[gem5-dev] changeset in gem5: cpu: Add a PC-value to the traffic generator ...

2015-03-02 Thread Stephan Diestelhorst via gem5-dev
changeset 7f67a8d786a2 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=7f67a8d786a2
description:
cpu: Add a PC-value to the traffic generator requests

Have the traffic generator add its masterID as the PC address to the
requests. That way, prefetchers (and other components) that use a PC
for request classification will see per-tester streams of requests.
This enables us to test strided prefetchers with the memchecker, too.

diffstat:

 src/cpu/testers/traffic_gen/generators.cc |  3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diffs (13 lines):

diff -r 9b71309d29f9 -r 7f67a8d786a2 src/cpu/testers/traffic_gen/generators.cc
--- a/src/cpu/testers/traffic_gen/generators.cc Mon Mar 02 04:00:29 2015 -0500
+++ b/src/cpu/testers/traffic_gen/generators.cc Mon Mar 02 04:00:31 2015 -0500
@@ -57,6 +57,9 @@
 {
 // Create new request
 Request *req = new Request(addr, size, flags, masterID);
+// Dummy PC to have PC-based prefetchers latch on; get entropy into higher
+// bits
+req->setPC(((Addr)masterID) << 2);
 
 // Embed it in a packet
 PacketPtr pkt = new Packet(req, cmd);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Add byte mask to Packet::checkFunctional

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset b1d90d88420e in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b1d90d88420e
description:
mem: Add byte mask to Packet::checkFunctional

This patch changes the valid-bytes start/end to a proper byte
mask. With the changes in timing introduced in previous patches there
are more packets waiting in queues, and there are regressions using
the checker CPU failing due to non-contigous read data being found in
the various cache queues.

This patch also adds some more comments explaining what is going on,
and adds the fourth and missing case to Packet::checkFunctional.

diffstat:

 src/mem/packet.cc |  109 -
 src/mem/packet.hh |   13 +
 2 files changed, 46 insertions(+), 76 deletions(-)

diffs (195 lines):

diff -r 886d2458e0d6 -r b1d90d88420e src/mem/packet.cc
--- a/src/mem/packet.cc Mon Mar 02 04:00:49 2015 -0500
+++ b/src/mem/packet.cc Mon Mar 02 04:00:52 2015 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2011-2014 ARM Limited
+ * Copyright (c) 2011-2015 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -207,6 +207,8 @@
 if (isRead()) {
 if (func_start >= val_start && func_end <= val_end) {
 memcpy(getPtr(), _data + offset, getSize());
+if (bytesValid.empty())
+bytesValid.resize(getSize(), true);
 // complete overlap, and as the current packet is a read
 // we are done
 return true;
@@ -218,91 +220,64 @@
 
 // calculate offsets and copy sizes for the two byte arrays
 if (val_start < func_start && val_end <= func_end) {
+// the one we are checking against starts before and
+// ends before or the same
 val_offset = func_start - val_start;
 func_offset = 0;
 overlap_size = val_end - func_start;
 } else if (val_start >= func_start && val_end > func_end) {
+// the one we are checking against starts after or the
+// same, and ends after
 val_offset = 0;
 func_offset = val_start - func_start;
 overlap_size = func_end - val_start;
 } else if (val_start >= func_start && val_end <= func_end) {
+// the one we are checking against is completely
+// subsumed in the current packet, possibly starting
+// and ending at the same address
 val_offset = 0;
 func_offset = val_start - func_start;
 overlap_size = size;
+} else if (val_start < func_start && val_end > func_end) {
+// the current packet is completely subsumed in the
+// one we are checking against
+val_offset = func_start - val_start;
+func_offset = 0;
+overlap_size = func_end - func_start;
 } else {
-panic("BUG: Missed a case for a partial functional request");
+panic("Missed a case for checkFunctional with "
+  " %s 0x%x size %d, against 0x%x size %d\n",
+  cmdString(), getAddr(), getSize(), addr, size);
 }
 
-// Figure out how much of the partial overlap should be copied
-// into the packet and not overwrite previously found bytes.
-if (bytesValidStart == 0 && bytesValidEnd == 0) {
-// No bytes have been copied yet, just set indices
-// to found range
-bytesValidStart = func_offset;
-bytesValidEnd = func_offset + overlap_size;
-} else {
-// Some bytes have already been copied. Use bytesValid
-// indices and offset values to figure out how much data
-// to copy and where to copy it to.
-
-// Indice overlap conditions to check
-int a = func_offset - bytesValidStart;
-int b = (func_offset + overlap_size) - bytesValidEnd;
-int c = func_offset - bytesValidEnd;
-int d = (func_offset + overlap_size) - bytesValidStart;
-
-if (a >= 0 && b <= 0) {
-// bytes already in pkt data array are superset of
-// found bytes, will not copy any bytes
-overlap_size = 0;
-} else if (a < 0 && d >= 0 && b <= 0) {
-// found bytes will move bytesValidStart towards 0
-overlap_size = bytesValidStart - func_offset;
-bytesValidStart = func_offset;
-} else if (b > 0 && c <= 0 && a >= 0) {
-// found bytes will move bytesValidEnd
-// towards end of pkt data array
-   

[gem5-dev] changeset in gem5: mem: Split port retry for all different packe...

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset eddb533708cb in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=eddb533708cb
description:
mem: Split port retry for all different packet classes

This patch fixes a long-standing isue with the port flow
control. Before this patch the retry mechanism was shared between all
different packet classes. As a result, a snoop response could get
stuck behind a request waiting for a retry, even if the send/recv
functions were split. This caused message-dependent deadlocks in
stress-test scenarios.

The patch splits the retry into one per packet (message) class. Thus,
sendTimingReq has a corresponding recvReqRetry, sendTimingResp has
recvRespRetry etc. Most of the changes to the code involve simply
clarifying what type of request a specific object was accepting.

The biggest change in functionality is in the cache downstream packet
queue, facing the memory. This queue was shared by requests and snoop
responses, and it is now split into two queues, each with their own
flow control, but the same physical MasterPort. These changes fixes
the previously seen deadlocks.

diffstat:

 src/arch/x86/pagetable_walker.cc   |6 +-
 src/arch/x86/pagetable_walker.hh   |4 +-
 src/cpu/kvm/base.hh|4 +-
 src/cpu/minor/fetch1.cc|2 +-
 src/cpu/minor/fetch1.hh|4 +-
 src/cpu/minor/lsq.cc   |2 +-
 src/cpu/minor/lsq.hh   |4 +-
 src/cpu/o3/cpu.cc  |8 +-
 src/cpu/o3/cpu.hh  |4 +-
 src/cpu/o3/fetch.hh|2 +-
 src/cpu/o3/fetch_impl.hh   |2 +-
 src/cpu/o3/lsq.hh  |2 +-
 src/cpu/o3/lsq_impl.hh |2 +-
 src/cpu/simple/atomic.hh   |2 +-
 src/cpu/simple/timing.cc   |8 +-
 src/cpu/simple/timing.hh   |8 +-
 src/cpu/testers/directedtest/RubyDirectedTester.hh |2 +-
 src/cpu/testers/memtest/memtest.cc |2 +-
 src/cpu/testers/memtest/memtest.hh |2 +-
 src/cpu/testers/networktest/networktest.cc |2 +-
 src/cpu/testers/networktest/networktest.hh |2 +-
 src/cpu/testers/rubytest/RubyTester.hh |2 +-
 src/cpu/testers/traffic_gen/traffic_gen.cc |2 +-
 src/cpu/testers/traffic_gen/traffic_gen.hh |4 +-
 src/dev/dma_device.cc  |2 +-
 src/dev/dma_device.hh  |2 +-
 src/mem/addr_mapper.cc |8 +-
 src/mem/addr_mapper.hh |   12 +-
 src/mem/bridge.cc  |8 +-
 src/mem/bridge.hh  |4 +-
 src/mem/cache/base.cc  |2 +-
 src/mem/cache/base.hh  |   12 +-
 src/mem/cache/cache.hh |   15 +-
 src/mem/cache/cache_impl.hh|  120 +--
 src/mem/coherent_xbar.cc   |   12 +-
 src/mem/coherent_xbar.hh   |   20 +-
 src/mem/comm_monitor.cc|8 +-
 src/mem/comm_monitor.hh|   12 +-
 src/mem/dram_ctrl.cc   |4 +-
 src/mem/dram_ctrl.hh   |2 +-
 src/mem/dramsim2.cc|8 +-
 src/mem/dramsim2.hh|4 +-
 src/mem/external_slave.cc  |6 +-
 src/mem/mem_checker_monitor.cc |8 +-
 src/mem/mem_checker_monitor.hh |   12 +-
 src/mem/mport.hh   |6 +-
 src/mem/noncoherent_xbar.cc|2 +-
 src/mem/noncoherent_xbar.hh|   10 +-
 src/mem/packet_queue.cc|  153 +++
 src/mem/packet_queue.hh|  160 +++-
 src/mem/port.cc|   16 +-
 src/mem/port.hh|   48 --
 src/mem/qport.hh   |   65 +---
 src/mem/ruby/slicc_interface/AbstractController.cc |6 +-
 src/mem/ruby/slicc_interface/AbstractController.hh |5 +-
 src/mem/ruby/structures/RubyMemoryControl.hh   |2 +-
 src/mem/ruby/system/DMASequencer.cc|2 +-
 src/mem/ruby/system/DMASequencer.hh|2 +-
 src/mem/ruby/system/RubyPort.cc|8 +-
 src/mem/ruby/system/RubyPort.hh

[gem5-dev] changeset in gem5: arm: Don't truncate 16-bit ASIDs to 8 bits

2015-03-02 Thread Andreas Sandberg via gem5-dev
changeset 890269a13188 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=890269a13188
description:
arm: Don't truncate 16-bit ASIDs to 8 bits

The ISA code sometimes stores 16-bit ASIDs as 8-bit unsigned integers
and has a couple of inverted checks that mask out the high 8 bits of
an ASID if 16-bit ASIDs have been /enabled/. This changeset fixes both
of those issues.

diffstat:

 src/arch/arm/isa.cc |  8 
 src/arch/arm/isa.hh |  2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diffs (38 lines):

diff -r fe09d1bc6721 -r 890269a13188 src/arch/arm/isa.cc
--- a/src/arch/arm/isa.cc   Mon Mar 02 04:00:27 2015 -0500
+++ b/src/arch/arm/isa.cc   Mon Mar 02 04:00:28 2015 -0500
@@ -1386,7 +1386,7 @@
 oc = sys->getThreadContext(x);
 assert(oc->getITBPtr() && oc->getDTBPtr());
 asid = bits(newVal, 63, 48);
-if (haveLargeAsid64)
+if (!haveLargeAsid64)
 asid &= mask(8);
 oc->getITBPtr()->flushAsid(asid, secure_lookup, target_el);
 oc->getDTBPtr()->flushAsid(asid, secure_lookup, target_el);
@@ -1941,10 +1941,10 @@
 }
 
 void
-ISA::tlbiVA(ThreadContext *tc, MiscReg newVal, uint8_t asid, bool 
secure_lookup,
-uint8_t target_el)
+ISA::tlbiVA(ThreadContext *tc, MiscReg newVal, uint16_t asid,
+bool secure_lookup, uint8_t target_el)
 {
-if (haveLargeAsid64)
+if (!haveLargeAsid64)
 asid &= mask(8);
 Addr va = ((Addr) bits(newVal, 43, 0)) << 12;
 System *sys = tc->getSystemPtr();
diff -r fe09d1bc6721 -r 890269a13188 src/arch/arm/isa.hh
--- a/src/arch/arm/isa.hh   Mon Mar 02 04:00:27 2015 -0500
+++ b/src/arch/arm/isa.hh   Mon Mar 02 04:00:28 2015 -0500
@@ -221,7 +221,7 @@
 assert(!cpsr.width);
 }
 
-void tlbiVA(ThreadContext *tc, MiscReg newVal, uint8_t asid,
+void tlbiVA(ThreadContext *tc, MiscReg newVal, uint16_t asid,
 bool secure_lookup, uint8_t target_el);
 
 void tlbiALL(ThreadContext *tc, bool secure_lookup, uint8_t target_el);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Add option to force in-order insertion i...

2015-03-02 Thread Stephan Diestelhorst via gem5-dev
changeset 886d2458e0d6 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=886d2458e0d6
description:
mem: Add option to force in-order insertion in PacketQueue

By default, the packet queue is ordered by the ticks of the to-be-sent
packages. With the recent modifications of packages sinking their 
header time
when their resposne leaves the caches, there could be cases of MSHR 
targets
being allocated and ordered A, B, but their responses being sent out in 
the
order B,A. This led to inconsistencies in bus traffic, in particular 
the snoop
filter observing first a ReadExResp and later a ReadRespWithInv.  
Logically,
these were ordered the other way around behind the MSHR, but due to the 
timing
adjustments when inserting into the PacketQueue, they were sent out in 
the
wrong order on the bus, confusing the snoop filter.

This patch adds a flag (off by default) such that these special cases 
can
request in-order insertion into the packet queue, which might offset 
timing
slighty. This is expected to occur rarely and not affect timing results.

diffstat:

 src/mem/cache/cache_impl.hh |   9 +
 src/mem/packet_queue.cc |  27 ---
 src/mem/packet_queue.hh |   4 +++-
 src/mem/qport.hh|   5 +++--
 4 files changed, 35 insertions(+), 10 deletions(-)

diffs (109 lines):

diff -r 3e6a3eaac71b -r 886d2458e0d6 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:48 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:49 2015 -0500
@@ -1596,15 +1596,16 @@
 // invalidate it.
 pkt->cmd = MemCmd::ReadRespWithInvalidate;
 }
-DPRINTF(Cache, "%s created response: %s address %x size %d\n",
-__func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize());
-// Here we condiser forward_time, paying for just forward latency and
+// Here we consider forward_time, paying for just forward latency and
 // also charging the delay provided by the xbar.
 // forward_time is used as send_time in next allocateWriteBuffer().
 Tick forward_time = clockEdge(forwardLatency) + pkt->headerDelay;
 // Here we reset the timing of the packet.
 pkt->headerDelay = pkt->payloadDelay = 0;
-memSidePort->schedTimingSnoopResp(pkt, forward_time);
+DPRINTF(Cache, "%s created response: %s address %x size %d tick: %lu\n",
+__func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize(),
+forward_time);
+memSidePort->schedTimingSnoopResp(pkt, forward_time, true);
 }
 
 template
diff -r 3e6a3eaac71b -r 886d2458e0d6 src/mem/packet_queue.cc
--- a/src/mem/packet_queue.cc   Mon Mar 02 04:00:48 2015 -0500
+++ b/src/mem/packet_queue.cc   Mon Mar 02 04:00:49 2015 -0500
@@ -100,10 +100,11 @@
 }
 
 void
-PacketQueue::schedSendTiming(PacketPtr pkt, Tick when)
+PacketQueue::schedSendTiming(PacketPtr pkt, Tick when, bool force_order)
 {
-DPRINTF(PacketQueue, "%s for %s address %x size %d\n", __func__,
-pkt->cmdString(), pkt->getAddr(), pkt->getSize());
+DPRINTF(PacketQueue, "%s for %s address %x size %d when %lu ord: %i\n",
+__func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize(), when,
+force_order);
 
 // we can still send a packet before the end of this tick
 assert(when >= curTick());
@@ -118,9 +119,26 @@
   name());
 }
 
+// if requested, force the timing to be in-order by changing the when
+// parameter
+if (force_order && !transmitList.empty()) {
+Tick back = transmitList.back().tick;
+
+// fudge timing if required; relies on the code below to do the right
+// thing (push_back) with the updated time-stamp
+if (when < back) {
+DPRINTF(PacketQueue, "%s force_order shifted packet %s address "\
+"%x from %lu to %lu\n", __func__, pkt->cmdString(),
+pkt->getAddr(), when, back);
+when = back;
+}
+}
+
 // nothing on the list, or earlier than current front element,
 // schedule an event
 if (transmitList.empty() || when < transmitList.front().tick) {
+// force_order-ed in here only when list is empty
+assert(!force_order || transmitList.empty());
 // note that currently we ignore a potentially outstanding retry
 // and could in theory put a new packet at the head of the
 // transmit list before retrying the existing packet
@@ -143,6 +161,9 @@
 return;
 }
 
+// forced orders never need insertion in the middle
+assert(!force_order);
+
 // this belongs in the middle somewhere, insertion sort
 auto i = transmitList.begin();
 ++i; // already checked for insertion at front
diff -r 3e6a3eaac71b -r 886d2458e0d6 src/mem/packet_queue.hh
--- a/src/mem/packet_queue.hh   Mon Mar 02 04:00:48 2015 -0500
+++ b/src/mem/p

[gem5-dev] changeset in gem5: mem: Tidy up the cache debug messages

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset 9ba5e70964a4 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=9ba5e70964a4
description:
mem: Tidy up the cache debug messages

Avoid redundant inclusion of the name in the DPRINTF string.

diffstat:

 src/mem/cache/base.cc   |  10 +-
 src/mem/cache/base.hh   |   3 ++-
 src/mem/cache/cache_impl.hh |   7 +++
 3 files changed, 10 insertions(+), 10 deletions(-)

diffs (80 lines):

diff -r eddb533708cb -r 9ba5e70964a4 src/mem/cache/base.cc
--- a/src/mem/cache/base.cc Mon Mar 02 04:00:35 2015 -0500
+++ b/src/mem/cache/base.cc Mon Mar 02 04:00:37 2015 -0500
@@ -92,13 +92,13 @@
 BaseCache::CacheSlavePort::setBlocked()
 {
 assert(!blocked);
-DPRINTF(CachePort, "Cache port %s blocking new requests\n", name());
+DPRINTF(CachePort, "Port is blocking new requests\n");
 blocked = true;
 // if we already scheduled a retry in this cycle, but it has not yet
 // happened, cancel it
 if (sendRetryEvent.scheduled()) {
 owner.deschedule(sendRetryEvent);
-DPRINTF(CachePort, "Cache port %s deschedule retry\n", name());
+DPRINTF(CachePort, "Port descheduled retry\n");
 mustSendRetry = true;
 }
 }
@@ -107,10 +107,10 @@
 BaseCache::CacheSlavePort::clearBlocked()
 {
 assert(blocked);
-DPRINTF(CachePort, "Cache port %s accepting new requests\n", name());
+DPRINTF(CachePort, "Port is accepting new requests\n");
 blocked = false;
 if (mustSendRetry) {
-// @TODO: need to find a better time (next bus cycle?)
+// @TODO: need to find a better time (next cycle?)
 owner.schedule(sendRetryEvent, curTick() + 1);
 }
 }
@@ -118,7 +118,7 @@
 void
 BaseCache::CacheSlavePort::processSendRetry()
 {
-DPRINTF(CachePort, "Cache port %s sending retry\n", name());
+DPRINTF(CachePort, "Port is sending retry\n");
 
 // reset the flag and call retry
 mustSendRetry = false;
diff -r eddb533708cb -r 9ba5e70964a4 src/mem/cache/base.hh
--- a/src/mem/cache/base.hh Mon Mar 02 04:00:35 2015 -0500
+++ b/src/mem/cache/base.hh Mon Mar 02 04:00:37 2015 -0500
@@ -129,7 +129,8 @@
  */
 void requestBus(RequestCause cause, Tick time)
 {
-DPRINTF(CachePort, "Asserting bus request for cause %d\n", cause);
+DPRINTF(CachePort, "Scheduling request at %llu due to %d\n",
+time, cause);
 reqQueue.schedSendEvent(time);
 }
 
diff -r eddb533708cb -r 9ba5e70964a4 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:35 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:37 2015 -0500
@@ -261,8 +261,7 @@
 markInServiceInternal(mshr, pending_dirty_resp);
 #if 0
 if (mshr->originalCmd == MemCmd::HardPFReq) {
-DPRINTF(HWPrefetch, "%s:Marking a HW_PF in service\n",
-name());
+DPRINTF(HWPrefetch, "Marking a HW_PF in service\n");
 //Also clear pending if need be
 if (!prefetcher->havePending())
 {
@@ -324,10 +323,10 @@
 // that can modify its value.
 blk = tags->accessBlock(pkt->getAddr(), pkt->isSecure(), lat, id);
 
-DPRINTF(Cache, "%s%s %x (%s) %s %s\n", pkt->cmdString(),
+DPRINTF(Cache, "%s%s %x (%s) %s\n", pkt->cmdString(),
 pkt->req->isInstFetch() ? " (ifetch)" : "",
 pkt->getAddr(), pkt->isSecure() ? "s" : "ns",
-blk ? "hit" : "miss", blk ? blk->print() : "");
+blk ? "hit " + blk->print() : "miss");
 
 // Writeback handling is special case.  We can write the block into
 // the cache without having a writeable copy (or any copy at all).
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: dev, arm: Clean up PL011 and rewrite interrup...

2015-03-02 Thread Andreas Sandberg via gem5-dev
changeset 4ed87af2930f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=4ed87af2930f
description:
dev, arm: Clean up PL011 and rewrite interrupt handling

The ARM PL011 UART model didn't clear and raise interrupts
correctly. This changeset rewrites the whole interrupt handling and
makes it both simpler and fixes several cases where the correct
interrupts weren't raised or cleared. Additionally, it cleans up many
other aspects of the code.

diffstat:

 src/dev/arm/pl011.cc |  144 
 src/dev/arm/pl011.hh |  152 --
 2 files changed, 144 insertions(+), 152 deletions(-)

diffs (truncated from 442 to 300 lines):

diff -r 4f8c1bd6fdb8 -r 4ed87af2930f src/dev/arm/pl011.cc
--- a/src/dev/arm/pl011.cc  Mon Mar 02 04:00:42 2015 -0500
+++ b/src/dev/arm/pl011.cc  Mon Mar 02 04:00:44 2015 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010 ARM Limited
+ * Copyright (c) 2010, 2015 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -38,23 +38,29 @@
  * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
  *
  * Authors: Ali Saidi
+ *  Andreas Sandberg
  */
 
+#include "dev/arm/pl011.hh"
+
 #include "base/trace.hh"
 #include "debug/Checkpoint.hh"
 #include "debug/Uart.hh"
 #include "dev/arm/amba_device.hh"
 #include "dev/arm/base_gic.hh"
-#include "dev/arm/pl011.hh"
 #include "dev/terminal.hh"
 #include "mem/packet.hh"
 #include "mem/packet_access.hh"
 #include "sim/sim_exit.hh"
+#include "params/Pl011.hh"
 
-Pl011::Pl011(const Params *p)
-: Uart(p, 0xfff), control(0x300), fbrd(0), ibrd(0), lcrh(0), ifls(0x12),
-  imsc(0), rawInt(0), maskInt(0), intNum(p->int_num), gic(p->gic),
-  endOnEOT(p->end_on_eot), intDelay(p->int_delay), intEvent(this)
+Pl011::Pl011(const Pl011Params *p)
+: Uart(p, 0xfff),
+  intEvent(this),
+  control(0x300), fbrd(0), ibrd(0), lcrh(0), ifls(0x12),
+  imsc(0), rawInt(0),
+  gic(p->gic), endOnEOT(p->end_on_eot), intNum(p->int_num),
+  intDelay(p->int_delay)
 {
 }
 
@@ -75,17 +81,23 @@
 switch(daddr) {
   case UART_DR:
 data = 0;
-if (term->dataAvailable())
+if (term->dataAvailable()) {
 data = term->in();
+// Since we don't simulate a FIFO for incoming data, we
+// assume it's empty and clear RXINTR and RTINTR.
+clearInterrupts(UART_RXINTR | UART_RTINTR);
+}
 break;
   case UART_FR:
-// For now we're infintely fast, so TX is never full, always empty,
-// always clear to send
-data = UART_FR_TXFE | UART_FR_CTS;
-if (!term->dataAvailable())
-data |= UART_FR_RXFE;
-DPRINTF(Uart, "Reading FR register as %#x rawInt=0x%x imsc=0x%x 
maskInt=0x%x\n",
-data, rawInt, imsc, maskInt);
+data =
+UART_FR_CTS | // Clear To Send
+(!term->dataAvailable() ? UART_FR_RXFE : 0) | // RX FIFO Empty
+UART_FR_TXFE; // TX FIFO empty
+
+DPRINTF(Uart,
+"Reading FR register as %#x rawInt=0x%x "
+"imsc=0x%x maskInt=0x%x\n",
+data, rawInt, imsc, maskInt());
 break;
   case UART_CR:
 data = control;
@@ -110,8 +122,8 @@
 DPRINTF(Uart, "Reading Raw Int status as 0x%x\n", rawInt);
 break;
   case UART_MIS:
-DPRINTF(Uart, "Reading Masked Int status as 0x%x\n", rawInt);
-data = maskInt;
+DPRINTF(Uart, "Reading Masked Int status as 0x%x\n", maskInt());
+data = maskInt();
 break;
   default:
 if (readId(pkt, AMBA_ID, pioAddr)) {
@@ -182,15 +194,11 @@
 exitSimLoop("UART received EOT", 0);
 
 term->out(data & 0xFF);
-
-//raw interrupt is set regardless of imsc.txim
-rawInt.txim = 1;
-if (imsc.txim) {
-DPRINTF(Uart, "TX int enabled, scheduling interruptt\n");
-if (!intEvent.scheduled())
-schedule(intEvent, curTick() + intDelay);
-}
-
+// We're supposed to clear TXINTR when this register is
+// written to, however. since we're also infinitely fast, we
+// need to immediately raise it again.
+clearInterrupts(UART_TXINTR);
+raiseInterrupts(UART_TXINTR);
 break;
   case UART_CR:
 control = data;
@@ -208,35 +216,13 @@
 ifls = data;
 break;
   case UART_IMSC:
-imsc = data;
-
-if (imsc.feim || imsc.peim || imsc.beim || imsc.oeim || imsc.rsvd)
-panic("Unknown interrupt enabled\n");
-
-// rimim, ctsmim, dcdmim, dsrmim can be enabled but are ignored
-// they are supposed to interrupt on a change of status in the line
-// which we should never have since our terminal is happy to always
-   

[gem5-dev] changeset in gem5: mem: Unify all cache DPRINTF address formatting

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset d1387fcd94b8 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=d1387fcd94b8
description:
mem: Unify all cache DPRINTF address formatting

This patch changes all the DPRINTF messages in the cache to use
'%#llx' every time a packet address is printed. The inclusion of '#'
ensures '0x' is prepended, and since the address type is a uint64_t %x
really should be %llx.

diffstat:

 src/mem/cache/cache_impl.hh |  73 
 src/mem/cache/mshr.cc   |   4 +-
 2 files changed, 41 insertions(+), 36 deletions(-)

diffs (truncated from 318 to 300 lines):

diff -r 1072b1381560 -r d1387fcd94b8 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:54 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:56 2015 -0500
@@ -179,7 +179,7 @@
 // appended themselves to this cache before knowing the store
 // will fail.
 blk->status |= BlkDirty;
-DPRINTF(Cache, "%s for %s address %x size %d (write)\n", __func__,
+DPRINTF(Cache, "%s for %s addr %#llx size %d (write)\n", __func__,
 pkt->cmdString(), pkt->getAddr(), pkt->getSize());
 } else if (pkt->isRead()) {
 if (pkt->isLLSC()) {
@@ -241,7 +241,7 @@
 assert(blk != tempBlock);
 tags->invalidate(blk);
 blk->invalidate();
-DPRINTF(Cache, "%s for %s address %x size %d (invalidation)\n",
+DPRINTF(Cache, "%s for %s addr %#llx size %d (invalidation)\n",
 __func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize());
 }
 }
@@ -308,7 +308,7 @@
 // sanity check
 assert(pkt->isRequest());
 
-DPRINTF(Cache, "%s for %s address %x size %d\n", __func__,
+DPRINTF(Cache, "%s for %s addr %#llx size %d\n", __func__,
 pkt->cmdString(), pkt->getAddr(), pkt->getSize());
 if (pkt->req->isUncacheable()) {
 uncacheableFlush(pkt);
@@ -323,9 +323,9 @@
 // that can modify its value.
 blk = tags->accessBlock(pkt->getAddr(), pkt->isSecure(), lat, id);
 
-DPRINTF(Cache, "%s%s %x (%s) %s\n", pkt->cmdString(),
+DPRINTF(Cache, "%s%s addr %#llx size %d (%s) %s\n", pkt->cmdString(),
 pkt->req->isInstFetch() ? " (ifetch)" : "",
-pkt->getAddr(), pkt->isSecure() ? "s" : "ns",
+pkt->getAddr(), pkt->getSize(), pkt->isSecure() ? "s" : "ns",
 blk ? "hit " + blk->print() : "miss");
 
 // Writeback handling is special case.  We can write the block into
@@ -392,7 +392,7 @@
 void
 Cache::recvTimingSnoopResp(PacketPtr pkt)
 {
-DPRINTF(Cache, "%s for %s address %x size %d\n", __func__,
+DPRINTF(Cache, "%s for %s addr %#llx size %d\n", __func__,
 pkt->cmdString(), pkt->getAddr(), pkt->getSize());
 
 assert(pkt->isResponse());
@@ -409,7 +409,7 @@
 assert(pkt->cmd == MemCmd::HardPFResp);
 // Check if it's a prefetch response and handle it. We shouldn't
 // get any other kinds of responses without FRRs.
-DPRINTF(Cache, "Got prefetch response from above for addr %#x (%s)\n",
+DPRINTF(Cache, "Got prefetch response from above for addr %#llx 
(%s)\n",
 pkt->getAddr(), pkt->isSecure() ? "s" : "ns");
 recvTimingResp(pkt);
 return;
@@ -470,7 +470,7 @@
 if (pkt->memInhibitAsserted()) {
 // a cache above us (but not where the packet came from) is
 // responding to the request
-DPRINTF(Cache, "mem inhibited on 0x%x (%s): not responding\n",
+DPRINTF(Cache, "mem inhibited on addr %#llx (%s): not responding\n",
 pkt->getAddr(), pkt->isSecure() ? "s" : "ns");
 assert(!pkt->req->isUncacheable());
 
@@ -678,6 +678,10 @@
 
 // Coalesce unless it was a software prefetch (see above).
 if (pkt) {
+DPRINTF(Cache, "%s coalescing MSHR for %s addr %#llx size 
%d\n",
+__func__, pkt->cmdString(), pkt->getAddr(),
+pkt->getSize());
+
 assert(pkt->req->masterId() < system->maxMasters());
 mshr_hits[pkt->cmdToIndex()][pkt->req->masterId()]++;
 if (mshr->threadNum != 0/*pkt->req->threadId()*/) {
@@ -833,7 +837,7 @@
 PacketPtr pkt = new Packet(cpu_pkt->req, cmd, blkSize);
 
 pkt->allocate();
-DPRINTF(Cache, "%s created %s address %x size %d\n",
+DPRINTF(Cache, "%s created %s addr %#llx size %d\n",
 __func__, pkt->cmdString(), pkt->getAddr(), pkt->getSize());
 return pkt;
 }
@@ -863,19 +867,19 @@
 if (blk && blk->isValid()) {
 tags->invalidate(blk);
 blk->invalidate();
-DPRINTF(Cache, "rcvd mem-inhibited %s on 0x%x (%s):"
+DPRINTF(Cache, "rcvd mem-inhibited %s on %#llx (%s):"
 " invalidating\n",
 pkt->cmdString(), pkt->getAddr(),
   

[gem5-dev] changeset in gem5: stats: Update stats to reflect cache and inte...

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset 8a20e2a1562d in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=8a20e2a1562d
description:
stats: Update stats to reflect cache and interconnect changes

This is a bulk update of stats to match the changes to cache timing,
interconnect timing, and a few minor changes to the o3 CPU.

diffstat:

 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-minor/stats.txt
|  1493 +-
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3-dual/stats.txt  
|  3848 +++---
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-o3/stats.txt   
|  2079 +-
 tests/long/fs/10.linux-boot/ref/alpha/linux/tsunami-switcheroo-full/stats.txt  
|  3027 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor-dual/stats.txt
|  4418 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-minor/stats.txt 
|  1832 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-checker/stats.txt
|  2532 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3-dual/stats.txt   
|  5864 
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-o3/stats.txt
|  2460 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-full/stats.txt   
|  3756 ++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-o3/stats.txt 
|  4062 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview-switcheroo-timing/stats.txt 
|  2795 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-minor-dual/stats.txt  
|  4969 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-minor/stats.txt   
|  2129 +-
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3-checker/stats.txt  
|  2818 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3-dual/stats.txt 
|  6279 -
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-o3/stats.txt  
|  2704 ++--
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-full/stats.txt 
|  4291 +++---
 tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-o3/stats.txt   
|  4414 +++---
 
tests/long/fs/10.linux-boot/ref/arm/linux/realview64-switcheroo-timing/stats.txt
   |  3220 ++--
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt   
|  2555 ++--
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-switcheroo-full/stats.txt 
|  3105 ++--
 tests/long/se/10.mcf/ref/arm/linux/minor-timing/stats.txt  
|   584 +-
 tests/long/se/10.mcf/ref/arm/linux/o3-timing/stats.txt 
|  1595 +-
 tests/long/se/10.mcf/ref/arm/linux/simple-atomic/stats.txt 
|22 +-
 tests/long/se/10.mcf/ref/arm/linux/simple-timing/stats.txt 
|   228 +-
 tests/long/se/10.mcf/ref/sparc/linux/simple-timing/stats.txt   
|   572 +-
 tests/long/se/10.mcf/ref/x86/linux/o3-timing/stats.txt 
|  1439 +-
 tests/long/se/10.mcf/ref/x86/linux/simple-timing/stats.txt 
|   572 +-
 tests/long/se/20.parser/ref/alpha/tru64/minor-timing/stats.txt 
|  1034 +-
 tests/long/se/20.parser/ref/arm/linux/minor-timing/stats.txt   
|  1050 +-
 tests/long/se/20.parser/ref/arm/linux/o3-timing/stats.txt  
|  1628 +-
 tests/long/se/20.parser/ref/arm/linux/simple-atomic/stats.txt  
|22 +-
 tests/long/se/20.parser/ref/arm/linux/simple-timing/stats.txt  
|   282 +-
 tests/long/se/20.parser/ref/x86/linux/o3-timing/stats.txt  
|  1657 +-
 tests/long/se/20.parser/ref/x86/linux/simple-timing/stats.txt  
|   452 +-
 tests/long/se/30.eon/ref/alpha/tru64/minor-timing/stats.txt
|   500 +-
 tests/long/se/30.eon/ref/alpha/tru64/o3-timing/stats.txt   
|  1338 +-
 tests/long/se/30.eon/ref/alpha/tru64/simple-timing/stats.txt   
|   564 +-
 tests/long/se/30.eon/ref/arm/linux/minor-timing/stats.txt  
|   708 +-
 tests/long/se/30.eon/ref/arm/linux/o3-timing/stats.txt 
|  1464 +-
 tests/long/se/30.eon/ref/arm/linux/simple-atomic/stats.txt 
|22 +-
 tests/long/se/30.eon/ref/arm/linux/simple-timing/stats.txt 
|   250 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/minor-timing/stats.txt
|   730 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/o3-timing/stats.txt   
|  1404 +-
 tests/long/se/40.perlbmk/ref/alpha/tru64/simple-timing/stats.txt   
|   430 +-
 tests/long/se/40.perlbmk/ref/arm/linux/minor-timing/stats.txt  
|   924 +-
 tests/long/se/40.perlbmk/ref/arm/linux/o3-timing/stats.txt 
  

[gem5-dev] changeset in gem5: arm: Share a port for the two table walker ob...

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset 4f8c1bd6fdb8 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=4f8c1bd6fdb8
description:
arm: Share a port for the two table walker objects

This patch changes how the MMU and table walkers are created such that
a single port is used to connect the MMU and the TLBs to the memory
system. Previously two ports were needed as there are two table walker
objects (stage one and stage two), and they both had a port. Now the
port itself is moved to the Stage2MMU, and each TableWalker is simply
using the port from the parent.

By using the same port we also remove the need for having an
additional crossbar joining the two ports before the walker cache or
the L2. This simplifies the creation of the CPU cache topology in
BaseCPU.py considerably. Moreover, for naming and symmetry reasons,
the TLB walker port is connected through the stage-one table walker
thus making the naming identical to x86. Along the same line, we use
the stage-one table walker to generate the master id that is used by
all TLB-related requests.

diffstat:

 src/arch/arm/ArmTLB.py   |  22 ++---
 src/arch/arm/stage2_mmu.cc   |  43 +++---
 src/arch/arm/stage2_mmu.hh   |  58 ---
 src/arch/arm/table_walker.cc |  72 +++
 src/arch/arm/table_walker.hh |  63 +++---
 src/arch/arm/tlb.cc  |  12 +-
 src/arch/arm/tlb.hh  |  13 +--
 src/cpu/BaseCPU.py   |  26 +--
 8 files changed, 161 insertions(+), 148 deletions(-)

diffs (truncated from 674 to 300 lines):

diff -r 4408a83f7881 -r 4f8c1bd6fdb8 src/arch/arm/ArmTLB.py
--- a/src/arch/arm/ArmTLB.pyMon Mar 02 04:00:41 2015 -0500
+++ b/src/arch/arm/ArmTLB.pyMon Mar 02 04:00:42 2015 -0500
@@ -1,6 +1,6 @@
 # -*- mode:python -*-
 
-# Copyright (c) 2009, 2013 ARM Limited
+# Copyright (c) 2009, 2013, 2015 ARM Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -48,11 +48,17 @@
 cxx_class = 'ArmISA::TableWalker'
 cxx_header = "arch/arm/table_walker.hh"
 is_stage2 =  Param.Bool(False, "Is this object for stage 2 translation?")
-port = MasterPort("Port for TableWalker to do walk the translation with")
-sys = Param.System(Parent.any, "system object parameter")
 num_squash_per_cycle = Param.Unsigned(2,
 "Number of outstanding walks that can be squashed per cycle")
 
+# The port to the memory system. This port is ultimately belonging
+# to the Stage2MMU, and shared by the two table walkers, but we
+# access it through the ITB and DTB walked objects in the CPU for
+# symmetry with the other ISAs.
+port = MasterPort("Port used by the two table walkers")
+
+sys = Param.System(Parent.any, "system object parameter")
+
 class ArmTLB(SimObject):
 type = 'ArmTLB'
 cxx_class = 'ArmISA::TLB'
@@ -77,10 +83,16 @@
 tlb = Param.ArmTLB("Stage 1 TLB")
 stage2_tlb = Param.ArmTLB("Stage 2 TLB")
 
+sys = Param.System(Parent.any, "system object parameter")
+
 class ArmStage2IMMU(ArmStage2MMU):
+# We rely on the itb being a parameter of the CPU, and get the
+# appropriate object that way
 tlb = Parent.itb
-stage2_tlb = ArmStage2TLB(walker = ArmStage2TableWalker())
+stage2_tlb = ArmStage2TLB()
 
 class ArmStage2DMMU(ArmStage2MMU):
+# We rely on the dtb being a parameter of the CPU, and get the
+# appropriate object that way
 tlb = Parent.dtb
-stage2_tlb = ArmStage2TLB(walker = ArmStage2TableWalker())
+stage2_tlb = ArmStage2TLB()
diff -r 4408a83f7881 -r 4f8c1bd6fdb8 src/arch/arm/stage2_mmu.cc
--- a/src/arch/arm/stage2_mmu.ccMon Mar 02 04:00:41 2015 -0500
+++ b/src/arch/arm/stage2_mmu.ccMon Mar 02 04:00:42 2015 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2012-2013 ARM Limited
+ * Copyright (c) 2012-2013, 2015 ARM Limited
  * All rights reserved
  *
  * The license below extends only to copyright in the software and shall
@@ -37,29 +37,31 @@
  * Authors: Thomas Grocutt
  */
 
+#include "arch/arm/stage2_mmu.hh"
 #include "arch/arm/faults.hh"
-#include "arch/arm/stage2_mmu.hh"
 #include "arch/arm/system.hh"
+#include "arch/arm/table_walker.hh"
 #include "arch/arm/tlb.hh"
 #include "cpu/base.hh"
 #include "cpu/thread_context.hh"
-#include "debug/Checkpoint.hh"
-#include "debug/TLB.hh"
-#include "debug/TLBVerbose.hh"
 
 using namespace ArmISA;
 
 Stage2MMU::Stage2MMU(const Params *p)
-: SimObject(p), _stage1Tlb(p->tlb), _stage2Tlb(p->stage2_tlb)
+: SimObject(p), _stage1Tlb(p->tlb), _stage2Tlb(p->stage2_tlb),
+  port(_stage1Tlb->getTableWalker(), p->sys),
+  masterId(p->sys->getMasterId(_stage1Tlb->getTableWalker()->name()))
 {
-stage1Tlb()->setMMU(this);
-stage2Tlb()->setMMU(this);
+// we use the stage-on

[gem5-dev] changeset in gem5: arm: Correctly access the stack pointer in GDB

2015-03-02 Thread Andreas Sandberg via gem5-dev
changeset fe09d1bc6721 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=fe09d1bc6721
description:
arm: Correctly access the stack pointer in GDB

We curently use INTREG_X31 instead of INTREG_SPX when accessing the
stack pointer in GDB. gem5 normally uses INTREG_SPX to access the
stack pointer, which gets mapped to the stack pointer corresponding
(INTREG_SPn) to the current exception level. This changeset updates
the GDB interface to use SPX instead of X31 (which is always zero)
when transfering CPU state to gdb.

diffstat:

 src/arch/arm/remote_gdb.cc |  13 +
 src/arch/arm/remote_gdb.hh |   1 +
 2 files changed, 10 insertions(+), 4 deletions(-)

diffs (47 lines):

diff -r f7d17d8a854c -r fe09d1bc6721 src/arch/arm/remote_gdb.cc
--- a/src/arch/arm/remote_gdb.ccMon Mar 02 04:00:27 2015 -0500
+++ b/src/arch/arm/remote_gdb.ccMon Mar 02 04:00:27 2015 -0500
@@ -204,9 +204,10 @@
 memset(gdbregs.regs, 0, gdbregs.bytes());
 
 if (inAArch64(context)) {  // AArch64
-// x0-x31
-for (int i = 0; i < 32; ++i)
+// x0-x30
+for (int i = 0; i < 31; ++i)
 gdbregs.regs64[GDB64_X0 + i] = context->readIntReg(INTREG_X0 + i);
+gdbregs.regs64[GDB64_SPX] = context->readIntReg(INTREG_SPX);
 // pc
 gdbregs.regs64[GDB64_PC] = context->pcState().pc();
 // cpsr
@@ -262,13 +263,17 @@
 
 DPRINTF(GDBAcc, "setregs in remotegdb \n");
 if (inAArch64(context)) {  // AArch64
-// x0-x31
-for (int i = 0; i < 32; ++i)
+// x0-x30
+for (int i = 0; i < 31; ++i)
 context->setIntReg(INTREG_X0 + i, gdbregs.regs64[GDB64_X0 + i]);
 // pc
 context->pcState(gdbregs.regs64[GDB64_PC]);
 // cpsr
 context->setMiscRegNoEffect(MISCREG_CPSR, gdbregs.regs64[GDB64_CPSR]);
+// Update the stack pointer. This should be done after
+// updating CPSR/PSTATE since that might affect how SPX gets
+// mapped.
+context->setIntReg(INTREG_SPX, gdbregs.regs64[GDB64_SPX]);
 // v0-v31
 for (int i = 0; i < 128; i += 4) {
 int gdboff = GDB64_V0_32 + i;
diff -r f7d17d8a854c -r fe09d1bc6721 src/arch/arm/remote_gdb.hh
--- a/src/arch/arm/remote_gdb.hhMon Mar 02 04:00:27 2015 -0500
+++ b/src/arch/arm/remote_gdb.hhMon Mar 02 04:00:27 2015 -0500
@@ -68,6 +68,7 @@
 // AArch64 registers
 enum {
 GDB64_X0 = 0,
+GDB64_SPX = 31,
 GDB64_PC = 32,
 GDB64_CPSR = 33,
 GDB64_V0 = 34,
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: arm: Fix broken page table permissions checks...

2015-03-02 Thread Andreas Sandberg via gem5-dev
changeset f7d17d8a854c in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=f7d17d8a854c
description:
arm: Fix broken page table permissions checks in remote GDB

The remote GDB interface currently doesn't check if translations are
valid before reading memory. This causes a panic when GDB tries to
access unmapped memory (e.g., when getting a stack trace). There are
two reasons for this: 1) The function used to check for valid
translations (virtvalid()) doesn't work and panics on invalid
translations. 2) The method in the GDB interface used to test if a
translation is valid (RemoteGDB::acc) always returns true regardless
of the return from virtvalid().

This changeset fixes both of these issues.

diffstat:

 src/arch/arm/remote_gdb.cc |  15 ++-
 src/arch/arm/vtophys.cc|  27 +++
 2 files changed, 25 insertions(+), 17 deletions(-)

diffs (87 lines):

diff -r 4206946d60fe -r f7d17d8a854c src/arch/arm/remote_gdb.cc
--- a/src/arch/arm/remote_gdb.ccThu Feb 26 09:58:26 2015 -0600
+++ b/src/arch/arm/remote_gdb.ccMon Mar 02 04:00:27 2015 -0500
@@ -142,6 +142,7 @@
 #include "arch/arm/system.hh"
 #include "arch/arm/utility.hh"
 #include "arch/arm/vtophys.hh"
+#include "base/chunk_generator.hh"
 #include "base/intmath.hh"
 #include "base/remote_gdb.hh"
 #include "base/socket.hh"
@@ -172,16 +173,12 @@
 RemoteGDB::acc(Addr va, size_t len)
 {
 if (FullSystem) {
-Addr last_va;
-va   = truncPage(va);
-last_va  = roundPage(va + len);
-
-do  {
-if (virtvalid(context, va)) {
-return true;
+for (ChunkGenerator gen(va, len, PageBytes); !gen.done(); gen.next()) {
+if (!virtvalid(context, gen.addr())) {
+DPRINTF(GDBAcc, "acc:   %#x mapping is invalid\n", va);
+return false;
 }
-va += PageBytes;
-} while (va < last_va);
+}
 
 DPRINTF(GDBAcc, "acc:   %#x mapping is valid\n", va);
 return true;
diff -r 4206946d60fe -r f7d17d8a854c src/arch/arm/vtophys.cc
--- a/src/arch/arm/vtophys.cc   Thu Feb 26 09:58:26 2015 -0600
+++ b/src/arch/arm/vtophys.cc   Mon Mar 02 04:00:27 2015 -0500
@@ -63,8 +63,8 @@
 fatal("VTOPHYS: Can't convert vaddr to paddr on ARM without a thread 
context");
 }
 
-Addr
-ArmISA::vtophys(ThreadContext *tc, Addr addr)
+static std::pair
+try_translate(ThreadContext *tc, Addr addr)
 {
 Fault fault;
 // Set up a functional memory Request to pass to the TLB
@@ -82,22 +82,33 @@
 tlb = static_cast(tc->getDTBPtr());
 fault = tlb->translateFunctional(&req, tc, BaseTLB::Read, TLB::NormalTran);
 if (fault == NoFault)
-return req.getPaddr();
+return std::make_pair(true, req.getPaddr());
 
 tlb = static_cast(tc->getITBPtr());
 fault = tlb->translateFunctional(&req, tc, BaseTLB::Read, TLB::NormalTran);
 if (fault == NoFault)
-return req.getPaddr();
+return std::make_pair(true, req.getPaddr());
 
-panic("Table walkers support functional accesses. We should never get 
here\n");
+return std::make_pair(false, 0);
+}
+
+Addr
+ArmISA::vtophys(ThreadContext *tc, Addr addr)
+{
+const std::pair translation(try_translate(tc, addr));
+
+if (translation.first)
+return translation.second;
+else
+panic("Table walkers support functional accesses. We should never get 
here\n");
 }
 
 bool
 ArmISA::virtvalid(ThreadContext *tc, Addr vaddr)
 {
-if (vtophys(tc, vaddr) != -1)
-return true;
-return false;
+const std::pair translation(try_translate(tc, vaddr));
+
+return translation.first;
 }
 
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Downstream components consumes new cross...

2015-03-02 Thread Marco Balboni via gem5-dev
changeset 3e6a3eaac71b in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=3e6a3eaac71b
description:
mem: Downstream components consumes new crossbar delays

This patch makes the caches and memory controllers consume the delay
that is annotated to a packet by the crossbar. Previously many
components simply threw these delays away. Note that the devices still
do not pay for these delays.

diffstat:

 src/mem/cache/cache_impl.hh |  140 ---
 src/mem/dram_ctrl.cc|   11 ++-
 src/mem/dramsim2.cc |8 +-
 3 files changed, 103 insertions(+), 56 deletions(-)

diffs (truncated from 367 to 300 lines):

diff -r 67b3e74de9ae -r 3e6a3eaac71b src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:47 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:48 2015 -0500
@@ -417,12 +417,14 @@
 
 pkt->popSenderState();
 delete rec;
-// @todo someone should pay for this
-pkt->headerDelay = pkt->payloadDelay = 0;
 // forwardLatency is set here because there is a response from an
 // upper level cache.
-memSidePort->schedTimingSnoopResp(pkt, clockEdge(forwardLatency));
-
+// To pay the delay that occurs if the packet comes from the bus,
+// we charge also headerDelay.
+Tick snoop_resp_time = clockEdge(forwardLatency) + pkt->headerDelay;
+// Reset the timing of the packet.
+pkt->headerDelay = pkt->payloadDelay = 0;
+memSidePort->schedTimingSnoopResp(pkt, snoop_resp_time);
 }
 
 template
@@ -519,31 +521,41 @@
 if (pkt->req->isUncacheable()) {
 uncacheableFlush(pkt);
 
-// @todo: someone should pay for this
-pkt->headerDelay = pkt->payloadDelay = 0;
-
 // writes go in write buffer, reads use MSHR,
 // prefetches are acknowledged (responded to) and dropped
 if (pkt->cmd.isPrefetch()) {
 // prefetching (cache loading) uncacheable data is nonsensical
 pkt->makeTimingResponse();
 std::memset(pkt->getPtr(), 0xFF, pkt->getSize());
-// We use lookupLatency here because the request is uncacheable
-cpuSidePort->schedTimingResp(pkt, clockEdge(lookupLatency));
+// We use lookupLatency here because the request is uncacheable.
+// We pay also for headerDelay that is charged of bus latencies if
+// the packet comes from the bus.
+Tick time = clockEdge(lookupLatency) + pkt->headerDelay;
+// Reset the timing of the packet.
+pkt->headerDelay = pkt->payloadDelay = 0;
+cpuSidePort->schedTimingResp(pkt, time);
 return true;
 } else if (pkt->isWrite() && !pkt->isRead()) {
-// We use forwardLatency here because there is an uncached
-// memory write, forwarded to WriteBuffer. It specifies the
-// latency to allocate an internal buffer and to schedule an
-// event to the queued port.
-allocateWriteBuffer(pkt, clockEdge(forwardLatency), true);
+// We pay also for headerDelay that is charged of bus latencies if
+// the packet comes from the bus.
+Tick allocate_wr_buffer_time = clockEdge(forwardLatency) +
+pkt->headerDelay;
+// Reset the timing of the packet.
+pkt->headerDelay = pkt->payloadDelay = 0;
+allocateWriteBuffer(pkt, allocate_wr_buffer_time, true);
 } else {
 // We use forwardLatency here because there is an uncached
 // memory read, allocateded to MSHR queue (it requires the same
 // time of forwarding to WriteBuffer, in our assumption). It
 // specifies the latency to allocate an internal buffer and to
 // schedule an event to the queued port.
-allocateUncachedReadBuffer(pkt, clockEdge(forwardLatency), true);
+// We pay also for headerDelay that is charged of bus latencies if
+// the packet comes from the bus.
+Tick allocate_rd_buffer_time = clockEdge(forwardLatency) +
+pkt->headerDelay;
+// Reset the timing of the packet.
+pkt->headerDelay = pkt->payloadDelay = 0;
+allocateUncachedReadBuffer(pkt, allocate_rd_buffer_time, true);
 }
 assert(pkt->needsResponse()); // else we should delete it here??
 return true;
@@ -557,6 +569,20 @@
 // Note that lat is passed by reference here. The function access() calls
 // accessBlock() which can modify lat value.
 bool satisfied = access(pkt, blk, lat, writebacks);
+// Here we charge the headerDelay that takes into account the latencies
+// of the bus, if the packet comes from it.
+// The latency charged it is just lat that is the value of lookupLatency
+// modified by access

[gem5-dev] changeset in gem5: mem: Move crossbar default latencies to subcl...

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset 67b3e74de9ae in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=67b3e74de9ae
description:
mem: Move crossbar default latencies to subclasses

This patch introduces a few subclasses to the CoherentXBar and
NoncoherentXBar to distinguish the different uses in the system. We
use the crossbar in a wide range of places: interfacing cores to the
L2, as a system interconnect, connecting I/O and peripherals,
etc. Needless to say, these crossbars have very different performance,
and the clock frequency alone is not enough to distinguish these
scenarios.

Instead of trying to capture every possible case, this patch
introduces dedicated subclasses for the three primary use-cases:
L2XBar, SystemXBar and IOXbar. More can be added if needed, and the
defaults can be overridden.

diffstat:

 configs/common/CacheConfig.py  |   6 +--
 configs/common/FSConfig.py |  14 
 configs/dram/sweep.py  |   2 +-
 configs/example/memcheck.py|   4 +-
 configs/example/memtest.py |   4 +-
 configs/example/ruby_mem_test.py   |   2 +-
 configs/example/se.py  |   2 +-
 configs/ruby/Ruby.py   |   2 +-
 configs/splash2/cluster.py |  10 +++---
 configs/splash2/run.py |   4 +-
 src/cpu/BaseCPU.py |   7 +---
 src/mem/XBar.py|  51 ++---
 tests/configs/base_config.py   |   4 +-
 tests/configs/memtest-filter.py|   6 ++--
 tests/configs/memtest.py   |   4 +-
 tests/configs/o3-timing-mp-ruby.py |   2 +-
 tests/configs/o3-timing-ruby.py|   2 +-
 tests/configs/simple-atomic-mp-ruby.py |   2 +-
 tests/configs/tgen-dram-ctrl.py|   2 +-
 tests/configs/tgen-simple-mem.py   |   2 +-
 20 files changed, 84 insertions(+), 48 deletions(-)

diffs (truncated from 458 to 300 lines):

diff -r b4fc9ad648aa -r 67b3e74de9ae configs/common/CacheConfig.py
--- a/configs/common/CacheConfig.py Mon Mar 02 04:00:46 2015 -0500
+++ b/configs/common/CacheConfig.py Mon Mar 02 04:00:47 2015 -0500
@@ -65,14 +65,12 @@
 if options.l2cache:
 # Provide a clock for the L2 and the L1-to-L2 bus here as they
 # are not connected using addTwoLevelCacheHierarchy. Use the
-# same clock as the CPUs, and set the L1-to-L2 bus width to 32
-# bytes (256 bits).
+# same clock as the CPUs.
 system.l2 = l2_cache_class(clk_domain=system.cpu_clk_domain,
size=options.l2_size,
assoc=options.l2_assoc)
 
-system.tol2bus = CoherentXBar(clk_domain = system.cpu_clk_domain,
-  width = 32)
+system.tol2bus = L2XBar(clk_domain = system.cpu_clk_domain)
 system.l2.cpu_side = system.tol2bus.master
 system.l2.mem_side = system.membus.slave
 
diff -r b4fc9ad648aa -r 67b3e74de9ae configs/common/FSConfig.py
--- a/configs/common/FSConfig.pyMon Mar 02 04:00:46 2015 -0500
+++ b/configs/common/FSConfig.pyMon Mar 02 04:00:47 2015 -0500
@@ -50,7 +50,7 @@
 def childImage(self, ci):
 self.image.child.image_file = ci
 
-class MemBus(CoherentXBar):
+class MemBus(SystemXBar):
 badaddr_responder = BadAddr()
 default = Self.badaddr_responder.pio
 
@@ -78,7 +78,7 @@
 self.tsunami = BaseTsunami()
 
 # Create the io bus to connect all device ports
-self.iobus = NoncoherentXBar()
+self.iobus = IOXBar()
 self.tsunami.attachIO(self.iobus)
 
 self.tsunami.ide.pio = self.iobus.master
@@ -143,7 +143,7 @@
 # generic system
 mdesc = SysConfig()
 self.readfile = mdesc.script()
-self.iobus = NoncoherentXBar()
+self.iobus = IOXBar()
 self.membus = MemBus()
 self.bridge = Bridge(delay='50ns')
 self.t1000 = T1000()
@@ -205,7 +205,7 @@
 mdesc = SysConfig()
 
 self.readfile = mdesc.script()
-self.iobus = NoncoherentXBar()
+self.iobus = IOXBar()
 self.membus = MemBus()
 self.membus.badaddr_responder.warn_access = "warn"
 self.bridge = Bridge(delay='50ns')
@@ -311,7 +311,7 @@
 # generic system
 mdesc = SysConfig()
 self.readfile = mdesc.script()
-self.iobus = NoncoherentXBar()
+self.iobus = IOXBar()
 self.membus = MemBus()
 self.bridge = Bridge(delay='50ns')
 self.mem_ranges = [AddrRange('1GB')]
@@ -358,7 +358,7 @@
 x86_sys.membus = MemBus()
 
 # North Bridge
-x86_sys.iobus = NoncoherentXBar()
+x86_sys.iobus = IOXBar()
 x86_sys.bridge = Bridge(delay='50ns')
 x86_sys.bridge.master = x86_sys.iobus.slave
 x86_sys.bridge.slave = x86_sys.membus.master
@@ -394,7 +394,7 @@
 
 def connectX86RubySystem(x86_sys):
 # North Bridge
-x86_sys.iobus = NoncoherentXBar()
+x86_sys.iobus = IOXBar

[gem5-dev] changeset in gem5: mem: Fix cache MSHR conflict determination

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset 1072b1381560 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1072b1381560
description:
mem: Fix cache MSHR conflict determination

This patch fixes a rather subtle issue in the sending of MSHR requests
in the cache, where the logic previously did not check for conflicts
between the MSRH queue and the write queue when requests were not
ready. The correct thing to do is to always check, since not having a
ready MSHR does not guarantee that there is no conflict.

The underlying problem seems to have slipped past due to the symmetric
timings used for the write queue and MSHR queue. However, with the
recent timing changes the bug caused regressions to fail.

diffstat:

 src/mem/cache/cache_impl.hh |  48 
 1 files changed, 22 insertions(+), 26 deletions(-)

diffs (72 lines):

diff -r b1d90d88420e -r 1072b1381560 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:52 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:54 2015 -0500
@@ -1887,39 +1887,33 @@
 MSHR *
 Cache::getNextMSHR()
 {
-// Check both MSHR queue and write buffer for potential requests
+// Check both MSHR queue and write buffer for potential requests,
+// note that null does not mean there is no request, it could
+// simply be that it is not ready
 MSHR *miss_mshr  = mshrQueue.getNextMSHR();
 MSHR *write_mshr = writeBuffer.getNextMSHR();
 
-// Now figure out which one to send... some cases are easy
-if (miss_mshr && !write_mshr) {
-return miss_mshr;
-}
-if (write_mshr && !miss_mshr) {
-return write_mshr;
-}
+// If we got a write buffer request ready, first priority is a
+// full write buffer, otherwhise we favour the miss requests
+if (write_mshr &&
+((writeBuffer.isFull() && writeBuffer.inServiceEntries == 0) ||
+ !miss_mshr)) {
+// need to search MSHR queue for conflicting earlier miss.
+MSHR *conflict_mshr =
+mshrQueue.findPending(write_mshr->addr, write_mshr->size,
+  write_mshr->isSecure);
 
-if (miss_mshr && write_mshr) {
-// We have one of each... normally we favor the miss request
-// unless the write buffer is full
-if (writeBuffer.isFull() && writeBuffer.inServiceEntries == 0) {
-// Write buffer is full, so we'd like to issue a write;
-// need to search MSHR queue for conflicting earlier miss.
-MSHR *conflict_mshr =
-mshrQueue.findPending(write_mshr->addr, write_mshr->size,
-  write_mshr->isSecure);
+if (conflict_mshr && conflict_mshr->order < write_mshr->order) {
+// Service misses in order until conflict is cleared.
+return conflict_mshr;
 
-if (conflict_mshr && conflict_mshr->order < write_mshr->order) {
-// Service misses in order until conflict is cleared.
-return conflict_mshr;
-}
-
-// No conflicts; issue write
-return write_mshr;
+// @todo Note that we ignore the ready time of the conflict here
 }
 
-// Write buffer isn't full, but need to check it for
-// conflicting earlier writeback
+// No conflicts; issue write
+return write_mshr;
+} else if (miss_mshr) {
+// need to check for conflicting earlier writeback
 MSHR *conflict_mshr =
 writeBuffer.findPending(miss_mshr->addr, miss_mshr->size,
 miss_mshr->isSecure);
@@ -1937,6 +1931,8 @@
 // have to flush writes in order?  I don't think so... not
 // for Alpha anyway.  Maybe for x86?
 return conflict_mshr;
+
+// @todo Note that we ignore the ready time of the conflict here
 }
 
 // No conflicts; issue read
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: cpu: o3 register renaming request handling im...

2015-03-02 Thread Rekai via gem5-dev
changeset ced453290507 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=ced453290507
description:
cpu: o3 register renaming request handling improved

Now, prior to the renaming, the instruction requests the exact amount of
registers it will need, and the rename_map decides whether the 
instruction is
allowed to proceed or not.

diffstat:

 src/cpu/base_dyn_inst.hh  |   1 +
 src/cpu/o3/rename_impl.hh |   6 --
 src/cpu/o3/rename_map.hh  |  23 +++
 src/cpu/static_inst.hh|   3 +++
 4 files changed, 31 insertions(+), 2 deletions(-)

diffs (83 lines):

diff -r 9ba5e70964a4 -r ced453290507 src/cpu/base_dyn_inst.hh
--- a/src/cpu/base_dyn_inst.hh  Mon Mar 02 04:00:37 2015 -0500
+++ b/src/cpu/base_dyn_inst.hh  Mon Mar 02 04:00:38 2015 -0500
@@ -594,6 +594,7 @@
 // for machines with separate int & FP reg files
 int8_t numFPDestRegs()  const { return staticInst->numFPDestRegs(); }
 int8_t numIntDestRegs() const { return staticInst->numIntDestRegs(); }
+int8_t numCCDestRegs() const { return staticInst->numCCDestRegs(); }
 
 /** Returns the logical register index of the i'th destination register. */
 RegIndex destRegIdx(int i) const { return staticInst->destRegIdx(i); }
diff -r 9ba5e70964a4 -r ced453290507 src/cpu/o3/rename_impl.hh
--- a/src/cpu/o3/rename_impl.hh Mon Mar 02 04:00:37 2015 -0500
+++ b/src/cpu/o3/rename_impl.hh Mon Mar 02 04:00:38 2015 -0500
@@ -1,5 +1,5 @@
 /*
- * Copyright (c) 2010-2012, 2014 ARM Limited
+ * Copyright (c) 2010-2012, 2014-2015 ARM Limited
  * Copyright (c) 2013 Advanced Micro Devices, Inc.
  * All rights reserved.
  *
@@ -633,7 +633,9 @@
 
 // Check here to make sure there are enough destination registers
 // to rename to.  Otherwise block.
-if (renameMap[tid]->numFreeEntries() < inst->numDestRegs()) {
+if (!renameMap[tid]->canRename(inst->numIntDestRegs(),
+   inst->numFPDestRegs(),
+   inst->numCCDestRegs())) {
 DPRINTF(Rename, "Blocking due to lack of free "
 "physical registers to rename to.\n");
 blockThisCycle = true;
diff -r 9ba5e70964a4 -r ced453290507 src/cpu/o3/rename_map.hh
--- a/src/cpu/o3/rename_map.hh  Mon Mar 02 04:00:37 2015 -0500
+++ b/src/cpu/o3/rename_map.hh  Mon Mar 02 04:00:38 2015 -0500
@@ -1,4 +1,16 @@
 /*
+ * Copyright (c) 2015 ARM Limited
+ * All rights reserved.
+ *
+ * The license below extends only to copyright in the software and shall
+ * not be construed as granting a license to any other intellectual
+ * property including but not limited to intellectual property relating
+ * to a hardware implementation of the functionality of the software
+ * licensed hereunder.  You may use the software subject to the license
+ * terms below provided that you ensure that this notice is replicated
+ * unmodified and in its entirety in all distributions of the software,
+ * modified or unmodified, in source code or in binary form.
+ *
  * Copyright (c) 2004-2005 The Regents of The University of Michigan
  * Copyright (c) 2013 Advanced Micro Devices, Inc.
  * All rights reserved.
@@ -346,6 +358,17 @@
 {
 return std::min(intMap.numFreeEntries(), floatMap.numFreeEntries());
 }
+
+/**
+ * Return whether there are enough registers to serve the request.
+ */
+bool canRename(uint32_t intRegs, uint32_t floatRegs, uint32_t ccRegs) const
+{
+return intRegs <= intMap.numFreeEntries() &&
+floatRegs <= floatMap.numFreeEntries() &&
+ccRegs <= ccMap.numFreeEntries();
+}
+
 };
 
 #endif //__CPU_O3_RENAME_MAP_HH__
diff -r 9ba5e70964a4 -r ced453290507 src/cpu/static_inst.hh
--- a/src/cpu/static_inst.hhMon Mar 02 04:00:37 2015 -0500
+++ b/src/cpu/static_inst.hhMon Mar 02 04:00:38 2015 -0500
@@ -117,6 +117,9 @@
 /// Number of integer destination regs.
 int8_t numIntDestRegs() const { return _numIntDestRegs; }
 //@}
+/// Number of coprocesor destination regs.
+int8_t numCCDestRegs() const { return _numCCDestRegs; }
+//@}
 
 /// @name Flag accessors.
 /// These functions are used to access the values of the various
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Add crossbar latencies

2015-03-02 Thread Marco Balboni via gem5-dev
changeset b4fc9ad648aa in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b4fc9ad648aa
description:
mem: Add crossbar latencies

This patch introduces latencies in crossbar that were neglected
before. In particular, it adds three parameters in crossbar model:
front_end_latency, forward_latency, and response_latency. Along with
these parameters, three corresponding members are added:
frontEndLatency, forwardLatency, and responseLatency. The coherent
crossbar has an additional snoop_response_latency.

The latency of the request path through the xbar is set as
--> frontEndLatency + forwardLatency

In case the snoop filter is enabled, the request path latency is charged
also by look-up latency of the snoop filter.
--> frontEndLatency + SF(lookupLatency) + forwardLatency.

The latency of the response path through the xbar is set instead as
--> responseLatency.

In case of snoop response, if the response is treated as a normal 
response
the latency associated is again
--> responseLatency;

If instead it is forwarded as snoop response we add an additional 
variable
+ snoopResponseLatency
and the latency associated is
--> snoopResponseLatency;

Furthermore, this patch lets the crossbar progress on the next clock
edge after an unused retry, changing the time the crossbar considers
itself busy after sending a retry that was not acted upon.

diffstat:

 src/mem/XBar.py |  42 +++-
 src/mem/coherent_xbar.cc|  51 +++
 src/mem/coherent_xbar.hh|   5 +++-
 src/mem/noncoherent_xbar.cc |  31 ++--
 src/mem/noncoherent_xbar.hh |   2 +-
 src/mem/xbar.cc |  57 ++--
 src/mem/xbar.hh |  16 ++--
 7 files changed, 150 insertions(+), 54 deletions(-)

diffs (truncated from 396 to 300 lines):

diff -r 4ed87af2930f -r b4fc9ad648aa src/mem/XBar.py
--- a/src/mem/XBar.py   Mon Mar 02 04:00:44 2015 -0500
+++ b/src/mem/XBar.py   Mon Mar 02 04:00:46 2015 -0500
@@ -1,4 +1,4 @@
-# Copyright (c) 2012 ARM Limited
+# Copyright (c) 2012, 2015 ARM Limited
 # All rights reserved.
 #
 # The license below extends only to copyright in the software and shall
@@ -49,10 +49,29 @@
 type = 'BaseXBar'
 abstract = True
 cxx_header = "mem/xbar.hh"
-slave = VectorSlavePort("vector port for connecting masters")
-master = VectorMasterPort("vector port for connecting slaves")
-header_cycles = Param.Cycles(1, "cycles of overhead per transaction")
-width = Param.Unsigned(8, "xbar width (bytes)")
+
+slave = VectorSlavePort("Vector port for connecting masters")
+master = VectorMasterPort("Vector port for connecting slaves")
+
+# Latencies governing the time taken for the variuos paths a
+# packet has through the crossbar. Note that the crossbar itself
+# does not add the latency due to assumptions in the coherency
+# mechanism. Instead the latency is annotated on the packet and
+# left to the neighbouring modules.
+#
+# A request incurs the frontend latency, possibly snoop filter
+# lookup latency, and forward latency. A response incurs the
+# response latency. Frontend latency encompasses arbitration and
+# deciding what to do when a request arrives. the forward latency
+# is the latency involved once a decision is made to forward the
+# request. The response latency, is similar to the forward
+# latency, but for responses rather than requests.
+frontend_latency = Param.Cycles(3, "Frontend latency")
+forward_latency = Param.Cycles(4, "Forward latency")
+response_latency = Param.Cycles(2, "Response latency")
+
+# Width governing the throughput of the crossbar
+width = Param.Unsigned(8, "Datapath width per port (bytes)")
 
 # The default port can be left unconnected, or be used to connect
 # a default slave port
@@ -74,12 +93,21 @@
 type = 'CoherentXBar'
 cxx_header = "mem/coherent_xbar.hh"
 
+# The coherent crossbar additionally has snoop responses that are
+# forwarded after a specific latency.
+snoop_response_latency = Param.Cycles(4, "Snoop response latency")
+
+# An optional snoop filter
+snoop_filter = Param.SnoopFilter(NULL, "Selected snoop filter")
+
 system = Param.System(Parent.any, "System that the crossbar belongs to.")
-snoop_filter = Param.SnoopFilter(NULL, "Selected snoop filter.")
 
 class SnoopFilter(SimObject):
 type = 'SnoopFilter'
 cxx_header = "mem/snoop_filter.hh"
-lookup_latency = Param.Cycles(3, "lookup latency (cycles)")
+
+# Lookup latency of the snoop filter, added to requests that pass
+# through a coherent crossbar.
+lookup_latency = Param.Cycles(1, "Lookup latency")
 
 system = Param.

[gem5-dev] changeset in gem5: arm: Remove unnecessary dependencies between ...

2015-03-02 Thread Giacomo Gabrielli via gem5-dev
changeset 4408a83f7881 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=4408a83f7881
description:
arm: Remove unnecessary dependencies between AArch64 FP instructions

diffstat:

 src/arch/arm/isa/templates/vfp64.isa |  15 ---
 1 files changed, 0 insertions(+), 15 deletions(-)

diffs (52 lines):

diff -r ced453290507 -r 4408a83f7881 src/arch/arm/isa/templates/vfp64.isa
--- a/src/arch/arm/isa/templates/vfp64.isa  Mon Mar 02 04:00:38 2015 -0500
+++ b/src/arch/arm/isa/templates/vfp64.isa  Mon Mar 02 04:00:41 2015 -0500
@@ -45,9 +45,6 @@
 _dest, _op1, mode)
 {
 %(constructor)s;
-for (int x = 0; x < _numDestRegs; x++) {
-_srcRegIdx[_numSrcRegs++] = _destRegIdx[x];
-}
 }
 }};
 
@@ -58,9 +55,6 @@
 _dest, _imm, mode)
 {
 %(constructor)s;
-for (int x = 0; x < _numDestRegs; x++) {
-_srcRegIdx[_numSrcRegs++] = _destRegIdx[x];
-}
 }
 }};
 
@@ -74,9 +68,6 @@
  _dest, _op1, _imm, mode)
 {
 %(constructor)s;
-for (int x = 0; x < _numDestRegs; x++) {
-_srcRegIdx[_numSrcRegs++] = _destRegIdx[x];
-}
 }
 }};
 
@@ -90,9 +81,6 @@
  _dest, _op1, _op2, mode)
 {
 %(constructor)s;
-for (int x = 0; x < _numDestRegs; x++) {
-_srcRegIdx[_numSrcRegs++] = _destRegIdx[x];
-}
 }
 }};
 
@@ -119,8 +107,5 @@
  _dest, _op1, _op2, _op3, mode)
 {
 %(constructor)s;
-for (int x = 0; x < _numDestRegs; x++) {
-_srcRegIdx[_numSrcRegs++] = _destRegIdx[x];
-}
 }
 }};
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: tests: Run regression timeout as foreground

2015-03-02 Thread Andreas Hansson via gem5-dev
changeset 9b71309d29f9 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=9b71309d29f9
description:
tests: Run regression timeout as foreground

Allow the user to send signals such as Ctrl C to the gem5 runs. Note
that this assumes coreutils >= 8.13, which aligns with Ubuntu 12.04
and RHE6.

diffstat:

 SConstruct   |  13 +
 tests/SConscript |   2 +-
 2 files changed, 10 insertions(+), 5 deletions(-)

diffs (35 lines):

diff -r 890269a13188 -r 9b71309d29f9 SConstruct
--- a/SConstructMon Mar 02 04:00:28 2015 -0500
+++ b/SConstructMon Mar 02 04:00:29 2015 -0500
@@ -784,10 +784,15 @@
 swig_flags=Split('-c++ -python -modern -templatereduce $_CPPINCFLAGS')
 main.Append(SWIGFLAGS=swig_flags)
 
-# Check for 'timeout' from GNU coreutils.  If present, regressions
-# will be run with a time limit.
-TIMEOUT_version = readCommand(['timeout', '--version'], exception=False)
-main['TIMEOUT'] = TIMEOUT_version and TIMEOUT_version.find('timeout') == 0
+# Check for 'timeout' from GNU coreutils. If present, regressions will
+# be run with a time limit. We require version 8.13 since we rely on
+# support for the '--foreground' option.
+timeout_lines = readCommand(['timeout', '--version'],
+exception='').splitlines()
+# Get the first line and tokenize it
+timeout_version = timeout_lines[0].split() if timeout_lines else []
+main['TIMEOUT'] =  timeout_version and \
+compareVersions(timeout_version[-1], '8.13') >= 0
 
 # filter out all existing swig scanners, they mess up the dependency
 # stuff for some reason
diff -r 890269a13188 -r 9b71309d29f9 tests/SConscript
--- a/tests/SConscript  Mon Mar 02 04:00:28 2015 -0500
+++ b/tests/SConscript  Mon Mar 02 04:00:29 2015 -0500
@@ -107,7 +107,7 @@
 # The slowest regression (bzip2) requires ~2.8 hours;
 # 4 hours was chosen to be conservative.
 elif env['TIMEOUT']:
-cmd = 'timeout 4h %s' % cmd
+cmd = 'timeout --foreground 4h %s' % cmd
 
 # Create a default value for the status string, changed as needed
 # based on the status.
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Fix prefetchSquash + memInhibitAsserted bug

2015-03-02 Thread Ali Jafri via gem5-dev
changeset 245cd4691cbf in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=245cd4691cbf
description:
mem: Fix prefetchSquash + memInhibitAsserted bug

This patch resolves a bug with hardware prefetches. Before a hardware 
prefetch
is sent towards the memory, the system generates a snoop request to 
check all
caches above the prefetch generating cache for the presence of the 
prefetth
target. If the prefetch target is found in the tags or the MSHRs of the 
upper
caches, the cache sets the prefetchSquashed flag in the snoop packet. 
When the
snoop packet returns with the prefetchSquashed flag set, the prefetch
generating cache deallocates the MSHR reserved for the prefetch. If the
prefetch target is found in the writeback buffer of the upper cache, 
the cache
sets the memInhibit flag, which signals the prefetch generating cache to
expect the data from the writeback. When the snoop packet returns with 
the
memInhibitAsserted flag set, it marks the allocated MSHR as inService 
and
waits for the data from the writeback.

If the prefetch target is found in multiple upper level caches, 
specifically
in the tags or MSHRs of one upper level cache and the writeback buffer 
of
another, the snoop packet will return with both prefetchSquashed and
memInhibitAsserted set, while the current code is not written to handle 
such
an outcome. Current code checks for the prefetchSquashed flag first, if 
it
finds the flag, it deallocates the reserved MSHR. This leads to assert 
failure
when the data from the writeback appears at cache. In this fix, we 
simply
switch the order of checks. We first check for memInhibitAsserted and 
then for
prefetch squashed.

diffstat:

 src/mem/cache/cache_impl.hh |  38 ++
 1 files changed, 22 insertions(+), 16 deletions(-)

diffs (55 lines):

diff -r 7f67a8d786a2 -r 245cd4691cbf src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:31 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Mon Mar 02 04:00:34 2015 -0500
@@ -1966,10 +1966,28 @@
 snoop_pkt.senderState = mshr;
 cpuSidePort->sendTimingSnoopReq(&snoop_pkt);
 
-// Check to see if the prefetch was squashed by an upper
-// cache (to prevent us from grabbing the line) or if a
-// writeback arrived between the time the prefetch was
-// placed in the MSHRs and when it was selected to be sent.
+// Check to see if the prefetch was squashed by an upper cache (to
+// prevent us from grabbing the line) or if a Check to see if a
+// writeback arrived between the time the prefetch was placed in
+// the MSHRs and when it was selected to be sent or if the
+// prefetch was squashed by an upper cache.
+
+// It is important to check msmInhibitAsserted before
+// prefetchSquashed. If another cache has asserted MEM_INGIBIT, it
+// will be sending a response which will arrive at the MSHR
+// allocated ofr this request. Checking the prefetchSquash first
+// may result in the MSHR being prematurely deallocated.
+
+if (snoop_pkt.memInhibitAsserted()) {
+// If we are getting a non-shared response it is dirty
+bool pending_dirty_resp = !snoop_pkt.sharedAsserted();
+markInService(mshr, pending_dirty_resp);
+DPRINTF(Cache, "Upward snoop of prefetch for addr"
+" %#x (%s) hit\n",
+tgt_pkt->getAddr(), tgt_pkt->isSecure()? "s": "ns");
+return NULL;
+}
+
 if (snoop_pkt.prefetchSquashed() || blk != NULL) {
 DPRINTF(Cache, "Prefetch squashed by cache.  "
"Deallocating mshr target %#x.\n", mshr->addr);
@@ -1983,18 +2001,6 @@
 return NULL;
 }
 
-// Check if the prefetch hit a writeback in an upper cache
-// and if so we will eventually get a HardPFResp from
-// above
-if (snoop_pkt.memInhibitAsserted()) {
-// If we are getting a non-shared response it is dirty
-bool pending_dirty_resp = !snoop_pkt.sharedAsserted();
-markInService(mshr, pending_dirty_resp);
-DPRINTF(Cache, "Upward snoop of prefetch for addr"
-" %#x (%s) hit\n",
-tgt_pkt->getAddr(), tgt_pkt->isSecure()? "s": "ns");
-return NULL;
-}
 }
 
 pkt = getBusPacket(tgt_pkt, blk, mshr->needsExclusive());
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2636: mem: fix prefetcher bug regarding write buffer hits

2015-03-02 Thread Andreas Hansson via gem5-dev


> On Feb. 10, 2015, 5:37 p.m., Stephan Diestelhorst wrote:
> > I have had a similar impulse, when inspecting this code.  However, the 
> > prefetch hitting a write-back in an upper cache is actually already handled 
> > in Cache::getTimingPacket():
> > 
> > // Check if the prefetch hit a writeback in an upper cache
> > // and if so we will eventually get a HardPFResp from
> > // above
> > if (snoop_pkt.memInhibitAsserted()) {
> > // If we are getting a non-shared response it is dirty
> > bool pending_dirty_resp = !snoop_pkt.sharedAsserted();
> > markInService(mshr, pending_dirty_resp);
> > DPRINTF(Cache, "Upward snoop of prefetch for addr"
> > " %#x (%s) hit\n",
> > tgt_pkt->getAddr(), tgt_pkt->isSecure()? "s": "ns");
> > return NULL;
> > }
> > 
> > We are currently testing a patch that shuffles this thing upwards; we have 
> > more detail tomorrow.  In either way, if we want to go with this patch, it 
> > should address this seemingly dead bit of logic, as well.  My suggestion is 
> > to give this an additional day, and in the meantime test as Andreas has 
> > suggested.
> 
> Steve Reinhardt wrote:
> Good catch, I hadn't noticed that before.  I believe what's happening in 
> George's case is that there are multiple L1s above a shared L2, and one of 
> them has/had the block in O state (dirty shared) and is in the process of 
> writing it back, while others have the block in the shared state.  So the one 
> with the block in writeback goes to write back the data (in accordance with 
> the code snippet you have there), but meanwhile other caches have squashed 
> the prefetch, so your code never gets executed, because this code immediately 
> above it gets executed instead:
> 
> // Check to see if the prefetch was squashed by an upper
> // cache (to prevent us from grabbing the line) or if a
> // writeback arrived between the time the prefetch was
> // placed in the MSHRs and when it was selected to be sent.
> if (snoop_pkt.prefetchSquashed() || blk != NULL) {
> DPRINTF(Cache, "Prefetch squashed by cache.  "
>"Deallocating mshr target %#x.\n", 
> mshr->addr);
> [...]
> return NULL;
> }
> 
> Interestingly the point where the prefetches get squashed is here, in 
> handleSnoop():
> 
> // Invalidate any prefetch's from below that would strip write 
> permissions
> // MemCmd::HardPFReq is only observed by upstream caches.  After 
> missing
> // above and in it's own cache, a new MemCmd::ReadReq is created that
> // downstream caches observe.
> if (pkt->cmd == MemCmd::HardPFReq) {
> DPRINTF(Cache, "Squashing prefetch from lower cache %#x\n",
> pkt->getAddr());
> pkt->setPrefetchSquashed();
> return;
> }
> 
> where despite the language about "strip write permissions", the prefetch 
> appears to get squashed as long as the block is valid, regardless of the 
> state.
> 
> Steve Reinhardt wrote:
> So if nothing else, my commit message describes the bug incorrectly... 
> it's not just a matter of hitting in the write buffer, it's handling the case 
> where it *both* hits in the write buffer of one upper-level cache, and also 
> gets squashed because of a hit in another upper-level cache.  The actual 
> symptom we were seeing was that the response from the cache with the 
> write-buffer copy was causing an assertion, since the receiving cache wasn't 
> expecting a response because it had squashed the prefetch.
> 
> Andreas Hansson wrote:
> Here is the fix: http://reviews.gem5.org/r/2654/

The fix has been pushed. I believe this patch can be discarded.


- Andreas


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2636/#review5885
---


On Feb. 6, 2015, 12:38 a.m., Steve Reinhardt wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2636/
> ---
> 
> (Updated Feb. 6, 2015, 12:38 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 10683:3147f3a868f7
> ---
> mem: fix prefetcher bug regarding write buffer hits
> 
> Prefetches are supposed to be squashed if the block is already
> present in a higher-level cache.  We squash appropriately if
> the block is in a higher-level cache or MSHR, but did not
> properly handle the

Re: [gem5-dev] Review Request 2655: config: Fix for 'android' lookup in disk name

2015-03-02 Thread Andreas Hansson via gem5-dev


> On Feb. 19, 2015, 10:50 p.m., Andreas Hansson wrote:
> > Ship It!

Could someone be kind enough to push this?


- Andreas


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2655/#review5898
---


On Feb. 19, 2015, 10:46 p.m., Rizwana Begum wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2655/
> ---
> 
> (Updated Feb. 19, 2015, 10:46 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 10695:74aaa564d5cc
> ---
> config: Fix for 'android' lookup in disk name
> 
> This patch modifies FSConfig.py to look for 'android' only in disk
> image name. Before this patch, 'android' was searched in full
> disk path.
> 
> 
> Diffs
> -
> 
>   configs/common/FSConfig.py 1a6785e37d81 
> 
> Diff: http://reviews.gem5.org/r/2655/diff/
> 
> 
> Testing
> ---
> 
> All quick and long regressions for ARM passed
> 
> 
> Thanks,
> 
> Rizwana Begum
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2636: mem: fix prefetcher bug regarding write buffer hits

2015-03-02 Thread Steve Reinhardt via gem5-dev
Done.  Thanks for the reminder.

Steve

On Mon, Mar 2, 2015 at 2:59 AM, Andreas Hansson 
wrote:

>This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2636/
>
> On February 10th, 2015, 5:37 p.m. UTC, *Stephan Diestelhorst* wrote:
>
> I have had a similar impulse, when inspecting this code.  However, the 
> prefetch hitting a write-back in an upper cache is actually already handled 
> in Cache::getTimingPacket():
>
> // Check if the prefetch hit a writeback in an upper cache
> // and if so we will eventually get a HardPFResp from
> // above
> if (snoop_pkt.memInhibitAsserted()) {
> // If we are getting a non-shared response it is dirty
> bool pending_dirty_resp = !snoop_pkt.sharedAsserted();
> markInService(mshr, pending_dirty_resp);
> DPRINTF(Cache, "Upward snoop of prefetch for addr"
> " %#x (%s) hit\n",
> tgt_pkt->getAddr(), tgt_pkt->isSecure()? "s": "ns");
> return NULL;
> }
>
> We are currently testing a patch that shuffles this thing upwards; we have 
> more detail tomorrow.  In either way, if we want to go with this patch, it 
> should address this seemingly dead bit of logic, as well.  My suggestion is 
> to give this an additional day, and in the meantime test as Andreas has 
> suggested.
>
>  On February 10th, 2015, 7:44 p.m. UTC, *Steve Reinhardt* wrote:
>
> Good catch, I hadn't noticed that before.  I believe what's happening in 
> George's case is that there are multiple L1s above a shared L2, and one of 
> them has/had the block in O state (dirty shared) and is in the process of 
> writing it back, while others have the block in the shared state.  So the one 
> with the block in writeback goes to write back the data (in accordance with 
> the code snippet you have there), but meanwhile other caches have squashed 
> the prefetch, so your code never gets executed, because this code immediately 
> above it gets executed instead:
>
> // Check to see if the prefetch was squashed by an upper
> // cache (to prevent us from grabbing the line) or if a
> // writeback arrived between the time the prefetch was
> // placed in the MSHRs and when it was selected to be sent.
> if (snoop_pkt.prefetchSquashed() || blk != NULL) {
> DPRINTF(Cache, "Prefetch squashed by cache.  "
>"Deallocating mshr target %#x.\n", mshr->addr);
> [...]
> return NULL;
> }
>
> Interestingly the point where the prefetches get squashed is here, in 
> handleSnoop():
>
> // Invalidate any prefetch's from below that would strip write permissions
> // MemCmd::HardPFReq is only observed by upstream caches.  After missing
> // above and in it's own cache, a new MemCmd::ReadReq is created that
> // downstream caches observe.
> if (pkt->cmd == MemCmd::HardPFReq) {
> DPRINTF(Cache, "Squashing prefetch from lower cache %#x\n",
> pkt->getAddr());
> pkt->setPrefetchSquashed();
> return;
> }
>
> where despite the language about "strip write permissions", the prefetch 
> appears to get squashed as long as the block is valid, regardless of the 
> state.
>
>  On February 10th, 2015, 7:48 p.m. UTC, *Steve Reinhardt* wrote:
>
> So if nothing else, my commit message describes the bug incorrectly... it's 
> not just a matter of hitting in the write buffer, it's handling the case 
> where it *both* hits in the write buffer of one upper-level cache, and also 
> gets squashed because of a hit in another upper-level cache.  The actual 
> symptom we were seeing was that the response from the cache with the 
> write-buffer copy was causing an assertion, since the receiving cache wasn't 
> expecting a response because it had squashed the prefetch.
>
>  On February 13th, 2015, 8:37 a.m. UTC, *Andreas Hansson* wrote:
>
> Here is the fix: http://reviews.gem5.org/r/2654/
>
>  The fix has been pushed. I believe this patch can be discarded.
>
>
> - Andreas
>
> On February 6th, 2015, 12:38 a.m. UTC, Steve Reinhardt wrote:
>   Review request for Default.
> By Steve Reinhardt.
>
> *Updated Feb. 6, 2015, 12:38 a.m.*
>  *Repository: * gem5
> Description
>
> Changeset 10683:3147f3a868f7
> ---
> mem: fix prefetcher bug regarding write buffer hits
>
> Prefetches are supposed to be squashed if the block is already
> present in a higher-level cache.  We squash appropriately if
> the block is in a higher-level cache or MSHR, but did not
> properly handle the case where the block is in the write buffer.
>
> Thanks to George Michelogiannakis  for
> help in tracking down and testing this fix.
>
>   Diffs
>
>- src/mem/cache/cache_impl.hh
>(3d17366c0423a59478ae63d40c8feeea34df218a)
>
> View Diff 

Re: [gem5-dev] Review Request 2655: config: Fix for 'android' lookup in disk name

2015-03-02 Thread Nilay Vaish via gem5-dev

I'll push it along with my own patches.


--
Nilay

On Mon, 2 Mar 2015, Andreas Hansson via gem5-dev wrote:





On Feb. 19, 2015, 10:50 p.m., Andreas Hansson wrote:

Ship It!


Could someone be kind enough to push this?


- Andreas


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2655/#review5898
---


On Feb. 19, 2015, 10:46 p.m., Rizwana Begum wrote:


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2655/
---

(Updated Feb. 19, 2015, 10:46 p.m.)


Review request for Default.


Repository: gem5


Description
---

Changeset 10695:74aaa564d5cc
---
config: Fix for 'android' lookup in disk name

This patch modifies FSConfig.py to look for 'android' only in disk
image name. Before this patch, 'android' was searched in full
disk path.


Diffs
-

  configs/common/FSConfig.py 1a6785e37d81

Diff: http://reviews.gem5.org/r/2655/diff/


Testing
---

All quick and long regressions for ARM passed


Thanks,

Rizwana Begum




___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2636: mem: fix prefetcher bug regarding write buffer hits

2015-03-02 Thread Andreas Hansson via gem5-dev
Great. Let us know if there are still any remaining issues.

We’ve got some additional cache fixes that should be on RB before the end of 
the week.

Andreas

From: Steve Reinhardt mailto:ste...@gmail.com>>
Date: Monday, 2 March 2015 16:47
To: Andreas Hansson mailto:andreas.hans...@arm.com>>
Cc: Stephan Diestelhorst 
mailto:stephan.diestelho...@gmail.com>>, 
Default mailto:gem5-dev@gem5.org>>
Subject: Re: Review Request 2636: mem: fix prefetcher bug regarding write 
buffer hits

Done.  Thanks for the reminder.

Steve

On Mon, Mar 2, 2015 at 2:59 AM, Andreas Hansson 
mailto:andreas.hans...@arm.com>> wrote:
This is an automatically generated e-mail. To reply, visit: 
http://reviews.gem5.org/r/2636/


On February 10th, 2015, 5:37 p.m. UTC, Stephan Diestelhorst wrote:

I have had a similar impulse, when inspecting this code.  However, the prefetch 
hitting a write-back in an upper cache is actually already handled in 
Cache::getTimingPacket():

// Check if the prefetch hit a writeback in an upper cache
// and if so we will eventually get a HardPFResp from
// above
if (snoop_pkt.memInhibitAsserted()) {
// If we are getting a non-shared response it is dirty
bool pending_dirty_resp = !snoop_pkt.sharedAsserted();
markInService(mshr, pending_dirty_resp);
DPRINTF(Cache, "Upward snoop of prefetch for addr"
" %#x (%s) hit\n",
tgt_pkt->getAddr(), tgt_pkt->isSecure()? "s": "ns");
return NULL;
}

We are currently testing a patch that shuffles this thing upwards; we have more 
detail tomorrow.  In either way, if we want to go with this patch, it should 
address this seemingly dead bit of logic, as well.  My suggestion is to give 
this an additional day, and in the meantime test as Andreas has suggested.

On February 10th, 2015, 7:44 p.m. UTC, Steve Reinhardt wrote:

Good catch, I hadn't noticed that before.  I believe what's happening in 
George's case is that there are multiple L1s above a shared L2, and one of them 
has/had the block in O state (dirty shared) and is in the process of writing it 
back, while others have the block in the shared state.  So the one with the 
block in writeback goes to write back the data (in accordance with the code 
snippet you have there), but meanwhile other caches have squashed the prefetch, 
so your code never gets executed, because this code immediately above it gets 
executed instead:

// Check to see if the prefetch was squashed by an upper
// cache (to prevent us from grabbing the line) or if a
// writeback arrived between the time the prefetch was
// placed in the MSHRs and when it was selected to be sent.
if (snoop_pkt.prefetchSquashed() || blk != NULL) {
DPRINTF(Cache, "Prefetch squashed by cache.  "
   "Deallocating mshr target %#x.\n", mshr->addr);
[...]
return NULL;
}

Interestingly the point where the prefetches get squashed is here, in 
handleSnoop():

// Invalidate any prefetch's from below that would strip write permissions
// MemCmd::HardPFReq is only observed by upstream caches.  After missing
// above and in it's own cache, a new MemCmd::ReadReq is created that
// downstream caches observe.
if (pkt->cmd == MemCmd::HardPFReq) {
DPRINTF(Cache, "Squashing prefetch from lower cache %#x\n",
pkt->getAddr());
pkt->setPrefetchSquashed();
return;
}

where despite the language about "strip write permissions", the prefetch 
appears to get squashed as long as the block is valid, regardless of the state.

On February 10th, 2015, 7:48 p.m. UTC, Steve Reinhardt wrote:

So if nothing else, my commit message describes the bug incorrectly... it's not 
just a matter of hitting in the write buffer, it's handling the case where it 
*both* hits in the write buffer of one upper-level cache, and also gets 
squashed because of a hit in another upper-level cache.  The actual symptom we 
were seeing was that the response from the cache with the write-buffer copy was 
causing an assertion, since the receiving cache wasn't expecting a response 
because it had squashed the prefetch.

On February 13th, 2015, 8:37 a.m. UTC, Andreas Hansson wrote:

Here is the fix: http://reviews.gem5.org/r/2654/

The fix has been pushed. I believe this patch can be discarded.


- Andreas


On February 6th, 2015, 12:38 a.m. UTC, Steve Reinhardt wrote:

Review request for Default.
By Steve Reinhardt.

Updated Feb. 6, 2015, 12:38 a.m.

Repository: gem5
Description

Changeset 10683:3147f3a868f7
---
mem: fix prefetcher bug regarding write buffer hits

Prefetches are supposed to be squashed if the block is already
present in a higher-level cache.  We squash appropriately if
the block is in a higher-level ca

Re: [gem5-dev] Review Request 2676: cpu: o3: remove unused stat variables.

2015-03-02 Thread Steve Reinhardt via gem5-dev

---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2676/#review5925
---

Ship it!


Ship It!

- Steve Reinhardt


On Feb. 28, 2015, 2:40 p.m., Nilay Vaish wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2676/
> ---
> 
> (Updated Feb. 28, 2015, 2:40 p.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 10712:bb6de70c386f
> ---
> cpu: o3: remove unused stat variables.
> 
> 
> Diffs
> -
> 
>   src/cpu/o3/commit.hh 4206946d60fe 
>   src/cpu/o3/commit_impl.hh 4206946d60fe 
> 
> Diff: http://reviews.gem5.org/r/2676/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Nilay Vaish
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2665: sim: Reuse the same limit_event in simulate()

2015-03-02 Thread Curtis Dunham via gem5-dev


> On Feb. 25, 2015, 4:26 a.m., Steve Reinhardt wrote:
> > src/sim/sim_events.hh, line 77
> > 
> >
> > seems like it would be safer just to say:
> > 
> > if (scheduled())
> > deschedule();
> > 
> > then if some derived class no longer meets the condition of always 
> > being scheduled, we'll still be OK

I can do this, however I originally made this trade-off because 
BaseGlobalEvent::deschedule() already checks for scheduled().


- Curtis


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2665/#review5913
---


On Feb. 25, 2015, 1:22 a.m., Curtis Dunham wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> http://reviews.gem5.org/r/2665/
> ---
> 
> (Updated Feb. 25, 2015, 1:22 a.m.)
> 
> 
> Review request for Default.
> 
> 
> Repository: gem5
> 
> 
> Description
> ---
> 
> Changeset 10706:e77f25583997
> ---
> sim: Reuse the same limit_event in simulate()
> 
> This patch accomplishes two things:
> 1. Makes simulate() reuse the same GlobalSimLoopExitEvent
>across calls. This is slightly more efficient than recreating
>it every time.
> 2. Gives callers to simulate() (especially other simulators) a
>foolproof way of knowing that the simulation period ended
>successfully by hitting the limit event. They can compare the
>global simulate_limit_event to the return value of simulate().
> 
> This change was motivated by an ongoing effort to integrate gem5
> and SST, with SST as the master sim and gem5 as the slave sim.
> 
> 
> Diffs
> -
> 
>   src/sim/sim_events.hh c6cb94a14fea4c59780d73d1623d7031bcede6af 
>   src/sim/simulate.hh c6cb94a14fea4c59780d73d1623d7031bcede6af 
>   src/sim/simulate.cc c6cb94a14fea4c59780d73d1623d7031bcede6af 
> 
> Diff: http://reviews.gem5.org/r/2665/diff/
> 
> 
> Testing
> ---
> 
> 
> Thanks,
> 
> Curtis Dunham
> 
>

___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev