[gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick

2015-01-22 Thread Cron Daemon via gem5-dev
scons: *** [build/ALPHA/gem5.opt] Error 1
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_MOESI_hammer/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer
 passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MESI_Two_Level/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MESI_Two_Level
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory
 passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token
 passed.
* 
build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token
 passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/o3-timing passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing-ruby 
passed.
* build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-atomic passed.
* build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest passed.
* build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest-filter passed.
* build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-dram-ctrl passed.
* build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-simple-mem passed.
* build/POWER/tests/opt/quick/se/00.hello/power/linux/o3-timing passed.
* build/POWER/tests/opt/quick/se/00.hello/power/linux/simple-atomic passed.
* build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-atomic passed.
* build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing passed.
* build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing-ruby 
passed.
* build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/o3-timing passed.
* build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/simple-atomic 
passed.
* build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/simple-timing 
passed.
* 
build/SPARC/tests/opt/quick/se/40.m5threads-test-atomic/sparc/linux/o3-timing-mp
 passed.
* 
build/SPARC/tests/opt/quick/se/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp
 passed.
* 
build/SPARC/tests/opt/quick/se/40.m5threads-test-atomic/sparc/linux/simple-timing-mp
 passed.
* build/X86/tests/opt/quick/se/00.hello/x86/linux/o3-timing passed.
* build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-atomic passed.
* build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-timing passed.
* build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-timing-ruby passed.
* build/X86/tests/opt/quick/fs/10.linux-boot/x86/linux/pc-simple-atomic 
passed.
* build/X86/tests/opt/quick/fs/10.linux-boot/x86/linux/pc-simple-timing 
passed.
* build/ARM/tests/opt/quick/se/00.hello/arm/linux/minor-timing passed.
* build/ARM/tests/opt/quick/se/00.hello/arm/linux/o3-timing passed.
* build/ARM/tests/opt/quick/se/00.hello/arm/linux/o3-timing-checker passed.
* build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-atomic passed.
* 
build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-atomic-dummychecker 
passed.
* build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-timing passed.
* 
build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic 
passed.
* 
build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic-dual
 passed.
* 
build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-timing 
passed.
* 
build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-timing-dual
 passed.
* 
build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-switcheroo-atomic 
passed.
* 
build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview64-simple-atomic 

[gem5-dev] changeset in gem5: stats: Update stats to reflect x86 table walk...

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset b7ff344c3061 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=b7ff344c3061
description:
stats: Update stats to reflect x86 table walker changes

diffstat:

 tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt   |  106 
+-
 tests/long/fs/10.linux-boot/ref/x86/linux/pc-switcheroo-full/stats.txt |   12 
+-
 2 files changed, 59 insertions(+), 59 deletions(-)

diffs (199 lines):

diff -r e49bf4884c59 -r b7ff344c3061 
tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt
--- a/tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt  Thu Jan 
22 05:00:54 2015 -0500
+++ b/tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt  Thu Jan 
22 05:00:57 2015 -0500
@@ -4,11 +4,11 @@
 sim_ticks5121937205500   # 
Number of ticks simulated
 final_tick   5121937205500   # 
Number of ticks from beginning of simulation (restored from checkpoints and 
never reset)
 sim_freq 1   # 
Frequency of simulated ticks
-host_inst_rate 133395   # 
Simulator instruction rate (inst/s)
-host_op_rate   263673   # 
Simulator op (including micro ops) rate (op/s)
-host_tick_rate 1674179733   # 
Simulator tick rate (ticks/s)
-host_mem_usage 798472   # 
Number of bytes of host memory used
-host_seconds  3059.37   # 
Real time elapsed on the host
+host_inst_rate 250170   # 
Simulator instruction rate (inst/s)
+host_op_rate   494496   # 
Simulator op (including micro ops) rate (op/s)
+host_tick_rate 3139783576   # 
Simulator tick rate (ticks/s)
+host_mem_usage 754660   # 
Number of bytes of host memory used
+host_seconds  1631.30   # 
Real time elapsed on the host
 sim_insts   408103625   # 
Number of instructions simulated
 sim_ops 806672783   # 
Number of ops (including micro ops) simulated
 system.voltage_domain.voltage   1   # 
Voltage in Volts
@@ -808,12 +808,12 @@
 system.cpu.dtb_walker_cache.demand_misses::total76507  
 # number of demand (read+write) misses
 system.cpu.dtb_walker_cache.overall_misses::cpu.dtb.walker76507
   # number of overall misses
 system.cpu.dtb_walker_cache.overall_misses::total76507 
  # number of overall misses
-system.cpu.dtb_walker_cache.ReadReq_miss_latency::cpu.dtb.walker935770692  
 # number of ReadReq miss cycles
-system.cpu.dtb_walker_cache.ReadReq_miss_latency::total935770692   
# number of ReadReq miss cycles
-system.cpu.dtb_walker_cache.demand_miss_latency::cpu.dtb.walker935770692   
# number of demand (read+write) miss cycles
-system.cpu.dtb_walker_cache.demand_miss_latency::total935770692
   # number of demand (read+write) miss cycles
-system.cpu.dtb_walker_cache.overall_miss_latency::cpu.dtb.walker935770692  
 # number of overall miss cycles
-system.cpu.dtb_walker_cache.overall_miss_latency::total935770692   
# number of overall miss cycles
+system.cpu.dtb_walker_cache.ReadReq_miss_latency::cpu.dtb.walker935770691  
 # number of ReadReq miss cycles
+system.cpu.dtb_walker_cache.ReadReq_miss_latency::total935770691   
# number of ReadReq miss cycles
+system.cpu.dtb_walker_cache.demand_miss_latency::cpu.dtb.walker935770691   
# number of demand (read+write) miss cycles
+system.cpu.dtb_walker_cache.demand_miss_latency::total935770691
   # number of demand (read+write) miss cycles
+system.cpu.dtb_walker_cache.overall_miss_latency::cpu.dtb.walker935770691  
 # number of overall miss cycles
+system.cpu.dtb_walker_cache.overall_miss_latency::total935770691   
# number of overall miss cycles
 system.cpu.dtb_walker_cache.ReadReq_accesses::cpu.dtb.walker   190525  
 # number of ReadReq accesses(hits+misses)
 system.cpu.dtb_walker_cache.ReadReq_accesses::total   190525   
# number of ReadReq accesses(hits+misses)
 system.cpu.dtb_walker_cache.demand_accesses::cpu.dtb.walker   190525   

[gem5-dev] changeset in gem5: mem: Always use SenderState for response rout...

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset 8bb4a9717eaa in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=8bb4a9717eaa
description:
mem: Always use SenderState for response routing in RubyPort

This patch aligns how the response routing is done in the RubyPort,
using the SenderState for both memory and I/O accesses. Before this
patch, only the I/O used the SenderState, whereas the memory accesses
relied on the src field in the packet. With this patch we shift to
using SenderState in both cases, thus not relying on the src field any
longer.

diffstat:

 src/mem/ruby/system/RubyPort.cc  |  28 
 src/mem/ruby/system/Sequencer.cc |   2 ++
 2 files changed, 18 insertions(+), 12 deletions(-)

diffs (79 lines):

diff -r bd376adfb7d4 -r 8bb4a9717eaa src/mem/ruby/system/RubyPort.cc
--- a/src/mem/ruby/system/RubyPort.cc   Thu Jan 22 05:01:14 2015 -0500
+++ b/src/mem/ruby/system/RubyPort.cc   Thu Jan 22 05:01:24 2015 -0500
@@ -180,11 +180,6 @@
 // got a response from a device
 assert(pkt-isResponse());
 
-// In FS mode, ruby memory will receive pio responses from devices
-// and it must forward these responses back to the particular CPU.
-DPRINTF(RubyPort,  Pio response for address %#x, going to %d\n,
-pkt-getAddr(), pkt-getDest());
-
 // First we must retrieve the request port from the sender State
 RubyPort::SenderState *senderState =
 safe_castRubyPort::SenderState *(pkt-popSenderState());
@@ -192,6 +187,11 @@
 assert(port != NULL);
 delete senderState;
 
+// In FS mode, ruby memory will receive pio responses from devices
+// and it must forward these responses back to the particular CPU.
+DPRINTF(RubyPort,  Pio response for address %#x, going to %s\n,
+pkt-getAddr(), port-name());
+
 // attempt to send the response in the next cycle
 port-schedTimingResp(pkt, curTick() + g_system_ptr-clockPeriod());
 
@@ -246,9 +246,6 @@
 return true;
 }
 
-// Save the port id to be used later to route the response
-pkt-setSrc(id);
-
 assert(Address(pkt-getAddr()).getOffset() + pkt-getSize() =
RubySystem::getBlockSizeBytes());
 
@@ -259,6 +256,10 @@
 // Otherwise, we need to tell the port to retry at a later point
 // and return false.
 if (requestStatus == RequestStatus_Issued) {
+// Save the port in the sender state object to be used later to
+// route the response
+pkt-pushSenderState(new SenderState(this));
+
 DPRINTF(RubyPort, Request %s 0x%x issued\n, pkt-cmdString(),
 pkt-getAddr());
 return true;
@@ -343,11 +344,14 @@
 assert(system-isMemAddr(pkt-getAddr()));
 assert(pkt-isRequest());
 
-// As it has not yet been turned around, the source field tells us
-// which port it came from.
-assert(pkt-getSrc()  slave_ports.size());
+// First we must retrieve the request port from the sender State
+RubyPort::SenderState *senderState =
+safe_castRubyPort::SenderState *(pkt-popSenderState());
+MemSlavePort *port = senderState-port;
+assert(port != NULL);
+delete senderState;
 
-slave_ports[pkt-getSrc()]-hitCallback(pkt);
+port-hitCallback(pkt);
 
 //
 // If we had to stall the MemSlavePorts, wake them up because the sequencer
diff -r bd376adfb7d4 -r 8bb4a9717eaa src/mem/ruby/system/Sequencer.cc
--- a/src/mem/ruby/system/Sequencer.cc  Thu Jan 22 05:01:14 2015 -0500
+++ b/src/mem/ruby/system/Sequencer.cc  Thu Jan 22 05:01:24 2015 -0500
@@ -547,6 +547,8 @@
 // subBlock with the recieved data.  The tester will later access
 // this state.
 if (m_usingRubyTester) {
+DPRINTF(RubySequencer, hitCallback %s 0x%x using RubyTester\n,
+pkt-cmdString(), pkt-getAddr());
 RubyTester::SenderState* testerSenderState =
 pkt-findNextSenderStateRubyTester::SenderState();
 assert(testerSenderState);
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Make the XBar responsible for tracking r...

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset bd376adfb7d4 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=bd376adfb7d4
description:
mem: Make the XBar responsible for tracking response routing

This patch removes the need for a source and destination field in the
packet by shifting the onus of the tracking to the crossbar, much like
a real implementation. This change in behaviour also means we no
longer need a SenderState to remember the source/dest when ever we
have multiple crossbars in the system. Thus, the stack that was
created by the SenderState is not needed, and each crossbar locally
tracks the response routing.

The fields in the packet are still left behind as the RubyPort (which
also acts as a crossbar) does routing based on them. In the succeeding
patches the uses of the src and dest field will be removed. Combined,
these patches improve the simulation performance by roughly 2%.

diffstat:

 src/mem/coherent_xbar.cc|  140 ---
 src/mem/coherent_xbar.hh|9 +-
 src/mem/noncoherent_xbar.cc |   23 +-
 src/mem/xbar.hh |9 ++
 4 files changed, 108 insertions(+), 73 deletions(-)

diffs (truncated from 360 to 300 lines):

diff -r b7ff344c3061 -r bd376adfb7d4 src/mem/coherent_xbar.cc
--- a/src/mem/coherent_xbar.cc  Thu Jan 22 05:00:57 2015 -0500
+++ b/src/mem/coherent_xbar.cc  Thu Jan 22 05:01:14 2015 -0500
@@ -142,6 +142,10 @@
 
 // remember if the packet is an express snoop
 bool is_express_snoop = pkt-isExpressSnoop();
+bool is_inhibited = pkt-memInhibitAsserted();
+// for normal requests, going downstream, the express snoop flag
+// and the inhibited flag should always be the same
+assert(is_express_snoop == is_inhibited);
 
 // determine the destination based on the address
 PortID master_port_id = findPort(pkt-getAddr());
@@ -163,9 +167,6 @@
 unsigned int pkt_size = pkt-hasData() ? pkt-getSize() : 0;
 unsigned int pkt_cmd = pkt-cmdToIndex();
 
-// set the source port for routing of the response
-pkt-setSrc(slave_port_id);
-
 calcPacketTiming(pkt);
 Tick packetFinishTime = pkt-lastWordDelay + curTick();
 
@@ -187,21 +188,10 @@
 }
 }
 
-// remember if we add an outstanding req so we can undo it if
-// necessary, if the packet needs a response, we should add it
-// as outstanding and express snoops never fail so there is
-// not need to worry about them
-bool add_outstanding = !is_express_snoop  pkt-needsResponse();
-
-// keep track that we have an outstanding request packet
-// matching this request, this is used by the coherency
-// mechanism in determining what to do with snoop responses
-// (in recvTimingSnoop)
-if (add_outstanding) {
-// we should never have an exsiting request outstanding
-assert(outstandingReq.find(pkt-req) == outstandingReq.end());
-outstandingReq.insert(pkt-req);
-}
+// remember if the packet will generate a snoop response
+const bool expect_snoop_resp = !is_inhibited  pkt-memInhibitAsserted();
+const bool expect_response = pkt-needsResponse() 
+!pkt-memInhibitAsserted();
 
 // Note: Cannot create a copy of the full packet, here.
 MemCmd orig_cmd(pkt-cmd);
@@ -224,41 +214,58 @@
 pkt-cmd = tmp_cmd;
 }
 
-// if this is an express snoop, we are done at this point
-if (is_express_snoop) {
-assert(success);
-snoops++;
+// check if we were successful in sending the packet onwards
+if (!success)  {
+// express snoops and inhibited packets should never be forced
+// to retry
+assert(!is_express_snoop);
+assert(!pkt-memInhibitAsserted());
+
+// undo the calculation so we can check for 0 again
+pkt-firstWordDelay = pkt-lastWordDelay = 0;
+
+DPRINTF(CoherentXBar, recvTimingReq: src %s %s 0x%x RETRY\n,
+src_port-name(), pkt-cmdString(), pkt-getAddr());
+
+// update the layer state and schedule an idle event
+reqLayers[master_port_id]-failedTiming(src_port,
+clockEdge(headerCycles));
 } else {
-// for normal requests, check if successful
-if (!success)  {
-// inhibited packets should never be forced to retry
-assert(!pkt-memInhibitAsserted());
+// express snoops currently bypass the crossbar state entirely
+if (!is_express_snoop) {
+// if this particular request will generate a snoop
+// response
+if (expect_snoop_resp) {
+// we should never have an exsiting request outstanding
+assert(outstandingSnoop.find(pkt-req) ==
+   outstandingSnoop.end());
+outstandingSnoop.insert(pkt-req);
 
-// if it was added as outstanding and the 

[gem5-dev] changeset in gem5: mem: Clean up Request initialisation

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset e3fc6bc7f97e in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=e3fc6bc7f97e
description:
mem: Clean up Request initialisation

This patch tidies up how we create and set the fields of a Request. In
essence it tries to use the constructor where possible (as opposed to
setPhys and setVirt), thus avoiding spreading the information across a
number of locations. In fact, setPhys is made private as part of this
patch, and a number of places where we callede setVirt instead uses
the appropriate constructor.

diffstat:

 src/arch/arm/isa.cc|  12 +--
 src/cpu/checker/cpu.cc |   8 +-
 src/cpu/kvm/base.cc|   5 +-
 src/cpu/kvm/base.hh|   3 -
 src/cpu/kvm/x86_cpu.cc |   5 +-
 src/cpu/kvm/x86_cpu.hh |   3 -
 src/cpu/simple/timing.cc   |  16 ++---
 src/cpu/simple/timing.hh   |   2 +-
 src/cpu/testers/memtest/memtest.cc |   6 +-
 src/cpu/testers/networktest/networktest.cc |  12 ++--
 src/mem/port_proxy.cc  |   8 +--
 src/mem/request.hh |  82 ++---
 src/mem/ruby/system/CacheRecorder.cc   |   8 +-
 13 files changed, 76 insertions(+), 94 deletions(-)

diffs (truncated from 469 to 300 lines):

diff -r e5936c2d53a0 -r e3fc6bc7f97e src/arch/arm/isa.cc
--- a/src/arch/arm/isa.cc   Tue Jan 20 14:15:28 2015 -0600
+++ b/src/arch/arm/isa.cc   Thu Jan 22 05:00:53 2015 -0500
@@ -1490,7 +1490,6 @@
   case MISCREG_ATS1HR:
   case MISCREG_ATS1HW:
 {
-  RequestPtr req = new Request;
   unsigned flags = 0;
   BaseTLB::Mode mode = BaseTLB::Read;
   TLB::ArmTranslationType tranType = TLB::NormalTran;
@@ -1562,16 +1561,16 @@
   // can't be an atomic translation because that causes problems
   // with unexpected atomic snoop requests.
   warn(Translating via MISCREG(%d) in functional mode! Fix 
Me!\n, misc_reg);
-  req-setVirt(0, val, 1, flags,  Request::funcMasterId,
-   tc-pcState().pc());
-  req-setThreadContext(tc-contextId(), tc-threadId());
-  fault = tc-getDTBPtr()-translateFunctional(req, tc, mode, 
tranType);
+  Request req(0, val, 1, flags,  Request::funcMasterId,
+  tc-pcState().pc(), tc-contextId(),
+  tc-threadId());
+  fault = tc-getDTBPtr()-translateFunctional(req, tc, mode, 
tranType);
   TTBCR ttbcr = readMiscRegNoEffect(MISCREG_TTBCR);
   HCR   hcr   = readMiscRegNoEffect(MISCREG_HCR);
 
   MiscReg newVal;
   if (fault == NoFault) {
-  Addr paddr = req-getPaddr();
+  Addr paddr = req.getPaddr();
   if (haveLPAE  (ttbcr.eae || tranType  TLB::HypMode ||
  ((tranType  TLB::S1S2NsTran)  hcr.vm) )) {
   newVal = (paddr  mask(39, 12)) |
@@ -1605,7 +1604,6 @@
   MISCREG: Translated addr 0x%08x fault fsr %#x: PAR: 
0x%08x\n,
   val, fsr, newVal);
   }
-  delete req;
   setMiscRegNoEffect(MISCREG_PAR, newVal);
   return;
 }
diff -r e5936c2d53a0 -r e3fc6bc7f97e src/cpu/checker/cpu.cc
--- a/src/cpu/checker/cpu.ccTue Jan 20 14:15:28 2015 -0600
+++ b/src/cpu/checker/cpu.ccThu Jan 22 05:00:53 2015 -0500
@@ -154,8 +154,8 @@
 
 // Need to account for multiple accesses like the Atomic and TimingSimple
 while (1) {
-memReq = new Request();
-memReq-setVirt(0, addr, size, flags, masterId, 
thread-pcState().instAddr());
+memReq = new Request(0, addr, size, flags, masterId,
+ thread-pcState().instAddr(), tc-contextId(), 0);
 
 // translate to physical address
 fault = dtb-translateFunctional(memReq, tc, BaseTLB::Read);
@@ -242,8 +242,8 @@
 
 // Need to account for a multiple access like Atomic and Timing CPUs
 while (1) {
-memReq = new Request();
-memReq-setVirt(0, addr, size, flags, masterId, 
thread-pcState().instAddr());
+memReq = new Request(0, addr, size, flags, masterId,
+ thread-pcState().instAddr(), tc-contextId(), 0);
 
 // translate to physical address
 fault = dtb-translateFunctional(memReq, tc, BaseTLB::Write);
diff -r e5936c2d53a0 -r e3fc6bc7f97e src/cpu/kvm/base.cc
--- a/src/cpu/kvm/base.cc   Tue Jan 20 14:15:28 2015 -0600
+++ b/src/cpu/kvm/base.cc   Thu Jan 22 05:00:53 2015 -0500
@@ -118,8 +118,6 @@
 // initialize CPU, including PC
 if (FullSystem  !switchedOut())
 TheISA::initCPU(tc, tc-contextId());
-
-

[gem5-dev] changeset in gem5: x86: Delay X86 table walk on receiving walker...

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset e49bf4884c59 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=e49bf4884c59
description:
x86: Delay X86 table walk on receiving walker response

This patch fixes a minor issue in the X86 page table walker where it
ended up sending new request packets to the crossbar before the
response processing was finished (recvTimingResp is directly calling
sendTimingReq). Under certain conditions this caused the crossbar to
see illegal combinations of request/response overlap, in turn causing
problems with a slightly modified crossbar implementation.

diffstat:

 src/arch/x86/pagetable_walker.cc |  6 --
 src/arch/x86/pagetable_walker.hh |  8 +++-
 2 files changed, 11 insertions(+), 3 deletions(-)

diffs (41 lines):

diff -r e3fc6bc7f97e -r e49bf4884c59 src/arch/x86/pagetable_walker.cc
--- a/src/arch/x86/pagetable_walker.cc  Thu Jan 22 05:00:53 2015 -0500
+++ b/src/arch/x86/pagetable_walker.cc  Thu Jan 22 05:00:54 2015 -0500
@@ -124,8 +124,10 @@
 delete senderWalk;
 // Since we block requests when another is outstanding, we
 // need to check if there is a waiting request to be serviced
-if (currStates.size())
-startWalkWrapper();
+if (currStates.size()  !startWalkWrapperEvent.scheduled())
+// delay sending any new requests until we are finished
+// with the responses
+schedule(startWalkWrapperEvent, clockEdge());
 }
 return true;
 }
diff -r e3fc6bc7f97e -r e49bf4884c59 src/arch/x86/pagetable_walker.hh
--- a/src/arch/x86/pagetable_walker.hh  Thu Jan 22 05:00:53 2015 -0500
+++ b/src/arch/x86/pagetable_walker.hh  Thu Jan 22 05:00:54 2015 -0500
@@ -183,6 +183,11 @@
 // Wrapper for checking for squashes before starting a translation.
 void startWalkWrapper();
 
+/**
+ * Event used to call startWalkWrapper.
+ **/
+EventWrapperWalker, Walker::startWalkWrapper startWalkWrapperEvent;
+
 // Functions for dealing with packets.
 bool recvTimingResp(PacketPtr pkt);
 void recvRetry();
@@ -207,7 +212,8 @@
 MemObject(params), port(name() + .port, this),
 funcState(this, NULL, NULL, true), tlb(NULL), sys(params-system),
 masterId(sys-getMasterId(name())),
-numSquashable(params-num_squash_per_cycle)
+numSquashable(params-num_squash_per_cycle),
+startWalkWrapperEvent(this)
 {
 }
 };
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Remove unused Packet src and dest fields

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset 87f7b5a07584 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=87f7b5a07584
description:
mem: Remove unused Packet src and dest fields

This patch takes the final step in removing the src and dest fields in
the packet. These fields were rather confusing in that they only
remember a single multiplexing component, and pushed the
responsibility to the bridge and caches to store the fields in a
senderstate, thus effectively creating a stack. With the recent
changes to the crossbar response routing the crossbar is now
responsible without relying on the packet fields. Thus, these
variables are now unused and can be removed.

diffstat:

 src/arch/x86/pagetable_walker.cc |   1 -
 src/mem/packet.hh|  49 +--
 2 files changed, 2 insertions(+), 48 deletions(-)

diffs (112 lines):

diff -r 3a3bb559b112 -r 87f7b5a07584 src/arch/x86/pagetable_walker.cc
--- a/src/arch/x86/pagetable_walker.cc  Thu Jan 22 05:01:30 2015 -0500
+++ b/src/arch/x86/pagetable_walker.cc  Thu Jan 22 05:01:31 2015 -0500
@@ -523,7 +523,6 @@
 write = oldRead;
 write-setuint64_t(pte);
 write-cmd = MemCmd::WriteReq;
-write-clearDest();
 } else {
 write = NULL;
 delete oldRead-req;
diff -r 3a3bb559b112 -r 87f7b5a07584 src/mem/packet.hh
--- a/src/mem/packet.hh Thu Jan 22 05:01:30 2015 -0500
+++ b/src/mem/packet.hh Thu Jan 22 05:01:31 2015 -0500
@@ -296,30 +296,6 @@
 unsigned size;
 
 /**
- * Source port identifier set on a request packet to enable
- * appropriate routing of the responses. The source port
- * identifier is set by any multiplexing component, e.g. a
- * crossbar, as the timing responses need this information to be
- * routed back to the appropriate port at a later point in
- * time. The field can be updated (over-written) as the request
- * packet passes through additional multiplexing components, and
- * it is their responsibility to remember the original source port
- * identifier, for example by using an appropriate sender
- * state. The latter is done in the cache and bridge.
- */
-PortID src;
-
-/**
- * Destination port identifier that is present on all response
- * packets that passed through a multiplexing component as a
- * request packet. The source port identifier is turned into a
- * destination port identifier when the packet is turned into a
- * response, and the destination is used, e.g. by the crossbar, to
- * select the appropriate path through the interconnect.
- */
-PortID dest;
-
-/**
  * The original value of the command field.  Only valid when the
  * current command field is an error condition; in that case, the
  * previous contents of the command field are copied here.  This
@@ -547,18 +523,6 @@
 bool hadBadAddress() const { return cmd == MemCmd::BadAddressError; }
 void copyError(Packet *pkt) { assert(pkt-isError()); cmd = pkt-cmd; }
 
-/// Accessor function to get the source index of the packet.
-PortID getSrc() const { return src; }
-/// Accessor function to set the source index of the packet.
-void setSrc(PortID _src) { src = _src; }
-
-/// Accessor function for the destination index of the packet.
-PortID getDest() const { return dest; }
-/// Accessor function to set the destination index of the packet.
-void setDest(PortID _dest) { dest = _dest; }
-/// Reset destination field, e.g. to turn a response into a request again.
-void clearDest() { dest = InvalidPortID; }
-
 Addr getAddr() const { assert(flags.isSet(VALID_ADDR)); return addr; }
 /**
  * Update the address of this packet mid-transaction. This is used
@@ -609,8 +573,7 @@
  */
 Packet(const RequestPtr _req, MemCmd _cmd)
 :  cmd(_cmd), req(_req), data(nullptr), addr(0), _isSecure(false),
-   size(0), src(InvalidPortID), dest(InvalidPortID),
-   bytesValidStart(0), bytesValidEnd(0),
+   size(0), bytesValidStart(0), bytesValidEnd(0),
firstWordDelay(0), lastWordDelay(0),
senderState(NULL)
 {
@@ -632,7 +595,6 @@
  */
 Packet(const RequestPtr _req, MemCmd _cmd, int _blkSize)
 :  cmd(_cmd), req(_req), data(nullptr), addr(0), _isSecure(false),
-   src(InvalidPortID), dest(InvalidPortID),
bytesValidStart(0), bytesValidEnd(0),
firstWordDelay(0), lastWordDelay(0),
senderState(NULL)
@@ -657,7 +619,6 @@
 :  cmd(pkt-cmd), req(pkt-req),
data(nullptr),
addr(pkt-addr), _isSecure(pkt-_isSecure), size(pkt-size),
-   src(pkt-src), dest(pkt-dest),
bytesValidStart(pkt-bytesValidStart),
bytesValidEnd(pkt-bytesValidEnd),
firstWordDelay(pkt-firstWordDelay),
@@ -743,10 +704,7 @@
 
 

[gem5-dev] changeset in gem5: sim: fix reference counting of PythonEvent

2015-01-22 Thread Curtis Dunham via gem5-dev
changeset a0dab21e422f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=a0dab21e422f
description:
sim: fix reference counting of PythonEvent

When gem5 is a slave to another simulator and the Python is only used
to initialize the configuration (and not perform actual simulation), a
debug start (--debug-start) event will get freed during or immediately
after the initial Python frame's execution rather than remaining in the
event queue. This tricky patch fixes the GC issue causing this.

diffstat:

 src/python/swig/event.i|  4 
 src/python/swig/pyevent.cc |  8 
 src/python/swig/pyevent.hh |  5 +++--
 3 files changed, 11 insertions(+), 6 deletions(-)

diffs (56 lines):

diff -r 87f7b5a07584 -r a0dab21e422f src/python/swig/event.i
--- a/src/python/swig/event.i   Thu Jan 22 05:01:31 2015 -0500
+++ b/src/python/swig/event.i   Tue Dec 23 11:51:40 2014 -0600
@@ -71,6 +71,10 @@
 }
 }
 
+%typemap(out) PythonEvent* {
+   result-object = $result = SWIG_NewPointerObj(SWIG_as_voidptr(result), 
SWIGTYPE_p_PythonEvent, SWIG_POINTER_NEW);
+}
+
 %ignore EventQueue::schedule;
 %ignore EventQueue::deschedule;
 
diff -r 87f7b5a07584 -r a0dab21e422f src/python/swig/pyevent.cc
--- a/src/python/swig/pyevent.ccThu Jan 22 05:01:31 2015 -0500
+++ b/src/python/swig/pyevent.ccTue Dec 23 11:51:40 2014 -0600
@@ -34,10 +34,10 @@
 #include sim/async.hh
 #include sim/eventq.hh
 
-PythonEvent::PythonEvent(PyObject *obj, Priority priority)
-: Event(priority), object(obj)
+PythonEvent::PythonEvent(PyObject *code, Priority priority)
+: Event(priority), eventCode(code)
 {
-if (object == NULL)
+if (code == NULL)
 panic(Passed in invalid object);
 }
 
@@ -49,7 +49,7 @@
 PythonEvent::process()
 {
 PyObject *args = PyTuple_New(0);
-PyObject *result = PyObject_Call(object, args, NULL);
+PyObject *result = PyObject_Call(eventCode, args, NULL);
 Py_DECREF(args);
 
 if (result) {
diff -r 87f7b5a07584 -r a0dab21e422f src/python/swig/pyevent.hh
--- a/src/python/swig/pyevent.hhThu Jan 22 05:01:31 2015 -0500
+++ b/src/python/swig/pyevent.hhTue Dec 23 11:51:40 2014 -0600
@@ -37,9 +37,10 @@
 class PythonEvent : public Event
 {
   private:
-PyObject *object;
+PyObject *eventCode; // PyObject to call to perform event
+  public:
+PyObject *object;// PyObject wrapping this PythonEvent
 
-  public:
 PythonEvent(PyObject *obj, Event::Priority priority);
 ~PythonEvent();
 
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Remove Packet source from ForwardRespons...

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset 3a3bb559b112 in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=3a3bb559b112
description:
mem: Remove Packet source from ForwardResponseRecord

This patch removes the source field from the ForwardResponseRecord,
but keeps the class as it is part of how the cache identifies
responses to hardware prefetches that are snooped upwards.

diffstat:

 src/mem/cache/cache_impl.hh |  11 +--
 1 files changed, 5 insertions(+), 6 deletions(-)

diffs (42 lines):

diff -r 1de300588c4f -r 3a3bb559b112 src/mem/cache/cache_impl.hh
--- a/src/mem/cache/cache_impl.hh   Thu Jan 22 05:01:27 2015 -0500
+++ b/src/mem/cache/cache_impl.hh   Thu Jan 22 05:01:30 2015 -0500
@@ -385,10 +385,7 @@
 {
   public:
 
-PortID prevSrc;
-
-ForwardResponseRecord(PortID prev_src) : prevSrc(prev_src)
-{}
+ForwardResponseRecord() {}
 };
 
 templateclass TagStore
@@ -407,6 +404,9 @@
 assert(!system-bypassCaches());
 
 if (rec == NULL) {
+// @todo What guarantee do we have that this HardPFResp is
+// actually for this cache, and not a cache closer to the
+// memory?
 assert(pkt-cmd == MemCmd::HardPFResp);
 // Check if it's a prefetch response and handle it. We shouldn't
 // get any other kinds of responses without FRRs.
@@ -417,7 +417,6 @@
 }
 
 pkt-popSenderState();
-pkt-setDest(rec-prevSrc);
 delete rec;
 // @todo someone should pay for this
 pkt-firstWordDelay = pkt-lastWordDelay = 0;
@@ -1542,7 +1541,7 @@
 if (is_timing) {
 Packet snoopPkt(pkt, true, false);  // clear flags, no allocation
 snoopPkt.setExpressSnoop();
-snoopPkt.pushSenderState(new ForwardResponseRecord(pkt-getSrc()));
+snoopPkt.pushSenderState(new ForwardResponseRecord());
 // the snoop packet does not need to wait any additional
 // time
 snoopPkt.firstWordDelay = snoopPkt.lastWordDelay = 0;
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


[gem5-dev] changeset in gem5: mem: Remove unused RequestState in the bridge

2015-01-22 Thread Andreas Hansson via gem5-dev
changeset 1de300588c4f in /z/repo/gem5
details: http://repo.gem5.org/gem5?cmd=changeset;node=1de300588c4f
description:
mem: Remove unused RequestState in the bridge

This patch removes the bridge sender state as the Crossbar now takes
care of remembering its own routing decisions.

diffstat:

 src/mem/bridge.cc |  22 --
 src/mem/bridge.hh |  17 -
 2 files changed, 0 insertions(+), 39 deletions(-)

diffs (66 lines):

diff -r 8bb4a9717eaa -r 1de300588c4f src/mem/bridge.cc
--- a/src/mem/bridge.cc Thu Jan 22 05:01:24 2015 -0500
+++ b/src/mem/bridge.cc Thu Jan 22 05:01:27 2015 -0500
@@ -207,15 +207,6 @@
 void
 Bridge::BridgeMasterPort::schedTimingReq(PacketPtr pkt, Tick when)
 {
-// If we expect to see a response, we need to restore the source
-// and destination field that is potentially changed by a second
-// crossbar
-if (!pkt-memInhibitAsserted()  pkt-needsResponse()) {
-// Update the sender state so we can deal with the response
-// appropriately
-pkt-pushSenderState(new RequestState(pkt-getSrc()));
-}
-
 // If we're about to put this packet at the head of the queue, we
 // need to schedule an event to do the transmit.  Otherwise there
 // should already be an event scheduled for sending the head
@@ -233,19 +224,6 @@
 void
 Bridge::BridgeSlavePort::schedTimingResp(PacketPtr pkt, Tick when)
 {
-// This is a response for a request we forwarded earlier.  The
-// corresponding request state should be stored in the packet's
-// senderState field.
-RequestState *req_state =
-dynamic_castRequestState*(pkt-popSenderState());
-assert(req_state != NULL);
-pkt-setDest(req_state-origSrc);
-delete req_state;
-
-// the bridge sets the destination irrespective of it is valid or
-// not, as it is checked in the crossbar
-DPRINTF(Bridge, response, new dest %d\n, pkt-getDest());
-
 // If we're about to put this packet at the head of the queue, we
 // need to schedule an event to do the transmit.  Otherwise there
 // should already be an event scheduled for sending the head
diff -r 8bb4a9717eaa -r 1de300588c4f src/mem/bridge.hh
--- a/src/mem/bridge.hh Thu Jan 22 05:01:24 2015 -0500
+++ b/src/mem/bridge.hh Thu Jan 22 05:01:27 2015 -0500
@@ -75,23 +75,6 @@
   protected:
 
 /**
- * A bridge request state stores packets along with their sender
- * state and original source. It has enough information to also
- * restore the response once it comes back to the bridge.
- */
-class RequestState : public Packet::SenderState
-{
-
-  public:
-
-const PortID origSrc;
-
-RequestState(PortID orig_src) : origSrc(orig_src)
-{ }
-
-};
-
-/**
  * A deferred packet stores a packet along with its scheduled
  * transmission time
  */
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.

2015-01-22 Thread Dutu, Alexandru via gem5-dev
From AMD's system programming manual:

SYSCALL
New selectors are loaded, without permission checking (see above), as follows:

Bits 47:32 of the STAR register specify the selector that is copied into the CS 
register.
Bits 47:32 of the STAR register + 8 specify the selector that is copied into 
the SS register.
The CS_base and the SS_base are both forced to zero.
The CS_limit and the SS_limit are both forced to 4 Gbyte.
The CS segment attributes are set to execute/read 64-bit code with a CPL of 
zero.
The SS segment attributes are set to read/write and expand-up with a 64-bit 
stack referenced by RSP.

SYSRET

When a system procedure performs a SYSRET back to application software, the CS 
selector is updated from bits 63:50 of the STAR register (STAR.SYSRET_CS) as 
follows:
If the return is to 32-bit mode (legacy or compatibility), CS is updated with 
the value of STAR.SYSRET_CS.
If the return is to 64-bit mode, CS is updated with the value of STAR.SYSRET_CS 
+ 16.
In both cases, the CPL is forced to 3, effectively ignoring STAR bits 49:48. 
The SS selector is updated to point to the next descriptor-table entry after 
the CS descriptor (STAR.SYSRET_CS + 8), and its RPL is not forced to 3.

I am wondering if we could detect the CPU vendor with CPUID and have different 
setup of the GDT based on the platform you are running on. Could scons actually 
detect this at build time?

Best,
Alex

From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of Gabe Black via gem5-dev 
[gem5-dev@gem5.org]
Sent: Wednesday, January 21, 2015 3:48 PM
To: mike upton; Default; Gabe Black
Subject: Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE 
and FS on Intel CPUs.

 On Jan. 21, 2015, 9:22 p.m., mike upton wrote:
  src/arch/x86/process.cc, lines 218-237
  http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218
 
  For AMD systems, the sys descriptors need to come first. On intel 
  systems they need to come second.
 
  I do not know how to resolve...

 mike upton wrote:
 I have been debugging why patch rb2557 breaks AMD KVM functionality.




 I was hoping to get to code that would work on both intel and AMD 
 platforms, but am not there yet.




 This patch is to be applied on top of rb2557.patch.




 There are 2 main issues, neither of which I understand well enough to 
 take much further.




 The first issue is that the order that the segment descriptors get 
 instantiated in the GDT table seems to matter between AMD and Intel, and they 
 seem to be mutually incompatible.




 AMD wants:

 csSys

 dsSys

 ds

 cs




 Intel wants:

 ds

 cs

 dsSys

 csSys




 I am not sure the relative ordering of ds and cs within a class matters, 
 only that AMD wants the Sys ones first, and Intel wants them second.




 There is also an issue with how 'star' gets defined.

 I can not make the Intel code work for AMD.




 Both issues are addressed in this patch.




 The patch makes the AMD system work, but breaks Intel functionality.




 I am also not sure how to upload this into review board. Do I create a 
 separate patch from TOT, or can I somehow attach this to rb2557.




 Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at 
 my 'Peter Principal Limit' as far as my understanding goes.




 I think it would be really ugly to have a machine-type test to version 
 the code...

You should grab a copy of the architecture manual. From there:


STAR—The STAR register has the following fields (unless otherwise noted, all 
bits are
read/write):
- SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify both the 
CS and SS
selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 32-bit 
mode
(either legacy or compatibility), this field is copied directly into the CS 
selector field. If
SYSRET is returning to 64-bit mode, the CS selector is set to this field + 16. 
SS.Sel is set to
this field + 8, regardless of the target mode. Because SYSRET always returns to 
CPL 3, the
RPL bits 49:48 should be initialized to 11b.
- SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify both 
the CS and SS
selectors loaded into CS and SS during SYSCALL. This field is copied directly 
into CS.Sel.
SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, the 
RPL bits
33:32 should be initialized to 00b.


That's why the order matters and is what it is.


- Gabe


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2557/#review5782
---


On Dec. 10, 2014, 10:11 a.m., Gabe Black wrote:

 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2557/
 

[gem5-dev] Uncachable memory requests in Ruby

2015-01-22 Thread Mohammad Alian via gem5-dev
Hello,
How can I force a request to be uncacheable when using Ruby memory 
system?req-setFlags(Request::UNCACHEABLE) works for classic memory system 
but it doesn't have any effect on the request while using Ruby.
Thank you,Mohammad
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Bug With Thread Suspend Instructions, Interrupts, x86 O3 CPU?

2015-01-22 Thread Joel Hestness via gem5-dev
Hey Gabe,
  Thanks for the suggestion. This work-around doesn't appear to work. In
the O3CPU, the instruction still does not get committed due to the fault
(DefaultCommitImpl::commitHead(suspend instruction) generates a trap
and returns that the instruction cannot be committed). After thread
reactivation, the instruction is executed again causing a the thread to
suspend. The SimpleTiming CPU has a similar issue that it executes the
fault and suspends the thread, but while the thread is suspended, the core
appears to just continue trying to execute the suspend instruction.

  It seems like the right way to fix this may be to introduce a
ThreadContext state, say Activating, which the thread is put into when
activate() is called on it, and the thread is not allowed to enter the
Active state until the ROB has been cleared (i.e. any remaining
instructions from before the suspend are squashed and retired). Does this
sound reasonable?

  Thanks,
  Joel


On Tue, Jan 20, 2015 at 12:32 PM, Gabe Black via gem5-dev gem5-dev@gem5.org
 wrote:

 It sounds like a bug/race condition in the O3 CPU, which I think you
 already knew. You could try moving the suspend call into a fault returned
 by the MicroHalt microop instead of the instruction itself. That might
 break the race, although it's not really fixing the issue with O3.

 Gabe

 On Tue, Jan 20, 2015 at 8:43 AM, Joel Hestness via gem5-dev 
 gem5-dev@gem5.org wrote:

  Hi guys,
I'm running into a very tricky problem with halt/suspend x86
 instructions
  with the O3 CPU. This might be a question for Nilay, Gabe B. or Mitch H.,
  and I'm really hoping for input given the complexity of this one.
 
The specific problem is that when calling suspend from the execute
 stage
  of an instruction (e.g. a pseudoinstruction or the x86 MicroHalt
 microop),
  the CPU context gets suspended, but after reactivating the context later,
  the instruction gets squashed and replayed, potentially causing the
 context
  to get suspended again immediately. The pseudoinstruction that I'm using
  doesn't do anything except call the thread context suspend, and the
  functionality is nearly identical to that of the MicroHalt op (I've now
  tried swapping in the MicroHalt and run into the same problem, so I
 suspect
  this may also affect the MWAIT implementation). The instruction that
  suspends the context moves to the commit buffer in the core, but cannot
 be
  committed before the thread is suspended. When the thread is restarted,
 the
  commit stage squashes all instructions, retiring the suspend instruction,
  and fetch starts back at the PC of the suspend instruction. In cases that
  appear to execute correctly, the pipeline re-fetches the suspend
  instruction, but it gets squashed from the commit stage and removed from
  the instruction list. In apparently broken cases, the instruction does
 not
  get squashed, so the thread goes back to sleep. Interrupts can jar the
 CPU
  out of the incorrect suspend loop, but sometimes it takes 3-6 interrupts
  (i.e. up to 10s of milliseconds).
 
Some details: I'm currently using gem5 revision 10237:b2850bdcec07 and
  the bug occurs in long-running sims in FS mode (single-threaded cores -
 no
  SMT). I've also pulled some more recent changeset and applied them to my
  repo, since they address O3 CPU issues: 10239
  http://repo.gem5.org/gem5/rev/592f0bb6bd6f, 10327
  http://repo.gem5.org/gem5/rev/5b6279635c49, 10328
  http://repo.gem5.org/gem5/rev/867b536a68be, 10329
  http://repo.gem5.org/gem5/rev/12e3be8203a5, 10331
  http://repo.gem5.org/gem5/rev/ed05298e8566, 10332
  http://repo.gem5.org/gem5/rev/1ba825974ee6, 10340
  http://repo.gem5.org/gem5/rev/40d24a672351. I'm unable to reproduce
 the
  bug in SE mode, and I suspect that sporadic interrupt handling in O3 may
 be
  part of the problem, since the examples that I can generate show CPU
  interrupts raised in close proximity to the thread suspend and activate
  activity.
 
I've attached a annotated O3 execution traces for seemingly correct and
  incorrect instances. Here are some specific questions I'm hoping for help
  with:
 
1) Are there any other known changes in the mainline repo that might
 fix
  this?
 
2) If not (1), I'm not clear on the purpose of retiring the suspend
  instruction (rather than committing) after reactivating the thread
 context.
  I can understand that the full pipeline squash would be a standard
  procedure after reactivating a thread. However, the suspend instruction
  finishes execution in the same cycle that thread suspend starts, so it
  should be free to commit. Since the suspend instruction doesn't commit,
 it
  is pointed to as the next PC after thread reactivation, which allows it
 to
  be reexecuted incorrectly. Is this retirement process the intended
 behavior
  for these thread suspend instructions like these? It seems like this
 might
  be a corner case where the suspend instruction should get committed
 rather
  than squashed/retired, 

[gem5-dev] get vfp/simd register value in ARMv8 simulation

2015-01-22 Thread Meng Wang via gem5-dev
Hello,
I'm trying to get the vfp/simd register values during simulation. I
find a clue in src/cpu/simple_thread.hh, there defined an array
`FloatReg f[TheISA::NumFloatRegs];`. Also I checked
ArmISA::NumFloatRegs value. It's 160.

Can I get V0-V31 register values by reading the FloatReg array? Also,
according to the manual, there should be 32 128-bit vfp/simd
registers. How do they map to the 160 elements of FloatReg array?
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev


Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.

2015-01-22 Thread mike upton via gem5-dev
OK,

I believe I have a patch that unifies the code for both AMD and Intel.

Do I post it as a separate review-board item?

On Thu, Jan 22, 2015 at 11:32 AM, mike upton michaelup...@gmail.com wrote:

This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2557/

 On January 21st, 2015, 9:22 p.m. UTC, *mike upton* wrote:

   src/arch/x86/process.cc
 http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218 (Diff
 revision 2)

 X86_64LiveProcess::initState()

   217

 SegDescriptor csLowPLDesc = initDesc;

   218

 csLowPLDesc.type.codeOrData = 1;

   219

 csLowPLDesc.dpl = 0;

   220

 uint64_t csLowPLDescVal = csLowPLDesc;

   221

 physProxy.writeBlob(GDTPhysAddr + numGDTEntries * 8,

   222

 (uint8_t *)(csLowPLDescVal), 8);

   223

   224

 numGDTEntries++;

   225

   226

 SegSelector csLowPL = 0;

   227

 csLowPL.si = numGDTEntries - 1;

   228

 csLowPL.rpl = 0;

   229

   230

 //64 bit data segment

   231

 SegDescriptor dsLowPLDesc = initDesc;

   232

 dsLowPLDesc.type.codeOrData = 0;

   233

 dsLowPLDesc.dpl = 0;

   234

 uint64_t dsLowPLDescVal = dsLowPLDesc;

   235

 physProxy.writeBlob(GDTPhysAddr + numGDTEntries * 8,

   236

 (uint8_t *)(dsLowPLDescVal), 8);

For AMD systems, the sys descriptors need to come first. On intel systems 
 they need to come second.

 I do not know how to resolve...

  On January 21st, 2015, 9:23 p.m. UTC, *mike upton* wrote:

 I have been debugging why patch rb2557 breaks AMD KVM functionality.




 I was hoping to get to code that would work on both intel and AMD platforms, 
 but am not there yet.




 This patch is to be applied on top of rb2557.patch.




 There are 2 main issues, neither of which I understand well enough to take 
 much further.




 The first issue is that the order that the segment descriptors get 
 instantiated in the GDT table seems to matter between AMD and Intel, and they 
 seem to be mutually incompatible.




 AMD wants:

 csSys

 dsSys

 ds

 cs




 Intel wants:

 ds

 cs

 dsSys

 csSys




 I am not sure the relative ordering of ds and cs within a class matters, only 
 that AMD wants the Sys ones first, and Intel wants them second.




 There is also an issue with how 'star' gets defined.

 I can not make the Intel code work for AMD.




 Both issues are addressed in this patch.




 The patch makes the AMD system work, but breaks Intel functionality.




 I am also not sure how to upload this into review board. Do I create a 
 separate patch from TOT, or can I somehow attach this to rb2557.




 Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at my 
 'Peter Principal Limit' as far as my understanding goes.




 I think it would be really ugly to have a machine-type test to version the 
 code...

  On January 21st, 2015, 9:48 p.m. UTC, *Gabe Black* wrote:

 You should grab a copy of the architecture manual. From there:


 STAR—The STAR register has the following fields (unless otherwise noted, all 
 bits are
 read/write):
 - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify both 
 the CS and SS
 selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 
 32-bit mode
 (either legacy or compatibility), this field is copied directly into the CS 
 selector field. If
 SYSRET is returning to 64-bit mode, the CS selector is set to this field + 
 16. SS.Sel is set to
 this field + 8, regardless of the target mode. Because SYSRET always returns 
 to CPL 3, the
 RPL bits 49:48 should be initialized to 11b.
 - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify both 
 the CS and SS
 selectors loaded into CS and SS during SYSCALL. This field is copied directly 
 into CS.Sel.
 SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, 
 the RPL bits
 33:32 should be initialized to 00b.


 That's why the order matters and is what it is.

  On January 21st, 2015, 11:15 p.m. UTC, *mike upton* wrote:

 AMD and Intel use different solutions, right?

 AMD: Syscall, sysret
 Intel: Sysenter, sysexit

 Do we need independent code streams for each?

 The original code worked for AMD, but not intel.
 The current 2557 patch works for Intel, but not AMD.

  On January 21st, 2015, 11:35 p.m. UTC, *Gabe Black* wrote:

 Yeah, there are some differences between the two. I think both support both 
 pairs of instructions, but I think one or the other only works in 32 bit mode 
 on for one of the vendors, or something along those lines. At one point I 
 could have told you exactly what the difference was, but now I'd have to 
 check the manuals. My expectation/hope is that a single GDT layout would work 
 for both. I doubt the kernel, for instance, specializes its layout based on 
 who's CPU it's running on.

  OK, it seems like the feedback is 

Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.

2015-01-22 Thread Steve Reinhardt via gem5-dev
Glad you all seem to be figuring this out.  I think we should do runtime
rather than compile-time detection though, as some people may have mixed
clusters they want to run the same binary on, or may cross-compile.

Steve

On Wed, Jan 21, 2015 at 3:30 PM, Dutu, Alexandru via gem5-dev 
gem5-dev@gem5.org wrote:

 From AMD's system programming manual:

 SYSCALL
 New selectors are loaded, without permission checking (see above), as
 follows:

 Bits 47:32 of the STAR register specify the selector that is copied into
 the CS register.
 Bits 47:32 of the STAR register + 8 specify the selector that is copied
 into the SS register.
 The CS_base and the SS_base are both forced to zero.
 The CS_limit and the SS_limit are both forced to 4 Gbyte.
 The CS segment attributes are set to execute/read 64-bit code with a CPL
 of zero.
 The SS segment attributes are set to read/write and expand-up with a
 64-bit stack referenced by RSP.

 SYSRET

 When a system procedure performs a SYSRET back to application software,
 the CS selector is updated from bits 63:50 of the STAR register
 (STAR.SYSRET_CS) as follows:
 If the return is to 32-bit mode (legacy or compatibility), CS is updated
 with the value of STAR.SYSRET_CS.
 If the return is to 64-bit mode, CS is updated with the value of
 STAR.SYSRET_CS + 16.
 In both cases, the CPL is forced to 3, effectively ignoring STAR bits
 49:48. The SS selector is updated to point to the next descriptor-table
 entry after the CS descriptor (STAR.SYSRET_CS + 8), and its RPL is not
 forced to 3.

 I am wondering if we could detect the CPU vendor with CPUID and have
 different setup of the GDT based on the platform you are running on. Could
 scons actually detect this at build time?

 Best,
 Alex
 
 From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of Gabe Black via
 gem5-dev [gem5-dev@gem5.org]
 Sent: Wednesday, January 21, 2015 3:48 PM
 To: mike upton; Default; Gabe Black
 Subject: Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in
 SE and FS on Intel CPUs.

  On Jan. 21, 2015, 9:22 p.m., mike upton wrote:
   src/arch/x86/process.cc, lines 218-237
   http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218
  
   For AMD systems, the sys descriptors need to come first. On intel
 systems they need to come second.
  
   I do not know how to resolve...
 
  mike upton wrote:
  I have been debugging why patch rb2557 breaks AMD KVM functionality.
 
 
 
 
  I was hoping to get to code that would work on both intel and AMD
 platforms, but am not there yet.
 
 
 
 
  This patch is to be applied on top of rb2557.patch.
 
 
 
 
  There are 2 main issues, neither of which I understand well enough
 to take much further.
 
 
 
 
  The first issue is that the order that the segment descriptors get
 instantiated in the GDT table seems to matter between AMD and Intel, and
 they seem to be mutually incompatible.
 
 
 
 
  AMD wants:
 
  csSys
 
  dsSys
 
  ds
 
  cs
 
 
 
 
  Intel wants:
 
  ds
 
  cs
 
  dsSys
 
  csSys
 
 
 
 
  I am not sure the relative ordering of ds and cs within a class
 matters, only that AMD wants the Sys ones first, and Intel wants them
 second.
 
 
 
 
  There is also an issue with how 'star' gets defined.
 
  I can not make the Intel code work for AMD.
 
 
 
 
  Both issues are addressed in this patch.
 
 
 
 
  The patch makes the AMD system work, but breaks Intel functionality.
 
 
 
 
  I am also not sure how to upload this into review board. Do I create
 a separate patch from TOT, or can I somehow attach this to rb2557.
 
 
 
 
  Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I
 am at my 'Peter Principal Limit' as far as my understanding goes.
 
 
 
 
  I think it would be really ugly to have a machine-type test to
 version the code...

 You should grab a copy of the architecture manual. From there:


 STAR—The STAR register has the following fields (unless otherwise noted,
 all bits are
 read/write):
 - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify
 both the CS and SS
 selectors loaded into CS and SS during SYSRET. If SYSRET is returning to
 32-bit mode
 (either legacy or compatibility), this field is copied directly into the
 CS selector field. If
 SYSRET is returning to 64-bit mode, the CS selector is set to this field +
 16. SS.Sel is set to
 this field + 8, regardless of the target mode. Because SYSRET always
 returns to CPL 3, the
 RPL bits 49:48 should be initialized to 11b.
 - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify
 both the CS and SS
 selectors loaded into CS and SS during SYSCALL. This field is copied
 directly into CS.Sel.
 SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0,
 the RPL bits
 33:32 should be initialized to 00b.


 That's why the order matters and is what it is.


 - Gabe


 

Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.

2015-01-22 Thread mike upton via gem5-dev


 On Jan. 21, 2015, 9:22 p.m., mike upton wrote:
  src/arch/x86/process.cc, lines 218-237
  http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218
 
  For AMD systems, the sys descriptors need to come first. On intel 
  systems they need to come second.
  
  I do not know how to resolve...
 
 mike upton wrote:
 I have been debugging why patch rb2557 breaks AMD KVM functionality.
 
 
 
 
 I was hoping to get to code that would work on both intel and AMD 
 platforms, but am not there yet.
 
 
 
 
 This patch is to be applied on top of rb2557.patch.
 
 
 
 
 There are 2 main issues, neither of which I understand well enough to 
 take much further.
 
 
 
 
 The first issue is that the order that the segment descriptors get 
 instantiated in the GDT table seems to matter between AMD and Intel, and they 
 seem to be mutually incompatible.
 
 
 
 
 AMD wants:
 
 csSys
 
 dsSys
 
 ds
 
 cs
 
 
 
 
 Intel wants:
 
 ds
 
 cs
 
 dsSys
 
 csSys
 
 
 
 
 I am not sure the relative ordering of ds and cs within a class matters, 
 only that AMD wants the Sys ones first, and Intel wants them second.
 
 
 
 
 There is also an issue with how 'star' gets defined.
 
 I can not make the Intel code work for AMD.
 
 
 
 
 Both issues are addressed in this patch.
 
 
 
 
 The patch makes the AMD system work, but breaks Intel functionality.
 
 
 
 
 I am also not sure how to upload this into review board. Do I create a 
 separate patch from TOT, or can I somehow attach this to rb2557.
 
 
 
 
 Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at 
 my 'Peter Principal Limit' as far as my understanding goes.
 
 
 
 
 I think it would be really ugly to have a machine-type test to version 
 the code...
 
 Gabe Black wrote:
 You should grab a copy of the architecture manual. From there:
 
 
 STAR—The STAR register has the following fields (unless otherwise noted, 
 all bits are
 read/write):
 - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify 
 both the CS and SS
 selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 
 32-bit mode
 (either legacy or compatibility), this field is copied directly into the 
 CS selector field. If
 SYSRET is returning to 64-bit mode, the CS selector is set to this field 
 + 16. SS.Sel is set to
 this field + 8, regardless of the target mode. Because SYSRET always 
 returns to CPL 3, the
 RPL bits 49:48 should be initialized to 11b.
 - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify 
 both the CS and SS
 selectors loaded into CS and SS during SYSCALL. This field is copied 
 directly into CS.Sel.
 SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 
 0, the RPL bits
 33:32 should be initialized to 00b.
 
 
 That's why the order matters and is what it is.
 
 mike upton wrote:
 AMD and Intel use different solutions, right?
 
 AMD: Syscall, sysret
 Intel: Sysenter, sysexit
 
 Do we need independent code streams for each?
 
 The original code worked for AMD, but not intel.
 The current 2557 patch works for Intel, but not AMD.
 
 Gabe Black wrote:
 Yeah, there are some differences between the two. I think both support 
 both pairs of instructions, but I think one or the other only works in 32 bit 
 mode on for one of the vendors, or something along those lines. At one point 
 I could have told you exactly what the difference was, but now I'd have to 
 check the manuals. My expectation/hope is that a single GDT layout would work 
 for both. I doubt the kernel, for instance, specializes its layout based on 
 who's CPU it's running on.

OK, it seems like the feedback is that we do need to runtime test the CPU we 
are running on and do CPU specific code.
I certainly can code this up.

Any pointers to what state is available in the simulator to test?
Or should I just add a cpuHostType() routine that will return Intel, AMD, or 
UNKNOWN?


- mike


---
This is an automatically generated e-mail. To reply, visit:
http://reviews.gem5.org/r/2557/#review5782
---


On Dec. 10, 2014, 10:11 a.m., Gabe Black wrote:
 
 ---
 This is an automatically generated e-mail. To reply, visit:
 http://reviews.gem5.org/r/2557/
 ---
 
 (Updated Dec. 10, 2014, 10:11 a.m.)
 
 
 Review request for Default.
 
 
 Repository: gem5
 
 
 Description
 ---
 
 

[gem5-dev] memory change during system call emulation

2015-01-22 Thread Meng Wang via gem5-dev
Hello,
I am writing a trace probe for AtomicSimpleCPU. The simulation is
planed to run in SE mode. For user-level instruction, I can get
address and data of memory access by *traceData* member in
BasicSimpleCPU. But for system call, I don't know how to collect the
memory change during the syscall emulation, especially when read/write
syscalls are emulated. Anybody can provide a clue for me?

Thanks,
Meng
___
gem5-dev mailing list
gem5-dev@gem5.org
http://m5sim.org/mailman/listinfo/gem5-dev