[gem5-dev] Cron m5test@zizzer /z/m5/regression/do-regression quick
scons: *** [build/ALPHA/gem5.opt] Error 1 * build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_hammer passed. * build/ALPHA_MOESI_hammer/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_hammer passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MESI_Two_Level passed. * build/ALPHA_MESI_Two_Level/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MESI_Two_Level passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_directory/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_directory passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/linux/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/00.hello/alpha/tru64/simple-timing-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/50.memtest/alpha/linux/memtest-ruby-MOESI_CMP_token passed. * build/ALPHA_MOESI_CMP_token/tests/opt/quick/se/60.rubytest/alpha/linux/rubytest-ruby-MOESI_CMP_token passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/o3-timing passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-timing-ruby passed. * build/MIPS/tests/opt/quick/se/00.hello/mips/linux/simple-atomic passed. * build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest passed. * build/NULL/tests/opt/quick/se/50.memtest/null/none/memtest-filter passed. * build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-dram-ctrl passed. * build/NULL/tests/opt/quick/se/70.tgen/null/none/tgen-simple-mem passed. * build/POWER/tests/opt/quick/se/00.hello/power/linux/o3-timing passed. * build/POWER/tests/opt/quick/se/00.hello/power/linux/simple-atomic passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-atomic passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing passed. * build/SPARC/tests/opt/quick/se/00.hello/sparc/linux/simple-timing-ruby passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/o3-timing passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/simple-atomic passed. * build/SPARC/tests/opt/quick/se/02.insttest/sparc/linux/simple-timing passed. * build/SPARC/tests/opt/quick/se/40.m5threads-test-atomic/sparc/linux/o3-timing-mp passed. * build/SPARC/tests/opt/quick/se/40.m5threads-test-atomic/sparc/linux/simple-atomic-mp passed. * build/SPARC/tests/opt/quick/se/40.m5threads-test-atomic/sparc/linux/simple-timing-mp passed. * build/X86/tests/opt/quick/se/00.hello/x86/linux/o3-timing passed. * build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-atomic passed. * build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-timing passed. * build/X86/tests/opt/quick/se/00.hello/x86/linux/simple-timing-ruby passed. * build/X86/tests/opt/quick/fs/10.linux-boot/x86/linux/pc-simple-atomic passed. * build/X86/tests/opt/quick/fs/10.linux-boot/x86/linux/pc-simple-timing passed. * build/ARM/tests/opt/quick/se/00.hello/arm/linux/minor-timing passed. * build/ARM/tests/opt/quick/se/00.hello/arm/linux/o3-timing passed. * build/ARM/tests/opt/quick/se/00.hello/arm/linux/o3-timing-checker passed. * build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-atomic passed. * build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-atomic-dummychecker passed. * build/ARM/tests/opt/quick/se/00.hello/arm/linux/simple-timing passed. * build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic passed. * build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-atomic-dual passed. * build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-timing passed. * build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-simple-timing-dual passed. * build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview-switcheroo-atomic passed. * build/ARM/tests/opt/quick/fs/10.linux-boot/arm/linux/realview64-simple-atomic
[gem5-dev] changeset in gem5: stats: Update stats to reflect x86 table walk...
changeset b7ff344c3061 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=b7ff344c3061 description: stats: Update stats to reflect x86 table walker changes diffstat: tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt | 106 +- tests/long/fs/10.linux-boot/ref/x86/linux/pc-switcheroo-full/stats.txt | 12 +- 2 files changed, 59 insertions(+), 59 deletions(-) diffs (199 lines): diff -r e49bf4884c59 -r b7ff344c3061 tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt --- a/tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt Thu Jan 22 05:00:54 2015 -0500 +++ b/tests/long/fs/10.linux-boot/ref/x86/linux/pc-o3-timing/stats.txt Thu Jan 22 05:00:57 2015 -0500 @@ -4,11 +4,11 @@ sim_ticks5121937205500 # Number of ticks simulated final_tick 5121937205500 # Number of ticks from beginning of simulation (restored from checkpoints and never reset) sim_freq 1 # Frequency of simulated ticks -host_inst_rate 133395 # Simulator instruction rate (inst/s) -host_op_rate 263673 # Simulator op (including micro ops) rate (op/s) -host_tick_rate 1674179733 # Simulator tick rate (ticks/s) -host_mem_usage 798472 # Number of bytes of host memory used -host_seconds 3059.37 # Real time elapsed on the host +host_inst_rate 250170 # Simulator instruction rate (inst/s) +host_op_rate 494496 # Simulator op (including micro ops) rate (op/s) +host_tick_rate 3139783576 # Simulator tick rate (ticks/s) +host_mem_usage 754660 # Number of bytes of host memory used +host_seconds 1631.30 # Real time elapsed on the host sim_insts 408103625 # Number of instructions simulated sim_ops 806672783 # Number of ops (including micro ops) simulated system.voltage_domain.voltage 1 # Voltage in Volts @@ -808,12 +808,12 @@ system.cpu.dtb_walker_cache.demand_misses::total76507 # number of demand (read+write) misses system.cpu.dtb_walker_cache.overall_misses::cpu.dtb.walker76507 # number of overall misses system.cpu.dtb_walker_cache.overall_misses::total76507 # number of overall misses -system.cpu.dtb_walker_cache.ReadReq_miss_latency::cpu.dtb.walker935770692 # number of ReadReq miss cycles -system.cpu.dtb_walker_cache.ReadReq_miss_latency::total935770692 # number of ReadReq miss cycles -system.cpu.dtb_walker_cache.demand_miss_latency::cpu.dtb.walker935770692 # number of demand (read+write) miss cycles -system.cpu.dtb_walker_cache.demand_miss_latency::total935770692 # number of demand (read+write) miss cycles -system.cpu.dtb_walker_cache.overall_miss_latency::cpu.dtb.walker935770692 # number of overall miss cycles -system.cpu.dtb_walker_cache.overall_miss_latency::total935770692 # number of overall miss cycles +system.cpu.dtb_walker_cache.ReadReq_miss_latency::cpu.dtb.walker935770691 # number of ReadReq miss cycles +system.cpu.dtb_walker_cache.ReadReq_miss_latency::total935770691 # number of ReadReq miss cycles +system.cpu.dtb_walker_cache.demand_miss_latency::cpu.dtb.walker935770691 # number of demand (read+write) miss cycles +system.cpu.dtb_walker_cache.demand_miss_latency::total935770691 # number of demand (read+write) miss cycles +system.cpu.dtb_walker_cache.overall_miss_latency::cpu.dtb.walker935770691 # number of overall miss cycles +system.cpu.dtb_walker_cache.overall_miss_latency::total935770691 # number of overall miss cycles system.cpu.dtb_walker_cache.ReadReq_accesses::cpu.dtb.walker 190525 # number of ReadReq accesses(hits+misses) system.cpu.dtb_walker_cache.ReadReq_accesses::total 190525 # number of ReadReq accesses(hits+misses) system.cpu.dtb_walker_cache.demand_accesses::cpu.dtb.walker 190525
[gem5-dev] changeset in gem5: mem: Always use SenderState for response rout...
changeset 8bb4a9717eaa in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=8bb4a9717eaa description: mem: Always use SenderState for response routing in RubyPort This patch aligns how the response routing is done in the RubyPort, using the SenderState for both memory and I/O accesses. Before this patch, only the I/O used the SenderState, whereas the memory accesses relied on the src field in the packet. With this patch we shift to using SenderState in both cases, thus not relying on the src field any longer. diffstat: src/mem/ruby/system/RubyPort.cc | 28 src/mem/ruby/system/Sequencer.cc | 2 ++ 2 files changed, 18 insertions(+), 12 deletions(-) diffs (79 lines): diff -r bd376adfb7d4 -r 8bb4a9717eaa src/mem/ruby/system/RubyPort.cc --- a/src/mem/ruby/system/RubyPort.cc Thu Jan 22 05:01:14 2015 -0500 +++ b/src/mem/ruby/system/RubyPort.cc Thu Jan 22 05:01:24 2015 -0500 @@ -180,11 +180,6 @@ // got a response from a device assert(pkt-isResponse()); -// In FS mode, ruby memory will receive pio responses from devices -// and it must forward these responses back to the particular CPU. -DPRINTF(RubyPort, Pio response for address %#x, going to %d\n, -pkt-getAddr(), pkt-getDest()); - // First we must retrieve the request port from the sender State RubyPort::SenderState *senderState = safe_castRubyPort::SenderState *(pkt-popSenderState()); @@ -192,6 +187,11 @@ assert(port != NULL); delete senderState; +// In FS mode, ruby memory will receive pio responses from devices +// and it must forward these responses back to the particular CPU. +DPRINTF(RubyPort, Pio response for address %#x, going to %s\n, +pkt-getAddr(), port-name()); + // attempt to send the response in the next cycle port-schedTimingResp(pkt, curTick() + g_system_ptr-clockPeriod()); @@ -246,9 +246,6 @@ return true; } -// Save the port id to be used later to route the response -pkt-setSrc(id); - assert(Address(pkt-getAddr()).getOffset() + pkt-getSize() = RubySystem::getBlockSizeBytes()); @@ -259,6 +256,10 @@ // Otherwise, we need to tell the port to retry at a later point // and return false. if (requestStatus == RequestStatus_Issued) { +// Save the port in the sender state object to be used later to +// route the response +pkt-pushSenderState(new SenderState(this)); + DPRINTF(RubyPort, Request %s 0x%x issued\n, pkt-cmdString(), pkt-getAddr()); return true; @@ -343,11 +344,14 @@ assert(system-isMemAddr(pkt-getAddr())); assert(pkt-isRequest()); -// As it has not yet been turned around, the source field tells us -// which port it came from. -assert(pkt-getSrc() slave_ports.size()); +// First we must retrieve the request port from the sender State +RubyPort::SenderState *senderState = +safe_castRubyPort::SenderState *(pkt-popSenderState()); +MemSlavePort *port = senderState-port; +assert(port != NULL); +delete senderState; -slave_ports[pkt-getSrc()]-hitCallback(pkt); +port-hitCallback(pkt); // // If we had to stall the MemSlavePorts, wake them up because the sequencer diff -r bd376adfb7d4 -r 8bb4a9717eaa src/mem/ruby/system/Sequencer.cc --- a/src/mem/ruby/system/Sequencer.cc Thu Jan 22 05:01:14 2015 -0500 +++ b/src/mem/ruby/system/Sequencer.cc Thu Jan 22 05:01:24 2015 -0500 @@ -547,6 +547,8 @@ // subBlock with the recieved data. The tester will later access // this state. if (m_usingRubyTester) { +DPRINTF(RubySequencer, hitCallback %s 0x%x using RubyTester\n, +pkt-cmdString(), pkt-getAddr()); RubyTester::SenderState* testerSenderState = pkt-findNextSenderStateRubyTester::SenderState(); assert(testerSenderState); ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Make the XBar responsible for tracking r...
changeset bd376adfb7d4 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=bd376adfb7d4 description: mem: Make the XBar responsible for tracking response routing This patch removes the need for a source and destination field in the packet by shifting the onus of the tracking to the crossbar, much like a real implementation. This change in behaviour also means we no longer need a SenderState to remember the source/dest when ever we have multiple crossbars in the system. Thus, the stack that was created by the SenderState is not needed, and each crossbar locally tracks the response routing. The fields in the packet are still left behind as the RubyPort (which also acts as a crossbar) does routing based on them. In the succeeding patches the uses of the src and dest field will be removed. Combined, these patches improve the simulation performance by roughly 2%. diffstat: src/mem/coherent_xbar.cc| 140 --- src/mem/coherent_xbar.hh|9 +- src/mem/noncoherent_xbar.cc | 23 +- src/mem/xbar.hh |9 ++ 4 files changed, 108 insertions(+), 73 deletions(-) diffs (truncated from 360 to 300 lines): diff -r b7ff344c3061 -r bd376adfb7d4 src/mem/coherent_xbar.cc --- a/src/mem/coherent_xbar.cc Thu Jan 22 05:00:57 2015 -0500 +++ b/src/mem/coherent_xbar.cc Thu Jan 22 05:01:14 2015 -0500 @@ -142,6 +142,10 @@ // remember if the packet is an express snoop bool is_express_snoop = pkt-isExpressSnoop(); +bool is_inhibited = pkt-memInhibitAsserted(); +// for normal requests, going downstream, the express snoop flag +// and the inhibited flag should always be the same +assert(is_express_snoop == is_inhibited); // determine the destination based on the address PortID master_port_id = findPort(pkt-getAddr()); @@ -163,9 +167,6 @@ unsigned int pkt_size = pkt-hasData() ? pkt-getSize() : 0; unsigned int pkt_cmd = pkt-cmdToIndex(); -// set the source port for routing of the response -pkt-setSrc(slave_port_id); - calcPacketTiming(pkt); Tick packetFinishTime = pkt-lastWordDelay + curTick(); @@ -187,21 +188,10 @@ } } -// remember if we add an outstanding req so we can undo it if -// necessary, if the packet needs a response, we should add it -// as outstanding and express snoops never fail so there is -// not need to worry about them -bool add_outstanding = !is_express_snoop pkt-needsResponse(); - -// keep track that we have an outstanding request packet -// matching this request, this is used by the coherency -// mechanism in determining what to do with snoop responses -// (in recvTimingSnoop) -if (add_outstanding) { -// we should never have an exsiting request outstanding -assert(outstandingReq.find(pkt-req) == outstandingReq.end()); -outstandingReq.insert(pkt-req); -} +// remember if the packet will generate a snoop response +const bool expect_snoop_resp = !is_inhibited pkt-memInhibitAsserted(); +const bool expect_response = pkt-needsResponse() +!pkt-memInhibitAsserted(); // Note: Cannot create a copy of the full packet, here. MemCmd orig_cmd(pkt-cmd); @@ -224,41 +214,58 @@ pkt-cmd = tmp_cmd; } -// if this is an express snoop, we are done at this point -if (is_express_snoop) { -assert(success); -snoops++; +// check if we were successful in sending the packet onwards +if (!success) { +// express snoops and inhibited packets should never be forced +// to retry +assert(!is_express_snoop); +assert(!pkt-memInhibitAsserted()); + +// undo the calculation so we can check for 0 again +pkt-firstWordDelay = pkt-lastWordDelay = 0; + +DPRINTF(CoherentXBar, recvTimingReq: src %s %s 0x%x RETRY\n, +src_port-name(), pkt-cmdString(), pkt-getAddr()); + +// update the layer state and schedule an idle event +reqLayers[master_port_id]-failedTiming(src_port, +clockEdge(headerCycles)); } else { -// for normal requests, check if successful -if (!success) { -// inhibited packets should never be forced to retry -assert(!pkt-memInhibitAsserted()); +// express snoops currently bypass the crossbar state entirely +if (!is_express_snoop) { +// if this particular request will generate a snoop +// response +if (expect_snoop_resp) { +// we should never have an exsiting request outstanding +assert(outstandingSnoop.find(pkt-req) == + outstandingSnoop.end()); +outstandingSnoop.insert(pkt-req); -// if it was added as outstanding and the
[gem5-dev] changeset in gem5: mem: Clean up Request initialisation
changeset e3fc6bc7f97e in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=e3fc6bc7f97e description: mem: Clean up Request initialisation This patch tidies up how we create and set the fields of a Request. In essence it tries to use the constructor where possible (as opposed to setPhys and setVirt), thus avoiding spreading the information across a number of locations. In fact, setPhys is made private as part of this patch, and a number of places where we callede setVirt instead uses the appropriate constructor. diffstat: src/arch/arm/isa.cc| 12 +-- src/cpu/checker/cpu.cc | 8 +- src/cpu/kvm/base.cc| 5 +- src/cpu/kvm/base.hh| 3 - src/cpu/kvm/x86_cpu.cc | 5 +- src/cpu/kvm/x86_cpu.hh | 3 - src/cpu/simple/timing.cc | 16 ++--- src/cpu/simple/timing.hh | 2 +- src/cpu/testers/memtest/memtest.cc | 6 +- src/cpu/testers/networktest/networktest.cc | 12 ++-- src/mem/port_proxy.cc | 8 +-- src/mem/request.hh | 82 ++--- src/mem/ruby/system/CacheRecorder.cc | 8 +- 13 files changed, 76 insertions(+), 94 deletions(-) diffs (truncated from 469 to 300 lines): diff -r e5936c2d53a0 -r e3fc6bc7f97e src/arch/arm/isa.cc --- a/src/arch/arm/isa.cc Tue Jan 20 14:15:28 2015 -0600 +++ b/src/arch/arm/isa.cc Thu Jan 22 05:00:53 2015 -0500 @@ -1490,7 +1490,6 @@ case MISCREG_ATS1HR: case MISCREG_ATS1HW: { - RequestPtr req = new Request; unsigned flags = 0; BaseTLB::Mode mode = BaseTLB::Read; TLB::ArmTranslationType tranType = TLB::NormalTran; @@ -1562,16 +1561,16 @@ // can't be an atomic translation because that causes problems // with unexpected atomic snoop requests. warn(Translating via MISCREG(%d) in functional mode! Fix Me!\n, misc_reg); - req-setVirt(0, val, 1, flags, Request::funcMasterId, - tc-pcState().pc()); - req-setThreadContext(tc-contextId(), tc-threadId()); - fault = tc-getDTBPtr()-translateFunctional(req, tc, mode, tranType); + Request req(0, val, 1, flags, Request::funcMasterId, + tc-pcState().pc(), tc-contextId(), + tc-threadId()); + fault = tc-getDTBPtr()-translateFunctional(req, tc, mode, tranType); TTBCR ttbcr = readMiscRegNoEffect(MISCREG_TTBCR); HCR hcr = readMiscRegNoEffect(MISCREG_HCR); MiscReg newVal; if (fault == NoFault) { - Addr paddr = req-getPaddr(); + Addr paddr = req.getPaddr(); if (haveLPAE (ttbcr.eae || tranType TLB::HypMode || ((tranType TLB::S1S2NsTran) hcr.vm) )) { newVal = (paddr mask(39, 12)) | @@ -1605,7 +1604,6 @@ MISCREG: Translated addr 0x%08x fault fsr %#x: PAR: 0x%08x\n, val, fsr, newVal); } - delete req; setMiscRegNoEffect(MISCREG_PAR, newVal); return; } diff -r e5936c2d53a0 -r e3fc6bc7f97e src/cpu/checker/cpu.cc --- a/src/cpu/checker/cpu.ccTue Jan 20 14:15:28 2015 -0600 +++ b/src/cpu/checker/cpu.ccThu Jan 22 05:00:53 2015 -0500 @@ -154,8 +154,8 @@ // Need to account for multiple accesses like the Atomic and TimingSimple while (1) { -memReq = new Request(); -memReq-setVirt(0, addr, size, flags, masterId, thread-pcState().instAddr()); +memReq = new Request(0, addr, size, flags, masterId, + thread-pcState().instAddr(), tc-contextId(), 0); // translate to physical address fault = dtb-translateFunctional(memReq, tc, BaseTLB::Read); @@ -242,8 +242,8 @@ // Need to account for a multiple access like Atomic and Timing CPUs while (1) { -memReq = new Request(); -memReq-setVirt(0, addr, size, flags, masterId, thread-pcState().instAddr()); +memReq = new Request(0, addr, size, flags, masterId, + thread-pcState().instAddr(), tc-contextId(), 0); // translate to physical address fault = dtb-translateFunctional(memReq, tc, BaseTLB::Write); diff -r e5936c2d53a0 -r e3fc6bc7f97e src/cpu/kvm/base.cc --- a/src/cpu/kvm/base.cc Tue Jan 20 14:15:28 2015 -0600 +++ b/src/cpu/kvm/base.cc Thu Jan 22 05:00:53 2015 -0500 @@ -118,8 +118,6 @@ // initialize CPU, including PC if (FullSystem !switchedOut()) TheISA::initCPU(tc, tc-contextId()); - -
[gem5-dev] changeset in gem5: x86: Delay X86 table walk on receiving walker...
changeset e49bf4884c59 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=e49bf4884c59 description: x86: Delay X86 table walk on receiving walker response This patch fixes a minor issue in the X86 page table walker where it ended up sending new request packets to the crossbar before the response processing was finished (recvTimingResp is directly calling sendTimingReq). Under certain conditions this caused the crossbar to see illegal combinations of request/response overlap, in turn causing problems with a slightly modified crossbar implementation. diffstat: src/arch/x86/pagetable_walker.cc | 6 -- src/arch/x86/pagetable_walker.hh | 8 +++- 2 files changed, 11 insertions(+), 3 deletions(-) diffs (41 lines): diff -r e3fc6bc7f97e -r e49bf4884c59 src/arch/x86/pagetable_walker.cc --- a/src/arch/x86/pagetable_walker.cc Thu Jan 22 05:00:53 2015 -0500 +++ b/src/arch/x86/pagetable_walker.cc Thu Jan 22 05:00:54 2015 -0500 @@ -124,8 +124,10 @@ delete senderWalk; // Since we block requests when another is outstanding, we // need to check if there is a waiting request to be serviced -if (currStates.size()) -startWalkWrapper(); +if (currStates.size() !startWalkWrapperEvent.scheduled()) +// delay sending any new requests until we are finished +// with the responses +schedule(startWalkWrapperEvent, clockEdge()); } return true; } diff -r e3fc6bc7f97e -r e49bf4884c59 src/arch/x86/pagetable_walker.hh --- a/src/arch/x86/pagetable_walker.hh Thu Jan 22 05:00:53 2015 -0500 +++ b/src/arch/x86/pagetable_walker.hh Thu Jan 22 05:00:54 2015 -0500 @@ -183,6 +183,11 @@ // Wrapper for checking for squashes before starting a translation. void startWalkWrapper(); +/** + * Event used to call startWalkWrapper. + **/ +EventWrapperWalker, Walker::startWalkWrapper startWalkWrapperEvent; + // Functions for dealing with packets. bool recvTimingResp(PacketPtr pkt); void recvRetry(); @@ -207,7 +212,8 @@ MemObject(params), port(name() + .port, this), funcState(this, NULL, NULL, true), tlb(NULL), sys(params-system), masterId(sys-getMasterId(name())), -numSquashable(params-num_squash_per_cycle) +numSquashable(params-num_squash_per_cycle), +startWalkWrapperEvent(this) { } }; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Remove unused Packet src and dest fields
changeset 87f7b5a07584 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=87f7b5a07584 description: mem: Remove unused Packet src and dest fields This patch takes the final step in removing the src and dest fields in the packet. These fields were rather confusing in that they only remember a single multiplexing component, and pushed the responsibility to the bridge and caches to store the fields in a senderstate, thus effectively creating a stack. With the recent changes to the crossbar response routing the crossbar is now responsible without relying on the packet fields. Thus, these variables are now unused and can be removed. diffstat: src/arch/x86/pagetable_walker.cc | 1 - src/mem/packet.hh| 49 +-- 2 files changed, 2 insertions(+), 48 deletions(-) diffs (112 lines): diff -r 3a3bb559b112 -r 87f7b5a07584 src/arch/x86/pagetable_walker.cc --- a/src/arch/x86/pagetable_walker.cc Thu Jan 22 05:01:30 2015 -0500 +++ b/src/arch/x86/pagetable_walker.cc Thu Jan 22 05:01:31 2015 -0500 @@ -523,7 +523,6 @@ write = oldRead; write-setuint64_t(pte); write-cmd = MemCmd::WriteReq; -write-clearDest(); } else { write = NULL; delete oldRead-req; diff -r 3a3bb559b112 -r 87f7b5a07584 src/mem/packet.hh --- a/src/mem/packet.hh Thu Jan 22 05:01:30 2015 -0500 +++ b/src/mem/packet.hh Thu Jan 22 05:01:31 2015 -0500 @@ -296,30 +296,6 @@ unsigned size; /** - * Source port identifier set on a request packet to enable - * appropriate routing of the responses. The source port - * identifier is set by any multiplexing component, e.g. a - * crossbar, as the timing responses need this information to be - * routed back to the appropriate port at a later point in - * time. The field can be updated (over-written) as the request - * packet passes through additional multiplexing components, and - * it is their responsibility to remember the original source port - * identifier, for example by using an appropriate sender - * state. The latter is done in the cache and bridge. - */ -PortID src; - -/** - * Destination port identifier that is present on all response - * packets that passed through a multiplexing component as a - * request packet. The source port identifier is turned into a - * destination port identifier when the packet is turned into a - * response, and the destination is used, e.g. by the crossbar, to - * select the appropriate path through the interconnect. - */ -PortID dest; - -/** * The original value of the command field. Only valid when the * current command field is an error condition; in that case, the * previous contents of the command field are copied here. This @@ -547,18 +523,6 @@ bool hadBadAddress() const { return cmd == MemCmd::BadAddressError; } void copyError(Packet *pkt) { assert(pkt-isError()); cmd = pkt-cmd; } -/// Accessor function to get the source index of the packet. -PortID getSrc() const { return src; } -/// Accessor function to set the source index of the packet. -void setSrc(PortID _src) { src = _src; } - -/// Accessor function for the destination index of the packet. -PortID getDest() const { return dest; } -/// Accessor function to set the destination index of the packet. -void setDest(PortID _dest) { dest = _dest; } -/// Reset destination field, e.g. to turn a response into a request again. -void clearDest() { dest = InvalidPortID; } - Addr getAddr() const { assert(flags.isSet(VALID_ADDR)); return addr; } /** * Update the address of this packet mid-transaction. This is used @@ -609,8 +573,7 @@ */ Packet(const RequestPtr _req, MemCmd _cmd) : cmd(_cmd), req(_req), data(nullptr), addr(0), _isSecure(false), - size(0), src(InvalidPortID), dest(InvalidPortID), - bytesValidStart(0), bytesValidEnd(0), + size(0), bytesValidStart(0), bytesValidEnd(0), firstWordDelay(0), lastWordDelay(0), senderState(NULL) { @@ -632,7 +595,6 @@ */ Packet(const RequestPtr _req, MemCmd _cmd, int _blkSize) : cmd(_cmd), req(_req), data(nullptr), addr(0), _isSecure(false), - src(InvalidPortID), dest(InvalidPortID), bytesValidStart(0), bytesValidEnd(0), firstWordDelay(0), lastWordDelay(0), senderState(NULL) @@ -657,7 +619,6 @@ : cmd(pkt-cmd), req(pkt-req), data(nullptr), addr(pkt-addr), _isSecure(pkt-_isSecure), size(pkt-size), - src(pkt-src), dest(pkt-dest), bytesValidStart(pkt-bytesValidStart), bytesValidEnd(pkt-bytesValidEnd), firstWordDelay(pkt-firstWordDelay), @@ -743,10 +704,7 @@
[gem5-dev] changeset in gem5: sim: fix reference counting of PythonEvent
changeset a0dab21e422f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=a0dab21e422f description: sim: fix reference counting of PythonEvent When gem5 is a slave to another simulator and the Python is only used to initialize the configuration (and not perform actual simulation), a debug start (--debug-start) event will get freed during or immediately after the initial Python frame's execution rather than remaining in the event queue. This tricky patch fixes the GC issue causing this. diffstat: src/python/swig/event.i| 4 src/python/swig/pyevent.cc | 8 src/python/swig/pyevent.hh | 5 +++-- 3 files changed, 11 insertions(+), 6 deletions(-) diffs (56 lines): diff -r 87f7b5a07584 -r a0dab21e422f src/python/swig/event.i --- a/src/python/swig/event.i Thu Jan 22 05:01:31 2015 -0500 +++ b/src/python/swig/event.i Tue Dec 23 11:51:40 2014 -0600 @@ -71,6 +71,10 @@ } } +%typemap(out) PythonEvent* { + result-object = $result = SWIG_NewPointerObj(SWIG_as_voidptr(result), SWIGTYPE_p_PythonEvent, SWIG_POINTER_NEW); +} + %ignore EventQueue::schedule; %ignore EventQueue::deschedule; diff -r 87f7b5a07584 -r a0dab21e422f src/python/swig/pyevent.cc --- a/src/python/swig/pyevent.ccThu Jan 22 05:01:31 2015 -0500 +++ b/src/python/swig/pyevent.ccTue Dec 23 11:51:40 2014 -0600 @@ -34,10 +34,10 @@ #include sim/async.hh #include sim/eventq.hh -PythonEvent::PythonEvent(PyObject *obj, Priority priority) -: Event(priority), object(obj) +PythonEvent::PythonEvent(PyObject *code, Priority priority) +: Event(priority), eventCode(code) { -if (object == NULL) +if (code == NULL) panic(Passed in invalid object); } @@ -49,7 +49,7 @@ PythonEvent::process() { PyObject *args = PyTuple_New(0); -PyObject *result = PyObject_Call(object, args, NULL); +PyObject *result = PyObject_Call(eventCode, args, NULL); Py_DECREF(args); if (result) { diff -r 87f7b5a07584 -r a0dab21e422f src/python/swig/pyevent.hh --- a/src/python/swig/pyevent.hhThu Jan 22 05:01:31 2015 -0500 +++ b/src/python/swig/pyevent.hhTue Dec 23 11:51:40 2014 -0600 @@ -37,9 +37,10 @@ class PythonEvent : public Event { private: -PyObject *object; +PyObject *eventCode; // PyObject to call to perform event + public: +PyObject *object;// PyObject wrapping this PythonEvent - public: PythonEvent(PyObject *obj, Event::Priority priority); ~PythonEvent(); ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Remove Packet source from ForwardRespons...
changeset 3a3bb559b112 in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=3a3bb559b112 description: mem: Remove Packet source from ForwardResponseRecord This patch removes the source field from the ForwardResponseRecord, but keeps the class as it is part of how the cache identifies responses to hardware prefetches that are snooped upwards. diffstat: src/mem/cache/cache_impl.hh | 11 +-- 1 files changed, 5 insertions(+), 6 deletions(-) diffs (42 lines): diff -r 1de300588c4f -r 3a3bb559b112 src/mem/cache/cache_impl.hh --- a/src/mem/cache/cache_impl.hh Thu Jan 22 05:01:27 2015 -0500 +++ b/src/mem/cache/cache_impl.hh Thu Jan 22 05:01:30 2015 -0500 @@ -385,10 +385,7 @@ { public: -PortID prevSrc; - -ForwardResponseRecord(PortID prev_src) : prevSrc(prev_src) -{} +ForwardResponseRecord() {} }; templateclass TagStore @@ -407,6 +404,9 @@ assert(!system-bypassCaches()); if (rec == NULL) { +// @todo What guarantee do we have that this HardPFResp is +// actually for this cache, and not a cache closer to the +// memory? assert(pkt-cmd == MemCmd::HardPFResp); // Check if it's a prefetch response and handle it. We shouldn't // get any other kinds of responses without FRRs. @@ -417,7 +417,6 @@ } pkt-popSenderState(); -pkt-setDest(rec-prevSrc); delete rec; // @todo someone should pay for this pkt-firstWordDelay = pkt-lastWordDelay = 0; @@ -1542,7 +1541,7 @@ if (is_timing) { Packet snoopPkt(pkt, true, false); // clear flags, no allocation snoopPkt.setExpressSnoop(); -snoopPkt.pushSenderState(new ForwardResponseRecord(pkt-getSrc())); +snoopPkt.pushSenderState(new ForwardResponseRecord()); // the snoop packet does not need to wait any additional // time snoopPkt.firstWordDelay = snoopPkt.lastWordDelay = 0; ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
[gem5-dev] changeset in gem5: mem: Remove unused RequestState in the bridge
changeset 1de300588c4f in /z/repo/gem5 details: http://repo.gem5.org/gem5?cmd=changeset;node=1de300588c4f description: mem: Remove unused RequestState in the bridge This patch removes the bridge sender state as the Crossbar now takes care of remembering its own routing decisions. diffstat: src/mem/bridge.cc | 22 -- src/mem/bridge.hh | 17 - 2 files changed, 0 insertions(+), 39 deletions(-) diffs (66 lines): diff -r 8bb4a9717eaa -r 1de300588c4f src/mem/bridge.cc --- a/src/mem/bridge.cc Thu Jan 22 05:01:24 2015 -0500 +++ b/src/mem/bridge.cc Thu Jan 22 05:01:27 2015 -0500 @@ -207,15 +207,6 @@ void Bridge::BridgeMasterPort::schedTimingReq(PacketPtr pkt, Tick when) { -// If we expect to see a response, we need to restore the source -// and destination field that is potentially changed by a second -// crossbar -if (!pkt-memInhibitAsserted() pkt-needsResponse()) { -// Update the sender state so we can deal with the response -// appropriately -pkt-pushSenderState(new RequestState(pkt-getSrc())); -} - // If we're about to put this packet at the head of the queue, we // need to schedule an event to do the transmit. Otherwise there // should already be an event scheduled for sending the head @@ -233,19 +224,6 @@ void Bridge::BridgeSlavePort::schedTimingResp(PacketPtr pkt, Tick when) { -// This is a response for a request we forwarded earlier. The -// corresponding request state should be stored in the packet's -// senderState field. -RequestState *req_state = -dynamic_castRequestState*(pkt-popSenderState()); -assert(req_state != NULL); -pkt-setDest(req_state-origSrc); -delete req_state; - -// the bridge sets the destination irrespective of it is valid or -// not, as it is checked in the crossbar -DPRINTF(Bridge, response, new dest %d\n, pkt-getDest()); - // If we're about to put this packet at the head of the queue, we // need to schedule an event to do the transmit. Otherwise there // should already be an event scheduled for sending the head diff -r 8bb4a9717eaa -r 1de300588c4f src/mem/bridge.hh --- a/src/mem/bridge.hh Thu Jan 22 05:01:24 2015 -0500 +++ b/src/mem/bridge.hh Thu Jan 22 05:01:27 2015 -0500 @@ -75,23 +75,6 @@ protected: /** - * A bridge request state stores packets along with their sender - * state and original source. It has enough information to also - * restore the response once it comes back to the bridge. - */ -class RequestState : public Packet::SenderState -{ - - public: - -const PortID origSrc; - -RequestState(PortID orig_src) : origSrc(orig_src) -{ } - -}; - -/** * A deferred packet stores a packet along with its scheduled * transmission time */ ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.
From AMD's system programming manual: SYSCALL New selectors are loaded, without permission checking (see above), as follows: Bits 47:32 of the STAR register specify the selector that is copied into the CS register. Bits 47:32 of the STAR register + 8 specify the selector that is copied into the SS register. The CS_base and the SS_base are both forced to zero. The CS_limit and the SS_limit are both forced to 4 Gbyte. The CS segment attributes are set to execute/read 64-bit code with a CPL of zero. The SS segment attributes are set to read/write and expand-up with a 64-bit stack referenced by RSP. SYSRET When a system procedure performs a SYSRET back to application software, the CS selector is updated from bits 63:50 of the STAR register (STAR.SYSRET_CS) as follows: If the return is to 32-bit mode (legacy or compatibility), CS is updated with the value of STAR.SYSRET_CS. If the return is to 64-bit mode, CS is updated with the value of STAR.SYSRET_CS + 16. In both cases, the CPL is forced to 3, effectively ignoring STAR bits 49:48. The SS selector is updated to point to the next descriptor-table entry after the CS descriptor (STAR.SYSRET_CS + 8), and its RPL is not forced to 3. I am wondering if we could detect the CPU vendor with CPUID and have different setup of the GDT based on the platform you are running on. Could scons actually detect this at build time? Best, Alex From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of Gabe Black via gem5-dev [gem5-dev@gem5.org] Sent: Wednesday, January 21, 2015 3:48 PM To: mike upton; Default; Gabe Black Subject: Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs. On Jan. 21, 2015, 9:22 p.m., mike upton wrote: src/arch/x86/process.cc, lines 218-237 http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218 For AMD systems, the sys descriptors need to come first. On intel systems they need to come second. I do not know how to resolve... mike upton wrote: I have been debugging why patch rb2557 breaks AMD KVM functionality. I was hoping to get to code that would work on both intel and AMD platforms, but am not there yet. This patch is to be applied on top of rb2557.patch. There are 2 main issues, neither of which I understand well enough to take much further. The first issue is that the order that the segment descriptors get instantiated in the GDT table seems to matter between AMD and Intel, and they seem to be mutually incompatible. AMD wants: csSys dsSys ds cs Intel wants: ds cs dsSys csSys I am not sure the relative ordering of ds and cs within a class matters, only that AMD wants the Sys ones first, and Intel wants them second. There is also an issue with how 'star' gets defined. I can not make the Intel code work for AMD. Both issues are addressed in this patch. The patch makes the AMD system work, but breaks Intel functionality. I am also not sure how to upload this into review board. Do I create a separate patch from TOT, or can I somehow attach this to rb2557. Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at my 'Peter Principal Limit' as far as my understanding goes. I think it would be really ugly to have a machine-type test to version the code... You should grab a copy of the architecture manual. From there: STAR—The STAR register has the following fields (unless otherwise noted, all bits are read/write): - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 32-bit mode (either legacy or compatibility), this field is copied directly into the CS selector field. If SYSRET is returning to 64-bit mode, the CS selector is set to this field + 16. SS.Sel is set to this field + 8, regardless of the target mode. Because SYSRET always returns to CPL 3, the RPL bits 49:48 should be initialized to 11b. - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSCALL. This field is copied directly into CS.Sel. SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, the RPL bits 33:32 should be initialized to 00b. That's why the order matters and is what it is. - Gabe --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2557/#review5782 --- On Dec. 10, 2014, 10:11 a.m., Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2557/
[gem5-dev] Uncachable memory requests in Ruby
Hello, How can I force a request to be uncacheable when using Ruby memory system?req-setFlags(Request::UNCACHEABLE) works for classic memory system but it doesn't have any effect on the request while using Ruby. Thank you,Mohammad ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Bug With Thread Suspend Instructions, Interrupts, x86 O3 CPU?
Hey Gabe, Thanks for the suggestion. This work-around doesn't appear to work. In the O3CPU, the instruction still does not get committed due to the fault (DefaultCommitImpl::commitHead(suspend instruction) generates a trap and returns that the instruction cannot be committed). After thread reactivation, the instruction is executed again causing a the thread to suspend. The SimpleTiming CPU has a similar issue that it executes the fault and suspends the thread, but while the thread is suspended, the core appears to just continue trying to execute the suspend instruction. It seems like the right way to fix this may be to introduce a ThreadContext state, say Activating, which the thread is put into when activate() is called on it, and the thread is not allowed to enter the Active state until the ROB has been cleared (i.e. any remaining instructions from before the suspend are squashed and retired). Does this sound reasonable? Thanks, Joel On Tue, Jan 20, 2015 at 12:32 PM, Gabe Black via gem5-dev gem5-dev@gem5.org wrote: It sounds like a bug/race condition in the O3 CPU, which I think you already knew. You could try moving the suspend call into a fault returned by the MicroHalt microop instead of the instruction itself. That might break the race, although it's not really fixing the issue with O3. Gabe On Tue, Jan 20, 2015 at 8:43 AM, Joel Hestness via gem5-dev gem5-dev@gem5.org wrote: Hi guys, I'm running into a very tricky problem with halt/suspend x86 instructions with the O3 CPU. This might be a question for Nilay, Gabe B. or Mitch H., and I'm really hoping for input given the complexity of this one. The specific problem is that when calling suspend from the execute stage of an instruction (e.g. a pseudoinstruction or the x86 MicroHalt microop), the CPU context gets suspended, but after reactivating the context later, the instruction gets squashed and replayed, potentially causing the context to get suspended again immediately. The pseudoinstruction that I'm using doesn't do anything except call the thread context suspend, and the functionality is nearly identical to that of the MicroHalt op (I've now tried swapping in the MicroHalt and run into the same problem, so I suspect this may also affect the MWAIT implementation). The instruction that suspends the context moves to the commit buffer in the core, but cannot be committed before the thread is suspended. When the thread is restarted, the commit stage squashes all instructions, retiring the suspend instruction, and fetch starts back at the PC of the suspend instruction. In cases that appear to execute correctly, the pipeline re-fetches the suspend instruction, but it gets squashed from the commit stage and removed from the instruction list. In apparently broken cases, the instruction does not get squashed, so the thread goes back to sleep. Interrupts can jar the CPU out of the incorrect suspend loop, but sometimes it takes 3-6 interrupts (i.e. up to 10s of milliseconds). Some details: I'm currently using gem5 revision 10237:b2850bdcec07 and the bug occurs in long-running sims in FS mode (single-threaded cores - no SMT). I've also pulled some more recent changeset and applied them to my repo, since they address O3 CPU issues: 10239 http://repo.gem5.org/gem5/rev/592f0bb6bd6f, 10327 http://repo.gem5.org/gem5/rev/5b6279635c49, 10328 http://repo.gem5.org/gem5/rev/867b536a68be, 10329 http://repo.gem5.org/gem5/rev/12e3be8203a5, 10331 http://repo.gem5.org/gem5/rev/ed05298e8566, 10332 http://repo.gem5.org/gem5/rev/1ba825974ee6, 10340 http://repo.gem5.org/gem5/rev/40d24a672351. I'm unable to reproduce the bug in SE mode, and I suspect that sporadic interrupt handling in O3 may be part of the problem, since the examples that I can generate show CPU interrupts raised in close proximity to the thread suspend and activate activity. I've attached a annotated O3 execution traces for seemingly correct and incorrect instances. Here are some specific questions I'm hoping for help with: 1) Are there any other known changes in the mainline repo that might fix this? 2) If not (1), I'm not clear on the purpose of retiring the suspend instruction (rather than committing) after reactivating the thread context. I can understand that the full pipeline squash would be a standard procedure after reactivating a thread. However, the suspend instruction finishes execution in the same cycle that thread suspend starts, so it should be free to commit. Since the suspend instruction doesn't commit, it is pointed to as the next PC after thread reactivation, which allows it to be reexecuted incorrectly. Is this retirement process the intended behavior for these thread suspend instructions like these? It seems like this might be a corner case where the suspend instruction should get committed rather than squashed/retired,
[gem5-dev] get vfp/simd register value in ARMv8 simulation
Hello, I'm trying to get the vfp/simd register values during simulation. I find a clue in src/cpu/simple_thread.hh, there defined an array `FloatReg f[TheISA::NumFloatRegs];`. Also I checked ArmISA::NumFloatRegs value. It's 160. Can I get V0-V31 register values by reading the FloatReg array? Also, according to the manual, there should be 32 128-bit vfp/simd registers. How do they map to the 160 elements of FloatReg array? ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev
Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.
OK, I believe I have a patch that unifies the code for both AMD and Intel. Do I post it as a separate review-board item? On Thu, Jan 22, 2015 at 11:32 AM, mike upton michaelup...@gmail.com wrote: This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2557/ On January 21st, 2015, 9:22 p.m. UTC, *mike upton* wrote: src/arch/x86/process.cc http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218 (Diff revision 2) X86_64LiveProcess::initState() 217 SegDescriptor csLowPLDesc = initDesc; 218 csLowPLDesc.type.codeOrData = 1; 219 csLowPLDesc.dpl = 0; 220 uint64_t csLowPLDescVal = csLowPLDesc; 221 physProxy.writeBlob(GDTPhysAddr + numGDTEntries * 8, 222 (uint8_t *)(csLowPLDescVal), 8); 223 224 numGDTEntries++; 225 226 SegSelector csLowPL = 0; 227 csLowPL.si = numGDTEntries - 1; 228 csLowPL.rpl = 0; 229 230 //64 bit data segment 231 SegDescriptor dsLowPLDesc = initDesc; 232 dsLowPLDesc.type.codeOrData = 0; 233 dsLowPLDesc.dpl = 0; 234 uint64_t dsLowPLDescVal = dsLowPLDesc; 235 physProxy.writeBlob(GDTPhysAddr + numGDTEntries * 8, 236 (uint8_t *)(dsLowPLDescVal), 8); For AMD systems, the sys descriptors need to come first. On intel systems they need to come second. I do not know how to resolve... On January 21st, 2015, 9:23 p.m. UTC, *mike upton* wrote: I have been debugging why patch rb2557 breaks AMD KVM functionality. I was hoping to get to code that would work on both intel and AMD platforms, but am not there yet. This patch is to be applied on top of rb2557.patch. There are 2 main issues, neither of which I understand well enough to take much further. The first issue is that the order that the segment descriptors get instantiated in the GDT table seems to matter between AMD and Intel, and they seem to be mutually incompatible. AMD wants: csSys dsSys ds cs Intel wants: ds cs dsSys csSys I am not sure the relative ordering of ds and cs within a class matters, only that AMD wants the Sys ones first, and Intel wants them second. There is also an issue with how 'star' gets defined. I can not make the Intel code work for AMD. Both issues are addressed in this patch. The patch makes the AMD system work, but breaks Intel functionality. I am also not sure how to upload this into review board. Do I create a separate patch from TOT, or can I somehow attach this to rb2557. Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at my 'Peter Principal Limit' as far as my understanding goes. I think it would be really ugly to have a machine-type test to version the code... On January 21st, 2015, 9:48 p.m. UTC, *Gabe Black* wrote: You should grab a copy of the architecture manual. From there: STAR—The STAR register has the following fields (unless otherwise noted, all bits are read/write): - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 32-bit mode (either legacy or compatibility), this field is copied directly into the CS selector field. If SYSRET is returning to 64-bit mode, the CS selector is set to this field + 16. SS.Sel is set to this field + 8, regardless of the target mode. Because SYSRET always returns to CPL 3, the RPL bits 49:48 should be initialized to 11b. - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSCALL. This field is copied directly into CS.Sel. SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, the RPL bits 33:32 should be initialized to 00b. That's why the order matters and is what it is. On January 21st, 2015, 11:15 p.m. UTC, *mike upton* wrote: AMD and Intel use different solutions, right? AMD: Syscall, sysret Intel: Sysenter, sysexit Do we need independent code streams for each? The original code worked for AMD, but not intel. The current 2557 patch works for Intel, but not AMD. On January 21st, 2015, 11:35 p.m. UTC, *Gabe Black* wrote: Yeah, there are some differences between the two. I think both support both pairs of instructions, but I think one or the other only works in 32 bit mode on for one of the vendors, or something along those lines. At one point I could have told you exactly what the difference was, but now I'd have to check the manuals. My expectation/hope is that a single GDT layout would work for both. I doubt the kernel, for instance, specializes its layout based on who's CPU it's running on. OK, it seems like the feedback is
Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.
Glad you all seem to be figuring this out. I think we should do runtime rather than compile-time detection though, as some people may have mixed clusters they want to run the same binary on, or may cross-compile. Steve On Wed, Jan 21, 2015 at 3:30 PM, Dutu, Alexandru via gem5-dev gem5-dev@gem5.org wrote: From AMD's system programming manual: SYSCALL New selectors are loaded, without permission checking (see above), as follows: Bits 47:32 of the STAR register specify the selector that is copied into the CS register. Bits 47:32 of the STAR register + 8 specify the selector that is copied into the SS register. The CS_base and the SS_base are both forced to zero. The CS_limit and the SS_limit are both forced to 4 Gbyte. The CS segment attributes are set to execute/read 64-bit code with a CPL of zero. The SS segment attributes are set to read/write and expand-up with a 64-bit stack referenced by RSP. SYSRET When a system procedure performs a SYSRET back to application software, the CS selector is updated from bits 63:50 of the STAR register (STAR.SYSRET_CS) as follows: If the return is to 32-bit mode (legacy or compatibility), CS is updated with the value of STAR.SYSRET_CS. If the return is to 64-bit mode, CS is updated with the value of STAR.SYSRET_CS + 16. In both cases, the CPL is forced to 3, effectively ignoring STAR bits 49:48. The SS selector is updated to point to the next descriptor-table entry after the CS descriptor (STAR.SYSRET_CS + 8), and its RPL is not forced to 3. I am wondering if we could detect the CPU vendor with CPUID and have different setup of the GDT based on the platform you are running on. Could scons actually detect this at build time? Best, Alex From: gem5-dev [gem5-dev-boun...@gem5.org] on behalf of Gabe Black via gem5-dev [gem5-dev@gem5.org] Sent: Wednesday, January 21, 2015 3:48 PM To: mike upton; Default; Gabe Black Subject: Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs. On Jan. 21, 2015, 9:22 p.m., mike upton wrote: src/arch/x86/process.cc, lines 218-237 http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218 For AMD systems, the sys descriptors need to come first. On intel systems they need to come second. I do not know how to resolve... mike upton wrote: I have been debugging why patch rb2557 breaks AMD KVM functionality. I was hoping to get to code that would work on both intel and AMD platforms, but am not there yet. This patch is to be applied on top of rb2557.patch. There are 2 main issues, neither of which I understand well enough to take much further. The first issue is that the order that the segment descriptors get instantiated in the GDT table seems to matter between AMD and Intel, and they seem to be mutually incompatible. AMD wants: csSys dsSys ds cs Intel wants: ds cs dsSys csSys I am not sure the relative ordering of ds and cs within a class matters, only that AMD wants the Sys ones first, and Intel wants them second. There is also an issue with how 'star' gets defined. I can not make the Intel code work for AMD. Both issues are addressed in this patch. The patch makes the AMD system work, but breaks Intel functionality. I am also not sure how to upload this into review board. Do I create a separate patch from TOT, or can I somehow attach this to rb2557. Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at my 'Peter Principal Limit' as far as my understanding goes. I think it would be really ugly to have a machine-type test to version the code... You should grab a copy of the architecture manual. From there: STAR—The STAR register has the following fields (unless otherwise noted, all bits are read/write): - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 32-bit mode (either legacy or compatibility), this field is copied directly into the CS selector field. If SYSRET is returning to 64-bit mode, the CS selector is set to this field + 16. SS.Sel is set to this field + 8, regardless of the target mode. Because SYSRET always returns to CPL 3, the RPL bits 49:48 should be initialized to 11b. - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSCALL. This field is copied directly into CS.Sel. SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, the RPL bits 33:32 should be initialized to 00b. That's why the order matters and is what it is. - Gabe
Re: [gem5-dev] Review Request 2557: x86: kvm: Fix the KVM CPU in SE and FS on Intel CPUs.
On Jan. 21, 2015, 9:22 p.m., mike upton wrote: src/arch/x86/process.cc, lines 218-237 http://reviews.gem5.org/r/2557/diff/2/?file=42948#file42948line218 For AMD systems, the sys descriptors need to come first. On intel systems they need to come second. I do not know how to resolve... mike upton wrote: I have been debugging why patch rb2557 breaks AMD KVM functionality. I was hoping to get to code that would work on both intel and AMD platforms, but am not there yet. This patch is to be applied on top of rb2557.patch. There are 2 main issues, neither of which I understand well enough to take much further. The first issue is that the order that the segment descriptors get instantiated in the GDT table seems to matter between AMD and Intel, and they seem to be mutually incompatible. AMD wants: csSys dsSys ds cs Intel wants: ds cs dsSys csSys I am not sure the relative ordering of ds and cs within a class matters, only that AMD wants the Sys ones first, and Intel wants them second. There is also an issue with how 'star' gets defined. I can not make the Intel code work for AMD. Both issues are addressed in this patch. The patch makes the AMD system work, but breaks Intel functionality. I am also not sure how to upload this into review board. Do I create a separate patch from TOT, or can I somehow attach this to rb2557. Hopefully Gabe or Alexandru can weigh in. I am happy to help, but I am at my 'Peter Principal Limit' as far as my understanding goes. I think it would be really ugly to have a machine-type test to version the code... Gabe Black wrote: You should grab a copy of the architecture manual. From there: STAR—The STAR register has the following fields (unless otherwise noted, all bits are read/write): - SYSRET CS and SS Selectors—Bits 63:48. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSRET. If SYSRET is returning to 32-bit mode (either legacy or compatibility), this field is copied directly into the CS selector field. If SYSRET is returning to 64-bit mode, the CS selector is set to this field + 16. SS.Sel is set to this field + 8, regardless of the target mode. Because SYSRET always returns to CPL 3, the RPL bits 49:48 should be initialized to 11b. - SYSCALL CS and SS Selectors—Bits 47:32. This field is used to specify both the CS and SS selectors loaded into CS and SS during SYSCALL. This field is copied directly into CS.Sel. SS.Sel is set to this field + 8. Because SYSCALL always switches to CPL 0, the RPL bits 33:32 should be initialized to 00b. That's why the order matters and is what it is. mike upton wrote: AMD and Intel use different solutions, right? AMD: Syscall, sysret Intel: Sysenter, sysexit Do we need independent code streams for each? The original code worked for AMD, but not intel. The current 2557 patch works for Intel, but not AMD. Gabe Black wrote: Yeah, there are some differences between the two. I think both support both pairs of instructions, but I think one or the other only works in 32 bit mode on for one of the vendors, or something along those lines. At one point I could have told you exactly what the difference was, but now I'd have to check the manuals. My expectation/hope is that a single GDT layout would work for both. I doubt the kernel, for instance, specializes its layout based on who's CPU it's running on. OK, it seems like the feedback is that we do need to runtime test the CPU we are running on and do CPU specific code. I certainly can code this up. Any pointers to what state is available in the simulator to test? Or should I just add a cpuHostType() routine that will return Intel, AMD, or UNKNOWN? - mike --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2557/#review5782 --- On Dec. 10, 2014, 10:11 a.m., Gabe Black wrote: --- This is an automatically generated e-mail. To reply, visit: http://reviews.gem5.org/r/2557/ --- (Updated Dec. 10, 2014, 10:11 a.m.) Review request for Default. Repository: gem5 Description ---
[gem5-dev] memory change during system call emulation
Hello, I am writing a trace probe for AtomicSimpleCPU. The simulation is planed to run in SE mode. For user-level instruction, I can get address and data of memory access by *traceData* member in BasicSimpleCPU. But for system call, I don't know how to collect the memory change during the syscall emulation, especially when read/write syscalls are emulated. Anybody can provide a clue for me? Thanks, Meng ___ gem5-dev mailing list gem5-dev@gem5.org http://m5sim.org/mailman/listinfo/gem5-dev