[m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
# HG changeset patch # User Gabe Black gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Configuration files/tables for/in the simulated guest.
Before I get rolling on this too seriously, should the builder function go within the python compiled into M5, or should it go into the post compile config files? Do we have a rule of thumb for what goes where? This seems important enough to be built in and not repeated in every user config, but it's also not a SimObject or anything like that. Gabe Steve Reinhardt wrote: I agree with Nate... I think deducing the logical system configuration directly from an arbitrary M5 SimObject tree is going to be next to impossible, given that the tree doesn't reflect the memory hierarchy interconnection topology, and since the M5 tree could very easily correspond to an undescribable config from the BIOS perspective. I wouldn't worry too much in the short term about making your system builder flexible; is it even possible to describe a non-traditional configuration like a mesh in the BIOS tables? I think having a basic tool that supports a handful of straightforward configs is an appropriate first step; people that feel the need to do more than that can be responsible for generalizing your solution as needed. I'm sure you've heard my spiel about spending too much time designing in unnecessary generality up front... Steve On Tue, Apr 21, 2009 at 12:36 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: It seems reasonable, although I'm sure it will be easier said than done one you get down to the details. The challenge will be putting it together so that it doesn't make so many assumptions that it becomes useless outside of traditional sorts of configurations like, for instance, meshes. I agree that trying to guess what the user really wanted your at best going to be close most of the time. Gabe nathan binkert wrote: My initial reaction is that it's probably better to build some sort of semi-configurable system builder function that creates a system with the right number of CPUs and the right topology of caches. If you do try to deduce what the user had in mind, I just think you're probably in for a very fragile system. Does this sort of thing seem reasonable? Nate I reordered my patches so that I could push this, and now all of the remaining ones are dependent on or are hacks that force there to be two CPUs. In order to push these last ~5 patches, I need to know how many CPUs there are when I'm constructing the BIOS tables on the python side of things. First, I think it would be useful to have a mechanism that will allow enumerating a simobjects children that are also simobjects. That way you could, for instance, go through and gather up all the CPUs to count them, assign an ISA level ID to them, etc. There are potential complications where you might leak into some other system, not be able to follow certain configurations correctly, not make the jump over ports, etc that might make this more complicated. There may also be CPUs you don't want to count that you'll, for instance, switch over to later. Besides CPUs, these tables also tend to have information about caches, and at least on the x86 side they expect to be identified by depth. There would need to be some way of identifying how far a particular cache is from a particular CPU in order to label it the, for instance, L2. I think you may also need to know what CPUs are connected to a particular cache so you can call it shared, etc. These sorts of names may not make sense in whatever crazy configuration somebody sets up. Another issue is determining bus topology and interrupt assignment. In x86 each bus needs to be identified with a particular technology, and all interrupts need to be described as far as who generates them, what bus they originate on, and where they ultimately end up. Linux gets confused when at least the interrupt information is wrong and things get lost. I think it's important for this to be automated as much as possible because I will attest it can be tricky to get everything right. Having a system in place which automatically builds up a semi-qualitative description of whatever whacky configuration people want to throw at it is going to be very challenging, though, because it might not always be clear what the right thing to do is when setting things up manually, and fixing a generated configuration may actually be harder than building it yourself. How this works has the potential to introduce a lot of problems, and if we get it wrong it could take a major effort to fix an entrenched system. I see this primarily as an
[m5-dev] Cron m5t...@zizzer /z/m5/regression/do-regression --scratch all
See /z/m5/regression/regress-2009-04-26-03:00:01 for details. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Configuration files/tables for/in the simulated guest.
I don't think a builder function that does everything I want will fit into the layout of the config stuff as is. I'm just going to augment what I've got to consider the number of CPUs. Gabe Gabe Black wrote: Before I get rolling on this too seriously, should the builder function go within the python compiled into M5, or should it go into the post compile config files? Do we have a rule of thumb for what goes where? This seems important enough to be built in and not repeated in every user config, but it's also not a SimObject or anything like that. Gabe Steve Reinhardt wrote: I agree with Nate... I think deducing the logical system configuration directly from an arbitrary M5 SimObject tree is going to be next to impossible, given that the tree doesn't reflect the memory hierarchy interconnection topology, and since the M5 tree could very easily correspond to an undescribable config from the BIOS perspective. I wouldn't worry too much in the short term about making your system builder flexible; is it even possible to describe a non-traditional configuration like a mesh in the BIOS tables? I think having a basic tool that supports a handful of straightforward configs is an appropriate first step; people that feel the need to do more than that can be responsible for generalizing your solution as needed. I'm sure you've heard my spiel about spending too much time designing in unnecessary generality up front... Steve On Tue, Apr 21, 2009 at 12:36 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: It seems reasonable, although I'm sure it will be easier said than done one you get down to the details. The challenge will be putting it together so that it doesn't make so many assumptions that it becomes useless outside of traditional sorts of configurations like, for instance, meshes. I agree that trying to guess what the user really wanted your at best going to be close most of the time. Gabe nathan binkert wrote: My initial reaction is that it's probably better to build some sort of semi-configurable system builder function that creates a system with the right number of CPUs and the right topology of caches. If you do try to deduce what the user had in mind, I just think you're probably in for a very fragile system. Does this sort of thing seem reasonable? Nate I reordered my patches so that I could push this, and now all of the remaining ones are dependent on or are hacks that force there to be two CPUs. In order to push these last ~5 patches, I need to know how many CPUs there are when I'm constructing the BIOS tables on the python side of things. First, I think it would be useful to have a mechanism that will allow enumerating a simobjects children that are also simobjects. That way you could, for instance, go through and gather up all the CPUs to count them, assign an ISA level ID to them, etc. There are potential complications where you might leak into some other system, not be able to follow certain configurations correctly, not make the jump over ports, etc that might make this more complicated. There may also be CPUs you don't want to count that you'll, for instance, switch over to later. Besides CPUs, these tables also tend to have information about caches, and at least on the x86 side they expect to be identified by depth. There would need to be some way of identifying how far a particular cache is from a particular CPU in order to label it the, for instance, L2. I think you may also need to know what CPUs are connected to a particular cache so you can call it shared, etc. These sorts of names may not make sense in whatever crazy configuration somebody sets up. Another issue is determining bus topology and interrupt assignment. In x86 each bus needs to be identified with a particular technology, and all interrupts need to be described as far as who generates them, what bus they originate on, and where they ultimately end up. Linux gets confused when at least the interrupt information is wrong and things get lost. I think it's important for this to be automated as much as possible because I will attest it can be tricky to get everything right. Having a system in place which automatically builds up a semi-qualitative description of whatever whacky configuration people want to throw at it is going to be very challenging, though, because it might not always be clear what the right thing to do is when setting things up manually, and fixing a generated configuration may actually be harder than building
[m5-dev] changeset in m5: X86: Record the initial APIC ID which identifie...
changeset 4f8af2f3185f in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=4f8af2f3185f description: X86: Record the initial APIC ID which identifies an APIC in M5. The ID as exposed to software can be changed. Tracking those changes in M5 would be cumbersome, especially since there's no guarantee the IDs will remain unique. diffstat: 4 files changed, 23 insertions(+), 23 deletions(-) src/arch/x86/interrupts.cc | 36 src/arch/x86/interrupts.hh |2 ++ src/dev/x86/i82094aa.cc|2 +- src/dev/x86/i82094aa.hh|6 -- diffs (139 lines): diff -r 9327451a8e7a -r 4f8af2f3185f src/arch/x86/interrupts.cc --- a/src/arch/x86/interrupts.ccSun Apr 26 02:04:32 2009 -0700 +++ b/src/arch/x86/interrupts.ccSun Apr 26 02:06:21 2009 -0700 @@ -295,17 +295,21 @@ void X86ISA::Interrupts::setCPU(BaseCPU * newCPU) { +assert(newCPU); +if (cpu != NULL cpu-cpuId() != newCPU-cpuId()) { +panic(Local APICs can't be moved between CPUs + with different IDs.\n); +} cpu = newCPU; -assert(cpu); -regs[APIC_ID] = (cpu-cpuId() 24); +initialApicId = cpu-cpuId(); +regs[APIC_ID] = (initialApicId 24); } Tick X86ISA::Interrupts::recvMessage(PacketPtr pkt) { -uint8_t id = (regs[APIC_ID] 24); -Addr offset = pkt-getAddr() - x86InterruptAddress(id, 0); +Addr offset = pkt-getAddr() - x86InterruptAddress(initialApicId, 0); assert(pkt-cmd == MemCmd::MessageReq); switch(offset) { @@ -315,9 +319,6 @@ DPRINTF(LocalApic, Got Trigger Interrupt message with vector %#x.\n, message.vector); -// Make sure we're really supposed to get this. -assert((message.destMode == 0 message.destination == id) || - (bits((int)message.destination, id))); requestInterrupt(message.vector, message.deliveryMode, message.trigger); @@ -354,10 +355,10 @@ void X86ISA::Interrupts::addressRanges(AddrRangeList range_list) { -uint8_t id = (regs[APIC_ID] 24); range_list.clear(); -RangeAddr range = RangeEx(x86LocalAPICAddress(id, 0), -x86LocalAPICAddress(id, 0) + PageBytes); +RangeAddr range = RangeEx(x86LocalAPICAddress(initialApicId, 0), +x86LocalAPICAddress(initialApicId, 0) + +PageBytes); range_list.push_back(range); pioAddr = range.start; } @@ -366,10 +367,10 @@ void X86ISA::Interrupts::getIntAddrRange(AddrRangeList range_list) { -uint8_t id = (regs[APIC_ID] 24); range_list.clear(); -range_list.push_back(RangeEx(x86InterruptAddress(id, 0), -x86InterruptAddress(id, 0) + PhysAddrAPICRangeSize)); +range_list.push_back(RangeEx(x86InterruptAddress(initialApicId, 0), +x86InterruptAddress(initialApicId, 0) + +PhysAddrAPICRangeSize)); } @@ -515,14 +516,9 @@ { int numContexts = sys-numContexts(); pendingIPIs += (numContexts - 1); -// We have no way to get at the thread context we're part -// of, so we'll just have to go with the CPU for now. -hack_once(Broadcast IPIs can't handle more than -one context per CPU.\n); -int myId = cpu-getContext(0)-contextId(); for (int i = 0; i numContexts; i++) { int thisId = sys-getThreadContext(i)-contextId(); -if (thisId != myId) { +if (thisId != initialApicId) { PacketPtr pkt = buildIntRequest(thisId, message); if (timing) intPort-sendMessageTiming(pkt, latency); @@ -589,7 +585,7 @@ pendingInit(false), initVector(0), pendingStartup(false), startupVector(0), startedUp(false), pendingUnmaskableInt(false), -pendingIPIs(0) +pendingIPIs(0), cpu(NULL) { pioSize = PageBytes; memset(regs, 0, sizeof(regs)); diff -r 9327451a8e7a -r 4f8af2f3185f src/arch/x86/interrupts.hh --- a/src/arch/x86/interrupts.hhSun Apr 26 02:04:32 2009 -0700 +++ b/src/arch/x86/interrupts.hhSun Apr 26 02:06:21 2009 -0700 @@ -191,6 +191,8 @@ BaseCPU *cpu; +int initialApicId; + public: /* * Params stuff. diff -r 9327451a8e7a -r 4f8af2f3185f src/dev/x86/i82094aa.cc --- a/src/dev/x86/i82094aa.cc Sun Apr 26 02:04:32 2009 -0700 +++ b/src/dev/x86/i82094aa.cc Sun Apr 26 02:06:21 2009 -0700 @@ -40,7 +40,7 @@ extIntPic(p-external_int_pic) { // This assumes there's only one I/O APIC in the system -id = p-apic_id; +initialApicId = id = p-apic_id; assert(id = 0xf); arbId = id; regSel = 0; diff -r
[m5-dev] changeset in m5: X86: Make the local APICs register themselves w...
changeset d3ee4e0d690c in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=d3ee4e0d690c description: X86: Make the local APICs register themselves with the IO APIC. This is a hack so that the IO APIC can figure out information about the local APICs. The local APICs still have no way to find out about each other. Ideally, when the local APICs update state that's relevant to somebody else, they'd send an update to everyone. Without being able to do a broadcast, that would still require knowing who else there is to notify. Other broadcasts are implemented using assumptions that may not always be true. diffstat: 4 files changed, 31 insertions(+) src/arch/x86/interrupts.cc | 13 + src/arch/x86/interrupts.hh |5 + src/dev/x86/i82094aa.cc|7 +++ src/dev/x86/i82094aa.hh|6 ++ diffs (96 lines): diff -r 4f8af2f3185f -r d3ee4e0d690c src/arch/x86/interrupts.cc --- a/src/arch/x86/interrupts.ccSun Apr 26 02:06:21 2009 -0700 +++ b/src/arch/x86/interrupts.ccSun Apr 26 02:09:13 2009 -0700 @@ -59,6 +59,9 @@ #include arch/x86/interrupts.hh #include arch/x86/intmessage.hh #include cpu/base.hh +#include dev/x86/i82094aa.hh +#include dev/x86/pc.hh +#include dev/x86/south_bridge.hh #include mem/packet_access.hh #include sim/system.hh @@ -306,6 +309,16 @@ } +void +X86ISA::Interrupts::init() +{ +BasicPioDevice::init(); +Pc * pc = dynamic_castPc *(platform); +assert(pc); +pc-southBridge-ioApic-registerLocalApic(initialApicId, this); +} + + Tick X86ISA::Interrupts::recvMessage(PacketPtr pkt) { diff -r 4f8af2f3185f -r d3ee4e0d690c src/arch/x86/interrupts.hh --- a/src/arch/x86/interrupts.hhSun Apr 26 02:06:21 2009 -0700 +++ b/src/arch/x86/interrupts.hhSun Apr 26 02:09:13 2009 -0700 @@ -214,6 +214,11 @@ } /* + * Initialize this object by registering it with the IO APIC. + */ +void init(); + +/* * Functions to interact with the interrupt port from IntDev. */ Tick read(PacketPtr pkt); diff -r 4f8af2f3185f -r d3ee4e0d690c src/dev/x86/i82094aa.cc --- a/src/dev/x86/i82094aa.cc Sun Apr 26 02:06:21 2009 -0700 +++ b/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:13 2009 -0700 @@ -182,6 +182,13 @@ pinStates[number] = false; } +void +X86ISA::I82094AA::registerLocalApic(int initialId, Interrupts *localApic) +{ +assert(localApic); +localApics[initialId] = localApic; +} + X86ISA::I82094AA * I82094AAParams::create() { diff -r 4f8af2f3185f -r d3ee4e0d690c src/dev/x86/i82094aa.hh --- a/src/dev/x86/i82094aa.hh Sun Apr 26 02:06:21 2009 -0700 +++ b/src/dev/x86/i82094aa.hh Sun Apr 26 02:09:13 2009 -0700 @@ -37,10 +37,13 @@ #include dev/x86/intdev.hh #include params/I82094AA.hh +#include map + namespace X86ISA { class I8259; +class Interrupts; class I82094AA : public PioDevice, public IntDev { @@ -67,6 +70,8 @@ I8259 * extIntPic; +std::mapint, Interrupts * localApics; + uint8_t regSel; uint8_t initialApicId; uint8_t id; @@ -122,6 +127,7 @@ void signalInterrupt(int line); void raiseInterruptPin(int number); void lowerInterruptPin(int number); +void registerLocalApic(int id, Interrupts *localApic); }; }; // namespace X86ISA ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
[m5-dev] changeset in m5: X86: Tell the function that sends int messages ...
changeset 6cbdd76b93db in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=6cbdd76b93db description: X86: Tell the function that sends int messages who to send to instead of figuring it out itself. diffstat: 4 files changed, 84 insertions(+), 56 deletions(-) src/arch/x86/interrupts.cc | 51 src/dev/x86/i82094aa.cc| 34 - src/dev/x86/intdev.cc | 48 ++--- src/dev/x86/intdev.hh |7 +- diffs (214 lines): diff -r d3ee4e0d690c -r 6cbdd76b93db src/arch/x86/interrupts.cc --- a/src/arch/x86/interrupts.ccSun Apr 26 02:09:13 2009 -0700 +++ b/src/arch/x86/interrupts.ccSun Apr 26 02:09:27 2009 -0700 @@ -510,11 +510,41 @@ bool timing = sys-getMemoryMode() == Enums::timing; // Be careful no updates of the delivery status bit get lost. regs[APIC_INTERRUPT_COMMAND_LOW] = low; +ApicList apics; +int numContexts = sys-numContexts(); switch (low.destShorthand) { case 0: -pendingIPIs++; -intPort-sendMessage(message, timing); -newVal = regs[APIC_INTERRUPT_COMMAND_LOW]; +if (message.deliveryMode == DeliveryMode::LowestPriority) { +panic(Lowest priority delivery mode +IPIs aren't implemented.\n); +} +if (message.destMode == 1) { +int dest = message.destination; +hack_once(Assuming logical destinations are 1 id.\n); +for (int i = 0; i numContexts; i++) { +if (dest 0x1) +apics.push_back(i); +dest = dest 1; +} +} else { +if (message.destination == 0xFF) { +for (int i = 0; i numContexts; i++) { +if (i == initialApicId) { +requestInterrupt(message.vector, +message.deliveryMode, message.trigger); +} else { +apics.push_back(i); +} +} +} else { +if (message.destination == initialApicId) { +requestInterrupt(message.vector, +message.deliveryMode, message.trigger); +} else { +apics.push_back(message.destination); +} +} +} break; case 1: newVal = val; @@ -527,22 +557,17 @@ // Fall through case 3: { -int numContexts = sys-numContexts(); -pendingIPIs += (numContexts - 1); for (int i = 0; i numContexts; i++) { -int thisId = sys-getThreadContext(i)-contextId(); -if (thisId != initialApicId) { -PacketPtr pkt = buildIntRequest(thisId, message); -if (timing) -intPort-sendMessageTiming(pkt, latency); -else -intPort-sendMessageAtomic(pkt); +if (i != initialApicId) { +apics.push_back(i); } } } -newVal = regs[APIC_INTERRUPT_COMMAND_LOW]; break; } +pendingIPIs += apics.size(); +intPort-sendMessage(apics, message, timing); +newVal = regs[APIC_INTERRUPT_COMMAND_LOW]; } break; case APIC_LVT_TIMER: diff -r d3ee4e0d690c -r 6cbdd76b93db src/dev/x86/i82094aa.cc --- a/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:13 2009 -0700 +++ b/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:27 2009 -0700 @@ -28,6 +28,7 @@ * Authors: Gabe Black */ +#include arch/x86/interrupts.hh #include arch/x86/intmessage.hh #include dev/x86/i82094aa.hh #include dev/x86/i8259.hh @@ -162,7 +163,38 @@ message.destMode = entry.destMode; message.level = entry.polarity; message.trigger = entry.trigger; -intPort-sendMessage(message, sys-getMemoryMode() == Enums::timing); +ApicList apics; +int numContexts = sys-numContexts(); +if (message.destMode == 0) { +if (message.deliveryMode == DeliveryMode::LowestPriority) { +panic(Lowest priority delivery mode from the +IO APIC aren't supported in physical +destination mode.\n); +} +
[m5-dev] changeset in m5: X86: Implement lowest priority interrupts more ...
changeset 2bfd792b1cc0 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=2bfd792b1cc0 description: X86: Implement lowest priority interrupts more correctly. Lowest priority interrupts are now delivered based on a rotating offset into the list of potential recipients. There could be parasitic cases were a processor gets picked on and ends up at that rotating offset all the time, but it's much more likely that the group will stay consistent and the pain will be distributed evenly. diffstat: 2 files changed, 19 insertions(+), 3 deletions(-) src/dev/x86/i82094aa.cc | 20 +--- src/dev/x86/i82094aa.hh |2 ++ diffs (49 lines): diff -r 6cbdd76b93db -r 2bfd792b1cc0 src/dev/x86/i82094aa.cc --- a/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:27 2009 -0700 +++ b/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:54 2009 -0700 @@ -38,7 +38,7 @@ X86ISA::I82094AA::I82094AA(Params *p) : PioDevice(p), IntDev(this), latency(p-pio_latency), pioAddr(p-pio_addr), -extIntPic(p-external_int_pic) +extIntPic(p-external_int_pic), lowestPriorityOffset(0) { // This assumes there's only one I/O APIC in the system initialApicId = id = p-apic_id; @@ -189,8 +189,22 @@ apics.push_back(localApicIt-first); } } -if (message.deliveryMode == DeliveryMode::LowestPriority) { -panic(Lowest priority delivery mode is not implemented.\n); +if (message.deliveryMode == DeliveryMode::LowestPriority +apics.size()) { +// The manual seems to suggest that the chipset just does +// something reasonable for these instead of actually using +// state from the local APIC. We'll just rotate an offset +// through the set of APICs selected above. +uint64_t modOffset = lowestPriorityOffset % apics.size(); +lowestPriorityOffset++; +ApicList::iterator apicIt = apics.begin(); +while (modOffset--) { +apicIt++; +assert(apicIt != apics.end()); +} +int selected = *apicIt; +apics.clear(); +apics.push_back(selected); } } intPort-sendMessage(apics, message, diff -r 6cbdd76b93db -r 2bfd792b1cc0 src/dev/x86/i82094aa.hh --- a/src/dev/x86/i82094aa.hh Sun Apr 26 02:09:27 2009 -0700 +++ b/src/dev/x86/i82094aa.hh Sun Apr 26 02:09:54 2009 -0700 @@ -77,6 +77,8 @@ uint8_t id; uint8_t arbId; +uint64_t lowestPriorityOffset; + static const uint8_t TableSize = 24; // This implementation is based on version 0x11, but 0x14 avoids having // to deal with the arbitration and APIC bus guck. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] changeset in m5: X86: Implement lowest priority interrupts more ...
X86 can now be considered to have nominal SMP support. An SMP version of the 2.6.22.9 linux kernel boots and fails in init in the same way as the UP kernel. I've uploaded the kernel I used to zizzer if you want to give it a try yourself. Gabe Gabe Black wrote: changeset 2bfd792b1cc0 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=2bfd792b1cc0 description: X86: Implement lowest priority interrupts more correctly. Lowest priority interrupts are now delivered based on a rotating offset into the list of potential recipients. There could be parasitic cases were a processor gets picked on and ends up at that rotating offset all the time, but it's much more likely that the group will stay consistent and the pain will be distributed evenly. diffstat: 2 files changed, 19 insertions(+), 3 deletions(-) src/dev/x86/i82094aa.cc | 20 +--- src/dev/x86/i82094aa.hh |2 ++ diffs (49 lines): diff -r 6cbdd76b93db -r 2bfd792b1cc0 src/dev/x86/i82094aa.cc --- a/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:27 2009 -0700 +++ b/src/dev/x86/i82094aa.cc Sun Apr 26 02:09:54 2009 -0700 @@ -38,7 +38,7 @@ X86ISA::I82094AA::I82094AA(Params *p) : PioDevice(p), IntDev(this), latency(p-pio_latency), pioAddr(p-pio_addr), -extIntPic(p-external_int_pic) +extIntPic(p-external_int_pic), lowestPriorityOffset(0) { // This assumes there's only one I/O APIC in the system initialApicId = id = p-apic_id; @@ -189,8 +189,22 @@ apics.push_back(localApicIt-first); } } -if (message.deliveryMode == DeliveryMode::LowestPriority) { -panic(Lowest priority delivery mode is not implemented.\n); +if (message.deliveryMode == DeliveryMode::LowestPriority +apics.size()) { +// The manual seems to suggest that the chipset just does +// something reasonable for these instead of actually using +// state from the local APIC. We'll just rotate an offset +// through the set of APICs selected above. +uint64_t modOffset = lowestPriorityOffset % apics.size(); +lowestPriorityOffset++; +ApicList::iterator apicIt = apics.begin(); +while (modOffset--) { +apicIt++; +assert(apicIt != apics.end()); +} +int selected = *apicIt; +apics.clear(); +apics.push_back(selected); } } intPort-sendMessage(apics, message, diff -r 6cbdd76b93db -r 2bfd792b1cc0 src/dev/x86/i82094aa.hh --- a/src/dev/x86/i82094aa.hh Sun Apr 26 02:09:27 2009 -0700 +++ b/src/dev/x86/i82094aa.hh Sun Apr 26 02:09:54 2009 -0700 @@ -77,6 +77,8 @@ uint8_t id; uint8_t arbId; +uint64_t lowestPriorityOffset; + static const uint8_t TableSize = 24; // This implementation is based on version 0x11, but 0x14 avoids having // to deal with the arbitration and APIC bus guck. ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- -- Korey L Sewell Graduate Student - PhD Candidate Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] Configuration files/tables for/in the simulated guest.
Before I get rolling on this too seriously, should the builder function go within the python compiled into M5, or should it go into the post compile config files? Do we have a rule of thumb for what goes where? This seems important enough to be built in and not repeated in every user config, but it's also not a SimObject or anything like that. It's hard to say. I guess I'd lean towards saying no. The downside of it being compiled in is that a change requires a recompile. The problem is that the stuff that's not compiled in is a total mess. Nate ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault = curStaticInst-initiateAcc(this, traceData); +inInitiateAcc = false; +if (completionPkt) { +completeDataAccess(completionPkt); +completionPkt = NULL; +} if (_status != Running) { // instruction will complete in dcache response callback assert(_status == DcacheWaitResponse || @@ -856,6 +864,10 @@ void TimingSimpleCPU::completeDataAccess(PacketPtr pkt) { +if (inInitiateAcc) { +completionPkt = pkt; +return; +} // received a response from the dcache: complete the load or store // instruction assert(!pkt-isError()); diff --git a/src/cpu/simple/timing.hh b/src/cpu/simple/timing.hh --- a/src/cpu/simple/timing.hh +++ b/src/cpu/simple/timing.hh @@ -317,6 +317,9 @@ Tick previousTick; +bool inInitiateAcc; +PacketPtr completionPkt; + public: virtual Port *getPort(const std::string if_name, int idx = -1); ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org mailto:m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev -- -- Korey L Sewell Graduate Student - PhD Candidate Computer Science Engineering University of Michigan ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev ___ m5-dev mailing list m5-dev@m5sim.org http://m5sim.org/mailman/listinfo/m5-dev
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying translation. What happens is that Stl_c goes into initiateAcc. That function calls write on the CPU which calls into the TLB which calls the translation callback which recognizes a failed store conditional which completes the instruction execution with completeAcc and cleans up. The call stack then collapses back to the initiateAcc which is still waiting to finish and which then tries to call a member function on traceData which was deleted during the cleanup. The problem here is not fundamentally complicated, but the mechanisms involved are. One solution would be to record the fact that we're still in initiateAcc, and if we are wait for the call stack to collapse back down to initiateAcc's caller before calling into completeAcc. That matches the semantics an instruction would expect more, I think, where the initiateAcc/completeAcc pair are called sequentially. A couple of thoughts come to mind: 1. If the problem is with traceData and specifially just with store conditionals, what prevents you from a solution that handles that is a small overhead on that case instead of on every memory access? Particularly, why not add a StoreCondInitiateAcc template in mem.isa that handles this special case? In there after the write to the CPU, you can maybe check to see if you want to mess with traceData stuff or not depending on if the conditional is already failed or whatever. 2. If the traceData is deleted, then why is it available to be accessed in initiateAcc anyway? Isnt it enclosed by a if (traceData)? Shouldnt that return false if that is a NULL pointer? 3. Lastly, it sounds like this could be a slightly good candidate case for having the ability to functionally split TLB access from the timing memory access since in this case you want to do the access depending on something that happens with the TLB... On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu wrote: There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc b/src/cpu/simple/timing.cc --- a/src/cpu/simple/timing.cc +++ b/src/cpu/simple/timing.cc @@ -115,6 +115,8 @@ ifetch_pkt = dcache_pkt = NULL; drainEvent = NULL; previousTick = 0; +inInitiateAcc = false; +completionPkt = NULL; changeState(SimObject::Running); } @@ -758,7 +760,13 @@ if (curStaticInst curStaticInst-isMemRef() !curStaticInst-isDataPrefetch()) { // load or store: just send to dcache +inInitiateAcc = true; Fault fault =
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
You should read my description, read the code, and then once you understand the problem comment on my solution. Gabe Korey Sewell wrote: Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying translation. What happens is that Stl_c goes into initiateAcc. That function calls write on the CPU which calls into the TLB which calls the translation callback which recognizes a failed store conditional which completes the instruction execution with completeAcc and cleans up. The call stack then collapses back to the initiateAcc which is still waiting to finish and which then tries to call a member function on traceData which was deleted during the cleanup. The problem here is not fundamentally complicated, but the mechanisms involved are. One solution would be to record the fact that we're still in initiateAcc, and if we are wait for the call stack to collapse back down to initiateAcc's caller before calling into completeAcc. That matches the semantics an instruction would expect more, I think, where the initiateAcc/completeAcc pair are called sequentially. A couple of thoughts come to mind: 1. If the problem is with traceData and specifially just with store conditionals, what prevents you from a solution that handles that is a small overhead on that case instead of on every memory access? Particularly, why not add a StoreCondInitiateAcc template in mem.isa that handles this special case? In there after the write to the CPU, you can maybe check to see if you want to mess with traceData stuff or not depending on if the conditional is already failed or whatever. 2. If the traceData is deleted, then why is it available to be accessed in initiateAcc anyway? Isnt it enclosed by a if (traceData)? Shouldnt that return false if that is a NULL pointer? 3. Lastly, it sounds like this could be a slightly good candidate case for having the ability to functionally split TLB access from the timing memory access since in this case you want to do the access depending on something that happens with the TLB... On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch # User Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu # Date 1240729422 25200 # Node ID 284ff3c27bbddacf5763e7b5a54e99e4dcd1912a # Parent 8652636856b3d24fb0088fb1af5f5dca5008d9c8 CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc. diff --git a/src/cpu/simple/timing.cc
Re: [m5-dev] [PATCH] CPU: Make the CPU wait until initiateAcc finishes before calling completeAcc
I should add, I realize it's not very clear what's going on, but it would take a lot more effort to explain everything in enough detail to get the problem across than for you (or anyone else) to look at the code. If you do look into the code, it's not a functional problem as far as how the timing CPU works or organizes events per se. It's a C++ problem where things may or may not end up being called from the middle of each other. Gabe Gabe Black wrote: You should read my description, read the code, and then once you understand the problem comment on my solution. Gabe Korey Sewell wrote: Yea, I was on that thread. Your explanation wasnt particularly in great detail there (files?, line #s?) so its hard to follow without getting your hand dirty on where this code is you are referring to. I'll comment and you can clarify my assumptions... Oooh. I see what's broken. This is a result of my changes to allow delaying translation. What happens is that Stl_c goes into initiateAcc. That function calls write on the CPU which calls into the TLB which calls the translation callback which recognizes a failed store conditional which completes the instruction execution with completeAcc and cleans up. The call stack then collapses back to the initiateAcc which is still waiting to finish and which then tries to call a member function on traceData which was deleted during the cleanup. The problem here is not fundamentally complicated, but the mechanisms involved are. One solution would be to record the fact that we're still in initiateAcc, and if we are wait for the call stack to collapse back down to initiateAcc's caller before calling into completeAcc. That matches the semantics an instruction would expect more, I think, where the initiateAcc/completeAcc pair are called sequentially. A couple of thoughts come to mind: 1. If the problem is with traceData and specifially just with store conditionals, what prevents you from a solution that handles that is a small overhead on that case instead of on every memory access? Particularly, why not add a StoreCondInitiateAcc template in mem.isa that handles this special case? In there after the write to the CPU, you can maybe check to see if you want to mess with traceData stuff or not depending on if the conditional is already failed or whatever. 2. If the traceData is deleted, then why is it available to be accessed in initiateAcc anyway? Isnt it enclosed by a if (traceData)? Shouldnt that return false if that is a NULL pointer? 3. Lastly, it sounds like this could be a slightly good candidate case for having the ability to functionally split TLB access from the timing memory access since in this case you want to do the access depending on something that happens with the TLB... On Sun, Apr 26, 2009 at 2:43 PM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: There's an explanation already in the thread with the subject Memory corruption inm5devrepositorywhenusing --trace-flags=ExecEnable. Stalling or sleeping doesn't make sense in this situation since it's not that sort of problem. Gabe Korey Sewell wrote: Can you explain how the completeAcc ever gets called before the initiatieAcc finishes in a SimpleCPU? I'm thinking, why isnt the SimpleCPU just stalling if there is a delay waiting for a memory access.If that's the right behavior, then isnt the solution to just stall the CPU (or even sleep it) and wait for initiateAcc to finish rather than try to work around the initiateAcc/completeAcc simultaneous call? I guess I dont understand fully the problem but it seems workable without a delay in overhead for all other memory accesses. Also, would this be a issue for any other CPU models (O3 or InOrder)? On Sun, Apr 26, 2009 at 3:22 AM, Gabe Black gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: This patch should fix the issue Geoff was having where tracing would cause a segfault. If you could please give it a try, Geoff, let me know if it actually solves the problem. This will add a small overhead for every memory instruction even though the old behavior is wrong only in certain circumstances. I considered trying to avoid that overhead in cases where the problem can't happen, but then tracing would change the behavior of the CPU and that seems like a bad idea. Assuming this actually fixes the bug and nobody objects I'll commit this soon. Gabe gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu mailto:gbl...@eecs.umich.edu wrote: # HG changeset patch #
[m5-dev] changeset in m5: X86: Centralize updates to the handy M5 reg.
changeset 7a2dc7d41ee1 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=7a2dc7d41ee1 description: X86: Centralize updates to the handy M5 reg. diffstat: 4 files changed, 65 insertions(+), 54 deletions(-) src/arch/x86/faults.cc | 10 +- src/arch/x86/miscregfile.cc | 71 +-- src/arch/x86/miscregfile.hh |2 + src/arch/x86/process.cc | 36 ++--- diffs (239 lines): diff -r 2bfd792b1cc0 -r 7a2dc7d41ee1 src/arch/x86/faults.cc --- a/src/arch/x86/faults.ccSun Apr 26 02:09:54 2009 -0700 +++ b/src/arch/x86/faults.ccSun Apr 26 16:47:48 2009 -0700 @@ -271,12 +271,8 @@ tc-setMiscReg(MISCREG_DR6, 0x0ff0ULL); tc-setMiscReg(MISCREG_DR7, 0x0400ULL); -// We're now in real mode, effectively at CPL 0 -HandyM5Reg m5Reg = 0; -m5Reg.mode = LegacyMode; -m5Reg.submode = RealMode; -m5Reg.cpl = 0; -tc-setMiscReg(MISCREG_M5_REG, m5Reg); +// Update the handy M5 Reg. +tc-setMiscReg(MISCREG_M5_REG, 0); MicroPC entry = X86ISAInst::RomLabels::extern_label_initIntHalt; tc-setMicroPC(romMicroPC(entry)); tc-setNextMicroPC(romMicroPC(entry) + 1); @@ -289,7 +285,7 @@ HandyM5Reg m5Reg = tc-readMiscReg(MISCREG_M5_REG); if (m5Reg.mode != LegacyMode || m5Reg.submode != RealMode) { panic(Startup IPI recived outside of real mode. -Don't know what to do.); +Don't know what to do. %d, %d, m5Reg.mode, m5Reg.submode); } tc-setMiscReg(MISCREG_CS, vector 8); diff -r 2bfd792b1cc0 -r 7a2dc7d41ee1 src/arch/x86/miscregfile.cc --- a/src/arch/x86/miscregfile.cc Sun Apr 26 02:09:54 2009 -0700 +++ b/src/arch/x86/miscregfile.cc Sun Apr 26 16:47:48 2009 -0700 @@ -96,6 +96,31 @@ class Checkpoint; +void MiscRegFile::updateHandyM5Reg(Efer efer, CR0 cr0, +SegAttr csAttr, RFLAGS rflags) +{ +HandyM5Reg m5reg; +if (efer.lma) { +m5reg.mode = LongMode; +if (csAttr.longMode) +m5reg.submode = SixtyFourBitMode; +else +m5reg.submode = CompatabilityMode; +} else { +m5reg.mode = LegacyMode; +if (cr0.pe) { +if (rflags.vm) +m5reg.submode = Virtual8086Mode; +else +m5reg.submode = ProtectedMode; +} else { +m5reg.submode = RealMode; +} +} +m5reg.cpl = csAttr.dpl; +regVal[MISCREG_M5_REG] = m5reg; +} + void MiscRegFile::clear() { // Blank everything. 0 might not be an appropriate value for some things, @@ -151,39 +176,17 @@ CR0 toggled = regVal[miscReg] ^ val; CR0 newCR0 = val; Efer efer = regVal[MISCREG_EFER]; -HandyM5Reg m5reg = regVal[MISCREG_M5_REG]; if (toggled.pg efer.lme) { if (newCR0.pg) { //Turning on long mode efer.lma = 1; -m5reg.mode = LongMode; regVal[MISCREG_EFER] = efer; } else { //Turning off long mode efer.lma = 0; -m5reg.mode = LegacyMode; regVal[MISCREG_EFER] = efer; } } -// Figure out what submode we're in. -if (m5reg.mode == LongMode) { -SegAttr csAttr = regVal[MISCREG_CS_ATTR]; -if (csAttr.longMode) -m5reg.submode = SixtyFourBitMode; -else -m5reg.submode = CompatabilityMode; -} else { -if (newCR0.pe) { -RFLAGS rflags = regVal[MISCREG_RFLAGS]; -if (rflags.vm) -m5reg.submode = Virtual8086Mode; -else -m5reg.submode = ProtectedMode; -} else { -m5reg.submode = RealMode; -} -} -regVal[MISCREG_M5_REG] = m5reg; if (toggled.pg) { tc-getITBPtr()-invalidateAll(); tc-getDTBPtr()-invalidateAll(); @@ -191,6 +194,10 @@ //This must always be 1. newCR0.et = 1; newVal = newCR0; +updateHandyM5Reg(regVal[MISCREG_EFER], + newCR0, + regVal[MISCREG_CS_ATTR], + regVal[MISCREG_RFLAGS]); } break; case MISCREG_CR2: @@ -214,26 +221,23 @@ { SegAttr toggled = regVal[miscReg] ^ val; SegAttr newCSAttr = val; -HandyM5Reg m5reg = regVal[MISCREG_M5_REG]; if (toggled.longMode) { if (newCSAttr.longMode) { -if (m5reg.mode == LongMode) -
[m5-dev] changeset in m5: X86: Split out the internal memory space from t...
changeset 5babc3f3d8c8 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=5babc3f3d8c8 description: X86: Split out the internal memory space from the regular translate() and precompute mode. diffstat: 4 files changed, 382 insertions(+), 373 deletions(-) src/arch/x86/miscregfile.cc |2 src/arch/x86/miscregs.hh|2 src/arch/x86/tlb.cc | 749 +-- src/arch/x86/tlb.hh |2 diffs (truncated from 830 to 300 lines): diff -r 7a2dc7d41ee1 -r 5babc3f3d8c8 src/arch/x86/miscregfile.cc --- a/src/arch/x86/miscregfile.cc Sun Apr 26 16:47:48 2009 -0700 +++ b/src/arch/x86/miscregfile.cc Sun Apr 26 16:48:44 2009 -0700 @@ -118,6 +118,8 @@ } } m5reg.cpl = csAttr.dpl; +m5reg.paging = cr0.pg; +m5reg.prot = cr0.pe; regVal[MISCREG_M5_REG] = m5reg; } diff -r 7a2dc7d41ee1 -r 5babc3f3d8c8 src/arch/x86/miscregs.hh --- a/src/arch/x86/miscregs.hh Sun Apr 26 16:47:48 2009 -0700 +++ b/src/arch/x86/miscregs.hh Sun Apr 26 16:48:44 2009 -0700 @@ -518,6 +518,8 @@ Bitfield0 mode; Bitfield3, 1 submode; Bitfield5, 4 cpl; +Bitfield6 paging; +Bitfield7 prot; EndBitUnion(HandyM5Reg) /** diff -r 7a2dc7d41ee1 -r 5babc3f3d8c8 src/arch/x86/tlb.cc --- a/src/arch/x86/tlb.cc Sun Apr 26 16:47:48 2009 -0700 +++ b/src/arch/x86/tlb.cc Sun Apr 26 16:48:44 2009 -0700 @@ -186,391 +186,393 @@ } Fault +TLB::translateInt(RequestPtr req, ThreadContext *tc) +{ +DPRINTF(TLB, Addresses references internal memory.\n); +Addr vaddr = req-getVaddr(); +Addr prefix = (vaddr 3) IntAddrPrefixMask; +if (prefix == IntAddrPrefixCPUID) { +panic(CPUID memory space not yet implemented!\n); +} else if (prefix == IntAddrPrefixMSR) { +vaddr = vaddr 3; +req-setMmapedIpr(true); +Addr regNum = 0; +switch (vaddr ~IntAddrPrefixMask) { + case 0x10: +regNum = MISCREG_TSC; +break; + case 0x1B: +regNum = MISCREG_APIC_BASE; +break; + case 0xFE: +regNum = MISCREG_MTRRCAP; +break; + case 0x174: +regNum = MISCREG_SYSENTER_CS; +break; + case 0x175: +regNum = MISCREG_SYSENTER_ESP; +break; + case 0x176: +regNum = MISCREG_SYSENTER_EIP; +break; + case 0x179: +regNum = MISCREG_MCG_CAP; +break; + case 0x17A: +regNum = MISCREG_MCG_STATUS; +break; + case 0x17B: +regNum = MISCREG_MCG_CTL; +break; + case 0x1D9: +regNum = MISCREG_DEBUG_CTL_MSR; +break; + case 0x1DB: +regNum = MISCREG_LAST_BRANCH_FROM_IP; +break; + case 0x1DC: +regNum = MISCREG_LAST_BRANCH_TO_IP; +break; + case 0x1DD: +regNum = MISCREG_LAST_EXCEPTION_FROM_IP; +break; + case 0x1DE: +regNum = MISCREG_LAST_EXCEPTION_TO_IP; +break; + case 0x200: +regNum = MISCREG_MTRR_PHYS_BASE_0; +break; + case 0x201: +regNum = MISCREG_MTRR_PHYS_MASK_0; +break; + case 0x202: +regNum = MISCREG_MTRR_PHYS_BASE_1; +break; + case 0x203: +regNum = MISCREG_MTRR_PHYS_MASK_1; +break; + case 0x204: +regNum = MISCREG_MTRR_PHYS_BASE_2; +break; + case 0x205: +regNum = MISCREG_MTRR_PHYS_MASK_2; +break; + case 0x206: +regNum = MISCREG_MTRR_PHYS_BASE_3; +break; + case 0x207: +regNum = MISCREG_MTRR_PHYS_MASK_3; +break; + case 0x208: +regNum = MISCREG_MTRR_PHYS_BASE_4; +break; + case 0x209: +regNum = MISCREG_MTRR_PHYS_MASK_4; +break; + case 0x20A: +regNum = MISCREG_MTRR_PHYS_BASE_5; +break; + case 0x20B: +regNum = MISCREG_MTRR_PHYS_MASK_5; +break; + case 0x20C: +regNum = MISCREG_MTRR_PHYS_BASE_6; +break; + case 0x20D: +regNum = MISCREG_MTRR_PHYS_MASK_6; +break; + case 0x20E: +regNum = MISCREG_MTRR_PHYS_BASE_7; +break; + case 0x20F: +regNum = MISCREG_MTRR_PHYS_MASK_7; +break; + case 0x250: +regNum = MISCREG_MTRR_FIX_64K_0; +break; + case 0x258: +regNum = MISCREG_MTRR_FIX_16K_8; +break; + case 0x259: +regNum = MISCREG_MTRR_FIX_16K_A; +break; + case 0x268: +regNum = MISCREG_MTRR_FIX_4K_C; +
[m5-dev] changeset in m5: X86: Precompute the default and alternate addre...
changeset af13ed3bea48 in /z/repo/m5 details: http://repo.m5sim.org/m5?cmd=changeset;node=af13ed3bea48 description: X86: Precompute the default and alternate address and operand size and the stack size. diffstat: 5 files changed, 69 insertions(+), 64 deletions(-) src/arch/x86/miscregfile.cc | 43 +++ src/arch/x86/miscregfile.hh |2 - src/arch/x86/miscregs.hh|5 ++ src/arch/x86/predecoder.cc | 78 +-- src/arch/x86/predecoder.hh |5 ++ diffs (260 lines): diff -r 5babc3f3d8c8 -r af13ed3bea48 src/arch/x86/miscregfile.cc --- a/src/arch/x86/miscregfile.cc Sun Apr 26 16:48:44 2009 -0700 +++ b/src/arch/x86/miscregfile.cc Sun Apr 26 16:49:24 2009 -0700 @@ -97,7 +97,7 @@ class Checkpoint; void MiscRegFile::updateHandyM5Reg(Efer efer, CR0 cr0, -SegAttr csAttr, RFLAGS rflags) +SegAttr csAttr, SegAttr ssAttr, RFLAGS rflags) { HandyM5Reg m5reg; if (efer.lma) { @@ -120,6 +120,37 @@ m5reg.cpl = csAttr.dpl; m5reg.paging = cr0.pg; m5reg.prot = cr0.pe; + +// Compute the default and alternate operand size. +if (m5reg.submode == SixtyFourBitMode || csAttr.defaultSize) { +m5reg.defOp = 2; +m5reg.altOp = 1; +} else { +m5reg.defOp = 1; +m5reg.altOp = 2; +} + +// Compute the default and alternate address size. +if (m5reg.submode == SixtyFourBitMode) { +m5reg.defAddr = 3; +m5reg.altAddr = 2; +} else if (csAttr.defaultSize) { +m5reg.defAddr = 2; +m5reg.altAddr = 1; +} else { +m5reg.defAddr = 1; +m5reg.altAddr = 2; +} + +// Compute the stack size +if (m5reg.submode == SixtyFourBitMode) { +m5reg.stack = 3; +} else if (ssAttr.defaultSize) { +m5reg.stack = 2; +} else { +m5reg.stack = 1; +} + regVal[MISCREG_M5_REG] = m5reg; } @@ -199,6 +230,7 @@ updateHandyM5Reg(regVal[MISCREG_EFER], newCR0, regVal[MISCREG_CS_ATTR], + regVal[MISCREG_SS_ATTR], regVal[MISCREG_RFLAGS]); } break; @@ -239,9 +271,17 @@ updateHandyM5Reg(regVal[MISCREG_EFER], regVal[MISCREG_CR0], newCSAttr, + regVal[MISCREG_SS_ATTR], regVal[MISCREG_RFLAGS]); } break; + case MISCREG_SS_ATTR: +updateHandyM5Reg(regVal[MISCREG_EFER], + regVal[MISCREG_CR0], + regVal[MISCREG_CS_ATTR], + val, + regVal[MISCREG_RFLAGS]); +break; // These segments always actually use their bases, or in other words // their effective bases must stay equal to their actual bases. case MISCREG_FS_BASE: @@ -346,6 +386,7 @@ updateHandyM5Reg(regVal[MISCREG_EFER], regVal[MISCREG_CR0], regVal[MISCREG_CS_ATTR], + regVal[MISCREG_SS_ATTR], regVal[MISCREG_RFLAGS]); return; default: diff -r 5babc3f3d8c8 -r af13ed3bea48 src/arch/x86/miscregfile.hh --- a/src/arch/x86/miscregfile.hh Sun Apr 26 16:48:44 2009 -0700 +++ b/src/arch/x86/miscregfile.hh Sun Apr 26 16:49:24 2009 -0700 @@ -108,7 +108,7 @@ protected: MiscReg regVal[NumMiscRegs]; void updateHandyM5Reg(Efer efer, CR0 cr0, -SegAttr csAttr, RFLAGS rflags); +SegAttr csAttr, SegAttr ssAttr, RFLAGS rflags); public: void clear(); diff -r 5babc3f3d8c8 -r af13ed3bea48 src/arch/x86/miscregs.hh --- a/src/arch/x86/miscregs.hh Sun Apr 26 16:48:44 2009 -0700 +++ b/src/arch/x86/miscregs.hh Sun Apr 26 16:49:24 2009 -0700 @@ -520,6 +520,11 @@ Bitfield5, 4 cpl; Bitfield6 paging; Bitfield7 prot; +Bitfield9, 8 defOp; +Bitfield11, 10 altOp; +Bitfield13, 12 defAddr; +Bitfield15, 14 altAddr; +Bitfield17, 16 stack; EndBitUnion(HandyM5Reg) /** diff -r 5babc3f3d8c8 -r af13ed3bea48 src/arch/x86/predecoder.cc --- a/src/arch/x86/predecoder.ccSun Apr 26 16:48:44 2009 -0700 +++ b/src/arch/x86/predecoder.ccSun Apr 26 16:49:24 2009 -0700 @@ -80,9 +80,9 @@ emi.modRM = 0; emi.sib = 0; -HandyM5Reg m5reg = tc-readMiscRegNoEffect(MISCREG_M5_REG); -emi.mode.mode = m5reg.mode; -emi.mode.submode = m5reg.submode; +m5Reg = tc-readMiscRegNoEffect(MISCREG_M5_REG); +emi.mode.mode = m5Reg.mode; +emi.mode.submode = m5Reg.submode; } void Predecoder::process() @@ -216,34 +216,15 @@ DPRINTF(Predecoder, Found opcode %#x.\n, nextByte); emi.opcode.op = nextByte; -