Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

Korey Sewell Mon, 23 Jun 2008 13:24:33 -0700

If starting workloads at PC 0 is what we intend to do  (when did that
happen?), that would mean that the real starting PC would need to be
loaded in the correct system register.


For SE mode, are we making sure we load the system registers correctly
so that the fault handler can pick it up and start at the right
address?

Last time I looked at the code (admittedly a while back), the AlphaTLB
the fault handling code for ITLB miss loads from a system register
right but then the problem I encountered was that in SE mode all the
system stuff isnt loaded correctly so even though the fault is being
taken correctly the registers to handle it at init. time werent set up
correctly (Probably in the src/arch/alpha/process.cc) for SE and thus
the trap handler didnt go through right.

Again, if the code is updated to a diff. point I could be wrong. I
dont even remember us intentionally starting workloads at PC 0 to get
the ITLB Miss Fault so that's why I think that's a problem.

prannav,
is the CPU0 getting the same PC 0 fault to start off with or is it
just CPU1? The answer to that question probably tells the story.

On 6/23/08, Gabe Black <[EMAIL PROTECTED]> wrote:
> Yes, it starts with PC 0, and yest there's an I TLB miss, but latter
> there are actual instructions in the rest of the pipeline which wouldn't
> happen if the process was bogus. There would also not be any
> instructions to squash. Every time o3 starts up, it has to get itself
> initialized and causes some bogus faults and such for the first few
> ticks. After that, it gets it's act straightened out and goes to the
> right place. I think he mentioned trying initializing the workload
> outside of the loop the, so I'm pretty sure a bad process object isn't
> it the culprit.
>
> To answer your question, yes there is a TLB object in SE mode. The TLB
> itself doesn't handle the faults and is instead loaded by the faults, at
> which point execution continues like it normally would. There is still
> an abstracted page table structure the fault uses to look up the
> official mapping of a virtual address to load the TLB, so if the address
> is completely invalid, ie. not mapped in any way by the process, there
> would still be a fail or panic just like before. I do think the TLB
> could be to blame, though, since it's possible the fault is loading the
> TLB in such a way as the TLB never matches with the entry.
>
> Prannav, if you could find the answers to those questions I mentioned,
> it would really help clarify what's actually going on. If you run a gdb
> targeted at Alpha on the benchmark binary and disassemble the
> instruction at 0x12000dde8 that's what the instruction should be. In the
> O3 output, I think it prints out what the instruction disassembles to
> when it's fetched, so that will tell you what M5 is trying to execute.
>
> Gabe
>
> Korey Sewell wrote:
> > but why would there be a TLB miss?
> >
> > It's because you are trying to execute PC "0x0" which is obviously not 
> > valid.
> >
> > I'm pretty sure that's the culprit. That's happened to me a bunch of
> > times in the past and it's always initializing the process wrong in
> > some way.
> >
> > But Did we not add TLB code for SE mode and the like? So now, instead
> > of a "unmapped" failure and die (like old M5) we are probably just
> > repeatedly trying to handle a trap that we shouldnt be handling.
> >
> >
> > On 6/23/08, Gabe Black <[EMAIL PROTECTED]> wrote:
> >
> >> I don't think that's actually the problem since later the cpu goes
> >> through a lot of the motions of executing instructions. The line
> >>
> >>  13000: system.cpu1.commit: Inst [sn:3] PC 0x12000dde8 has a fault
> >>
> >>
> >> suggests that the instructions are executing, they're just faulting over
> >> and over. What would be helpful is if you can figure out:
> >>
> >> 1. What the instruction actually is.
> >> 2. What the fault it's throwing is.
> >> 3. Why it's throwing that fault.
> >> 4. Why it never successfully fixes that condition.
> >>
> >> What I'd guess is that there's some sort of data TLB miss that's
> >> happening which is never successfully being fixed. Usually in glibc, one
> >> of if not the first instruction a process executes sets the frame
> >> pointer to 0, so I'm not sure what fault this could be throwing. It's
> >> also possible the instruction address is being mistranslated and you're
> >> executing the wrong memory.
> >>
> >> Gabe
> >>
> >> Korey Sewell wrote:
> >>
> >>> You need to look just a bit closer at this... The line(s) of interest are:
> >>> " 0: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> >>> instruction, starting at PC 0x000000."
> >>>
> >>> Thus, if CPU1 is starting at address 0x0, that probably means it is
> >>> starting with no workload, and eventually experienced a trap because
> >>> there is no code at that address.
> >>>
> >>> It probably would be the best thing to have some kind of check
> >>> somewhere to WARN a user that a CPU has no valid process to start from
> >>> and then sleep the CPU rather than waste sim. cycles on that.
> >>>
> >>> Anyway,
> >>> you have to figure out how to get the 2nd CPU to get a valid process.
> >>> On a first cut, I would just hardcode the CPU processes bindings
> >>> instead of using the loop like in your example. If you can get that to
> >>> work, then you know something  is going on with how the loop is
> >>> setting up your system.
> >>>
> >>> I'm guessing that something like this would work (check syntax though):
> >>> "
> >>> Process process1 = Benchmarks.SPECGCCEIO()
> >>> Process process2 = Benchmarks.SPECGCCEIO()
> >>> system.cpu[0].workload = process1
> >>> system.cpu[1].workload = process2
> >>> "
> >>>
> >>> I've done similar when i've needed something quick to work. I've
> >>> noticed that you're using EIO so if you aren't able to hardcode it,
> >>> then the EIO functionality could be the culprit as well.
> >>>
> >>> On Mon, Jun 23, 2008 at 12:15 AM, prannav shrestha <[EMAIL PROTECTED]> 
> >>> wrote:
> >>>
> >>>
> >>>> HI Sewell!!
> >>>> I run the simulation with O3CPU flags, and looking at the information
> >>>> provided, i found that in my case, PC value for CPU remains same forever,
> >>>> whereas CPU0 is executing the workload.  Also, the insructions fetched by
> >>>> CPU1, which is always the same, is squashed everytime. Some part of the
> >>>> debug output is as below:
> >>>>
> >>>> Global frequency set at 1000000000000 ticks per second
> >>>>       0: global: BTB: Creating BTB object.
> >>>>       0: system.cpu0.iew.lsq: LSQ sharing policy set to Partitioned: 32
> >>>> entries per LQ | 32 entries per SQ      0: system.cpu0.commit: Commit 
> >>>> Policy
> >>>> set to Round Robin.      0: system.cpu0.rob: ROB sharing policy set to
> >>>> Partitioned
> >>>>       0: global: Creating AlphaO3CPU object.
> >>>>
> >>>> number of threads: 1
> >>>>       0: global: Workload[0] process is 0      0: global: BTB: Creating 
> >>>> BTB
> >>>> object.
> >>>>       0: system.cpu1.iew.lsq: LSQ sharing policy set to Partitioned: 32
> >>>> entries per LQ | 32 entries per SQ      0: system.cpu1.commit: Commit 
> >>>> Policy
> >>>> set to Round Robin.      0: system.cpu1.rob: ROB sharing policy set to
> >>>> Partitioned
> >>>>       0: global: Creating AlphaO3CPU object.
> >>>>
> >>>> number of threads: 1
> >>>>       0: global: Workload[0] process is 0      0: global: Calling 
> >>>> activate
> >>>> on Thread Context 0
> >>>>       0: system.cpu0: [tid:0]: Calling activate thread.
> >>>>       0: system.cpu0: [tid:0]: Adding to active threads list
> >>>>       0: system.cpu0.fetch: Waking up from quiesce
> >>>>       0: system.cpu0.commit: Generating TC squash event for [tid:0]
> >>>>       0: global: Calling activate on Thread Context 0
> >>>>       0: system.cpu1: [tid:0]: Calling activate thread.
> >>>>       0: system.cpu1: [tid:0]: Adding to active threads list
> >>>>       0: system.cpu1.fetch: Waking up from quiesce
> >>>>       0: system.cpu1.commit: Generating TC squash event for [tid:0]
> >>>>       0: system.cpu1:
> >>>>
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>       0: system.cpu1.fetch: Running stage.
> >>>>       0: system.cpu1.fetch: Attempting to fetch from [tid:0]
> >>>>       0: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x000000.
> >>>>       0: system.cpu1.fetch: [tid:0]: Blocked, need to handle the trap.
> >>>>       0: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC
> >>>> 0x000000      0: system.cpu1.decode: Processing [tid:0]
> >>>>       0: system.cpu1.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>       0: system.cpu1.decode: [tid:0] Nothing to do, breaking out early.
> >>>> ......................
> >>>> ........
> >>>> ....
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>    6500: system.cpu0.fetch: Running stage.
> >>>>    6500: system.cpu0.fetch: There are no more threads available to fetch
> >>>> from.
> >>>>    6500: system.cpu0.decode: Processing [tid:0]
> >>>>    6500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>    6500: system.cpu0.decode: [tid:0] Nothing to do, breaking out early.
> >>>>    6500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>>    6500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>>    6500: system.cpu0.commit: [tid:0]: Instruction [sn:2] PC 0x120140ce8 
> >>>> is
> >>>> head of ROB and ready to commit
> >>>>    6500: system.cpu0.commit: [tid:0]: ROB has 1 insts & 191 free entries.
> >>>>    6500: system.cpu0: Scheduling next tick!
> >>>>    6500: system.cpu1:
> >>>>
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>    6500: system.cpu1.fetch: Running stage.
> >>>>    6500: system.cpu1.fetch: There are no more threads available to fetch
> >>>> from.
> >>>>    6500: system.cpu1.decode: Processing [tid:0]
> >>>>    6500: system.cpu1.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>    6500: system.cpu1.decode: [tid:0] Nothing to do, breaking out early.
> >>>>    6500: system.cpu1.commit: Getting instructions from Rename stage.
> >>>>    6500: system.cpu1.commit: Trying to commit instructions in the ROB.
> >>>>    6500: system.cpu1.commit: [tid:0]: Instruction [sn:2] PC 0x12000dde8 
> >>>> is
> >>>> head of ROB and ready to commit
> >>>>    6500: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free entries.
> >>>>    6500: system.cpu1: Scheduling next tick!
> >>>>    7000: system.cpu1:
> >>>>
> >>>> .............
> >>>> ..........
> >>>> ..........
> >>>>
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   13000: system.cpu1.fetch: Running stage.
> >>>>   13000: system.cpu1.fetch: There are no more threads available to fetch
> >>>> from.
> >>>>   13000: system.cpu1.decode: Processing [tid:0]
> >>>>   13000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>   13000: system.cpu1.decode: [tid:0] Nothing to do, breaking out early.
> >>>>   13000: system.cpu1.commit: Getting instructions from Rename stage.
> >>>>   13000: system.cpu1.commit: Trying to commit instructions in the ROB.
> >>>>   13000: system.cpu1.commit: Trying to commit head instruction, [sn:3]
> >>>> [tid:0]
> >>>>   13000: system.cpu1.commit: Inst [sn:3] PC 0x12000dde8 has a fault
> >>>>   13000: system.cpu1.commit: Generating trap event for [tid:0]
> >>>>   13000: system.cpu1.commit: Unable to commit head instruction
> >>>> PC:0x12000dde8 [tid:0] [sn:3].
> >>>>   13000: system.cpu1.commit: [tid:0]: Instruction [sn:3] PC 0x12000dde8 
> >>>> is
> >>>> head of ROB and ready to commit
> >>>>   13000: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free entries.
> >>>>   13000: system.cpu1: Scheduling next tick!
> >>>>   13000: system.cpu0:
> >>>>
> >>>> .....
> >>>> ....
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   32500: system.cpu0.fetch: Running stage.
> >>>>   32500: system.cpu0.fetch: Attempting to fetch from [tid:0]
> >>>>   32500: system.cpu0.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x120140cf0.
> >>>>   32500: system.cpu0.fetch: Fetch: Doing instruction read.
> >>>>   32500: system.cpu0.fetch: [tid:0]: Doing cache access.
> >>>>   32500: system.cpu0.decode: Processing [tid:0]
> >>>>   32500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>   32500: system.cpu0.decode: [tid:0]: Sending instruction to rename.
> >>>>   32500: system.cpu0.decode: [tid:0]: Processing instruction [sn:3] with 
> >>>> PC
> >>>> 0x120140ce8
> >>>>   32500: system.cpu0.decode: [tid:0]: Processing instruction [sn:4] with 
> >>>> PC
> >>>> 0x120140cec
> >>>>   32500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>>   32500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>>   32500: system.cpu0.commit: [tid:0]: ROB has 0 insts & 192 free entries.
> >>>>   32500: system.cpu0: Scheduling next tick!
> >>>>   33000: system.cpu0:
> >>>>
> >>>> ..............
> >>>> .........
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   39500: system.cpu0.fetch: Running stage.
> >>>>   39500: system.cpu0.fetch: Attempting to fetch from [tid:0]
> >>>>   39500: system.cpu0.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x120140d00.
> >>>>   39500: system.cpu0.fetch: Fetch: Doing instruction read.
> >>>>   39500: system.cpu0.fetch: [tid:0]: Doing cache access.
> >>>>   39500: system.cpu0.decode: Processing [tid:0]
> >>>>   39500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>   39500: system.cpu0.decode: [tid:0]: Sending instruction to rename.
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:5] with 
> >>>> PC
> >>>> 0x120140cf0
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:6] with 
> >>>> PC
> >>>> 0x120140cf4
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:7] with 
> >>>> PC
> >>>> 0x120140cf8
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:8] with 
> >>>> PC
> >>>> 0x120140cfc
> >>>>   39500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>>   39500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>>   39500: system.cpu0.commit: [tid:0]: ROB has 0 insts & 192 free entries.
> >>>>   39500: system.cpu0: Scheduling next tick!
> >>>>   40000: system.cpu0:
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   40000: system.cpu1.fetch: Running stage.
> >>>>   40000: system.cpu1.fetch: There are no more threads available to fetch
> >>>> from.
> >>>>   40000: system.cpu1.decode: Processing [tid:0]
> >>>>   40000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>>   40000: system.cpu1.decode: [tid:0] Nothing to do, breaking out early.
> >>>>   40000: system.cpu1.commit: Squashing from trap, restarting at PC
> >>>> 0x12000dde8
> >>>>   40000: system.cpu1.commit: [tid:0]: Instruction [sn:5] PC 0x12000dde8 
> >>>> is
> >>>> head of ROB and ready to commit
> >>>>   40000: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free entries.
> >>>>   40000: system.cpu1: Scheduling next tick!
> >>>>   40500: system.cpu1.fetch-iport: Received timing
> >>>>   40500: system.cpu1:
> >>>> ....................
> >>>> .................
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>> 1243500: system.cpu0.fetch: Running stage.
> >>>> 1243500: system.cpu0.fetch: There are no more threads available to fetch
> >>>> from.
> >>>> 1243500: system.cpu0.decode: Processing [tid:0]
> >>>> 1243500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>> 1243500: system.cpu0.decode: [tid:0] Nothing to do, breaking out early.
> >>>> 1243500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>> 1243500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>> 1243500: system.cpu0.commit: [tid:0]: Instruction [sn:369] PC 
> >>>> 0x120106160 is
> >>>> head of ROB and ready to commit
> >>>> 1243500: system.cpu0.commit: [tid:0]: ROB has 1 insts & 191 free entries.
> >>>> 1243500: system.cpu0: Scheduling next tick!
> >>>> 1244000: system.cpu0:
> >>>>
> >>>> .........
> >>>> ........
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>> 4001000: system.cpu1.fetch: [tid:0]: Done squashing, switching to 
> >>>> running.
> >>>> 4001000: system.cpu1.fetch: Running stage.
> >>>> 4001000: system.cpu1.fetch: Attempting to fetch from [tid:0]
> >>>> 4001000: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x12000dde8.
> >>>> 4001000: system.cpu1.fetch: [tid:0]: Blocked, need to handle the trap.
> >>>> 4001000: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC
> >>>> 0x12000dde84001000: system.cpu1.decode: Processing [tid:0]
> >>>> 4001000: system.cpu1.decode: [tid:0]: Done squashing, switching to 
> >>>> running.
> >>>> 4001000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to run
> >>>> stage.
> >>>> 4001000: system.cpu1.decode: [tid:0] Nothing to do, breaking out early.
> >>>> 4001000: system.cpu1.commit: Getting instructions from Rename stage.
> >>>> 4001000: system.cpu1.commit: Trying to commit instructions in the ROB.
> >>>> 4001000: system.cpu1.commit: [tid:0]: ROB has 0 insts & 192 free entries.
> >>>> 4001000: system.cpu1: Scheduling next tick!
> >>>> 4001500: system.cpu1:
> >>>> ......
> >>>>
> >>>> Exiting @ cycle 4002500 because all threads reached the max instruction
> >>>> count
> >>>>
> >>>> I am running two different benchmarks in those two cores. I have included
> >>>> the long list of the debug. The PC of CPU1 is always the same. Why the
> >>>> instruction at CPU1 is always being squashed?
> >>>>
> >>>> regards,
> >>>> prannav
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Jun 22, 2008 at 10:49 PM, Korey Sewell <[EMAIL PROTECTED]> wrote:
> >>>>
> >>>>
> >>>>> Turn on the trace flags for O3CPU and see what it says.... Inside the
> >>>>> "o3" folder there should be a *.py file that list the trace-flags.
> >>>>>
> >>>>> If you run the simulation with all the O3 trace-flags on you should be
> >>>>> able to see what's happening. I suggest only running for maybe 100
> >>>>> ticks so that you dont get overloaded with text.
> >>>>>
> >>>>> From the debug output, you should see each CPU get initialized, fetch,
> >>>>> and all that. I'm guessing what happens is that CPU1 starts up, sees
> >>>>> no process to latch onto, and then sleeps, but the debug output should
> >>>>> verify that for you.
> >>>>>
> >>>>> Simply looking at the instruction commits (Exec) wont help you in this
> >>>>> case since you're saying that there is no instructions commit. You'll
> >>>>> at least need to turn on "Fetch Decode Exec" and all that so you can
> >>>>> get a detailed view.
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ----------
> >>>>> Korey L Sewell
> >>>>> Graduate Student - PhD Candidate
> >>>>> Computer Science & Engineering
> >>>>> University of Michigan
> >>>>> _______________________________________________
> >>>>> m5-users mailing list
> >>>>> [email protected]
> >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> m5-users mailing list
> >>>> [email protected]
> >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> m5-users mailing list
> >> [email protected]
> >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >>
> >>
> >
> >
> >
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>


-- 
----------
Korey L Sewell
Graduate Student - PhD Candidate
Computer Science & Engineering
University of Michigan
_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

Reply via email to