Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

prannav shrestha Mon, 23 Jun 2008 13:56:32 -0700

Gabe,
I am new in M5, so will u please guide me how to get the info u need. Like I
am running my configuration file in debug mode. using command
              gdb ../../build/ALPHA_SE/m5.debug


There are lot many commands. If you say which commands to run in debug mode,
it be much faster...

thank guys

Prannav



On Mon, Jun 23, 2008 at 10:22 AM, Gabe Black <[EMAIL PROTECTED]> wrote:

> Yes, it starts with PC 0, and yest there's an I TLB miss, but latter
> there are actual instructions in the rest of the pipeline which wouldn't
> happen if the process was bogus. There would also not be any
> instructions to squash. Every time o3 starts up, it has to get itself
> initialized and causes some bogus faults and such for the first few
> ticks. After that, it gets it's act straightened out and goes to the
> right place. I think he mentioned trying initializing the workload
> outside of the loop the, so I'm pretty sure a bad process object isn't
> it the culprit.
>
> To answer your question, yes there is a TLB object in SE mode. The TLB
> itself doesn't handle the faults and is instead loaded by the faults, at
> which point execution continues like it normally would. There is still
> an abstracted page table structure the fault uses to look up the
> official mapping of a virtual address to load the TLB, so if the address
> is completely invalid, ie. not mapped in any way by the process, there
> would still be a fail or panic just like before. I do think the TLB
> could be to blame, though, since it's possible the fault is loading the
> TLB in such a way as the TLB never matches with the entry.
>
> Prannav, if you could find the answers to those questions I mentioned,
> it would really help clarify what's actually going on. If you run a gdb
> targeted at Alpha on the benchmark binary and disassemble the
> instruction at 0x12000dde8 that's what the instruction should be. In the
> O3 output, I think it prints out what the instruction disassembles to
> when it's fetched, so that will tell you what M5 is trying to execute.
>
> Gabe
>
> Korey Sewell wrote:
> > but why would there be a TLB miss?
> >
> > It's because you are trying to execute PC "0x0" which is obviously not
> valid.
> >
> > I'm pretty sure that's the culprit. That's happened to me a bunch of
> > times in the past and it's always initializing the process wrong in
> > some way.
> >
> > But Did we not add TLB code for SE mode and the like? So now, instead
> > of a "unmapped" failure and die (like old M5) we are probably just
> > repeatedly trying to handle a trap that we shouldnt be handling.
> >
> >
> > On 6/23/08, Gabe Black <[EMAIL PROTECTED]> wrote:
> >
> >> I don't think that's actually the problem since later the cpu goes
> >> through a lot of the motions of executing instructions. The line
> >>
> >>  13000: system.cpu1.commit: Inst [sn:3] PC 0x12000dde8 has a fault
> >>
> >>
> >> suggests that the instructions are executing, they're just faulting over
> >> and over. What would be helpful is if you can figure out:
> >>
> >> 1. What the instruction actually is.
> >> 2. What the fault it's throwing is.
> >> 3. Why it's throwing that fault.
> >> 4. Why it never successfully fixes that condition.
> >>
> >> What I'd guess is that there's some sort of data TLB miss that's
> >> happening which is never successfully being fixed. Usually in glibc, one
> >> of if not the first instruction a process executes sets the frame
> >> pointer to 0, so I'm not sure what fault this could be throwing. It's
> >> also possible the instruction address is being mistranslated and you're
> >> executing the wrong memory.
> >>
> >> Gabe
> >>
> >> Korey Sewell wrote:
> >>
> >>> You need to look just a bit closer at this... The line(s) of interest
> are:
> >>> " 0: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> >>> instruction, starting at PC 0x000000."
> >>>
> >>> Thus, if CPU1 is starting at address 0x0, that probably means it is
> >>> starting with no workload, and eventually experienced a trap because
> >>> there is no code at that address.
> >>>
> >>> It probably would be the best thing to have some kind of check
> >>> somewhere to WARN a user that a CPU has no valid process to start from
> >>> and then sleep the CPU rather than waste sim. cycles on that.
> >>>
> >>> Anyway,
> >>> you have to figure out how to get the 2nd CPU to get a valid process.
> >>> On a first cut, I would just hardcode the CPU processes bindings
> >>> instead of using the loop like in your example. If you can get that to
> >>> work, then you know something  is going on with how the loop is
> >>> setting up your system.
> >>>
> >>> I'm guessing that something like this would work (check syntax though):
> >>> "
> >>> Process process1 = Benchmarks.SPECGCCEIO()
> >>> Process process2 = Benchmarks.SPECGCCEIO()
> >>> system.cpu[0].workload = process1
> >>> system.cpu[1].workload = process2
> >>> "
> >>>
> >>> I've done similar when i've needed something quick to work. I've
> >>> noticed that you're using EIO so if you aren't able to hardcode it,
> >>> then the EIO functionality could be the culprit as well.
> >>>
> >>> On Mon, Jun 23, 2008 at 12:15 AM, prannav shrestha <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>>
> >>>> HI Sewell!!
> >>>> I run the simulation with O3CPU flags, and looking at the information
> >>>> provided, i found that in my case, PC value for CPU remains same
> forever,
> >>>> whereas CPU0 is executing the workload.  Also, the insructions fetched
> by
> >>>> CPU1, which is always the same, is squashed everytime. Some part of
> the
> >>>> debug output is as below:
> >>>>
> >>>> Global frequency set at 1000000000000 ticks per second
> >>>>       0: global: BTB: Creating BTB object.
> >>>>       0: system.cpu0.iew.lsq: LSQ sharing policy set to Partitioned:
> 32
> >>>> entries per LQ | 32 entries per SQ      0: system.cpu0.commit: Commit
> Policy
> >>>> set to Round Robin.      0: system.cpu0.rob: ROB sharing policy set to
> >>>> Partitioned
> >>>>       0: global: Creating AlphaO3CPU object.
> >>>>
> >>>> number of threads: 1
> >>>>       0: global: Workload[0] process is 0      0: global: BTB:
> Creating BTB
> >>>> object.
> >>>>       0: system.cpu1.iew.lsq: LSQ sharing policy set to Partitioned:
> 32
> >>>> entries per LQ | 32 entries per SQ      0: system.cpu1.commit: Commit
> Policy
> >>>> set to Round Robin.      0: system.cpu1.rob: ROB sharing policy set to
> >>>> Partitioned
> >>>>       0: global: Creating AlphaO3CPU object.
> >>>>
> >>>> number of threads: 1
> >>>>       0: global: Workload[0] process is 0      0: global: Calling
> activate
> >>>> on Thread Context 0
> >>>>       0: system.cpu0: [tid:0]: Calling activate thread.
> >>>>       0: system.cpu0: [tid:0]: Adding to active threads list
> >>>>       0: system.cpu0.fetch: Waking up from quiesce
> >>>>       0: system.cpu0.commit: Generating TC squash event for [tid:0]
> >>>>       0: global: Calling activate on Thread Context 0
> >>>>       0: system.cpu1: [tid:0]: Calling activate thread.
> >>>>       0: system.cpu1: [tid:0]: Adding to active threads list
> >>>>       0: system.cpu1.fetch: Waking up from quiesce
> >>>>       0: system.cpu1.commit: Generating TC squash event for [tid:0]
> >>>>       0: system.cpu1:
> >>>>
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>       0: system.cpu1.fetch: Running stage.
> >>>>       0: system.cpu1.fetch: Attempting to fetch from [tid:0]
> >>>>       0: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x000000.
> >>>>       0: system.cpu1.fetch: [tid:0]: Blocked, need to handle the trap.
> >>>>       0: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC
> >>>> 0x000000      0: system.cpu1.decode: Processing [tid:0]
> >>>>       0: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>       0: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>> ......................
> >>>> ........
> >>>> ....
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>    6500: system.cpu0.fetch: Running stage.
> >>>>    6500: system.cpu0.fetch: There are no more threads available to
> fetch
> >>>> from.
> >>>>    6500: system.cpu0.decode: Processing [tid:0]
> >>>>    6500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>    6500: system.cpu0.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>>    6500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>>    6500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>>    6500: system.cpu0.commit: [tid:0]: Instruction [sn:2] PC
> 0x120140ce8 is
> >>>> head of ROB and ready to commit
> >>>>    6500: system.cpu0.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> >>>>    6500: system.cpu0: Scheduling next tick!
> >>>>    6500: system.cpu1:
> >>>>
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>    6500: system.cpu1.fetch: Running stage.
> >>>>    6500: system.cpu1.fetch: There are no more threads available to
> fetch
> >>>> from.
> >>>>    6500: system.cpu1.decode: Processing [tid:0]
> >>>>    6500: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>    6500: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>>    6500: system.cpu1.commit: Getting instructions from Rename stage.
> >>>>    6500: system.cpu1.commit: Trying to commit instructions in the ROB.
> >>>>    6500: system.cpu1.commit: [tid:0]: Instruction [sn:2] PC
> 0x12000dde8 is
> >>>> head of ROB and ready to commit
> >>>>    6500: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> >>>>    6500: system.cpu1: Scheduling next tick!
> >>>>    7000: system.cpu1:
> >>>>
> >>>> .............
> >>>> ..........
> >>>> ..........
> >>>>
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   13000: system.cpu1.fetch: Running stage.
> >>>>   13000: system.cpu1.fetch: There are no more threads available to
> fetch
> >>>> from.
> >>>>   13000: system.cpu1.decode: Processing [tid:0]
> >>>>   13000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>   13000: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>>   13000: system.cpu1.commit: Getting instructions from Rename stage.
> >>>>   13000: system.cpu1.commit: Trying to commit instructions in the ROB.
> >>>>   13000: system.cpu1.commit: Trying to commit head instruction, [sn:3]
> >>>> [tid:0]
> >>>>   13000: system.cpu1.commit: Inst [sn:3] PC 0x12000dde8 has a fault
> >>>>   13000: system.cpu1.commit: Generating trap event for [tid:0]
> >>>>   13000: system.cpu1.commit: Unable to commit head instruction
> >>>> PC:0x12000dde8 [tid:0] [sn:3].
> >>>>   13000: system.cpu1.commit: [tid:0]: Instruction [sn:3] PC
> 0x12000dde8 is
> >>>> head of ROB and ready to commit
> >>>>   13000: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> >>>>   13000: system.cpu1: Scheduling next tick!
> >>>>   13000: system.cpu0:
> >>>>
> >>>> .....
> >>>> ....
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   32500: system.cpu0.fetch: Running stage.
> >>>>   32500: system.cpu0.fetch: Attempting to fetch from [tid:0]
> >>>>   32500: system.cpu0.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x120140cf0.
> >>>>   32500: system.cpu0.fetch: Fetch: Doing instruction read.
> >>>>   32500: system.cpu0.fetch: [tid:0]: Doing cache access.
> >>>>   32500: system.cpu0.decode: Processing [tid:0]
> >>>>   32500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>   32500: system.cpu0.decode: [tid:0]: Sending instruction to rename.
> >>>>   32500: system.cpu0.decode: [tid:0]: Processing instruction [sn:3]
> with PC
> >>>> 0x120140ce8
> >>>>   32500: system.cpu0.decode: [tid:0]: Processing instruction [sn:4]
> with PC
> >>>> 0x120140cec
> >>>>   32500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>>   32500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>>   32500: system.cpu0.commit: [tid:0]: ROB has 0 insts & 192 free
> entries.
> >>>>   32500: system.cpu0: Scheduling next tick!
> >>>>   33000: system.cpu0:
> >>>>
> >>>> ..............
> >>>> .........
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   39500: system.cpu0.fetch: Running stage.
> >>>>   39500: system.cpu0.fetch: Attempting to fetch from [tid:0]
> >>>>   39500: system.cpu0.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x120140d00.
> >>>>   39500: system.cpu0.fetch: Fetch: Doing instruction read.
> >>>>   39500: system.cpu0.fetch: [tid:0]: Doing cache access.
> >>>>   39500: system.cpu0.decode: Processing [tid:0]
> >>>>   39500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>   39500: system.cpu0.decode: [tid:0]: Sending instruction to rename.
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:5]
> with PC
> >>>> 0x120140cf0
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:6]
> with PC
> >>>> 0x120140cf4
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:7]
> with PC
> >>>> 0x120140cf8
> >>>>   39500: system.cpu0.decode: [tid:0]: Processing instruction [sn:8]
> with PC
> >>>> 0x120140cfc
> >>>>   39500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>>   39500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>>   39500: system.cpu0.commit: [tid:0]: ROB has 0 insts & 192 free
> entries.
> >>>>   39500: system.cpu0: Scheduling next tick!
> >>>>   40000: system.cpu0:
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>>   40000: system.cpu1.fetch: Running stage.
> >>>>   40000: system.cpu1.fetch: There are no more threads available to
> fetch
> >>>> from.
> >>>>   40000: system.cpu1.decode: Processing [tid:0]
> >>>>   40000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>>   40000: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>>   40000: system.cpu1.commit: Squashing from trap, restarting at PC
> >>>> 0x12000dde8
> >>>>   40000: system.cpu1.commit: [tid:0]: Instruction [sn:5] PC
> 0x12000dde8 is
> >>>> head of ROB and ready to commit
> >>>>   40000: system.cpu1.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> >>>>   40000: system.cpu1: Scheduling next tick!
> >>>>   40500: system.cpu1.fetch-iport: Received timing
> >>>>   40500: system.cpu1:
> >>>> ....................
> >>>> .................
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>> 1243500: system.cpu0.fetch: Running stage.
> >>>> 1243500: system.cpu0.fetch: There are no more threads available to
> fetch
> >>>> from.
> >>>> 1243500: system.cpu0.decode: Processing [tid:0]
> >>>> 1243500: system.cpu0.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>> 1243500: system.cpu0.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>> 1243500: system.cpu0.commit: Getting instructions from Rename stage.
> >>>> 1243500: system.cpu0.commit: Trying to commit instructions in the ROB.
> >>>> 1243500: system.cpu0.commit: [tid:0]: Instruction [sn:369] PC
> 0x120106160 is
> >>>> head of ROB and ready to commit
> >>>> 1243500: system.cpu0.commit: [tid:0]: ROB has 1 insts & 191 free
> entries.
> >>>> 1243500: system.cpu0: Scheduling next tick!
> >>>> 1244000: system.cpu0:
> >>>>
> >>>> .........
> >>>> ........
> >>>> FullO3CPU: Ticking main, FullO3CPU.
> >>>> 4001000: system.cpu1.fetch: [tid:0]: Done squashing, switching to
> running.
> >>>> 4001000: system.cpu1.fetch: Running stage.
> >>>> 4001000: system.cpu1.fetch: Attempting to fetch from [tid:0]
> >>>> 4001000: system.cpu1.fetch: [tid:0]: Attempting to translate and read
> >>>> instruction, starting at PC 0x12000dde8.
> >>>> 4001000: system.cpu1.fetch: [tid:0]: Blocked, need to handle the trap.
> >>>> 4001000: system.cpu1.fetch: [tid:0]: fault (itbmiss) detected @ PC
> >>>> 0x12000dde84001000: system.cpu1.decode: Processing [tid:0]
> >>>> 4001000: system.cpu1.decode: [tid:0]: Done squashing, switching to
> running.
> >>>> 4001000: system.cpu1.decode: [tid:0]: Not blocked, so attempting to
> run
> >>>> stage.
> >>>> 4001000: system.cpu1.decode: [tid:0] Nothing to do, breaking out
> early.
> >>>> 4001000: system.cpu1.commit: Getting instructions from Rename stage.
> >>>> 4001000: system.cpu1.commit: Trying to commit instructions in the ROB.
> >>>> 4001000: system.cpu1.commit: [tid:0]: ROB has 0 insts & 192 free
> entries.
> >>>> 4001000: system.cpu1: Scheduling next tick!
> >>>> 4001500: system.cpu1:
> >>>> ......
> >>>>
> >>>> Exiting @ cycle 4002500 because all threads reached the max
> instruction
> >>>> count
> >>>>
> >>>> I am running two different benchmarks in those two cores. I have
> included
> >>>> the long list of the debug. The PC of CPU1 is always the same. Why the
> >>>> instruction at CPU1 is always being squashed?
> >>>>
> >>>> regards,
> >>>> prannav
> >>>>
> >>>>
> >>>>
> >>>> On Sun, Jun 22, 2008 at 10:49 PM, Korey Sewell <[EMAIL PROTECTED]>
> wrote:
> >>>>
> >>>>
> >>>>> Turn on the trace flags for O3CPU and see what it says.... Inside the
> >>>>> "o3" folder there should be a *.py file that list the trace-flags.
> >>>>>
> >>>>> If you run the simulation with all the O3 trace-flags on you should
> be
> >>>>> able to see what's happening. I suggest only running for maybe 100
> >>>>> ticks so that you dont get overloaded with text.
> >>>>>
> >>>>> From the debug output, you should see each CPU get initialized,
> fetch,
> >>>>> and all that. I'm guessing what happens is that CPU1 starts up, sees
> >>>>> no process to latch onto, and then sleeps, but the debug output
> should
> >>>>> verify that for you.
> >>>>>
> >>>>> Simply looking at the instruction commits (Exec) wont help you in
> this
> >>>>> case since you're saying that there is no instructions commit. You'll
> >>>>> at least need to turn on "Fetch Decode Exec" and all that so you can
> >>>>> get a detailed view.
> >>>>>
> >>>>>
> >>>>> --
> >>>>> ----------
> >>>>> Korey L Sewell
> >>>>> Graduate Student - PhD Candidate
> >>>>> Computer Science & Engineering
> >>>>> University of Michigan
> >>>>> _______________________________________________
> >>>>> m5-users mailing list
> >>>>> [email protected]
> >>>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >>>>>
> >>>>>
> >>>> _______________________________________________
> >>>> m5-users mailing list
> >>>> [email protected]
> >>>> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >>>>
> >>>>
> >>>>
> >>>
> >>>
> >>>
> >> _______________________________________________
> >> m5-users mailing list
> >> [email protected]
> >> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
> >>
> >>
> >
> >
> >
>
> _______________________________________________
> m5-users mailing list
> [email protected]
> http://m5sim.org/cgi-bin/mailman/listinfo/m5-users
>

_______________________________________________
m5-users mailing list
[email protected]
http://m5sim.org/cgi-bin/mailman/listinfo/m5-users

Re: [m5-users] running multi core- instruction count in all cores is zero except 1... why?

Reply via email to