And actually,

Looking at the output, there is definitely somethign in the hypervisor that
relies on cpu numbers. Because it is the hypervisor that prints out "cpu1
failed to start", BUT the hardware description file says it should be 'cpu4'
which is why we swizzle 4 to 1. So, clearly switching the cpu number
confuses the hypervisor because now there are two different numbers for the
second cpu. So, I will be looking into hypervisor.

Polina

On Thu, Mar 12, 2009 at 11:29 AM, Polina Dudnik <pdud...@gmail.com> wrote:

> Ali,
>
> I am a little worried that NumInterruptTypes is set to seven. I ran into an
> assert which was indicating that the received interrupt was beyond
> NumInterruptTypes, and I ignored it to see what happens. But, I looked
> through catchexc.fth in OpenSparc and there are many more exceptions than 7.
> So, I am clearly ignoring some important interrupts, like 'Fast Data Access
> MMU Miss' or 'VA Watchpoint'. I am not sure what to do about it, but I am
> pretty sure that must be the reason I get the output below.
>
>
> panic[cpu0]/thread=180e000: cpu1 failed to start (2)
>
> 000000000180b8b0 unix:start_cpu+124 (1, 10166a0, 0, 181cc00, 2, 0)
>   %l0-3: 0000000000000000 0000000000000000 000000000181cc00
> 0000000001072000
>   %l4-7: 0000030001385a20 000003000134a0f8 0000000000000004
> 000000000181a800
> 000000000180b960 unix:start_other_cpus+194 (1813000, 2, 1, 0, 1836f28,
> 1839400)
>   %l0-3: 000000000183c5f0 0000000000000001 ffffffffffffffff
> 0000000000000002
>   %l4-7: 0000000000000001 00000000018b1400 0000000001072000
> 0000000001016400
> 000000000180ba10 genunix:main+1d4 (18a9938, 18a5800, 1836340, 1861c00, 0,
> 18abc00)
>   %l0-3: 0000000070002000 0000000000000001 00000000018abc00
> 0000000000000002
>   %l4-7: 00000000018aca28 00000000018ac800 00000000018a9948
> 00000000018a9800
>
> syncing file systems... done
> skipping system dump - no dump device configured
> rebooting...
> panic - kernel: prom_reboot: reboot call returned!
> Program terminated
> {0} ok boot disk
> ERROR: Last Trap: Fast Data Access MMU Miss
>
>
>
>
>
> On Tue, Mar 10, 2009 at 10:53 PM, Ali Saidi <sa...@umich.edu> wrote:
>
>> This is definitely a possibility. While the format might support any
>> valid combination, it's possible that the hypervisor code itself
>> assumes that there are four threads per core and so another CPU must
>> be id 4. You might try changing the numthreads to 2 and keep numcores
>> at 1.
>>
>> Ali
>>
>> On Mar 10, 2009, at 2:44 PM, Polina Dudnik wrote:
>>
>> > Yeah, I kinda though that too, more along the lines that other
>> > OpenSparc binaries are dependent somehow on the hypervisor binary
>> > and should also be recompiled. So, I did post to their forum, but
>> > they are not responding. So, I will look through OpenSparc and see
>> > if maybe they are hardcoding this.
>> >
>> > On Tue, Mar 10, 2009 at 1:30 PM, Gabriel Michael Black <
>> gbl...@eecs.umich.edu
>> > > wrote:
>> > I was thinking about that the other day, and maybe OpenSparc is
>> > configuring the first thread of each core? Those could be numerically
>> > 4 apart, one per thread, which I believe is what you said is
>> > happening.
>> >
>> > Gabe
>> >
>> > Quoting Polina Dudnik <pdud...@gmail.com>:
>> >
>> > > So, just to keep everyone posted: I did what Gabe suggests and
>> > returned
>> > > whenever a thread is unallocated and got the same seg fault I was
>> > getting in
>> > > stable release where the processor numbers had to be swizzled.
>> > Meanwhile, in
>> > > the stable release where I did swizzle the numbers I am getting an
>> > assertion
>> > > error which tells me that I am seeing an interrupt number that is
>> > out of
>> > > range.
>> > >
>> > > In general I think it is worthwhile fixing the cpu number
>> > assignment at the
>> > > root otherwise I will keep seeing seg faults that require
>> > swizzling. So, I
>> > > am trying to understand in OpenSparc why changing the cpu numbers
>> > in the
>> > > hypervisor doesn't fix the problem.
>> > >
>> > > Polina
>> > >
>> > > On Fri, Mar 6, 2009 at 11:24 AM, Polina Dudnik <pdud...@gmail.com>
>> > wrote:
>> > >
>> > >>
>> > >>
>> > >> On Thu, Mar 5, 2009 at 5:10 PM, Gabriel Michael Black <
>> > >> gbl...@eecs.umich.edu> wrote:
>> > >>
>> > >>> The change is simple enough that I'll just describe it. This deals
>> > >>> solely with the simple CPU, so if your trying to use O3, for
>> > example,
>> > >>> it won't help you directly. The code here:
>> > >>>
>> http://repo.m5sim.org/m5/file/886da6fa6d4a/src/cpu/simple/base.cc#l307
>> > >>> should return if the thread is suspended -or- unallocated. After
>> > you
>> > >>> change that, I think you'll also run into an assert in the CPU.
>> > I just
>> > >>> got rid of the assert and haven't had any problems, but that
>> > might not
>> > >>> be the right thing to do.
>> > >>>
>> > >>> Gabe
>> > >>
>> > >>
>> > >> I looked at l307 and I don't think it should return if the thread
>> > is
>> > >> suspended. It should get activated if the thread is suspended,
>> > isn't that
>> > >> right or am I missing something?
>> > >>
>> > >> Polina
>> > >>
>> > >>
>> > >>>
>> > >>>
>> > >>> Quoting Polina Dudnik <pdud...@gmail.com>:
>> > >>>
>> > >>> > Oh, I see. Do you think you can distribute the partial patch
>> > you have?
>> > >>> >
>> > >>> > Thank you.
>> > >>> >
>> > >>> > Polina
>> > >>> >
>> > >>> > On Thu, Mar 5, 2009 at 4:48 PM, Gabriel Michael Black <
>> > >>> gbl...@eecs.umich.edu
>> > >>> >> wrote:
>> > >>> >
>> > >>> >> Quoting Polina Dudnik <pdud...@gmail.com>:
>> > >>> >>
>> > >>> >> > On Thu, Mar 5, 2009 at 3:38 PM, Gabriel Michael Black <
>> > >>> >> gbl...@eecs.umich.edu
>> > >>> >> >> wrote:
>> > >>> >> >
>> > >>> >> >> There's actually a bug in the CPU wakeup code which
>> > prevents any CPU
>> > >>> >> >> that isn't activated and then suspended, like SPARCs APs
>> > which are
>> > >>> >> >> suspended directly, from waking up on interrupts, etc. I
>> > have a
>> > >>> >> >> partial fix which I've been using to work around the
>> > problem, but we
>> > >>> >> >> need to come up with a full solution. I don't know if this
>> > is what
>> > >>> the
>> > >>> >> >> problem is, but it sounds like it could be.
>> > >>> >> >>
>> > >>> >> >> Gabe
>> > >>> >> >
>> > >>> >> >
>> > >>> >> > Are you talking about the seg fault in m5-stable that I
>> > get? Or the
>> > >>> CPU
>> > >>> >> ids?
>> > >>> >> >
>> > >>> >> > Polina
>> > >>> >> >
>> > >>> >> >
>> > >>> >>
>> > >>> >> I was talking about the hang Ali described. If the BP is
>> > waiting for
>> > >>> >> an AP to tell it it's alive and the AP never wakes up, the
>> > system will
>> > >>> >> likely hang. I ran into that problem in X86_FS.
>> > >>> >>
>> > >>> >> Gabe
>> > >>> >>
>> > >>> >> _______________________________________________
>> > >>> >> m5-dev mailing list
>> > >>> >> m5-dev@m5sim.org
>> > >>> >> http://m5sim.org/mailman/listinfo/m5-dev
>> > >>> >>
>> > >>> >
>> > >>>
>> > >>>
>> > >>> _______________________________________________
>> > >>> m5-dev mailing list
>> > >>> m5-dev@m5sim.org
>> > >>> http://m5sim.org/mailman/listinfo/m5-dev
>> > >>>
>> > >>
>> > >>
>> > >
>> >
>> >
>> > _______________________________________________
>> > m5-dev mailing list
>> > m5-dev@m5sim.org
>> > http://m5sim.org/mailman/listinfo/m5-dev
>> >
>> > _______________________________________________
>> > m5-dev mailing list
>> > m5-dev@m5sim.org
>> > http://m5sim.org/mailman/listinfo/m5-dev
>>
>> _______________________________________________
>> m5-dev mailing list
>> m5-dev@m5sim.org
>> http://m5sim.org/mailman/listinfo/m5-dev
>>
>
>
_______________________________________________
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev

Reply via email to