Re: [marss86-devel] Segmentation fault in simulation with larger number of cores

Ankita Garg Wed, 20 Mar 2013 16:50:47 -0700

Thanks to the tips in this thread, I was able to boot qemu with 16 and more
cores fairly quickly. However, simulation with 32 cores resulted in the
following segmentation fault:


Program terminated with signal 11, Segmentation fault.
#0  superstl::SelfHashtable<RIPVirtPhys, BasicBlock, 16384,
BasicBlockHashtableLinkManager, superstl::HashtableKeyManager<RIPVirtPhys,
32768> >::Iterator::next (this=0x1347ce0, context_id=<value optimized out>)
at ptlsim/lib/superstl.h:2610
2610              link = link->next;
(gdb) bt
#0  superstl::SelfHashtable<RIPVirtPhys, BasicBlock, 32768,
BasicBlockHashtableLinkManager, superstl::HashtableKeyManager<RIPVirtPhys,
32768> >::Iterator::next (this=0x1347ce0, context_id=<value optimized out>)
at ptlsim/lib/superstl.h:2610
#1  BasicBlockCache::flush (this=0x1347ce0, context_id=<value optimized
out>) at ptlsim/build/x86/decode-core.cpp:1931
#2  0x000000000063fb29 in ptl_flush_bbcache (context_id=3 '\003') at
ptlsim/build/core/basecore.cpp:30
#3  0x000000000053b99a in tlb_flush (env=0x29cb190, flush_global=<value
optimized out>) at qemu/exec.c:1964
#4  0x0000000000795296 in assist_write_cr3 (ctx=...) at
ptlsim/build/x86/decode-complex.cpp:708
#5  0x00000000006cdadd in ooo_4::ThreadContext::handle_barrier
(this=0xdc6b710) at ptlsim/build/core/ooo-core/ooo.cpp:1296
#6  0x00000000006d2526 in ooo_4::OooCore::runcycle (this=0xdc55ee0,
none=<value optimized out>)
    at ptlsim/build/core/ooo-core/ooo.cpp:840
#7  0x00000000007697b5 in BaseMachine::run (this=0x12fd1a0, config=<value
optimized out>) at ptlsim/build/sim/machine.cpp:447
#8  0x000000000077a32f in ptl_simulate () at
ptlsim/build/sim/ptlsim.cpp:1426
#9  0x00000000005d5ba7 in sim_cpu_exec () at qemu/cpu-exec.c:310
#10 0x000000000042280d in main_loop (argc=<value optimized out>,
argv=<value optimized out>, envp=<value optimized out>)
    at qemu/vl.c:1450
#11 main (argc=<value optimized out>, argv=<value optimized out>,
envp=<value optimized out>) at qemu/vl.c:3189

The pointer link at ptlsim/lib/superstl.h:2610 is not null, but looks like
it is corrupted. Has anyone come across this before ? Any thoughts on the
above fault ?



On Thu, Mar 14, 2013 at 11:08 AM, Paul Rosenfeld <[email protected]>wrote:

> Just tried the new image and I'm happy to report that it took only 2.5
> minutes to get to a login prompt.
>
> I'll just move the benchmarks into this new image and retire our old
> images.
>
> Thanks,
> Paul
>
>
> On Thu, Mar 14, 2013 at 11:16 AM, Paul Rosenfeld <[email protected]>wrote:
>
>> Alrighty, I will try that later today and report back. I'm thinking maybe
>> that older image has a kernel with some crazy driver that's taking forever
>> to emulate per-CPU and that's what is causing it to boot for so long.
>>
>> Thanks,
>> Paul
>>
>>
>> On Wed, Mar 13, 2013 at 2:52 PM, <[email protected]> wrote:
>>
>>> I'm trying to reproduce your issue, but I'm not able to do so...
>>>
>>> I was able to boot this image with c=16:
>>> http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2
>>>
>>> In fact, I didn't even need the rootdelay argument until I configured
>>> MARSS for >16 cores. Could you try that image, and maybe consider
>>> upgrading to it if it works for you?
>>>
>>> Tyler
>>>
>>> > I can see the rootdelay parameter did its thing because I see the
>>> kernel
>>> > saying "waiting 200sec before mounting root partition" or whatever.
>>> After
>>> > a
>>> > few minutes I get here:
>>> >
>>> >  * Filesystem type 'fusectl' is not supported. Skipping mount.
>>> >
>>> >  * Starting kernel event manager...
>>>    [
>>> > OK ]
>>> >  * Loading hardware drivers...
>>> >
>>> > input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
>>> >
>>> > ACPI: Power Button [PWRF]
>>> >
>>> > processor LNXCPU:00: registered as cooling_device0
>>> >
>>> > processor LNXCPU:01: registered as cooling_device1
>>> >
>>> > processor LNXCPU:02: registered as cooling_device2
>>> >
>>> > processor LNXCPU:03: registered as cooling_device3
>>> >
>>> > processor LNXCPU:04: registered as cooling_device4
>>> >
>>> > processor LNXCPU:05: registered as cooling_device5
>>> >
>>> > processor LNXCPU:06: registered as cooling_device6
>>> >
>>> > processor LNXCPU:07: registered as cooling_device7
>>> >
>>> > processor LNXCPU:08: registered as cooling_device8
>>> >
>>> > processor LNXCPU:09: registered as cooling_device9
>>> >
>>> > processor LNXCPU:0a: registered as cooling_device10
>>> >
>>> > processor LNXCPU:0b: registered as cooling_device11
>>> >
>>> > processor LNXCPU:0c: registered as cooling_device12
>>> >
>>> > processor LNXCPU:0d: registered as cooling_device13
>>> >
>>> > processor LNXCPU:0e: registered as cooling_device14
>>> >
>>> > processor LNXCPU:0f: registered as cooling_device15
>>> >
>>> >
>>>    [
>>> > OK ]
>>> >
>>> > and thats' about where it gets really stuck for me. It just sits and
>>> waits
>>> > for a really long time
>>> >
>>> > On Wed, Mar 13, 2013 at 12:10 PM, Paul Rosenfeld
>>> > <[email protected]>wrote:
>>> >
>>> >> Just for reference, here's the menu.lst entry from my menu.lst:
>>> >>
>>> >> title           Ubuntu 9.04, kernel 2.6.31.4qemu
>>> >> uuid            ab838715-9cb7-4299-96f7-459437993bde
>>> >> kernel          /boot/vmlinuz-2.6.31.4qemu root=/dev/hda1 ro single 1
>>> >> rootdelay=200
>>> >>
>>> >> Everything look OK here?
>>> >>
>>> >>
>>> >> On Wed, Mar 13, 2013 at 11:49 AM, Paul Rosenfeld
>>> >> <[email protected]>wrote:
>>> >>
>>> >>> Nope, nothing fancy. A few commits behind HEAD on master with some
>>> >>> modifications to the simulation (but nothing changed in qemu). Added
>>> >>> rootdelay=200 and it hangs right after freeing kernel memory and
>>> takes
>>> >>> a
>>> >>> really long time to get the disks mounted and the devices loaded.
>>> >>>
>>> >>> I'll double check my image to make sure that the rootdelay made it
>>> into
>>> >>> the correct menu.lst entry.
>>> >>>
>>> >>> The odd thing is that once I get to the login prompt (if I'm doing an
>>> >>> interactive session), qemu is perfectly responsive. Maybe I'll try to
>>> >>> boot
>>> >>> a raw image instead of qcow2 to see if that changes anything.
>>> >>>
>>> >>>
>>> >>>
>>> >>> On Wed, Mar 13, 2013 at 11:43 AM, <[email protected]>
>>> wrote:
>>> >>>
>>> >>>> I forgot to add --
>>> >>>>
>>> >>>> The only issue that I can see with your approach is keeping qemu in
>>> >>>> sync
>>> >>>> with ptlsim. If you look at `ptl_add_phys_memory_mapping` in
>>> >>>> qemu/cputlb.c, you'll notice that qemu feeds page mappings to ptlsim
>>> >>>> even
>>> >>>> when ptlsim isn't active.
>>> >>>>
>>> >>>> I could be wrong here, but I believe you'll need to update that
>>> >>>> mapping
>>> >>>> once you boot a checkpoint.
>>> >>>>
>>> >>>> We'd be more than willing to help you in whatever way to can to get
>>> >>>> something like this committed to master.
>>> >>>>
>>> >>>> Tyler
>>> >>>>
>>> >>>> > Paul,
>>> >>>> >
>>> >>>> > Adding rootdelay to menu.lst is the same thing as passing it as a
>>> >>>> kernel
>>> >>>> > argument, so yes.... no difference.
>>> >>>> >
>>> >>>> > As Avadh mentioned, 5 hours is a _long_ time to get things going.
>>> I
>>> >>>> got a
>>> >>>> > 16+ core instance to get to a prompt in a few minutes last time I
>>> >>>> tried.
>>> >>>> > Admittedly, I never tried to create a checkpoint when I had that
>>> >>>> many
>>> >>>> > cores... is the checkpointing taking a long time, or are you
>>> waiting
>>> >>>> that
>>> >>>> > long just to boot the system?
>>> >>>> >
>>> >>>> > What value are you passing to rootdelay? You're building master
>>> >>>> without
>>> >>>> > debugging or anything fancy, right?
>>> >>>> >
>>> >>>> > Tyler
>>> >>>> >
>>> >>>> >> I added rootdelay to menu.lst (and I think in grub1 you don't
>>> have
>>> >>>> to
>>> >>>> do
>>> >>>> >> anything else, right?)
>>> >>>> >>
>>> >>>> >> I'm using the old parsec images with a reasonably ancient ubuntu.
>>> >>>> Should
>>> >>>> >> I
>>> >>>> >> be using something more recent?
>>> >>>> >>
>>> >>>> >>
>>> >>>> >>
>>> >>>> >> On Wed, Mar 13, 2013 at 11:13 AM, avadh patel <
>>> [email protected]>
>>> >>>> >> wrote:
>>> >>>> >>
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>> On Tue, Mar 12, 2013 at 1:19 PM, Paul Rosenfeld
>>> >>>> >>> <[email protected]>wrote:
>>> >>>> >>>
>>> >>>> >>>> Hello Everyone,
>>> >>>> >>>>
>>> >>>> >>>> It's been a while but I'm starting to use MARSSx86 for
>>> >>>> simulations
>>> >>>> >>>> again.
>>> >>>> >>>> I've been trying to run 16 core simulations and am finding that
>>> >>>> the
>>> >>>> >>>> boot
>>> >>>> >>>> time is very long (~5 hours to make a checkpoint). This makes
>>> it
>>> >>>> quite
>>> >>>> >>>> frustrating when I accidentally set the wrong parameters inside
>>> >>>> the
>>> >>>> >>>> workload or run the wrong workload or any number of other
>>> >>>> mistakes I
>>> >>>> >>>> tend
>>> >>>> >>>> to make.
>>> >>>> >>>>
>>> >>>> >>>> Booting 16 core should not take that long.  Did you try adding
>>> >>>> >>> 'rootdelay' option to kernel command line? It significantly
>>> >>>> improves
>>> >>>> >>> kernel
>>> >>>> >>> boot time in QEMU for large number of cores.
>>> >>>> >>>
>>> >>>> >>> - Avadh
>>> >>>> >>>
>>> >>>> >>>
>>> >>>> >>>> So I was thinking -- what if I made a post boot but
>>> >>>> >>>> pre-simulation-switch
>>> >>>> >>>> checkpoint (i.e., checkpoint but stay in emulation mode). That
>>> >>>> way,
>>> >>>> >>>> the
>>> >>>> >>>> create_checkpoints.py script could just launch the system from
>>> >>>> the
>>> >>>> >>>> post-boot snapshot and proceed to launch the workloads which
>>> >>>> would
>>> >>>> >>>> have
>>> >>>> >>>> the
>>> >>>> >>>> PTL calls that would then make the actual simulation
>>> checkpoints.
>>> >>>> Not
>>> >>>> >>>> only
>>> >>>> >>>> would that reduce the time it took to create a lot of
>>> >>>> checkpoints,
>>> >>>> but
>>> >>>> >>>> also
>>> >>>> >>>> if I screwed up a checkpoint, I could just delete it, boot the
>>> >>>> >>>> post-boot
>>> >>>> >>>> snapshot, tweak the workload, and re-checkpoint the simulation.
>>> >>>> >>>>
>>> >>>> >>>> I think marss checkpoints piggyback on qemu's snapshot
>>> >>>> capabilities,
>>> >>>> >>>> but
>>> >>>> >>>> is there some downside to this approach here that I'm missing?
>>> >>>> >>>>
>>> >>>> >>>> Thanks,
>>> >>>> >>>> Paul
>>> >>>> >>>>
>>> >>>> >>>> _______________________________________________
>>> >>>> >>>> http://www.marss86.org
>>> >>>> >>>> Marss86-Devel mailing list
>>> >>>> >>>> [email protected]
>>> >>>> >>>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>> >>>> >>>>
>>> >>>> >>>>
>>> >>>> >>>
>>> >>>> >> _______________________________________________
>>> >>>> >> http://www.marss86.org
>>> >>>> >> Marss86-Devel mailing list
>>> >>>> >> [email protected]
>>> >>>> >> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>> >>>> >>
>>> >>>> >
>>> >>>> >
>>> >>>> >
>>> >>>> > _______________________________________________
>>> >>>> > http://www.marss86.org
>>> >>>> > Marss86-Devel mailing list
>>> >>>> > [email protected]
>>> >>>> > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>> >>>> >
>>> >>>>
>>> >>>>
>>> >>>
>>> >>
>>> > _______________________________________________
>>> > http://www.marss86.org
>>> > Marss86-Devel mailing list
>>> > [email protected]
>>> > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>> >
>>>
>>>
>>
>
> _______________________________________________
> http://www.marss86.org
> Marss86-Devel mailing list
> [email protected]
> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>
>


-- 
Regards,
Ankita
Graduate Student
Department of Computer Science
University of Texas at Austin

_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] Segmentation fault in simulation with larger number of cores

Reply via email to