Re: [marss86-devel] Checkpoint without switching to simulation?

tstache1 Wed, 13 Mar 2013 11:53:23 -0700

I'm trying to reproduce your issue, but I'm not able to do so...

I was able to boot this image with c=16:
http://bertha.cs.binghamton.edu/downloads/ubuntu-natty.tar.bz2


In fact, I didn't even need the rootdelay argument until I configured
MARSS for >16 cores. Could you try that image, and maybe consider
upgrading to it if it works for you?

Tyler

> I can see the rootdelay parameter did its thing because I see the kernel
> saying "waiting 200sec before mounting root partition" or whatever. After
> a
> few minutes I get here:
>
>  * Filesystem type 'fusectl' is not supported. Skipping mount.
>
>  * Starting kernel event manager...                                      [
> OK ]
>  * Loading hardware drivers...
>
> input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input2
>
> ACPI: Power Button [PWRF]
>
> processor LNXCPU:00: registered as cooling_device0
>
> processor LNXCPU:01: registered as cooling_device1
>
> processor LNXCPU:02: registered as cooling_device2
>
> processor LNXCPU:03: registered as cooling_device3
>
> processor LNXCPU:04: registered as cooling_device4
>
> processor LNXCPU:05: registered as cooling_device5
>
> processor LNXCPU:06: registered as cooling_device6
>
> processor LNXCPU:07: registered as cooling_device7
>
> processor LNXCPU:08: registered as cooling_device8
>
> processor LNXCPU:09: registered as cooling_device9
>
> processor LNXCPU:0a: registered as cooling_device10
>
> processor LNXCPU:0b: registered as cooling_device11
>
> processor LNXCPU:0c: registered as cooling_device12
>
> processor LNXCPU:0d: registered as cooling_device13
>
> processor LNXCPU:0e: registered as cooling_device14
>
> processor LNXCPU:0f: registered as cooling_device15
>
>                                                                          [
> OK ]
>
> and thats' about where it gets really stuck for me. It just sits and waits
> for a really long time
>
> On Wed, Mar 13, 2013 at 12:10 PM, Paul Rosenfeld
> <[email protected]>wrote:
>
>> Just for reference, here's the menu.lst entry from my menu.lst:
>>
>> title           Ubuntu 9.04, kernel 2.6.31.4qemu
>> uuid            ab838715-9cb7-4299-96f7-459437993bde
>> kernel          /boot/vmlinuz-2.6.31.4qemu root=/dev/hda1 ro single 1
>> rootdelay=200
>>
>> Everything look OK here?
>>
>>
>> On Wed, Mar 13, 2013 at 11:49 AM, Paul Rosenfeld
>> <[email protected]>wrote:
>>
>>> Nope, nothing fancy. A few commits behind HEAD on master with some
>>> modifications to the simulation (but nothing changed in qemu). Added
>>> rootdelay=200 and it hangs right after freeing kernel memory and takes
>>> a
>>> really long time to get the disks mounted and the devices loaded.
>>>
>>> I'll double check my image to make sure that the rootdelay made it into
>>> the correct menu.lst entry.
>>>
>>> The odd thing is that once I get to the login prompt (if I'm doing an
>>> interactive session), qemu is perfectly responsive. Maybe I'll try to
>>> boot
>>> a raw image instead of qcow2 to see if that changes anything.
>>>
>>>
>>>
>>> On Wed, Mar 13, 2013 at 11:43 AM, <[email protected]> wrote:
>>>
>>>> I forgot to add --
>>>>
>>>> The only issue that I can see with your approach is keeping qemu in
>>>> sync
>>>> with ptlsim. If you look at `ptl_add_phys_memory_mapping` in
>>>> qemu/cputlb.c, you'll notice that qemu feeds page mappings to ptlsim
>>>> even
>>>> when ptlsim isn't active.
>>>>
>>>> I could be wrong here, but I believe you'll need to update that
>>>> mapping
>>>> once you boot a checkpoint.
>>>>
>>>> We'd be more than willing to help you in whatever way to can to get
>>>> something like this committed to master.
>>>>
>>>> Tyler
>>>>
>>>> > Paul,
>>>> >
>>>> > Adding rootdelay to menu.lst is the same thing as passing it as a
>>>> kernel
>>>> > argument, so yes.... no difference.
>>>> >
>>>> > As Avadh mentioned, 5 hours is a _long_ time to get things going. I
>>>> got a
>>>> > 16+ core instance to get to a prompt in a few minutes last time I
>>>> tried.
>>>> > Admittedly, I never tried to create a checkpoint when I had that
>>>> many
>>>> > cores... is the checkpointing taking a long time, or are you waiting
>>>> that
>>>> > long just to boot the system?
>>>> >
>>>> > What value are you passing to rootdelay? You're building master
>>>> without
>>>> > debugging or anything fancy, right?
>>>> >
>>>> > Tyler
>>>> >
>>>> >> I added rootdelay to menu.lst (and I think in grub1 you don't have
>>>> to
>>>> do
>>>> >> anything else, right?)
>>>> >>
>>>> >> I'm using the old parsec images with a reasonably ancient ubuntu.
>>>> Should
>>>> >> I
>>>> >> be using something more recent?
>>>> >>
>>>> >>
>>>> >>
>>>> >> On Wed, Mar 13, 2013 at 11:13 AM, avadh patel <[email protected]>
>>>> >> wrote:
>>>> >>
>>>> >>>
>>>> >>>
>>>> >>>
>>>> >>> On Tue, Mar 12, 2013 at 1:19 PM, Paul Rosenfeld
>>>> >>> <[email protected]>wrote:
>>>> >>>
>>>> >>>> Hello Everyone,
>>>> >>>>
>>>> >>>> It's been a while but I'm starting to use MARSSx86 for
>>>> simulations
>>>> >>>> again.
>>>> >>>> I've been trying to run 16 core simulations and am finding that
>>>> the
>>>> >>>> boot
>>>> >>>> time is very long (~5 hours to make a checkpoint). This makes it
>>>> quite
>>>> >>>> frustrating when I accidentally set the wrong parameters inside
>>>> the
>>>> >>>> workload or run the wrong workload or any number of other
>>>> mistakes I
>>>> >>>> tend
>>>> >>>> to make.
>>>> >>>>
>>>> >>>> Booting 16 core should not take that long.  Did you try adding
>>>> >>> 'rootdelay' option to kernel command line? It significantly
>>>> improves
>>>> >>> kernel
>>>> >>> boot time in QEMU for large number of cores.
>>>> >>>
>>>> >>> - Avadh
>>>> >>>
>>>> >>>
>>>> >>>> So I was thinking -- what if I made a post boot but
>>>> >>>> pre-simulation-switch
>>>> >>>> checkpoint (i.e., checkpoint but stay in emulation mode). That
>>>> way,
>>>> >>>> the
>>>> >>>> create_checkpoints.py script could just launch the system from
>>>> the
>>>> >>>> post-boot snapshot and proceed to launch the workloads which
>>>> would
>>>> >>>> have
>>>> >>>> the
>>>> >>>> PTL calls that would then make the actual simulation checkpoints.
>>>> Not
>>>> >>>> only
>>>> >>>> would that reduce the time it took to create a lot of
>>>> checkpoints,
>>>> but
>>>> >>>> also
>>>> >>>> if I screwed up a checkpoint, I could just delete it, boot the
>>>> >>>> post-boot
>>>> >>>> snapshot, tweak the workload, and re-checkpoint the simulation.
>>>> >>>>
>>>> >>>> I think marss checkpoints piggyback on qemu's snapshot
>>>> capabilities,
>>>> >>>> but
>>>> >>>> is there some downside to this approach here that I'm missing?
>>>> >>>>
>>>> >>>> Thanks,
>>>> >>>> Paul
>>>> >>>>
>>>> >>>> _______________________________________________
>>>> >>>> http://www.marss86.org
>>>> >>>> Marss86-Devel mailing list
>>>> >>>> [email protected]
>>>> >>>> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>>> >>>>
>>>> >>>>
>>>> >>>
>>>> >> _______________________________________________
>>>> >> http://www.marss86.org
>>>> >> Marss86-Devel mailing list
>>>> >> [email protected]
>>>> >> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>>> >>
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > http://www.marss86.org
>>>> > Marss86-Devel mailing list
>>>> > [email protected]
>>>> > https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>>>> >
>>>>
>>>>
>>>
>>
> _______________________________________________
> http://www.marss86.org
> Marss86-Devel mailing list
> [email protected]
> https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel
>


_______________________________________________
http://www.marss86.org
Marss86-Devel mailing list
[email protected]
https://www.cs.binghamton.edu/mailman/listinfo/marss86-devel

Re: [marss86-devel] Checkpoint without switching to simulation?

Reply via email to