Re: [PATCH -mm] kexec jump -v9

2008-03-12 Thread Pavel Machek
Hi!

> > > > The features of this patch can be used for as follow:
> > > > 
> > > > - A simple hibernation implementation without ACPI support. You can
> > > >   kexec a hibernating kernel, save the memory image of original system
> > > >   and shutdown the system. When resuming, you restore the memory image
> > > >   of original system via ordinary kexec load then jump back.
> > > > 
> > > 
> > > The main usage of this functionality is for hibernation. I am not sure
> > > what has been the conclusion of previous discussions.
> > > 
> > > Rafael/Pavel, does the approach of doing hibernation using a separate
> > > kernel holds promise?
> > 
> > Its certainly "more traditional" method of doing hibernation than
> > tricks swsusp currently plays.
> 
> What exactly are you referring to?

Well, traditionaly it is 'A saves B to disk' (like bootloader saves
kernel&userspace). In swsusp we have 'kernel saves itself'... which
works, too, but is pretty different design.

> > Now, I guess they are some difficulties, like ACPI integration, and
> > some basic drawbacks, like few seconds needed to boot second kernel
> > during suspend.
> > 
> > ...OTOH this is probably only chance to eliminate freezer from
> > swsusp...
> 
> Some facts:
> 
> * There's no reason to think that we can't use this same mechanism for
>   hibernation (the only difficulty seems to be the handling of devices used 
> for
>   saving the image).

Ok, at least kexec makes handling of suspend device easier.

> Moreover, if this had been the _only_ argument for the $subject functionality,
> I'd have been against it.

Fortunately its not the only one :-).

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: Fwd: Kexec support for PS3 (ppc64)

2008-03-12 Thread Piet Delaney

Geoff Levand wrote:

Pete/Piet Delaney wrote:
  

Geoff Levand wrote:
| Geoff Levand wrote:
|> Rajasekaran P wrote:
|>> Hi Geoff,
|>> Thanks for your information.
|>>
|>> I used latest kexec tools with 2.6.23 kernel built with ps3_defconfig.
|>> This kernel boots normally on PS3. But when I  add "crashkernel"
|>> argument, it goes on hang.
|>>
|>> # kexec --command-line="video=ps3fb:mode:166 rhgb root=/dev/ps3da3
|>> [EMAIL PROTECTED]" \
|>> --initrd=/boot/initrd-2.6.23.img \
|>> -l /boot/vmlinux-2.6.23
|>>
|>> # taskset 1 kexec e
|>>
|>> Nothing happens after this. As you said earlier in your mail, I am not
|>> getting any PRINTK outputs from kernel on screen.

I was wondering about bringing in the printf() code from purgatory
and using that to print info about what's going on in the kexec code.
The standalone printf() might come in handy in other places; Ex: kgdb
stub. If it was in a library it could also be used during kernel
decompression to inform on problems occurring, like checksum errors.



When the first stage kernel goes down it releases all hypervisor
resources so that whatever boots next can successful open those
resources.  The frame buffer is one of those resources that are
released, and after it is, there will be no more output to the
display until the second stage image brings up its display.  It
is not a matter of adding print statements or string formatting.
  


Oh, I thought you experienced the same problem I had
with printk() calls in some of the kexec functions making
it no longer work. I was considering changing my KEXEC_DEBUG()
macros to using printf() instead of printk() and importing
the purgatory printf() code.

-Geoff

  


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: initramfs failing, 7MB limit?

2008-03-12 Thread Michael Neuling
> This is OT but I'm seeing what seems like a 7MB limit on an initramfs 
> under 2.6.24.3.
> 
> I can boot an initramfs which also works under kexec that has a cpio 
> size of 6480384 bytes.  I have another (created the same way) that 
> has more userland tools with a size of 7395328 bytes. That fails 
> under both boot and kexec with the kernel not finding "/init" and 
> panicking. "/init"  is present and executable in the initramfs. This 
> fails on the target hardware and two other standard PC hardware with 
> 256M, 512M and 1G of RAM.
> 
> I use the kernel scripts (scripts/gen_initramfs_list.sh  and 
> usr/gen_init_cpio ) to create the cpio based initramfs but it's not 
> embedded in the kernel. It's a standalone initramfs that gets passed 
> to the bootloader/kexec.
> 
> It's my understanding that with an initramfs, one is just limited to 
> available RAM and there are no set size limits.
> 
> Anyone else see this behavior or am I doing something incredibly stupid.

We have booted with very large initramfs (greater than 100MB) on ppc64
and not had a problem.  

You're probably having problems the initramfs overwriting some other
important information.  What does the memory map look like?  Where is
the initramfs located in memory and what is after it?

Mikey


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: initramfs failing, 7MB limit?

2008-03-12 Thread Scott D. Davilla
>  > This is OT but I'm seeing what seems like a 7MB limit on an initramfs
>>  under 2.6.24.3.
>>
>>  I can boot an initramfs which also works under kexec that has a cpio
>>  size of 6480384 bytes.  I have another (created the same way) that
>>  has more userland tools with a size of 7395328 bytes. That fails
>>  under both boot and kexec with the kernel not finding "/init" and
>>  panicking. "/init"  is present and executable in the initramfs. This
>>  fails on the target hardware and two other standard PC hardware with
>>  256M, 512M and 1G of RAM.
>>
>>  I use the kernel scripts (scripts/gen_initramfs_list.sh  and
>>  usr/gen_init_cpio ) to create the cpio based initramfs but it's not
>>  embedded in the kernel. It's a standalone initramfs that gets passed
>>  to the bootloader/kexec.
>>
>>  It's my understanding that with an initramfs, one is just limited to
>>  available RAM and there are no set size limits.
>>
>>  Anyone else see this behavior or am I doing something incredibly stupid.
>
>We have booted with very large initramfs (greater than 100MB) on ppc64
>and not had a problem. 
>
>You're probably having problems the initramfs overwriting some other
>important information.  What does the memory map look like?  Where is
>the initramfs located in memory and what is after it?
>

I'm not sure that's the case, this occurs independent of using a) 
kexec, b) syslinux or c) atv-bootloader on three different x86 
hardware platforms with one having three difference memory 
configurations.

I'm pretty sure kexec will place the initrd in a proper location as 
does syslinux. Since I'm writing atv-bootloader, the initrd is placed 
just under the bootloader which is loaded at 0xB00 (darwin mach 
kernel load location). So the failure does not seem to depend on 
initial load location of the initramfs. I could track down the actual 
load locations but don't think this problem is dependent on that.

In addition, on two of the platforms, I has serial support so I can 
run a serial console to capture the kernel output of the panic'd 
load. The kernel finds the initramfs (initrd), loads it and seem 
happy with it. It then panics later when trying to run "/init". I am 
building both initrds with kernel scripts and the only difference is 
the addition of wget and support glibc libs and these are not 
referenced in the init code. So the cpio built processes should be 
ok. Unless cpio itself is barfing.

It's either something about the cpio construction or the kernel's 
unpacking. I did read something about one cannot have ext3 support as 
a compiled module or bad things happen, but that seems really goofy, 
why should the kernel be using ext3 for an initramfs, ext3 seems 
really stupid for a ram disk, what's to save?

Scott

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [linux-pm] [PATCH -mm] kexec jump -v9

2008-03-12 Thread Alan Stern
On Wed, 12 Mar 2008, Huang, Ying wrote:

> I think "kexec based hibernation" is the only currently available
> possible method to write out image without freezer (after driver works
> are done). If other process is running, how to prevent them from writing
> to disk without freezing them in current implementation?

This is a very good question.

It's a matter of managing the block layer's request queues.  Somehow 
the existing I/O requests must remain blocked while the requests needed 
for writing the image must be allowed to proceed.

I don't know what would be needed to make this work, but it ought to be 
possible somehow...

Alan Stern


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-03-12 Thread Vivek Goyal
On Wed, Mar 12, 2008 at 10:14:34AM +0800, Huang, Ying wrote:
> On Wed, 2008-03-12 at 08:59 +1100, Nigel Cunningham wrote:
> > Hi all.
> > 
> > I hope kexec turns out to be a good, usable solution. Unfortunately,
> > however, I still have some areas where I'm not convinced that kexec is
> > going to work or work well:
> > 
> > 1. Reliability.
> > 
> > It's being sold as a replacement for freezing processes, yet AFAICS it's
> > still going to require the freezer in order to be reliable. In the
> > normal case, there isn't much of an issue with freeing memory or
> > allocating swap, and so these steps can be expected to progress without
> > pain. Imagine, however, the situation where another process or processes
> > are trying to allocate large amounts of memory at the same time, or the
> > system is swapping heavily. Although such situations will not be common,
> > they are entirely conceivable, and any implementation ought to be able
> > to handle such a situation efficiently. If the freezer is removed, any
> > hibernation implementation - not just kexec - is going to have a much
> > harder job of being reliable in all circumstances. AFAICS, the only way
> > a kexec based solution is going to be able to get around this will be to
> > not have to allocate memory, but that will require permanent allocation
> > of memory for the kexec kernel and it's work area as well as the
> > permanent, exclusive allocation of storage for the kexec hibernation
> > implementation that's currently in place (making the LCA complaint about
> > not being able to hibernate to swap on NTFS on fuse equally relevant).
> 
> As Eric said kexec need only to allocate memory during loading, not
> executing.

Yes. But this memory gets reserved at loading time and then this memory
remains unused for the whole duration (except hibernation).

In the example you gave, looks like you are reserving 15MB of memory for
second kernel. In practice, we we finding it difficult to boot a regular
kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
or while core is being copied.

Kexec based hibernation does not look any different than kdump in terms
of memory requirements. The only difference seems to be that kdump does
the contiguous memory reservation at boot time and kexec based hibernation
does the memory reservation at kernel loading time.

The only difference I can think of is, kdump will generally run on servers
and hibernation will be required on desktops/laptops and run time memory
requirements might be little different. I don't have numbers though.

At the same time carrying a separate kernel binary just for hibernation
purposes does not sound very good.
  
[..]
> > 3. Usability.
> > 
> > Right now, kexec based hibernation looks quite complicated to configure,
> > and the user is apparently going to have to remember to boot a different
> > kernel or at least a different bootloader entry in order to resume. Not
> 
> No, the newest implementation need not to boot a different kernel or
> different bootloader entry. You just use one bootloader entry, it will
> resume if there's an image, booting normally if there's not. You can
> look at the newest hibernation example description.
> 

Following is the step from new method you have given.

7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

This mentions that use rootfs.gz as initrd. Without modifying the boot
loader entry, how would I switch the initrd dynamically.

Looks like it might be a typo. So basically we can just boot back into
normal kernel and then a user can load the resumable core file and kexec
to it?

I think all this functionality can be packed into normal initrd itself
to make user interface better.

A user can configure the destination for hibernated image at system
installation time and initrd will be modified accordingly to save the
hibernated image as well to check that user specfied location to find out
if a hibernation image is available and needs to be resumed.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-03-12 Thread Vivek Goyal
On Tue, Mar 11, 2008 at 08:17:45PM -0600, Eric W. Biederman wrote:
> "Huang, Ying" <[EMAIL PROTECTED]> writes:
> 
> > Yes. The entry point should be saved in dump.elf itself, this can be
> > done via a user-space tool such as "makedumpfile". Because
> > "makedumpfile" is also used to exclude free pages from disk image, it
> > needs a communication method between two kernels (to get backup pages
> > map or something like that from kernel A). We have talked about this
> > before.
> >
> > - Your opinion is to communicate via the purgatory. (But I don't know
> > how to communicate between kernel A and purgatory).
> 
> How about the return address on the stack?
> 

I think he needs to pass on much more data than just return address. 

IIUC, he needs to pass backup pages map to new kernel, so that any
user space tool can use backup pages map to reconstruct/rearrange the
first kernel's memory core and tools like makedumpfile can do filtering
before hibernated images is saved.

This brings me to a random thought. Can we break the process of loading
a hibernation kernel in two steps.

- In first step just do the memory reservation for running second kernel.
  (kexec -l )

- This memory map of reserved pages is exported to user space.

- Use this memory map and regenerate the hibernation kernel initrd
  (rootfs.gz) and put the memory map there. This memory map can be used
  by makedumpfile in second kernel for filtering.

This way it will user space to user space communication of information 
which gets fixed at kernel loading time.

> > - Eric's opinion is to communicate between the user space in kernel A
> > and user space in kernel B.
> 
> Purgatory is for all intents and purposes user space.  Because the
> return address falls on the trampoline page we won't know it's
> address before we call kexec.  But a return address and a stack
> on that page should be a perfectly good way to communicate.
> 
> > - My opinion is to communicate between two kernel directly.
> >
> > I think as a minimal infrastructure patch, we can communicate minimal
> > information between user space of two kernels. When we have consensus on
> > this topic, we can use makedumpfile for both excluding free pages and
> > saving the entry point. Now, we can save the entry point in a separate
> > file or I can write a simple tool to do this.
> 
> We need a fixed protocol so we do not make assumptions about how things
> will be implemented, allowing kernels to diverge and kinds of other
> good things.
> 
> For communicating extra information from the kernel being shut down
> we have elf notes.
> 
> Direct kernel to kernel communication is forbidden.  We must have
> a well defined protocol.  Allowing the implementations to change
> at their different speeds, and still work together.
> 

Agreed. Without a proper protocol, we will often run into issues that
X version of kernel does not work with Y version of hibernation kernel
etc.

> >> May be we can have a separate load flag (--load-resume-image) to mark
> >> that we are resuming an hibernated image and kexec does not have to
> >> prepare commandline, does not have to prepare zero page/setup page etc.
> >
> > There is already similar flag in original kexec-tools implementation:
> > "--args-none". If it is specified, kexec-tools does not prepare command
> > line and zero page/setup page etc. I think we can just re-use this flag.
> > And If it is desired an alias is good for me too.
> 
> My gut feel is we look at the image and detect what kind it is, and simply
> not enable image processing after we have read the note that says it
> is a resumable core or whatever.
> 

That makes sense. Just that we shall have to put some kind of ELF NOTE
or some other identifier in resumable core file to identify it.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-03-12 Thread Vivek Goyal
On Wed, Mar 12, 2008 at 09:45:26AM +0800, Huang, Ying wrote:

[..]
> > I have thought through it again and try to put together some of the
> > new kexec options we can introduce to make the whole thing work. I am 
> > considering a simple case where a user boots the kernel A and then
> > launches kernel B using "kexec --load-preseve-context". Now a user
> > might save the hibernated image or might want to come back to A.
> > 
> > - kexec -l 
> > Normal kexec functionality. Boot a new kernel, without preserving
> > existing kernel's context.
> > 
> > - kexec --load-preserve-context 
> > Boot a new kernel while preserving existing kernel's context.
> > 
> > Will be used for booting kernel B for the first time.
> > 
> > - kexec --load-resume-image 
> 
> In original kexec-tools, this can be done through:
> kexec -l --args-none 
> 
> Do you need to define an alias for it?

Ok, we can get rid of --load-resume-image and go by the Eric's idea
of detecting image type and taking action accordingly.

Thanks
Vivek

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [linux-pm] [PATCH -mm] kexec jump -v9

2008-03-12 Thread Rafael J. Wysocki
On Wednesday, 12 of March 2008, Alan Stern wrote:
> On Wed, 12 Mar 2008, Huang, Ying wrote:
> 
> > I think "kexec based hibernation" is the only currently available
> > possible method to write out image without freezer (after driver works
> > are done). If other process is running, how to prevent them from writing
> > to disk without freezing them in current implementation?
> 
> This is a very good question.
> 
> It's a matter of managing the block layer's request queues.  Somehow 
> the existing I/O requests must remain blocked while the requests needed 
> for writing the image must be allowed to proceed.
> 
> I don't know what would be needed to make this work, but it ought to be 
> possible somehow...

Yes, it ought to be possible.

Ultimately, IMHO, we should put all devices unnecessary for saving the image
(and doing some eye-candy work) into low power states before the image is
created and keep them in low power states until the system is eventually
powered off.

If this is done, the remaining problem is the handling of the devices that we
need to save the image.  I believe that will be achievable without using the
freezer.

Thanks,
Rafael

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-03-12 Thread Eric W. Biederman
Vivek Goyal <[EMAIL PROTECTED]> writes:

> Yes. But this memory gets reserved at loading time and then this memory
> remains unused for the whole duration (except hibernation).
>
> In the example you gave, looks like you are reserving 15MB of memory for
> second kernel. In practice, we we finding it difficult to boot a regular
> kernel in 16MB of memory in kdump. We are now reserving 128MB of memory
> for kdump kernel on x86 arch, otheriwse OOM kill kicks in during init
> or while core is being copied.

Sounds like something we may want to fix.  Living at the default kernel
address may alieviate that problem somewhat.

> Kexec based hibernation does not look any different than kdump in terms
> of memory requirements. The only difference seems to be that kdump does
> the contiguous memory reservation at boot time and kexec based hibernation
> does the memory reservation at kernel loading time.
>
> The only difference I can think of is, kdump will generally run on servers
> and hibernation will be required on desktops/laptops and run time memory
> requirements might be little different. I don't have numbers though.
>
> At the same time carrying a separate kernel binary just for hibernation
> purposes does not sound very good.

One difference is you only get the memory penalty just before you hibernate,
instead of continuously.  So potentially you could swap out things to
make run for the kernel to save you to disk.

> [..]
>> > 3. Usability.
>> > 
>> > Right now, kexec based hibernation looks quite complicated to configure,
>> > and the user is apparently going to have to remember to boot a different
>> > kernel or at least a different bootloader entry in order to resume. Not
>> 
>> No, the newest implementation need not to boot a different kernel or
>> different bootloader entry. You just use one bootloader entry, it will
>> resume if there's an image, booting normally if there's not. You can
>> look at the newest hibernation example description.
>> 
>
> Following is the step from new method you have given.
>
> 7. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
>root file system.
>
> This mentions that use rootfs.gz as initrd. Without modifying the boot
> loader entry, how would I switch the initrd dynamically.
>
> Looks like it might be a typo. So basically we can just boot back into
> normal kernel and then a user can load the resumable core file and kexec
> to it?
>
> I think all this functionality can be packed into normal initrd itself
> to make user interface better.
>
> A user can configure the destination for hibernated image at system
> installation time and initrd will be modified accordingly to save the
> hibernated image as well to check that user specfied location to find out
> if a hibernation image is available and needs to be resumed.

Yes.  And we don't need to load any of this until just before hibernation
time so we should be able to change things right up until the last moment.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [linux-pm] [PATCH -mm] kexec jump -v9

2008-03-12 Thread Eric W. Biederman
"Rafael J. Wysocki" <[EMAIL PROTECTED]> writes:

> Yes, it ought to be possible.
>
> Ultimately, IMHO, we should put all devices unnecessary for saving the image
> (and doing some eye-candy work) into low power states before the image is
> created and keep them in low power states until the system is eventually
> powered off.

Why?  I guess I don't see why we care what power state the devices are in.
Especially since we should be able to quickly save the image.

We need to disconnect the drivers from the hardware yes.  So filesystems
still work and applications that do direct hardware access still work
and don't need to reopen their connections.

I'm leery of low power states as they don't always work, and bringing
low power states seems to confuse hibernation to disk with suspend to
ram.

> If this is done, the remaining problem is the handling of the devices that we
> need to save the image.  I believe that will be achievable without using the
> freezer.

Reasonable.  In general the problem is much easier if we don't store
the hibernation image in a filesystem or partition that the rest of
the system is using.  That way we avoid inconsistencies.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: bug/patch for i386 EFI boot

2008-03-12 Thread Eric W. Biederman
"Scott D. Davilla" <[EMAIL PROTECTED]> writes:
> Done and resubmitted with a proper subject line with commented out 
> lines removed. VMware was mangling the leading tabs on the drag and 
> drop from Linux to OS X ???
>
> And as follow up question. It there any access to 
> screen_info.orig_video_isVGA besides linking to the kernel? If there 
> is access to orig_video_isVGA then kexec can setup the screen boot 
> params as the bootloader intended instead of assuming a default VGA 
> config. The orig_video_isVGA is the only parameter missing to clone 
> the initial screen_info information.

Let me take a stab at answering part of this.

Originally I recall kexec passed a configuration for no screen at all.

I think that is still the default of how we setup the data
structures.  Then I merged a patch that detected which
type of screen there is (which you have recently amended).

The goal in kexec in this area has always been to use the user space
APIs and generate the data a normal bootloader or the 16bit setup code
would, without performing the BIOS calls.

If you can see a better way to do this we should go for it.

What we want is not so much the screen layout that the bootloader
provided but the current video mode.  In frame buffer consoles
generally this can not change so it is a pass through.

Hopefully this helps a little.

Eric

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec