Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.

2018-10-25 Thread Huang, Ying
Arun KS  writes:

> Remove managed_page_count_lock spinlock and instead use atomic
> variables.
>
> Suggested-by: Michal Hocko 
> Suggested-by: Vlastimil Babka 
> Signed-off-by: Arun KS 
>
> ---
> As discussed here,
> https://patchwork.kernel.org/patch/10627521/#22261253

My 2 cents.  I think you should include at least part of the discussion
in the patch description to make it more readable by itself.

Best Regards,
Huang, Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 1/3] PCI/PM: Fix kexec for D3cold and bridge suspending

2012-09-20 Thread Huang Ying
On Thu, 2012-09-20 at 21:20 +0200, Rafael J. Wysocki wrote:
> On Monday, September 17, 2012, Bjorn Helgaas wrote:
> > +cc Eric and kexec list
> > 
> > On Mon, Sep 17, 2012 at 2:54 AM, Huang Ying  wrote:
> > > If PCI devices are put into D3cold before kexec, because the
> > > configuration registers of PCI devices in D3cold are not accessible.
> > >
> > > And if PCI bridges are put into low power state before kexec,
> > > configuration registers of PCI devices underneath the PCI bridges are
> > > not accessible too.
> > >
> > > These will make some PCI devices can not be scanned after kexec, so
> > > resume the PCI devices in D3cold or PCI bridges in low power state
> > > before kexec.
> > 
> > Don't we need to resume the device even without the kexec issue?  And
> > even if it's in D1 or D2?
> > 
> > It looks to me like pci_msi_shutdown() (and probably drv->shutdown())
> > depend on the device being in D0.
> 
> We should in theory, but we didn't do any power management of PCI bridges
> before, so this is the first time we have a problem with it.
> 
> So I'd say, yeah, let's resume if current_state is between D1 and D3cold
> inclusive and the kexec comment is not very helpful (the problem is not
> kexec-specific in general).

Resume from D1 to D3cold for any device or just bridges?

Best Regards,
Huang Ying

> Speaking of kexec, it might consider using the hibernation device freeze
> instead of device shutdown (which the kexec jump feature does).  I've seen
> reports of problems that would be solved this way most likely.
> 
> Thanks,
> Rafael
> 
> 
> > > Signed-off-by: Huang Ying 
> > > ---
> > >  drivers/pci/pci-driver.c |4 
> > >  1 file changed, 4 insertions(+)
> > >
> > > --- a/drivers/pci/pci-driver.c
> > > +++ b/drivers/pci/pci-driver.c
> > > @@ -421,6 +421,10 @@ static void pci_device_shutdown(struct d
> > > struct pci_dev *pci_dev = to_pci_dev(dev);
> > > struct pci_driver *drv = pci_dev->driver;
> > >
> > > +   /* Resume bridges and devices in D3cold for kexec to work 
> > > properly */
> > > +   if (pci_dev->current_state == PCI_D3cold || pci_dev->subordinate)
> > > +   pm_runtime_resume(dev);
> > > +
> > > if (drv && drv->shutdown)
> > > drv->shutdown(pci_dev);
> > > pci_msi_shutdown(pci_dev);
> > 
> > 
> 



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC 1/3] PCI/PM: Fix kexec for D3cold and bridge suspending

2012-09-20 Thread Huang Ying
On Thu, 2012-09-20 at 00:38 -0700, Eric W. Biederman wrote:
> Bjorn Helgaas  writes:
> 
> > +cc Eric and kexec list
> >
> > On Mon, Sep 17, 2012 at 2:54 AM, Huang Ying  wrote:
> >> If PCI devices are put into D3cold before kexec, because the
> >> configuration registers of PCI devices in D3cold are not accessible.
> >>
> >> And if PCI bridges are put into low power state before kexec,
> >> configuration registers of PCI devices underneath the PCI bridges are
> >> not accessible too.
> >>
> >> These will make some PCI devices can not be scanned after kexec, so
> >> resume the PCI devices in D3cold or PCI bridges in low power state
> >> before kexec.
> >
> > Don't we need to resume the device even without the kexec issue?  And
> > even if it's in D1 or D2?
> 
> The basic requirement is that the device needs to be visible so we can
> auto discover it.  As I recall most sleep states don't make the device
> invisible and we can handle the rest in the device initializatoin code.

PCI devices in D3cold or under a bridge in D3hot will not be visible, so
we must fix that for kexec to run properly.

> > It looks to me like pci_msi_shutdown() (and probably drv->shutdown())
> > depend on the device being in D0.
> 
> There is certainly a depenency on the config registers being visible.
> Although I don't know if much will go wrong if they aren't.
> 
> Ceratinly pci_msi_shutdown doesn't have anything to do if the device has
> had so much power removed that the device is not even exectuing.

Don't know which power state device should be in for pci_msi_shutdown
etc.  But it appears that normal shutdown/reboot and kexec works at most
times so far.  D3cold and bridge in D3hot works for normal
shutdown/reboot, but not for kexec.  So I write some fix.

Best Regards,
Huang Ying

> >> Signed-off-by: Huang Ying 
> >> ---
> >>  drivers/pci/pci-driver.c |4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> --- a/drivers/pci/pci-driver.c
> >> +++ b/drivers/pci/pci-driver.c
> >> @@ -421,6 +421,10 @@ static void pci_device_shutdown(struct d
> >> struct pci_dev *pci_dev = to_pci_dev(dev);
> >> struct pci_driver *drv = pci_dev->driver;
> >>
> >> +   /* Resume bridges and devices in D3cold for kexec to work properly 
> >> */
> >> +   if (pci_dev->current_state == PCI_D3cold || pci_dev->subordinate)
> >> +   pm_runtime_resume(dev);
> >> +
> >> if (drv && drv->shutdown)
> >> drv->shutdown(pci_dev);
> >> pci_msi_shutdown(pci_dev);
> >
> > ___
> > kexec mailing list
> > kexec@lists.infradead.org
> > http://lists.infradead.org/mailman/listinfo/kexec



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [Bug] Kdump does not work when panic triggered due to MCE

2011-05-09 Thread Huang Ying
Hi, Prasad,

On 05/10/2011 12:35 AM, K.Prasad wrote:
> On Fri, May 06, 2011 at 07:38:25PM +0200, Andi Kleen wrote:
>>> Has anybody tested this before? Or have found kdump working when fatal
>>> MCEs have actually occurred?
>>
>> Ying did some testing. mce-test has test cases for kdump.
>>
> 
> We'd be glad to hear about any successful testcases with recent kernels.
> My manual testing was quite similar to what the LTP kdump testcase would
> do i.e. configure kdump service, trigger crash through
> /proc/sysrq-trigger and watchout for kdumpbut as you could see in
> the logs, that did not happen.
> 
>> My guess is you injected the error into some area used by the kexec
>> code or boot up path of the kexec kernel.
>>
>> -Andi
> 
> The logs did not suggest that the second kernel was booted into. The
> "Rebooting in ... seconds" message appeared from the first kernel. I
> tried the kdump testcase in atleast two dissimilar machines but with
> the same results, so it is not clear if the kexec code was affected by
> the MCE injection in both the cases.

>From your panic logs, it seems that panic is triggered for MCE on one
CPU,  when crash_kexec is executing, another panic is triggered on
another CPU for timeout mechanism in MCE.  We have seen something like
that in mce-test developing.  Please try following command line for mce
injecting.

mce-inject --no-random
/home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over

Which is used by kdump test driver of mce-test too.

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH][EFI] Run EFI in physical mode

2010-08-15 Thread huang ying
On Mon, Aug 16, 2010 at 11:27 AM, H. Peter Anvin  wrote:
> No, it should not be dynamic; rather we should unify all the users who need a 
> 1:1 map and just keep that page table set around.

Agree. One known issue of global 1:1 map is that we need to make at
least part of page table PAGE_KERNEL_EXEC for EFI runtime code, and
change_page_attr can not be used before page allocator is available.

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH][EFI] Run EFI in physical mode

2010-08-15 Thread huang ying
On Sat, Aug 14, 2010 at 3:18 AM, Takao Indoh  wrote:
> diff -Nurp linux-2.6.35.org/arch/x86/kernel/efi_64.c 
> linux-2.6.35/arch/x86/kernel/efi_64.c
> --- linux-2.6.35.org/arch/x86/kernel/efi_64.c   2010-08-01 18:11:14.0 
> -0400
> +++ linux-2.6.35/arch/x86/kernel/efi_64.c       2010-08-13 14:39:25.819105004 
> -0400
> @@ -39,7 +39,9 @@
>  #include 
>
>  static pgd_t save_pgd __initdata;
> -static unsigned long efi_flags __initdata;
> +static unsigned long efi_flags;
> +static pgd_t efi_pgd[PTRS_PER_PGD] __page_aligned_bss;
> +static unsigned long save_cr3;
>
>  static void __init early_mapping_set_exec(unsigned long start,
>                                          unsigned long end,
> @@ -98,6 +100,19 @@ void __init efi_call_phys_epilog(void)
>        early_runtime_code_mapping_set_exec(0);
>  }
>
> +void efi_call_phys_prelog_in_physmode(void)
> +{
> +       local_irq_save(efi_flags);
> +       save_cr3 = read_cr3();
> +       write_cr3(virt_to_phys(efi_pgd));
> +}
> +
> +void efi_call_phys_epilog_in_physmode(void)
> +{
> +       write_cr3(save_cr3);
> +       local_irq_restore(efi_flags);
> +}

efi_flags and save_cr3 should be per-CPU, because they now will be
used after SMP is enabled.

efi_pgd should be dynamically allocated instead of statically
allocated, because EFI may be not enabled on some platform.

And I think it is better to unify early physical mode with run-time
physical mode. Just allocate the page table with early page allocator
(lmb?).

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2] kexec jump support for x86_64

2009-12-08 Thread Huang Ying
x86_64 specific support, including crash memory range and purgatory setup.
Corresponding kernel support has been merged already.

Together with the kexec jump features in Linux kernel, kexec jump can
be used for following:

- A simple hibernation implementation without ACPI support. You can
  kexec a hibernating kernel, save the memory image of original system
  and shutdown the system. When resuming, you restore the memory image
  of original system via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot. You can make
  system snapshot with kexec/kdump, jump back, do some thing and make
  another system snapshot.

- Cooperative multi-kernel/system. With kexec jump, you can switch
  between several kernels/systems quickly without boot process except
  the first time. This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.

Signed-off-by: Huang Ying 

---
 kexec/arch/x86_64/crashdump-x86_64.c |   48 ++-
 purgatory/arch/x86_64/purgatory-x86_64.c |   11 ++-
 purgatory/arch/x86_64/setup-x86_64.S |3 +
 3 files changed, 48 insertions(+), 14 deletions(-)

--- a/kexec/arch/x86_64/crashdump-x86_64.c
+++ b/kexec/arch/x86_64/crashdump-x86_64.c
@@ -161,7 +161,8 @@ static struct memory_range crash_reserve
  * to look into down the line. May be something like /proc/kernelmem or may
  * be zone data structures exported from kernel.
  */
-static int get_crash_memory_ranges(struct memory_range **range, int *ranges)
+static int get_crash_memory_ranges(struct memory_range **range, int *ranges,
+  int kexec_flags)
 {
const char *iomem= proc_iomem();
int memory_ranges = 0, gart = 0;
@@ -179,10 +180,12 @@ static int get_crash_memory_ranges(struc
 
/* First entry is for first 640K region. Different bios report first
 * 640K in different manner hence hardcoding it */
-   crash_memory_range[0].start = 0x;
-   crash_memory_range[0].end = 0x0009;
-   crash_memory_range[0].type = RANGE_RAM;
-   memory_ranges++;
+   if (!(kexec_flags & KEXEC_PRESERVE_CONTEXT)) {
+   crash_memory_range[0].start = 0x;
+   crash_memory_range[0].end = 0x0009;
+   crash_memory_range[0].type = RANGE_RAM;
+   memory_ranges++;
+   }
 
while(fgets(line, sizeof(line), fp) != 0) {
char *str;
@@ -239,6 +242,22 @@ static int get_crash_memory_ranges(struc
memory_ranges++;
}
fclose(fp);
+   if (kexec_flags & KEXEC_PRESERVE_CONTEXT) {
+   int i;
+   for (i = 0; i < memory_ranges; i++) {
+   if (crash_memory_range[i].end > 0x0009) {
+   crash_reserved_mem.start = \
+   crash_memory_range[i].start;
+   break;
+   }
+   }
+   if (crash_reserved_mem.start >= mem_max) {
+   fprintf(stderr, "Too small mem_max: 0x%llx.\n", 
mem_max);
+   return -1;
+   }
+   crash_reserved_mem.end = mem_max;
+   crash_reserved_mem.type = RANGE_RAM;
+   }
if (exclude_region(&memory_ranges, crash_reserved_mem.start,
crash_reserved_mem.end) < 0)
return -1;
@@ -590,7 +609,8 @@ int load_crashdump_segments(struct kexec
if (get_kernel_vaddr_and_size(info))
return -1;
 
-   if (get_crash_memory_ranges(&mem_range, &nr_ranges) < 0)
+   if (get_crash_memory_ranges(&mem_range, &nr_ranges,
+   info->kexec_flags) < 0)
return -1;
 
/* Memory regions which panic kernel can safely use to boot into */
@@ -602,13 +622,15 @@ int load_crashdump_segments(struct kexec
add_memmap(memmap_p, crash_reserved_mem.start, sz);
 
/* Create a backup region segment to store backup data*/
-   sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1);
-   tmp = xmalloc(sz);
-   memset(tmp, 0, sz);
-   info->backup_start = add_buffer(info, tmp, sz, sz, align,
-   0, max_addr, 1);
-   if (delete_memmap(memmap_p, info->backup_start, sz) < 0)
-   return -1;
+   if (!(info->kexec_flags & KEXEC_PRESERVE_CONTEXT)) {
+   sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1);
+   tmp = xmalloc(sz);
+   memset(tmp, 0, sz);
+   info->backup_start = add_buffer(info, tmp, sz, sz, align,
+   0, max_addr, 1);
+   if (delete_memmap(memmap_p, info->backup_start, sz)

[PATCH] kexec jump support for x86_64

2009-12-08 Thread Huang Ying
x86_64 specific support, including crash memory range and purgatory setup.
Corresponding kernel support has been merged already.

Signed-off-by: Huang Ying 

---
 kexec/arch/x86_64/crashdump-x86_64.c |   48 ++-
 purgatory/arch/x86_64/purgatory-x86_64.c |   11 ++-
 purgatory/arch/x86_64/setup-x86_64.S |3 +
 3 files changed, 48 insertions(+), 14 deletions(-)

--- a/kexec/arch/x86_64/crashdump-x86_64.c
+++ b/kexec/arch/x86_64/crashdump-x86_64.c
@@ -161,7 +161,8 @@ static struct memory_range crash_reserve
  * to look into down the line. May be something like /proc/kernelmem or may
  * be zone data structures exported from kernel.
  */
-static int get_crash_memory_ranges(struct memory_range **range, int *ranges)
+static int get_crash_memory_ranges(struct memory_range **range, int *ranges,
+  int kexec_flags)
 {
const char *iomem= proc_iomem();
int memory_ranges = 0, gart = 0;
@@ -179,10 +180,12 @@ static int get_crash_memory_ranges(struc
 
/* First entry is for first 640K region. Different bios report first
 * 640K in different manner hence hardcoding it */
-   crash_memory_range[0].start = 0x;
-   crash_memory_range[0].end = 0x0009;
-   crash_memory_range[0].type = RANGE_RAM;
-   memory_ranges++;
+   if (!(kexec_flags & KEXEC_PRESERVE_CONTEXT)) {
+   crash_memory_range[0].start = 0x;
+   crash_memory_range[0].end = 0x0009;
+   crash_memory_range[0].type = RANGE_RAM;
+   memory_ranges++;
+   }
 
while(fgets(line, sizeof(line), fp) != 0) {
char *str;
@@ -239,6 +242,22 @@ static int get_crash_memory_ranges(struc
memory_ranges++;
}
fclose(fp);
+   if (kexec_flags & KEXEC_PRESERVE_CONTEXT) {
+   int i;
+   for (i = 0; i < memory_ranges; i++) {
+   if (crash_memory_range[i].end > 0x0009) {
+   crash_reserved_mem.start = \
+   crash_memory_range[i].start;
+   break;
+   }
+   }
+   if (crash_reserved_mem.start >= mem_max) {
+   fprintf(stderr, "Too small mem_max: 0x%llx.\n", 
mem_max);
+   return -1;
+   }
+   crash_reserved_mem.end = mem_max;
+   crash_reserved_mem.type = RANGE_RAM;
+   }
if (exclude_region(&memory_ranges, crash_reserved_mem.start,
crash_reserved_mem.end) < 0)
return -1;
@@ -590,7 +609,8 @@ int load_crashdump_segments(struct kexec
if (get_kernel_vaddr_and_size(info))
return -1;
 
-   if (get_crash_memory_ranges(&mem_range, &nr_ranges) < 0)
+   if (get_crash_memory_ranges(&mem_range, &nr_ranges,
+   info->kexec_flags) < 0)
return -1;
 
/* Memory regions which panic kernel can safely use to boot into */
@@ -602,13 +622,15 @@ int load_crashdump_segments(struct kexec
add_memmap(memmap_p, crash_reserved_mem.start, sz);
 
/* Create a backup region segment to store backup data*/
-   sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1);
-   tmp = xmalloc(sz);
-   memset(tmp, 0, sz);
-   info->backup_start = add_buffer(info, tmp, sz, sz, align,
-   0, max_addr, 1);
-   if (delete_memmap(memmap_p, info->backup_start, sz) < 0)
-   return -1;
+   if (!(info->kexec_flags & KEXEC_PRESERVE_CONTEXT)) {
+   sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1);
+   tmp = xmalloc(sz);
+   memset(tmp, 0, sz);
+   info->backup_start = add_buffer(info, tmp, sz, sz, align,
+   0, max_addr, 1);
+   if (delete_memmap(memmap_p, info->backup_start, sz) < 0)
+   return -1;
+   }
 
/* Create elf header segment and store crash image data. */
if (crash_create_elf64_headers(info, &elf_info,
--- a/purgatory/arch/x86_64/purgatory-x86_64.c
+++ b/purgatory/arch/x86_64/purgatory-x86_64.c
@@ -6,6 +6,7 @@
 uint8_t reset_vga = 0;
 uint8_t legacy_pic = 0;
 uint8_t panic_kernel = 0;
+unsigned long jump_back_entry = 0;
 char *cmdline_end = NULL;
 
 void setup_arch(void)
@@ -14,8 +15,16 @@ void setup_arch(void)
if (legacy_pic)   x86_setup_legacy_pic();
 }
 
+void x86_setup_jump_back_entry(void)
+{
+   if (cmdline_end)
+   sprintf(cmdline_end, " kexec_jump_back_entry=0x%lx",
+   jump_back_entry);
+}
+
 /* This function can be used to execute after the SHA256 verification. */
 void post_verification_setup_arch(void)
 {
- 

[PATCH resend] kexec/x86_64: Use one page table in x86_64 machine_kexec

2009-02-03 Thread Huang Ying
Impact: reduce kernel BSS size by 7 pages, improve code readability

Two page tables are used in current x86_64 kexec implementation. One
is used to jump from kernel virtual address to identity map address,
the other is used to map all physical memory. In fact, on x86_64,
there is no conflict between kernel virtual address space and physical
memory space, so just one page table is sufficient. The page table
pages used to map control page are dynamically allocated to save
memory if kexec image is not loaded. ASM code used to map control page
is replaced by C code too.

Signed-off-by: Huang Ying 

---
 arch/x86/include/asm/kexec.h |   27 ++-
 arch/x86/kernel/machine_kexec_64.c   |   82 +++---
 arch/x86/kernel/relocate_kernel_64.S |  125 ---
 3 files changed, 67 insertions(+), 167 deletions(-)

--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -9,23 +9,8 @@
 # define PAGES_NR  4
 #else
 # define PA_CONTROL_PAGE   0
-# define VA_CONTROL_PAGE   1
-# define PA_PGD2
-# define VA_PGD3
-# define PA_PUD_0  4
-# define VA_PUD_0  5
-# define PA_PMD_0  6
-# define VA_PMD_0  7
-# define PA_PTE_0  8
-# define VA_PTE_0  9
-# define PA_PUD_1  10
-# define VA_PUD_1  11
-# define PA_PMD_1  12
-# define VA_PMD_1  13
-# define PA_PTE_1  14
-# define VA_PTE_1  15
-# define PA_TABLE_PAGE 16
-# define PAGES_NR  17
+# define PA_TABLE_PAGE 1
+# define PAGES_NR  2
 #endif
 
 #ifdef CONFIG_X86_32
@@ -157,9 +142,9 @@ relocate_kernel(unsigned long indirectio
unsigned long start_address) ATTRIB_NORET;
 #endif
 
-#ifdef CONFIG_X86_32
 #define ARCH_HAS_KIMAGE_ARCH
 
+#ifdef CONFIG_X86_32
 struct kimage_arch {
pgd_t *pgd;
 #ifdef CONFIG_X86_PAE
@@ -169,6 +154,12 @@ struct kimage_arch {
pte_t *pte0;
pte_t *pte1;
 };
+#else
+struct kimage_arch {
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+};
 #endif
 
 #endif /* __ASSEMBLY__ */
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,15 +18,6 @@
 #include 
 #include 
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u64 kexec_pgd[512] PAGE_ALIGNED;
-static u64 kexec_pud0[512] PAGE_ALIGNED;
-static u64 kexec_pmd0[512] PAGE_ALIGNED;
-static u64 kexec_pte0[512] PAGE_ALIGNED;
-static u64 kexec_pud1[512] PAGE_ALIGNED;
-static u64 kexec_pmd1[512] PAGE_ALIGNED;
-static u64 kexec_pte1[512] PAGE_ALIGNED;
-
 static void init_level2_page(pmd_t *level2p, unsigned long addr)
 {
unsigned long end_addr;
@@ -107,12 +98,65 @@ out:
return result;
 }
 
+static void free_transition_pgtable(struct kimage *image)
+{
+   free_page((unsigned long)image->arch.pud);
+   free_page((unsigned long)image->arch.pmd);
+   free_page((unsigned long)image->arch.pte);
+}
+
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   unsigned long vaddr, paddr;
+   int result = -ENOMEM;
+
+   vaddr = (unsigned long)relocate_kernel;
+   paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+   pgd += pgd_index(vaddr);
+   if (!pgd_present(*pgd)) {
+   pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
+   if (!pud)
+   goto err;
+   image->arch.pud = pud;
+   set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+   }
+   pud = pud_offset(pgd, vaddr);
+   if (!pud_present(*pud)) {
+   pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   if (!pmd)
+   goto err;
+   image->arch.pmd = pmd;
+   set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+   }
+   pmd = pmd_offset(pud, vaddr);
+   if (!pmd_present(*pmd)) {
+   pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   if (!pte)
+   goto err;
+   image->arch.pte = pte;
+   set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+   }
+   pte = pte_offset_kernel(pmd, vaddr);
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+   return 0;
+err:
+   free_transition_pgtable(image);
+   return result;
+}
+
 
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
pgd_t *level4p;
+   int result;
level4p = (pgd_t *)__va(start_pgtable);
-   return init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+   result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+   if (result)
+   return result;
+   return init_transition_pgtable(image, level4p);
 }
 
 stat

[PATCH] kexec/x86_64: Use one page table in x86_64 kexec

2009-01-10 Thread Huang Ying
Two page tables are used in current x86_64 kexec implementation. One
is used to jump from kernel virtual address to identity map address,
the other is used to map all physical memory. In fact, on x86_64,
there is no conflict between kernel virtual address space and physical
memory space, so just one page table is sufficient. The page table
pages used to map control page are dynamically allocated to save
memory if kexec image is not loaded. ASM code used to map control page
is replaced by C code too.

Signed-off-by: Huang Ying 

---
 arch/x86/include/asm/kexec.h |   27 ++-
 arch/x86/kernel/machine_kexec_64.c   |   82 +++---
 arch/x86/kernel/relocate_kernel_64.S |  125 ---
 3 files changed, 67 insertions(+), 167 deletions(-)

--- a/arch/x86/include/asm/kexec.h
+++ b/arch/x86/include/asm/kexec.h
@@ -9,23 +9,8 @@
 # define PAGES_NR  4
 #else
 # define PA_CONTROL_PAGE   0
-# define VA_CONTROL_PAGE   1
-# define PA_PGD2
-# define VA_PGD3
-# define PA_PUD_0  4
-# define VA_PUD_0  5
-# define PA_PMD_0  6
-# define VA_PMD_0  7
-# define PA_PTE_0  8
-# define VA_PTE_0  9
-# define PA_PUD_1  10
-# define VA_PUD_1  11
-# define PA_PMD_1  12
-# define VA_PMD_1  13
-# define PA_PTE_1  14
-# define VA_PTE_1  15
-# define PA_TABLE_PAGE 16
-# define PAGES_NR  17
+# define PA_TABLE_PAGE 1
+# define PAGES_NR  2
 #endif
 
 #ifdef CONFIG_X86_32
@@ -157,9 +142,9 @@ relocate_kernel(unsigned long indirectio
unsigned long start_address) ATTRIB_NORET;
 #endif
 
-#ifdef CONFIG_X86_32
 #define ARCH_HAS_KIMAGE_ARCH
 
+#ifdef CONFIG_X86_32
 struct kimage_arch {
pgd_t *pgd;
 #ifdef CONFIG_X86_PAE
@@ -169,6 +154,12 @@ struct kimage_arch {
pte_t *pte0;
pte_t *pte1;
 };
+#else
+struct kimage_arch {
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+};
 #endif
 
 #endif /* __ASSEMBLY__ */
--- a/arch/x86/kernel/machine_kexec_64.c
+++ b/arch/x86/kernel/machine_kexec_64.c
@@ -18,15 +18,6 @@
 #include 
 #include 
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u64 kexec_pgd[512] PAGE_ALIGNED;
-static u64 kexec_pud0[512] PAGE_ALIGNED;
-static u64 kexec_pmd0[512] PAGE_ALIGNED;
-static u64 kexec_pte0[512] PAGE_ALIGNED;
-static u64 kexec_pud1[512] PAGE_ALIGNED;
-static u64 kexec_pmd1[512] PAGE_ALIGNED;
-static u64 kexec_pte1[512] PAGE_ALIGNED;
-
 static void init_level2_page(pmd_t *level2p, unsigned long addr)
 {
unsigned long end_addr;
@@ -107,12 +98,65 @@ out:
return result;
 }
 
+static void free_transition_pgtable(struct kimage *image)
+{
+   free_page((unsigned long)image->arch.pud);
+   free_page((unsigned long)image->arch.pmd);
+   free_page((unsigned long)image->arch.pte);
+}
+
+static int init_transition_pgtable(struct kimage *image, pgd_t *pgd)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   unsigned long vaddr, paddr;
+   int result = -ENOMEM;
+
+   vaddr = (unsigned long)relocate_kernel;
+   paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE);
+   pgd += pgd_index(vaddr);
+   if (!pgd_present(*pgd)) {
+   pud = (pud_t *)get_zeroed_page(GFP_KERNEL);
+   if (!pud)
+   goto err;
+   image->arch.pud = pud;
+   set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE));
+   }
+   pud = pud_offset(pgd, vaddr);
+   if (!pud_present(*pud)) {
+   pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   if (!pmd)
+   goto err;
+   image->arch.pmd = pmd;
+   set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE));
+   }
+   pmd = pmd_offset(pud, vaddr);
+   if (!pmd_present(*pmd)) {
+   pte = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   if (!pte)
+   goto err;
+   image->arch.pte = pte;
+   set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE));
+   }
+   pte = pte_offset_kernel(pmd, vaddr);
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+   return 0;
+err:
+   free_transition_pgtable(image);
+   return result;
+}
+
 
 static int init_pgtable(struct kimage *image, unsigned long start_pgtable)
 {
pgd_t *level4p;
+   int result;
level4p = (pgd_t *)__va(start_pgtable);
-   return init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+   result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT);
+   if (result)
+   return result;
+   return init_transition_pgtable(image, level4p);
 }
 
 static void set_idt(void *newidt, u16 limit)
@@ -174,7 +218,7 @@ int machi

Re: [PATCH] Fix kexec x86_64 load failed bug

2008-11-25 Thread Huang Ying
On Wed, 2008-11-26 at 14:16 +0800, Simon Horman wrote:
> On Wed, Nov 26, 2008 at 12:25:51PM +0800, Huang Ying wrote:
> > On Wed, 2008-11-26 at 11:25 +0800, Randy Dunlap wrote:
> > > This isn't kernel code?  Where is /purgatory/ ?
> > > 
> > > Anyway, for kernel code, that should be:
> > > char *cmdline_end = NULL;
> > 
> > This patch is against kexec tools, not kernel.
> > 
> > Best Regards,
> > Huang Ying
> 
> Hi Huang,
> 
> I think that I would prefer "char *cmdline_end = NULL;" for kexec-tools
> code too.

Patch v2 follows with NULL instead of 0.

Best Regards,
Huang Ying
>
Fix a bug of kexec load on x86_64. Kexec fails to do load on x86_64, with
error message:

  Symbol: cmdline_end not found cannot set

Because kexec/arch/i386/kexec-bzImage.c accesses cmdline_end symbol in
i386 purgatory, but there is no cmdline_end in x86_64 purgatory, and
kexec-bzImage.c is used by x86_64 too.

cmdline_end is added into x86_64 purgatory to solve the bug, because kexec
jump support for x86_64 is planned.

Reported-by: Bernhard Walle <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 purgatory/arch/x86_64/purgatory-x86_64.c |2 ++
 1 file changed, 2 insertions(+)

--- a/purgatory/arch/x86_64/purgatory-x86_64.c
+++ b/purgatory/arch/x86_64/purgatory-x86_64.c
@@ -1,10 +1,12 @@
 #include 
+#include 
 #include 
 #include "purgatory-x86_64.h"
 
 uint8_t reset_vga = 0;
 uint8_t legacy_pic = 0;
 uint8_t panic_kernel = 0;
+char *cmdline_end = NULL;
 
 void setup_arch(void)
 {



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] Fix kexec x86_64 load failed bug

2008-11-25 Thread Huang Ying
On Wed, 2008-11-26 at 11:25 +0800, Randy Dunlap wrote:
> This isn't kernel code?  Where is /purgatory/ ?
> 
> Anyway, for kernel code, that should be:
> char *cmdline_end = NULL;

This patch is against kexec tools, not kernel.

Best Regards,
Huang Ying



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] Fix kexec x86_64 load failed bug

2008-11-25 Thread Huang Ying
Fix a bug of kexec load on x86_64. Kexec fails to do load on x86_64, with
error message:

  Symbol: cmdline_end not found cannot set

Because kexec/arch/i386/kexec-bzImage.c accesses cmdline_end symbol in
i386 purgatory, but there is no cmdline_end in x86_64 purgatory, and
kexec-bzImage.c is used by x86_64 too.

cmdline_end is added into x86_64 purgatory to solve the bug, because kexec
jump support for x86_64 is planned.

Reported-by: Bernhard Walle <[EMAIL PROTECTED]>
Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

diff --git a/purgatory/arch/x86_64/purgatory-x86_64.c 
b/purgatory/arch/x86_64/purgatory-x86_64.c
index 374b554..67a37f9 100644
--- a/purgatory/arch/x86_64/purgatory-x86_64.c
+++ b/purgatory/arch/x86_64/purgatory-x86_64.c
@@ -5,6 +5,7 @@
 uint8_t reset_vga = 0;
 uint8_t legacy_pic = 0;
 uint8_t panic_kernel = 0;
+char *cmdline_end = 0;
 
 void setup_arch(void)
 {



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/3 -v4] kexec/i386: remove PAGE_SIZE alignment from relocate_kernel

2008-10-30 Thread Huang Ying
This patch removes PAGE_SIZE alignment from relocate_kernel(). Before
kexec jump patches are merged, control page is mapped to
relocate_kernel in kexec page tables, so relocate_kernel must be
PAGE_SIZE aligned. Now, control page is mapped to identity mapped
address, so relocate_kernel need not to be PAGE_SIZE aligned any
more. This can reduce a few KB from kernel text segement.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/relocate_kernel_32.S |1 -
 1 file changed, 1 deletion(-)

--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -39,7 +39,6 @@
 #define CP_PA_BACKUP_PAGES_MAP DATA(0x1c)
 
.text
-   .align PAGE_SIZE
.globl relocate_kernel
 relocate_kernel:
/* Save the CPU context, used for jumping back */



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/3 -v4] kexec/i386: allocate page table pages dynamically

2008-10-30 Thread Huang Ying
This patch adds an architecture specific struct kimage_arch into
struct kimage. The pointers to page table pages used by kexec are
added to struct kimage_arch. The page tables pages are dynamically
allocated in machine_kexec_prepare instead of statically from BSS
segment. This will save up to 20k memory when kexec image is not
loaded.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/include/asm/kexec.h   |   14 +++
 arch/x86/kernel/machine_kexec_32.c |   69 +
 include/linux/kexec.h  |4 ++
 3 files changed, 65 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -25,15 +26,6 @@
 #include 
 #include 
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u32 kexec_pgd[1024] PAGE_ALIGNED;
-#ifdef CONFIG_X86_PAE
-static u32 kexec_pmd0[1024] PAGE_ALIGNED;
-static u32 kexec_pmd1[1024] PAGE_ALIGNED;
-#endif
-static u32 kexec_pte0[1024] PAGE_ALIGNED;
-static u32 kexec_pte1[1024] PAGE_ALIGNED;
-
 static void set_idt(void *newidt, __u16 limit)
 {
struct desc_ptr curidt;
@@ -76,6 +68,37 @@ static void load_segments(void)
 #undef __STR
 }
 
+static void machine_kexec_free_page_tables(struct kimage *image)
+{
+   free_page((unsigned long)image->arch.pgd);
+#ifdef CONFIG_X86_PAE
+   free_page((unsigned long)image->arch.pmd0);
+   free_page((unsigned long)image->arch.pmd1);
+#endif
+   free_page((unsigned long)image->arch.pte0);
+   free_page((unsigned long)image->arch.pte1);
+}
+
+static int machine_kexec_alloc_page_tables(struct kimage *image)
+{
+   image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+#ifdef CONFIG_X86_PAE
+   image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+#endif
+   image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   if (!image->arch.pgd ||
+#ifdef CONFIG_X86_PAE
+   !image->arch.pmd0 || !image->arch.pmd1 ||
+#endif
+   !image->arch.pte0 || !image->arch.pte1) {
+   machine_kexec_free_page_tables(image);
+   return -ENOMEM;
+   }
+   return 0;
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -87,13 +110,14 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Make control page executable.
+ * - Make control page executable.
+ * - Allocate page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
if (nx_enabled)
set_pages_x(image->control_code_page, 1);
-   return 0;
+   return machine_kexec_alloc_page_tables(image);
 }
 
 /*
@@ -104,6 +128,7 @@ void machine_kexec_cleanup(struct kimage
 {
if (nx_enabled)
set_pages_nx(image->control_code_page, 1);
+   machine_kexec_free_page_tables(image);
 }
 
 /*
@@ -150,18 +175,18 @@ void machine_kexec(struct kimage *image)
relocate_kernel_ptr = control_page;
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
-   page_list[PA_PGD] = __pa(kexec_pgd);
-   page_list[VA_PGD] = (unsigned long)kexec_pgd;
+   page_list[PA_PGD] = __pa(image->arch.pgd);
+   page_list[VA_PGD] = (unsigned long)image->arch.pgd;
 #ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(kexec_pmd0);
-   page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-   page_list[PA_PMD_1] = __pa(kexec_pmd1);
-   page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(kexec_pte0);
-   page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-   page_list[PA_PTE_1] = __pa(kexec_pte1);
-   page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+   page_list[PA_PMD_0] = __pa(image->arch.pmd0);
+   page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0;
+   page_list[PA_PMD_1] = __pa(image->arch.pmd1);
+   page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1;
+#endif
+   page_list[PA_PTE_0] = __pa(image->arch.pte0);
+   page_list[VA_PTE_0] = (unsigned long)image->arch.pte0;
+   page_list[PA_PTE_1] = __pa(image->arch.pte1);
+   page_list[VA_PTE_1] = (unsigned long)image->arch.pte1;
 
if (image->type == KEXEC_TYPE_DEFAULT)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -100,6 +100,10 @@ struct kimage {
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
unsigned int preserve_context : 1;
+
+#ifdef ARCH_HAS_KIMAGE_ARCH
+   struct kimage_arch arch;
+#endif
 };
 
 
--- a/arch/x86/include

[PATCH 0/3 -v4] kexec/i386: kexec page table code clean up

2008-10-30 Thread Huang Ying
This patchset cleans up page table setup code of kexec on i386.

This patchset is based on v2.6.28-rc2-338-g65fc716 and has been tested
on i386.

v4:

- Re-based on mainline git tree: v2.6.28-rc2-338-g65fc716.

v3:

- Remove PAGE_SIZE alignment from relocate_kernel()

- Re-based on 2.6.28-rc2-mm1

v2:

- Rename some function names, such as alloc_page_tables ->
  machine_kexec_alloc_page_tables, etc.

- Cleanup error processing for machine_alloc_page_tables.


Best Regards,
Huang Ying


signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 3/3 -v4] kexec/i386: setup kexec page table in C

2008-10-30 Thread Huang Ying
This patch transforms the kexec page tables setup code from assembler
code to C code in machine_kexec_prepare. This improves readability and
reduces code line number.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/include/asm/kexec.h |   17 -
 arch/x86/kernel/machine_kexec_32.c   |   59 ++
 arch/x86/kernel/relocate_kernel_32.S |  114 ---
 3 files changed, 49 insertions(+), 141 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -99,6 +99,45 @@ static int machine_kexec_alloc_page_tabl
return 0;
 }
 
+static void machine_kexec_page_table_set_one(
+   pgd_t *pgd, pmd_t *pmd, pte_t *pte,
+   unsigned long vaddr, unsigned long paddr)
+{
+   pud_t *pud;
+
+   pgd += pgd_index(vaddr);
+#ifdef CONFIG_X86_PAE
+   if (!(pgd_val(*pgd) & _PAGE_PRESENT))
+   set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT));
+#endif
+   pud = pud_offset(pgd, vaddr);
+   pmd = pmd_offset(pud, vaddr);
+   if (!(pmd_val(*pmd) & _PAGE_PRESENT))
+   set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+   pte = pte_offset_kernel(pmd, vaddr);
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+}
+
+static void machine_kexec_prepare_page_tables(struct kimage *image)
+{
+   void *control_page;
+   pmd_t *pmd = 0;
+
+   control_page = page_address(image->control_code_page);
+#ifdef CONFIG_X86_PAE
+   pmd = image->arch.pmd0;
+#endif
+   machine_kexec_page_table_set_one(
+   image->arch.pgd, pmd, image->arch.pte0,
+   (unsigned long)control_page, __pa(control_page));
+#ifdef CONFIG_X86_PAE
+   pmd = image->arch.pmd1;
+#endif
+   machine_kexec_page_table_set_one(
+   image->arch.pgd, pmd, image->arch.pte1,
+   __pa(control_page), __pa(control_page));
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -112,12 +151,19 @@ static int machine_kexec_alloc_page_tabl
  *
  * - Make control page executable.
  * - Allocate page tables
+ * - Setup page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   int error;
+
if (nx_enabled)
set_pages_x(image->control_code_page, 1);
-   return machine_kexec_alloc_page_tables(image);
+   error = machine_kexec_alloc_page_tables(image);
+   if (error)
+   return error;
+   machine_kexec_prepare_page_tables(image);
+   return 0;
 }
 
 /*
@@ -176,17 +222,6 @@ void machine_kexec(struct kimage *image)
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
page_list[PA_PGD] = __pa(image->arch.pgd);
-   page_list[VA_PGD] = (unsigned long)image->arch.pgd;
-#ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(image->arch.pmd0);
-   page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0;
-   page_list[PA_PMD_1] = __pa(image->arch.pmd1);
-   page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(image->arch.pte0);
-   page_list[VA_PTE_0] = (unsigned long)image->arch.pte0;
-   page_list[PA_PTE_1] = __pa(image->arch.pte1);
-   page_list[VA_PTE_1] = (unsigned long)image->arch.pte1;
 
if (image->type == KEXEC_TYPE_DEFAULT)
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page)
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -10,15 +10,12 @@
 #include 
 #include 
 #include 
-#include 
 
 /*
  * Must be relocatable PIC code callable as a C function
  */
 
 #define PTR(x) (x << 2)
-#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
-#define PAE_PGD_ATTR (_PAGE_PRESENT)
 
 /* control_page + KEXEC_CONTROL_CODE_MAX_SIZE
  * ~ control_page + PAGE_SIZE are used as data storage and stack for
@@ -59,117 +56,6 @@ relocate_kernel:
movl%cr4, %eax
movl%eax, CR4(%edi)
 
-#ifdef CONFIG_X86_PAE
-   /* map the control page at its virtual address */
-
-   movlPTR(VA_PGD)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0xc000, %eax
-   shrl$27, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PMD_0)(%ebp), %edx
-   orl $PAE_PGD_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PMD_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x3fe0, %eax
-   shrl$18, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PTE_0)(%ebp), %edx
-   orl $PAGE_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PTE_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x001ff000, %eax
-   shrl$9, %eax
-   addl%edi, %eax
-
-   

[PATCH -mm 1/3 -v3] kexec/i386: remove PAGE_SIZE alignment from relocate_kernel

2008-10-30 Thread Huang Ying
This patch removes PAGE_SIZE alignment from relocate_kernel(). Before
kexec jump patches are merged, control page is mapped to
relocate_kernel in kexec page tables, so relocate_kernel must be
PAGE_SIZE aligned. Now, control page is mapped to identity mapped
address, so relocate_kernel need not to be PAGE_SIZE aligned any
more. This can reduce a few KB from kernel text segement.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/relocate_kernel_32.S |1 -
 1 file changed, 1 deletion(-)

--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -39,7 +39,6 @@
 #define CP_PA_BACKUP_PAGES_MAP DATA(0x1c)
 
.text
-   .align PAGE_SIZE
.globl relocate_kernel
 relocate_kernel:
/* Save the CPU context, used for jumping back */



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -mm 2/3 -v3] kexec/i386: allocate page table pages dynamically

2008-10-30 Thread Huang Ying
This patch adds an architecture specific struct kimage_arch into
struct kimage. The pointers to page table pages used by kexec are
added to struct kimage_arch. The page tables pages are dynamically
allocated in machine_kexec_prepare instead of statically from BSS
segment. This will save up to 20k memory when kexec image is not
loaded.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/include/asm/kexec.h   |   14 +++
 arch/x86/kernel/machine_kexec_32.c |   69 +
 include/linux/kexec.h  |4 ++
 3 files changed, 65 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -25,15 +26,6 @@
 #include 
 #include 
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u32 kexec_pgd[1024] PAGE_ALIGNED;
-#ifdef CONFIG_X86_PAE
-static u32 kexec_pmd0[1024] PAGE_ALIGNED;
-static u32 kexec_pmd1[1024] PAGE_ALIGNED;
-#endif
-static u32 kexec_pte0[1024] PAGE_ALIGNED;
-static u32 kexec_pte1[1024] PAGE_ALIGNED;
-
 static void set_idt(void *newidt, __u16 limit)
 {
struct desc_ptr curidt;
@@ -76,6 +68,37 @@ static void load_segments(void)
 #undef __STR
 }
 
+static void machine_kexec_free_page_tables(struct kimage *image)
+{
+   free_page((unsigned long)image->arch.pgd);
+#ifdef CONFIG_X86_PAE
+   free_page((unsigned long)image->arch.pmd0);
+   free_page((unsigned long)image->arch.pmd1);
+#endif
+   free_page((unsigned long)image->arch.pte0);
+   free_page((unsigned long)image->arch.pte1);
+}
+
+static int machine_kexec_alloc_page_tables(struct kimage *image)
+{
+   image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+#ifdef CONFIG_X86_PAE
+   image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+#endif
+   image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   if (!image->arch.pgd ||
+#ifdef CONFIG_X86_PAE
+   !image->arch.pmd0 || !image->arch.pmd1 ||
+#endif
+   !image->arch.pte0 || !image->arch.pte1) {
+   machine_kexec_free_page_tables(image);
+   return -ENOMEM;
+   }
+   return 0;
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -87,13 +110,14 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Make control page executable.
+ * - Make control page executable.
+ * - Allocate page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
if (nx_enabled)
set_pages_x(image->control_code_page, 1);
-   return 0;
+   return machine_kexec_alloc_page_tables(image);
 }
 
 /*
@@ -104,6 +128,7 @@ void machine_kexec_cleanup(struct kimage
 {
if (nx_enabled)
set_pages_nx(image->control_code_page, 1);
+   machine_kexec_free_page_tables(image);
 }
 
 /*
@@ -150,18 +175,18 @@ void machine_kexec(struct kimage *image)
relocate_kernel_ptr = control_page;
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
-   page_list[PA_PGD] = __pa(kexec_pgd);
-   page_list[VA_PGD] = (unsigned long)kexec_pgd;
+   page_list[PA_PGD] = __pa(image->arch.pgd);
+   page_list[VA_PGD] = (unsigned long)image->arch.pgd;
 #ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(kexec_pmd0);
-   page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-   page_list[PA_PMD_1] = __pa(kexec_pmd1);
-   page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(kexec_pte0);
-   page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-   page_list[PA_PTE_1] = __pa(kexec_pte1);
-   page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+   page_list[PA_PMD_0] = __pa(image->arch.pmd0);
+   page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0;
+   page_list[PA_PMD_1] = __pa(image->arch.pmd1);
+   page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1;
+#endif
+   page_list[PA_PTE_0] = __pa(image->arch.pte0);
+   page_list[VA_PTE_0] = (unsigned long)image->arch.pte0;
+   page_list[PA_PTE_1] = __pa(image->arch.pte1);
+   page_list[VA_PTE_1] = (unsigned long)image->arch.pte1;
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
 
/* The segment registers are funny things, they have both a
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -100,6 +100,10 @@ struct kimage {
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
unsigned int preserve_context : 1;
+
+#ifdef ARCH_HAS_KIMAGE_ARCH
+   struct kimage_arch arch;

[PATCH -mm 3/3 -v3] kexec/i386: setup kexec page table in C

2008-10-30 Thread Huang Ying
This patch transforms the kexec page tables setup code from assembler
code to C code in machine_kexec_prepare. This improves readability and
reduces code line number.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/include/asm/kexec.h |   17 -
 arch/x86/kernel/machine_kexec_32.c   |   59 ++
 arch/x86/kernel/relocate_kernel_32.S |  114 ---
 3 files changed, 49 insertions(+), 141 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -99,6 +99,45 @@ static int machine_kexec_alloc_page_tabl
return 0;
 }
 
+static void machine_kexec_page_table_set_one(
+   pgd_t *pgd, pmd_t *pmd, pte_t *pte,
+   unsigned long vaddr, unsigned long paddr)
+{
+   pud_t *pud;
+
+   pgd += pgd_index(vaddr);
+#ifdef CONFIG_X86_PAE
+   if (!(pgd_val(*pgd) & _PAGE_PRESENT))
+   set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT));
+#endif
+   pud = pud_offset(pgd, vaddr);
+   pmd = pmd_offset(pud, vaddr);
+   if (!(pmd_val(*pmd) & _PAGE_PRESENT))
+   set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+   pte = pte_offset_kernel(pmd, vaddr);
+   set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC));
+}
+
+static void machine_kexec_prepare_page_tables(struct kimage *image)
+{
+   void *control_page;
+   pmd_t *pmd = 0;
+
+   control_page = page_address(image->control_code_page);
+#ifdef CONFIG_X86_PAE
+   pmd = image->arch.pmd0;
+#endif
+   machine_kexec_page_table_set_one(
+   image->arch.pgd, pmd, image->arch.pte0,
+   (unsigned long)control_page, __pa(control_page));
+#ifdef CONFIG_X86_PAE
+   pmd = image->arch.pmd1;
+#endif
+   machine_kexec_page_table_set_one(
+   image->arch.pgd, pmd, image->arch.pte1,
+   __pa(control_page), __pa(control_page));
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -112,12 +151,19 @@ static int machine_kexec_alloc_page_tabl
  *
  * - Make control page executable.
  * - Allocate page tables
+ * - Setup page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   int error;
+
if (nx_enabled)
set_pages_x(image->control_code_page, 1);
-   return machine_kexec_alloc_page_tables(image);
+   error = machine_kexec_alloc_page_tables(image);
+   if (error)
+   return error;
+   machine_kexec_prepare_page_tables(image);
+   return 0;
 }
 
 /*
@@ -176,17 +222,6 @@ void machine_kexec(struct kimage *image)
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
page_list[PA_PGD] = __pa(image->arch.pgd);
-   page_list[VA_PGD] = (unsigned long)image->arch.pgd;
-#ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(image->arch.pmd0);
-   page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0;
-   page_list[PA_PMD_1] = __pa(image->arch.pmd1);
-   page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(image->arch.pte0);
-   page_list[VA_PTE_0] = (unsigned long)image->arch.pte0;
-   page_list[PA_PTE_1] = __pa(image->arch.pte1);
-   page_list[VA_PTE_1] = (unsigned long)image->arch.pte1;
page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
 
/* The segment registers are funny things, they have both a
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -10,15 +10,12 @@
 #include 
 #include 
 #include 
-#include 
 
 /*
  * Must be relocatable PIC code callable as a C function
  */
 
 #define PTR(x) (x << 2)
-#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
-#define PAE_PGD_ATTR (_PAGE_PRESENT)
 
 /* control_page + KEXEC_CONTROL_CODE_MAX_SIZE
  * ~ control_page + PAGE_SIZE are used as data storage and stack for
@@ -59,117 +56,6 @@ relocate_kernel:
movl%cr4, %eax
movl%eax, CR4(%edi)
 
-#ifdef CONFIG_X86_PAE
-   /* map the control page at its virtual address */
-
-   movlPTR(VA_PGD)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0xc000, %eax
-   shrl$27, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PMD_0)(%ebp), %edx
-   orl $PAE_PGD_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PMD_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x3fe0, %eax
-   shrl$18, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PTE_0)(%ebp), %edx
-   orl $PAGE_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PTE_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x001ff000, %eax
-   shrl$9, %eax

[PATCH -mm 0/3 -v3] kexec/i386: kexec page table code clean up

2008-10-30 Thread Huang Ying
This patchset cleans up page table setup code of kexec on i386.

This patchset is based on 2.6.28-rc2-mm1 and has been tested on i386.

v3:

- Remove PAGE_SIZE alignment from relocate_kernel()

- Re-based on 2.6.28-rc2-mm1

v2:

- Rename some function names, such as alloc_page_tables ->
  machine_kexec_alloc_page_tables, etc.

- Cleanup error processing for machine_alloc_page_tables.


Best Regards,
Huang Ying




signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/2] kexec jump/hibernation support for kexec-tools

2008-10-28 Thread Huang Ying
This patchset add kexec jump/hibernation support to kexec
tools. Together with the kexec jump/hibernation features in Linux
kernel (which is merged into mainstream from 2.6.27 on) can be used
for following:

- A simple hibernation implementation without ACPI support. You can
  kexec a hibernating kernel, save the memory image of original system
  and shutdown the system. When resuming, you restore the memory image
  of original system via ordinary kexec load then jump back.

- Kernel/system debug through making system snapshot. You can make
  system snapshot with kexec/kdump, jump back, do some thing and make
  another system snapshot.

- Cooperative multi-kernel/system. With kexec jump, you can switch
  between several kernels/systems quickly without boot process except
  the first time. This appears like swap a whole kernel/system out/in.

- A general method to call program in physical mode (paging turning
  off). This can be used to invoke BIOS code under Linux.


The following additional kernel/tools may be needed for kexec
jump/hibernation:

- Linux kernel from 2.6.27 on.

- makedumpfile with patches are used as memory image saving tool, it
  can exclude free pages from original kernel memory image file. The
  patches and the precompiled makedumpfile can be download from the
  following URL:
   source: 
http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2
   patches: 
http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2
   binary: 
http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10

- An initramfs image can be used as the root file system of kexeced
  kernel. An initramfs image built with "BuildRoot" can be downloaded
  from the following URL:
   initramfs image: 
http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz
  All user space tools above are included in the initramfs image.


Usage example of simple hibernation:

1. Compile and install Linux kernel (newer than 2.6.27) with following
options selected:

CONFIG_X86_32=y
CONFIG_RELOCATABLE=y
CONFIG_KEXEC=y
CONFIG_CRASH_DUMP=y
CONFIG_PM=y
CONFIG_HIBERNATION=y
CONFIG_KEXEC_JUMP=y

2. Build an initramfs image contains kexec-tool and makedumpfile, or
   download the pre-built initramfs image, called rootfs.gz in
   following text.

3. Prepare a partition to save memory image of original kernel, called
   hibernating partition in following text.

4. Boot kernel compiled in step 1 (kernel A).

5. In the kernel A, load kernel compiled in step 1 (kernel B) with
   /sbin/kexec. The shell command line can be as follow:

   /sbin/kexec --load-preserve-context /boot/bzImage --mem-max=0xff 
--initrd=rootfs.gz

6. Boot the kernel B with following shell command line:

   /sbin/kexec -e

7. The kernel B will boot as normal kexec. In kernel B the memory
   image of kernel A can be saved into hibernating partition as
   follow:

   jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep 
kexec_jump_back_entry | cut -d '='`
   echo $jump_back_entry > kexec_jump_back_entry
   cp /proc/vmcore dump.elf

   Then you can shutdown the machine as normal.

8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
   root file system.

9. In kernel C, load the memory image of kernel A as follow:

   /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf

10. Jump back to the kernel A as follow:

   /sbin/kexec -e

   Then, kernel A is resumed.


Now, only the i386 architecture is supported. The patchset is based on
the latest kexec-tools git tree, and has been tested on IBM T42 with
ACPI on and off.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] kexec jump support for kexec-tools

2008-10-28 Thread Huang Ying
To support memory backup/restore an option named
--load-preserve-context is added to kexec. When it is specified
toggether with --mem-max, most segments for crash dump support are
loaded, and the memory range between mem_min to mem_max which has no
segments loaded are loaded as backup segments. To support jump back
from kexeced, options named --load-jump-back-helper and --entry are
added to load a helper image with specified entry to jump back.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kexec/arch/i386/crashdump-x86.c |   51 +++---
 kexec/arch/i386/kexec-bzImage.c |   10 +-
 kexec/arch/i386/kexec-elf-x86.c |4 
 kexec/arch/i386/kexec-x86-common.c  |3 
 kexec/arch/i386/x86-linux-setup.h   |3 
 kexec/crashdump-elf.c   |2 
 kexec/crashdump.c   |1 
 kexec/kexec-syscall.h   |5 -
 kexec/kexec.c   |  177 +++-
 kexec/kexec.h   |   12 ++
 purgatory/arch/i386/purgatory-x86.c |   14 ++
 purgatory/arch/i386/setup-x86.S |3 
 purgatory/include/purgatory.h   |1 
 purgatory/printf.c  |   38 ++-
 14 files changed, 291 insertions(+), 33 deletions(-)

--- a/kexec/kexec-syscall.h
+++ b/kexec/kexec-syscall.h
@@ -75,8 +75,9 @@ static inline long kexec_reboot(void)
 }
 
 
-#define KEXEC_ON_CRASH  0x0001
-#define KEXEC_ARCH_MASK 0x
+#define KEXEC_ON_CRASH 0x0001
+#define KEXEC_PRESERVE_CONTEXT 0x0002
+#define KEXEC_ARCH_MASK0x
 
 /* These values match the ELF architecture values. 
  * Unless there is a good reason that should continue to be the case.
--- a/kexec/kexec.c
+++ b/kexec/kexec.c
@@ -378,6 +378,91 @@ unsigned long add_buffer_virt(struct kex
buf_min, buf_max, buf_end, 0);
 }
 
+static int find_memory_range(struct kexec_info *info,
+unsigned long *base, unsigned long *size)
+{
+   int i;
+   unsigned long start, end;
+
+   for (i = 0; i < info->memory_ranges; i++) {
+   if (info->memory_range[i].type != RANGE_RAM)
+   continue;
+   start = info->memory_range[i].start;
+   end = info->memory_range[i].end;
+   if (end > *base && start < *base + *size) {
+   if (start > *base) {
+   *size = *base + *size - start;
+   *base = start;
+   }
+   if (end < *base + *size)
+   *size = end - *base;
+   return 1;
+   }
+   }
+   return 0;
+}
+
+static int find_segment_hole(struct kexec_info *info,
+unsigned long *base, unsigned long *size)
+{
+   int i;
+   unsigned long seg_base, seg_size;
+
+   for (i = 0; i < info->nr_segments; i++) {
+   seg_base = (unsigned long)info->segment[i].mem;
+   seg_size = info->segment[i].memsz;
+
+   if (seg_base + seg_size <= *base)
+   continue;
+   else if (seg_base >= *base + *size)
+   break;
+   else if (*base < seg_base) {
+   *size = seg_base - *base;
+   break;
+   } else if (seg_base + seg_size < *base + *size) {
+   *size = *base + *size - (seg_base + seg_size);
+   *base = seg_base + seg_size;
+   } else {
+   *size = 0;
+   break;
+   }
+   }
+   return *size;
+}
+
+int add_backup_segments(struct kexec_info *info, unsigned long backup_base,
+   unsigned long backup_size)
+{
+   unsigned long mem_base, mem_size, bkseg_base, bkseg_size, start, end;
+   unsigned long pagesize;
+
+   pagesize = getpagesize();
+   while (backup_size) {
+   mem_base = backup_base;
+   mem_size = backup_size;
+   if (!find_memory_range(info, &mem_base, &mem_size))
+   break;
+   backup_size = backup_base + backup_size - \
+   (mem_base + mem_size);
+   backup_base = mem_base + mem_size;
+   while (mem_size) {
+   bkseg_base = mem_base;
+   bkseg_size = mem_size;
+   if (sort_segments(info) < 0)
+   return -1;
+   if (!find_segment_hole(info, &bkseg_base, &bkseg_size))
+   break;
+   start = (bkseg_base + pagesize - 1) & ~(pagesize - 1);
+   end = (bkseg_base + bkseg_size) & ~(pagesize - 1);
+   add_segm

[PATCH 2/2] core dump file support for ELF loader

2008-10-28 Thread Huang Ying
This patch adds core dump file support to ELF file loader. This can be
used by kexec based hibernation to load hibernated image, which is
from /proc/vmcore, a core dump file.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kexec/kexec-elf-exec.c |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

--- a/kexec/kexec-elf-exec.c
+++ b/kexec/kexec-elf-exec.c
@@ -20,7 +20,8 @@ int build_elf_exec_info(const char *buf,
if (result < 0) {
return result;
}
-   if ((ehdr->e_type != ET_EXEC) && (ehdr->e_type != ET_DYN)) {
+   if ((ehdr->e_type != ET_EXEC) && (ehdr->e_type != ET_DYN) &&
+   (ehdr->e_type != ET_CORE)) {
/* not an ELF executable */
if (probe_debug) {
fprintf(stderr, "Not ELF type ET_EXEC or ET_DYN\n");



signature.asc
Description: This is a digitally signed message part
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v3 6/7] kexec jump: __ftrace_enabled_save/restore

2008-08-17 Thread Huang Ying
On Fri, 2008-08-15 at 14:49 +0200, Ingo Molnar wrote:
> * Huang Ying <[EMAIL PROTECTED]> wrote:
> 
> > +/* Ftrace disable/restore without lock. Some synchronization mechanism
> > + * must be used to prevent ftrace_enabled to be changed between
> > + * disable/restore. */
> 
> use the proper comment style please:
> 
> /*
>  *
>  */

OK. I will change it.

> > +static inline int __ftrace_enabled_save(void)
> > +{
> > +#ifdef CONFIG_FTRACE
> > +   int saved_ftrace_enabled = ftrace_enabled;
> > +   ftrace_enabled = 0;
> > +   return saved_ftrace_enabled;
> > +#else
> > +   return 0;
> > +#endif
> > +}
> > +
> > +static inline void __ftrace_enabled_restore(int enabled)
> > +{
> > +#ifdef CONFIG_FTRACE
> > +   ftrace_enabled = enabled;
> > +#endif
> > +}
> 
> hm, what is this used for?
> 
> also, instead of such an ugly inline, why not create a proper 
> kernel/trace/* function for this. That would also give it access to all 
> the proper locking mechanisms - instead of relying on some extral 
> mechanism.

This function is used for kexec jump in machine_kexec(). Where all
non-boot CPUs and IRQ are disabled, system is going to kexec, and it is
not allowed to schedule to other process in this circumstance, so a
non-lock version is needed. A locked version has been implemented by
Steven Rostedt, I think it can be used for other circumstance.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec jump: fix compiling warning on xchg(&kexec_lock, 0) in kernel_kexec()

2008-08-13 Thread Huang Ying
Fix compiling warning on xchg(&kexec_lock, 0) in kernel_kexec().

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kernel/kexec.c |4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1433,6 +1433,7 @@ module_init(crash_save_vmcoreinfo_init)
 int kernel_kexec(void)
 {
int error = 0;
+   int locked;
 
if (xchg(&kexec_lock, 1))
return -EBUSY;
@@ -1498,7 +1499,8 @@ int kernel_kexec(void)
 #endif
 
  Unlock:
-   xchg(&kexec_lock, 0);
+   locked = xchg(&kexec_lock, 0);
+   BUG_ON(!locked);
 
return error;
 }



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec jump: fix code size checking

2008-08-12 Thread Huang Ying
On Tue, 2008-08-12 at 20:40 -0700, Eric W. Biederman wrote:
[...]
> 4) Put the code is a special section .text.kexec? and have the linker
>always do the size comparison and the computation of the section size.
> 
> The fewer conditionals we have the less likely something is to break.

Yes. This one is good. But I think current one is acceptable too.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v3 1/7] kexec jump: clean up #ifdef and comments

2008-08-12 Thread Huang Ying
On Tue, 2008-08-12 at 20:49 -0700, Andrew Morton wrote:
> On Tue, 12 Aug 2008 11:14:21 +0800 Huang Ying <[EMAIL PROTECTED]> wrote:
> 
> > xchg(&kexec_lock, 0);
> 
> kernel/kexec.c: In function 'kernel_kexec':
> kernel/kexec.c:1501: warning: value computed is not used
> 
> Is there any reason why we cannot use the more conventional
> test_and_set_bit() etc, rather than this peculiarity?
> 
> Or perhaps spin_trylock?

Hi, Andrew,

I think it is of no problem to replace xchg() with test_and_set_bit() or
spin_trylock().

Hi, Eric,

Do you have some reason to use xchg() instead of others?

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec jump: fix code size checking

2008-08-12 Thread Huang Ying
On Wed, 2008-08-13 at 12:47 +1000, Simon Horman wrote:
> On Wed, Aug 13, 2008 at 09:04:35AM +0800, Huang Ying wrote:
> > Fix building issue when CONFIG_KEXEC=n. Thanks to Vivek Goyal for his
> > reminding.
> > 
> > Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
> > 
> > ---
> >  include/asm-x86/kexec.h |3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > --- a/include/asm-x86/kexec.h
> > +++ b/include/asm-x86/kexec.h
> > @@ -43,6 +43,9 @@
> >  
> >  #ifdef CONFIG_X86_32
> >  # define KEXEC_CONTROL_CODE_MAX_SIZE   2048
> > +# ifndef CONFIG_KEXEC
> > +#  define kexec_control_code_size  0
> > +# endif
> >  #endif
> >  
> >  #ifndef __ASSEMBLY__
> 
> Is it impossible to skip the linker check in the !CONFIG_KEXEC case?

It is possible. I think there are several ways to do that.

1) use #ifdef in vmlinux_32.lds.S, such as:

#ifdef CONFIG_KEXEC
ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
   "kexec control code size is too big")
#endif

2) #define a macro for kexec check ld script in asm/kexec.h, such as:

#define LD_CHECK_KEXEC()ASSERT(kexec_control_code_size <= 
KEXEC_CONTROL_CODE_MAX_SIZE, \
   "kexec control code size is too big")

and use that in vmlinux_32.lds.S.

3) #define kexec_control_code_size 0. So that the check can be passed
always. And, code size = 0 is reasonable for no code (CONFIG_KEXEC=n).


I think 3) is better. What do you think about?

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v3 6/7] kexec jump: __ftrace_enabled_save/restore

2008-08-12 Thread Huang Ying
On Tue, 2008-08-12 at 09:06 -0400, Vivek Goyal wrote:
> On Tue, Aug 12, 2008 at 11:14:36AM +0800, Huang Ying wrote:
> > Add __ftrace_enabled_save/restore, used to disable ftrace for a
> > while. Now, this is used by kexec jump, which need a version without
> > lock, for general situation, a locked version should be used.
> > 
> > Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
> > 
> > ---
> >  include/linux/ftrace.h |   21 +
> >  1 file changed, 21 insertions(+)
> > 
> > --- a/include/linux/ftrace.h
> > +++ b/include/linux/ftrace.h
> > @@ -98,6 +98,27 @@ static inline void tracer_disable(void)
> >  #endif
> >  }
> >  
> > +/* Ftrace disable/restore without lock. Some synchronization mechanism
> > + * must be used to prevent ftrace_enabled to be changed between
> > + * disable/restore. */
> > +static inline int __ftrace_enabled_save(void)
> > +{
> > +#ifdef CONFIG_FTRACE
> > +   int saved_ftrace_enabled = ftrace_enabled;
> > +   ftrace_enabled = 0;
> > +   return saved_ftrace_enabled;
> > +#else
> > +   return 0;
> > +#endif
> > +}
> > +
> > +static inline void __ftrace_enabled_restore(int enabled)
> > +{
> > +#ifdef CONFIG_FTRACE
> > +   ftrace_enabled = enabled;
> > +#endif
> > +}
> > +
> >  #ifdef CONFIG_FRAME_POINTER
> >  /* TODO: need to fix this for ARM */
> >  # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))
> 
> I guess steven would like to see a patch which introduces both locked
> and lockless versions and with a very good comment explaining in what
> kind of unusual situation one can use the lockless version.

Have sent a locked version to Steven. And, there are some comments for
non-locked version, __ftrace_enabled_save() in above patch. What do you
think about it?

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec jump: fix code size checking

2008-08-12 Thread Huang Ying
Fix building issue when CONFIG_KEXEC=n. Thanks to Vivek Goyal for his
reminding.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 include/asm-x86/kexec.h |3 +++
 1 file changed, 3 insertions(+)

--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -43,6 +43,9 @@
 
 #ifdef CONFIG_X86_32
 # define KEXEC_CONTROL_CODE_MAX_SIZE   2048
+# ifndef CONFIG_KEXEC
+#  define kexec_control_code_size  0
+# endif
 #endif
 
 #ifndef __ASSEMBLY__



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 3/7] kexec jump: check code size in control page

2008-08-11 Thread Huang Ying
Kexec/Kexec-jump require code size in control page is less than
PAGE_SIZE/2. This patch add link-time checking for this.

ASSERT() of ld link script is used as the link-time checking
mechanism.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c   |2 +-
 arch/x86/kernel/relocate_kernel_32.S |   10 +++---
 arch/x86/kernel/vmlinux_32.lds.S |6 ++
 include/asm-x86/kexec.h  |4 
 4 files changed, 18 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -138,7 +138,7 @@ void machine_kexec(struct kimage *image)
}
 
control_page = page_address(image->control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE/2);
+   memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
 
relocate_kernel_ptr = control_page;
page_list[PA_CONTROL_PAGE] = __pa(control_page);
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -20,10 +20,11 @@
 #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define PAE_PGD_ATTR (_PAGE_PRESENT)
 
-/* control_page + PAGE_SIZE/2 ~ control_page + PAGE_SIZE * 3/4 are
- * used to save some data for jumping back
+/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE
+ * ~ control_page + PAGE_SIZE are used as data storage and stack for
+ * jumping back
  */
-#define DATA(offset)   (PAGE_SIZE/2+(offset))
+#define DATA(offset)   (KEXEC_CONTROL_CODE_MAX_SIZE+(offset))
 
 /* Minimal CPU state */
 #define ESPDATA(0x0)
@@ -376,3 +377,6 @@ swap_pages:
popl%ebx
popl%ebp
ret
+
+   .globl kexec_control_code_size
+.set kexec_control_code_size, . - relocate_kernel
--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -41,6 +41,10 @@
 # define PAGES_NR  17
 #endif
 
+#ifdef CONFIG_X86_32
+# define KEXEC_CONTROL_CODE_MAX_SIZE   2048
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include 
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -209,3 +209,9 @@ SECTIONS
 
   DWARF_DEBUG
 }
+
+/* Link time checks */
+#include 
+
+ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
+   "kexec control code size is too big")



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 5/7] kexec jump: in sync with hibernation implementation

2008-08-11 Thread Huang Ying
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync
with current hibernation implementation.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kernel/kexec.c |2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1457,6 +1457,7 @@ int kernel_kexec(void)
error = disable_nonboot_cpus();
if (error)
goto Resume_devices;
+   device_pm_lock();
local_irq_disable();
/* At this point, device_suspend() has been called,
 * but *not* device_power_down(). We *must*
@@ -1485,6 +1486,7 @@ int kernel_kexec(void)
device_power_up(PMSG_RESTORE);
  Enable_irqs:
local_irq_enable();
+   device_pm_unlock();
enable_nonboot_cpus();
  Resume_devices:
device_resume(PMSG_RESTORE);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 4/7] kexec jump: remove duplication of kexec_restart_prepare()

2008-08-11 Thread Huang Ying
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating
the code.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
Acked-by: Pavel Machek <[EMAIL PROTECTED]>
Acked-by: Vivek Goyal <[EMAIL PROTECTED]>

---
 include/linux/reboot.h |1 +
 kernel/kexec.c |6 +-
 kernel/sys.c   |2 +-
 3 files changed, 3 insertions(+), 6 deletions(-)

--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -59,6 +59,7 @@ extern void machine_crash_shutdown(struc
  * Architecture independent implemenations of sys_reboot commands.
  */
 
+extern void kernel_restart_prepare(char *cmd);
 extern void kernel_restart(char *cmd);
 extern void kernel_halt(void);
 extern void kernel_power_off(void);
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -274,7 +274,7 @@ void emergency_restart(void)
 }
 EXPORT_SYMBOL_GPL(emergency_restart);
 
-static void kernel_restart_prepare(char *cmd)
+void kernel_restart_prepare(char *cmd)
 {
blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1472,11 +1472,7 @@ int kernel_kexec(void)
} else
 #endif
{
-   blocking_notifier_call_chain(&reboot_notifier_list,
-SYS_RESTART, NULL);
-   system_state = SYSTEM_RESTART;
-   device_shutdown();
-   sysdev_shutdown();
+   kernel_restart_prepare(NULL);
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 2/7] kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE

2008-08-11 Thread Huang Ying
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because
control page is used for not only code on some platform. For example
in kexec jump, it is used for data and stack too.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/arm/include/asm/kexec.h |2 +-
 arch/ia64/include/asm/kexec.h|2 +-
 arch/powerpc/include/asm/kexec.h |2 +-
 arch/s390/include/asm/kexec.h|2 +-
 arch/sh/include/asm/kexec.h  |2 +-
 include/asm-mips/kexec.h |2 +-
 include/asm-x86/kexec.h  |4 ++--
 include/linux/kexec.h|4 ++--
 kernel/kexec.c   |4 ++--
 9 files changed, 12 insertions(+), 12 deletions(-)

--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -63,7 +63,7 @@
 /* Maximum address we can use for the control code buffer */
 # define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 
-# define KEXEC_CONTROL_CODE_SIZE   4096
+# define KEXEC_CONTROL_PAGE_SIZE   4096
 
 /* The native architecture */
 # define KEXEC_ARCH KEXEC_ARCH_386
@@ -79,7 +79,7 @@
 # define KEXEC_CONTROL_MEMORY_LIMIT (0xFFUL)
 
 /* Allocate one page for the pdp and the second for the code */
-# define KEXEC_CONTROL_CODE_SIZE  (4096UL + 4096UL)
+# define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
 
 /* The native architecture */
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
--- a/arch/sh/include/asm/kexec.h
+++ b/arch/sh/include/asm/kexec.h
@@ -21,7 +21,7 @@
 /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 
-#define KEXEC_CONTROL_CODE_SIZE4096
+#define KEXEC_CONTROL_PAGE_SIZE4096
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_SH
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -22,7 +22,7 @@
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 #endif
 
-#define KEXEC_CONTROL_CODE_SIZE 4096
+#define KEXEC_CONTROL_PAGE_SIZE 4096
 
 /* The native architecture */
 #ifdef __powerpc64__
--- a/arch/ia64/include/asm/kexec.h
+++ b/arch/ia64/include/asm/kexec.h
@@ -9,7 +9,7 @@
 /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 
-#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+#define KEXEC_CONTROL_PAGE_SIZE (8192 + 8192 + 4096)
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_IA_64
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -31,7 +31,7 @@
 #define KEXEC_CONTROL_MEMORY_LIMIT (1UL<<31)
 
 /* Allocate one page for the pdp and the second for the code */
-#define KEXEC_CONTROL_CODE_SIZE 4096
+#define KEXEC_CONTROL_PAGE_SIZE 4096
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_S390
--- a/arch/arm/include/asm/kexec.h
+++ b/arch/arm/include/asm/kexec.h
@@ -10,7 +10,7 @@
 /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
 
-#define KEXEC_CONTROL_CODE_SIZE4096
+#define KEXEC_CONTROL_PAGE_SIZE4096
 
 #define KEXEC_ARCH KEXEC_ARCH_ARM
 
--- a/include/asm-mips/kexec.h
+++ b/include/asm-mips/kexec.h
@@ -16,7 +16,7 @@
  /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT (0x2000)
 
-#define KEXEC_CONTROL_CODE_SIZE 4096
+#define KEXEC_CONTROL_PAGE_SIZE 4096
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_MIPS
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -242,7 +242,7 @@ static int kimage_normal_alloc(struct ki
 */
result = -ENOMEM;
image->control_code_page = kimage_alloc_control_pages(image,
-  get_order(KEXEC_CONTROL_CODE_SIZE));
+  get_order(KEXEC_CONTROL_PAGE_SIZE));
if (!image->control_code_page) {
printk(KERN_ERR "Could not allocate control_code_buffer\n");
goto out;
@@ -317,7 +317,7 @@ static int kimage_crash_alloc(struct kim
 */
result = -ENOMEM;
image->control_code_page = kimage_alloc_control_pages(image,
-  get_order(KEXEC_CONTROL_CODE_SIZE));
+  get_order(KEXEC_CONTROL_PAGE_SIZE));
if (!image->control_code_page) {
printk(KERN_ERR "Could not allocate control_code_buffer\n");
goto out;
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -25,8 +25,8 @@
 #error KEXEC_CONTROL_MEMORY_LIMIT not defined
 #endif
 
-#ifndef KEXEC_CONTROL_CODE_SIZE
-#error KEXEC_CONTROL_CODE_SIZE not defined
+#ifndef KEXEC_CONTROL_PAGE_SIZE
+#error KEXEC_CONTROL_PAGE_SIZE not defined
 #endif
 
 #ifndef KEXEC_ARCH



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 6/7] kexec jump: __ftrace_enabled_save/restore

2008-08-11 Thread Huang Ying
Add __ftrace_enabled_save/restore, used to disable ftrace for a
while. Now, this is used by kexec jump, which need a version without
lock, for general situation, a locked version should be used.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 include/linux/ftrace.h |   21 +
 1 file changed, 21 insertions(+)

--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -98,6 +98,27 @@ static inline void tracer_disable(void)
 #endif
 }
 
+/* Ftrace disable/restore without lock. Some synchronization mechanism
+ * must be used to prevent ftrace_enabled to be changed between
+ * disable/restore. */
+static inline int __ftrace_enabled_save(void)
+{
+#ifdef CONFIG_FTRACE
+   int saved_ftrace_enabled = ftrace_enabled;
+   ftrace_enabled = 0;
+   return saved_ftrace_enabled;
+#else
+   return 0;
+#endif
+}
+
+static inline void __ftrace_enabled_restore(int enabled)
+{
+#ifdef CONFIG_FTRACE
+   ftrace_enabled = enabled;
+#endif
+}
+
 #ifdef CONFIG_FRAME_POINTER
 /* TODO: need to fix this for ARM */
 # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 0/7] kexec jump: fixes for 2.6.27

2008-08-11 Thread Huang Ying
Hi,

This patchset fixes some issues of kexec jump for 2.6.27.

It is based on 2.6.27-rc2 and has been tested on i386 platform.


ChangeLog:

v3:

- Merge added file vmlinux_check_32.lds.S into vmlinux_32.lds.S.
- Add comments about lock for ftrace code.

v2:

- Check control code size at link time instead of run time.
- Encapsulate ftrace related code into functions.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 7/7] kexec jump: fix for ftrace

2008-08-11 Thread Huang Ying
Ftrace depends on some processor state that we destroyed during kexec
and restored by restore_processor_state(). So save_processor_state()
and restore_processor_state() are moved into machine_kexec() and
ftrace is restored after restore_processor_state().

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |   16 +++-
 kernel/kexec.c |2 --
 2 files changed, 15 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -113,6 +114,7 @@ void machine_kexec(struct kimage *image)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   int save_ftrace_enabled;
asmlinkage unsigned long
(*relocate_kernel_ptr)(unsigned long indirection_page,
   unsigned long control_page,
@@ -120,7 +122,12 @@ void machine_kexec(struct kimage *image)
   unsigned int has_pae,
   unsigned int preserve_context);
 
-   tracer_disable();
+#ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context)
+   save_processor_state();
+#endif
+
+   save_ftrace_enabled = __ftrace_enabled_save();
 
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
@@ -178,6 +185,13 @@ void machine_kexec(struct kimage *image)
   (unsigned long)page_list,
   image->start, cpu_has_pae,
   image->preserve_context);
+
+#ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context)
+   restore_processor_state();
+#endif
+
+   __ftrace_enabled_restore(save_ftrace_enabled);
 }
 
 void arch_crash_save_vmcoreinfo(void)
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1469,7 +1469,6 @@ int kernel_kexec(void)
error = device_power_down(PMSG_FREEZE);
if (error)
goto Enable_irqs;
-   save_processor_state();
} else
 #endif
{
@@ -1482,7 +1481,6 @@ int kernel_kexec(void)
 
 #ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
-   restore_processor_state();
device_power_up(PMSG_RESTORE);
  Enable_irqs:
local_irq_enable();



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v3 1/7] kexec jump: clean up #ifdef and comments

2008-08-11 Thread Huang Ying
Move if (kexec_image->preserve_context) { ... } into #ifdef
CONFIG_KEXEC_JUMP to make code looks cleaner.

Fix no longer correct comments of kernel_kexec().

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
Acked-by: Vivek Goyal <[EMAIL PROTECTED]>

---
 kernel/kexec.c |   17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1426,11 +1426,9 @@ static int __init crash_save_vmcoreinfo_
 
 module_init(crash_save_vmcoreinfo_init)
 
-/**
- * kernel_kexec - reboot the system
- *
- * Move into place and start executing a preloaded standalone
- * executable.  If nothing was preloaded return an error.
+/*
+ * Move into place and start executing a preloaded standalone
+ * executable.  If nothing was preloaded return an error.
  */
 int kernel_kexec(void)
 {
@@ -1443,8 +1441,8 @@ int kernel_kexec(void)
goto Unlock;
}
 
-   if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context) {
mutex_lock(&pm_mutex);
pm_prepare_console();
error = freeze_processes();
@@ -1471,8 +1469,9 @@ int kernel_kexec(void)
if (error)
goto Enable_irqs;
save_processor_state();
+   } else
 #endif
-   } else {
+   {
blocking_notifier_call_chain(&reboot_notifier_list,
 SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
@@ -1484,8 +1483,8 @@ int kernel_kexec(void)
 
machine_kexec(kexec_image);
 
-   if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context) {
restore_processor_state();
device_power_up(PMSG_RESTORE);
  Enable_irqs:
@@ -1499,8 +1498,8 @@ int kernel_kexec(void)
  Restore_console:
pm_restore_console();
mutex_unlock(&pm_mutex);
-#endif
}
+#endif
 
  Unlock:
xchg(&kexec_lock, 0);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore

2008-08-11 Thread Huang Ying
Hi, Vivek,

On Mon, 2008-08-11 at 09:51 -0400, Vivek Goyal wrote:
[...]
> So you want to use a non-locked version from optimization point of view?
> So that we don't end up taking and release a lock?

Not from optimization point of view. machine_kexec() may be called from
crash_kexec(), where it is not permitted to take and release a lock.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 6/8] kexec jump: fix for lockdep

2008-08-10 Thread Huang Ying
On Mon, 2008-08-11 at 08:09 +0200, Peter Zijlstra wrote:
> On Mon, 2008-08-11 at 08:59 +0800, Huang Ying wrote:
> > On Fri, 2008-08-08 at 12:13 +0200, Peter Zijlstra wrote:
> > > On Fri, 2008-08-08 at 14:52 +0800, Huang Ying wrote:
> > > > Replace local_irq_disable() with raw_local_irq_disable() to prevent
> > > > lockdep complain.
> > > Uhhm, please provide more information - just using raw_* to silence
> > > lockdep is generally the wrong thing to do.
> > 
> > In traditional kexec, the new kernel will replace current one, so the
> > irq is simply disabled. But now jumping back from kexeced kernel is
> > supported, so the irq should be enabled again.
> > 
> > The code sequence of irq during kexec jump is as follow:
> > 
> > local_irq_disable(); /* in kernel_kexec() */
> > local_irq_disable(); /* in machine_kexec() */
> > local_irq_enable(); /* in kernel_kexec() */
> > 
> > The disable and enable is not match. Maybe another method is to use
> > local_irq_save(), local_irq_restore() pair in machine_kexec(), so the
> > disable and enable is matched.
> 
> And its the machine kernel's lockdep instance that goes complain?
> 
> whichever annotation gets used - and I think I can agree that raw_*
> might be approriate there, this should be accompanied with a rather
> elaborate changelog and preferably a comment in the code too. Without
> such we'll be wondering in the years to come WTH happens here.

Sorry, I find there is no complain from lockdep. Un-paired irq
disable/enable has no problem with lockdep, just increase something such
as "redundant_hardirqs_off". Please ignore this thread.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore

2008-08-10 Thread Huang Ying
Hi, Steven,

On Fri, 2008-08-08 at 10:30 -0400, Steven Rostedt wrote:
[...]
> The only problem with this approach is what happens if the user changes 
> the enabled in between these two calls. This would make ftrace 
> inconsistent.
> 
> I have a patch from the -rt tree that handles what you want. It is 
> attached below. Not sure how well it will apply to mainline.
> 
> I really need to go through the rt patch set and start submitting a bunch 
> of clean-up/fixes to mainline. We've been meaning to do it, just have been 
> distracted :-(

Your version is better in general sense. Thank you very much!

But in this specific situation of kexec/kjump. The execution environment
is that other CPUs are disabled, local irq is disabled, and it is not
permitted to switch to other process. But it is safe and sufficient to
use non-locked version here.

So to satisfy both demands, I think it is better to provide both
version, locked and non-locked. What do you think about that?

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 3/8] kexec jump: check code size in control page

2008-08-10 Thread Huang Ying
Hi, Vivek,

On Fri, 2008-08-08 at 10:09 -0400, Vivek Goyal wrote:
[...]
> > --- a/arch/x86/kernel/relocate_kernel_32.S
> > +++ b/arch/x86/kernel/relocate_kernel_32.S
> > @@ -20,10 +20,11 @@
> >  #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
> >  #define PAE_PGD_ATTR (_PAGE_PRESENT)
> >  
> > -/* control_page + PAGE_SIZE/2 ~ control_page + PAGE_SIZE * 3/4 are
> > - * used to save some data for jumping back
> > +/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE
> > + * ~ control_page + PAGE_SIZE * 3/4 are used to save some data for
> > + * jumping back
> >   */
> 
> Hi Huang,
> 
> Above comment is not very clear. Can you please elaborate it. I thought
> that PAGE_SIZE/2 is used for control code and rest half is shared between
> kjump data and stack. What is PAGE_SIZE *3/4?

Yes. Rest half is shared between kjump data and stack. I will change it.

> > +++ b/arch/x86/kernel/vmlinux_check_32.lds.S
> > @@ -0,0 +1,7 @@
> > +/*
> > + * Link time checks
> > + */
> > +
> > +#include 
> > +
> > +ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE,
> "kexec control code size is too big")
> 
> Will it make sense to move it into vmlinux_32.lds.S itself? Creating a
> separate
> file for a single check seems superfluous.

I hope other ones can use it. But for now, put it in vmlinux_32.lds.S is
better. I will change it.

Best Regards,
HUang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -v2 6/8] kexec jump: fix for lockdep

2008-08-10 Thread Huang Ying
On Fri, 2008-08-08 at 12:13 +0200, Peter Zijlstra wrote:
> On Fri, 2008-08-08 at 14:52 +0800, Huang Ying wrote:
> > Replace local_irq_disable() with raw_local_irq_disable() to prevent
> > lockdep complain.
> Uhhm, please provide more information - just using raw_* to silence
> lockdep is generally the wrong thing to do.

In traditional kexec, the new kernel will replace current one, so the
irq is simply disabled. But now jumping back from kexeced kernel is
supported, so the irq should be enabled again.

The code sequence of irq during kexec jump is as follow:

local_irq_disable(); /* in kernel_kexec() */
local_irq_disable(); /* in machine_kexec() */
local_irq_enable(); /* in kernel_kexec() */

The disable and enable is not match. Maybe another method is to use
local_irq_save(), local_irq_restore() pair in machine_kexec(), so the
disable and enable is matched.

Best Regrards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 6/8] kexec jump: fix for lockdep

2008-08-07 Thread Huang Ying
Replace local_irq_disable() with raw_local_irq_disable() to prevent
lockdep complain.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -123,7 +123,7 @@ void machine_kexec(struct kimage *image)
tracer_disable();
 
/* Interrupts aren't acceptable while we reboot */
-   local_irq_disable();
+   raw_local_irq_disable();
 
if (image->preserve_context) {
 #ifdef CONFIG_X86_IO_APIC



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 8/8] kexec jump: fix for ftrace

2008-08-07 Thread Huang Ying
Ftrace depends on some processor state that we destroyed during kexec
and restored by restore_processor_state(). So save_processor_state()
and restore_processor_state() are moved into machine_kexec() and
ftrace is restored after restore_processor_state().

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |   16 +++-
 kernel/kexec.c |2 --
 2 files changed, 15 insertions(+), 3 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -113,6 +114,7 @@ void machine_kexec(struct kimage *image)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   int save_ftrace_enabled;
asmlinkage unsigned long
(*relocate_kernel_ptr)(unsigned long indirection_page,
   unsigned long control_page,
@@ -120,7 +122,12 @@ void machine_kexec(struct kimage *image)
   unsigned int has_pae,
   unsigned int preserve_context);
 
-   tracer_disable();
+#ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context)
+   save_processor_state();
+#endif
+
+   save_ftrace_enabled = ftrace_enabled_save();
 
/* Interrupts aren't acceptable while we reboot */
raw_local_irq_disable();
@@ -178,6 +185,13 @@ void machine_kexec(struct kimage *image)
   (unsigned long)page_list,
   image->start, cpu_has_pae,
   image->preserve_context);
+
+#ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context)
+   restore_processor_state();
+#endif
+
+   ftrace_enabled_restore(save_ftrace_enabled);
 }
 
 void arch_crash_save_vmcoreinfo(void)
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1469,7 +1469,6 @@ int kernel_kexec(void)
error = device_power_down(PMSG_FREEZE);
if (error)
goto Enable_irqs;
-   save_processor_state();
} else
 #endif
{
@@ -1482,7 +1481,6 @@ int kernel_kexec(void)
 
 #ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
-   restore_processor_state();
device_power_up(PMSG_RESTORE);
  Enable_irqs:
local_irq_enable();



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 5/8] kexec jump: in sync with hibernation implementation

2008-08-07 Thread Huang Ying
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync
with current hibernation implementation.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kernel/kexec.c |2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1457,6 +1457,7 @@ int kernel_kexec(void)
error = disable_nonboot_cpus();
if (error)
goto Resume_devices;
+   device_pm_lock();
local_irq_disable();
/* At this point, device_suspend() has been called,
 * but *not* device_power_down(). We *must*
@@ -1485,6 +1486,7 @@ int kernel_kexec(void)
device_power_up(PMSG_RESTORE);
  Enable_irqs:
local_irq_enable();
+   device_pm_unlock();
enable_nonboot_cpus();
  Resume_devices:
device_resume(PMSG_RESTORE);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 1/8] kexec jump: clean up #ifdef and comments

2008-08-07 Thread Huang Ying
Move if (kexec_image->preserve_context) { ... } into #ifdef
CONFIG_KEXEC_JUMP to make code looks cleaner.

Fix no longer correct comments of kernel_kexec().

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
Acked-by: Vivek Goyal <[EMAIL PROTECTED]>

---
 kernel/kexec.c |   17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1426,11 +1426,9 @@ static int __init crash_save_vmcoreinfo_
 
 module_init(crash_save_vmcoreinfo_init)
 
-/**
- * kernel_kexec - reboot the system
- *
- * Move into place and start executing a preloaded standalone
- * executable.  If nothing was preloaded return an error.
+/*
+ * Move into place and start executing a preloaded standalone
+ * executable.  If nothing was preloaded return an error.
  */
 int kernel_kexec(void)
 {
@@ -1443,8 +1441,8 @@ int kernel_kexec(void)
goto Unlock;
}
 
-   if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context) {
mutex_lock(&pm_mutex);
pm_prepare_console();
error = freeze_processes();
@@ -1471,8 +1469,9 @@ int kernel_kexec(void)
if (error)
goto Enable_irqs;
save_processor_state();
+   } else
 #endif
-   } else {
+   {
blocking_notifier_call_chain(&reboot_notifier_list,
 SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
@@ -1484,8 +1483,8 @@ int kernel_kexec(void)
 
machine_kexec(kexec_image);
 
-   if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context) {
restore_processor_state();
device_power_up(PMSG_RESTORE);
  Enable_irqs:
@@ -1499,8 +1498,8 @@ int kernel_kexec(void)
  Restore_console:
pm_restore_console();
mutex_unlock(&pm_mutex);
-#endif
}
+#endif
 
  Unlock:
xchg(&kexec_lock, 0);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 0/8] kexec jump: fixes for 2.6.27

2008-08-07 Thread Huang Ying
Hi,

This patchset fixes some issues of kexec jump for 2.6.27.

It is based on 2.6.27-rc2 and has been tested on i386 platform.


ChangeLog:

v2:

- Check control code size at link time instead of run time.
- Encapsulate ftrace related code into functions.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore

2008-08-07 Thread Huang Ying
Add ftrace_enabled_save/restore, used to disable ftrace for a
while. This is used by kexec jump.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 include/linux/ftrace.h |   18 ++
 1 file changed, 18 insertions(+)

--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -98,6 +98,24 @@ static inline void tracer_disable(void)
 #endif
 }
 
+static inline int ftrace_enabled_save(void)
+{
+#ifdef CONFIG_FTRACE
+   int saved_ftrace_enabled = ftrace_enabled;
+   ftrace_enabled = 0;
+   return saved_ftrace_enabled;
+#else
+   return 0;
+#endif
+}
+
+static inline void ftrace_enabled_restore(int enabled)
+{
+#ifdef CONFIG_FTRACE
+   ftrace_enabled = enabled;
+#endif
+}
+
 #ifdef CONFIG_FRAME_POINTER
 /* TODO: need to fix this for ARM */
 # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0))



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 4/8] kexec jump: remove duplication of kexec_restart_prepare()

2008-08-07 Thread Huang Ying
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating
the code.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
Acked-by: Pavel Machek <[EMAIL PROTECTED]>
Acked-by: Vivek Goyal <[EMAIL PROTECTED]>

---
 include/linux/reboot.h |1 +
 kernel/kexec.c |6 +-
 kernel/sys.c   |2 +-
 3 files changed, 3 insertions(+), 6 deletions(-)

--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -59,6 +59,7 @@ extern void machine_crash_shutdown(struc
  * Architecture independent implemenations of sys_reboot commands.
  */
 
+extern void kernel_restart_prepare(char *cmd);
 extern void kernel_restart(char *cmd);
 extern void kernel_halt(void);
 extern void kernel_power_off(void);
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -274,7 +274,7 @@ void emergency_restart(void)
 }
 EXPORT_SYMBOL_GPL(emergency_restart);
 
-static void kernel_restart_prepare(char *cmd)
+void kernel_restart_prepare(char *cmd)
 {
blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1472,11 +1472,7 @@ int kernel_kexec(void)
} else
 #endif
{
-   blocking_notifier_call_chain(&reboot_notifier_list,
-SYS_RESTART, NULL);
-   system_state = SYSTEM_RESTART;
-   device_shutdown();
-   sysdev_shutdown();
+   kernel_restart_prepare(NULL);
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 2/8] kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE

2008-08-07 Thread Huang Ying
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because
control page is used for not only code on some platform. For example
in kexec jump, it is used for data and stack too.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/arm/include/asm/kexec.h |2 +-
 arch/ia64/include/asm/kexec.h|2 +-
 arch/powerpc/include/asm/kexec.h |2 +-
 arch/s390/include/asm/kexec.h|2 +-
 arch/sh/include/asm/kexec.h  |2 +-
 include/asm-mips/kexec.h |2 +-
 include/asm-x86/kexec.h  |4 ++--
 include/linux/kexec.h|4 ++--
 kernel/kexec.c   |4 ++--
 9 files changed, 12 insertions(+), 12 deletions(-)

--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -63,7 +63,7 @@
 /* Maximum address we can use for the control code buffer */
 # define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 
-# define KEXEC_CONTROL_CODE_SIZE   4096
+# define KEXEC_CONTROL_PAGE_SIZE   4096
 
 /* The native architecture */
 # define KEXEC_ARCH KEXEC_ARCH_386
@@ -79,7 +79,7 @@
 # define KEXEC_CONTROL_MEMORY_LIMIT (0xFFUL)
 
 /* Allocate one page for the pdp and the second for the code */
-# define KEXEC_CONTROL_CODE_SIZE  (4096UL + 4096UL)
+# define KEXEC_CONTROL_PAGE_SIZE  (4096UL + 4096UL)
 
 /* The native architecture */
 # define KEXEC_ARCH KEXEC_ARCH_X86_64
--- a/arch/sh/include/asm/kexec.h
+++ b/arch/sh/include/asm/kexec.h
@@ -21,7 +21,7 @@
 /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 
-#define KEXEC_CONTROL_CODE_SIZE4096
+#define KEXEC_CONTROL_PAGE_SIZE4096
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_SH
--- a/arch/powerpc/include/asm/kexec.h
+++ b/arch/powerpc/include/asm/kexec.h
@@ -22,7 +22,7 @@
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 #endif
 
-#define KEXEC_CONTROL_CODE_SIZE 4096
+#define KEXEC_CONTROL_PAGE_SIZE 4096
 
 /* The native architecture */
 #ifdef __powerpc64__
--- a/arch/ia64/include/asm/kexec.h
+++ b/arch/ia64/include/asm/kexec.h
@@ -9,7 +9,7 @@
 /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE
 
-#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096)
+#define KEXEC_CONTROL_PAGE_SIZE (8192 + 8192 + 4096)
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_IA_64
--- a/arch/s390/include/asm/kexec.h
+++ b/arch/s390/include/asm/kexec.h
@@ -31,7 +31,7 @@
 #define KEXEC_CONTROL_MEMORY_LIMIT (1UL<<31)
 
 /* Allocate one page for the pdp and the second for the code */
-#define KEXEC_CONTROL_CODE_SIZE 4096
+#define KEXEC_CONTROL_PAGE_SIZE 4096
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_S390
--- a/arch/arm/include/asm/kexec.h
+++ b/arch/arm/include/asm/kexec.h
@@ -10,7 +10,7 @@
 /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT (-1UL)
 
-#define KEXEC_CONTROL_CODE_SIZE4096
+#define KEXEC_CONTROL_PAGE_SIZE4096
 
 #define KEXEC_ARCH KEXEC_ARCH_ARM
 
--- a/include/asm-mips/kexec.h
+++ b/include/asm-mips/kexec.h
@@ -16,7 +16,7 @@
  /* Maximum address we can use for the control code buffer */
 #define KEXEC_CONTROL_MEMORY_LIMIT (0x2000)
 
-#define KEXEC_CONTROL_CODE_SIZE 4096
+#define KEXEC_CONTROL_PAGE_SIZE 4096
 
 /* The native architecture */
 #define KEXEC_ARCH KEXEC_ARCH_MIPS
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -242,7 +242,7 @@ static int kimage_normal_alloc(struct ki
 */
result = -ENOMEM;
image->control_code_page = kimage_alloc_control_pages(image,
-  get_order(KEXEC_CONTROL_CODE_SIZE));
+  get_order(KEXEC_CONTROL_PAGE_SIZE));
if (!image->control_code_page) {
printk(KERN_ERR "Could not allocate control_code_buffer\n");
goto out;
@@ -317,7 +317,7 @@ static int kimage_crash_alloc(struct kim
 */
result = -ENOMEM;
image->control_code_page = kimage_alloc_control_pages(image,
-  get_order(KEXEC_CONTROL_CODE_SIZE));
+  get_order(KEXEC_CONTROL_PAGE_SIZE));
if (!image->control_code_page) {
printk(KERN_ERR "Could not allocate control_code_buffer\n");
goto out;
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -25,8 +25,8 @@
 #error KEXEC_CONTROL_MEMORY_LIMIT not defined
 #endif
 
-#ifndef KEXEC_CONTROL_CODE_SIZE
-#error KEXEC_CONTROL_CODE_SIZE not defined
+#ifndef KEXEC_CONTROL_PAGE_SIZE
+#error KEXEC_CONTROL_PAGE_SIZE not defined
 #endif
 
 #ifndef KEXEC_ARCH



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -v2 3/8] kexec jump: check code size in control page

2008-08-07 Thread Huang Ying
Kexec/Kexec-jump require code size in control page is less than
PAGE_SIZE/2. This patch add link-time checking for this.

ASSERT() of ld link script is used as the link-time checking
mechanism.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |2 +-
 arch/x86/kernel/relocate_kernel_32.S   |   10 +++---
 arch/x86/kernel/vmlinux_32.lds.S   |2 ++
 arch/x86/kernel/vmlinux_check_32.lds.S |7 +++
 include/asm-x86/kexec.h|4 
 5 files changed, 21 insertions(+), 4 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -138,7 +138,7 @@ void machine_kexec(struct kimage *image)
}
 
control_page = page_address(image->control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE/2);
+   memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE);
 
relocate_kernel_ptr = control_page;
page_list[PA_CONTROL_PAGE] = __pa(control_page);
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -20,10 +20,11 @@
 #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY)
 #define PAE_PGD_ATTR (_PAGE_PRESENT)
 
-/* control_page + PAGE_SIZE/2 ~ control_page + PAGE_SIZE * 3/4 are
- * used to save some data for jumping back
+/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE
+ * ~ control_page + PAGE_SIZE * 3/4 are used to save some data for
+ * jumping back
  */
-#define DATA(offset)   (PAGE_SIZE/2+(offset))
+#define DATA(offset)   (KEXEC_CONTROL_CODE_MAX_SIZE+(offset))
 
 /* Minimal CPU state */
 #define ESPDATA(0x0)
@@ -376,3 +377,6 @@ swap_pages:
popl%ebx
popl%ebp
ret
+
+   .globl kexec_control_code_size
+.set kexec_control_code_size, . - relocate_kernel
--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -41,6 +41,10 @@
 # define PAGES_NR  17
 #endif
 
+#ifdef CONFIG_X86_32
+# define KEXEC_CONTROL_CODE_MAX_SIZE   2048
+#endif
+
 #ifndef __ASSEMBLY__
 
 #include 
--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -209,3 +209,5 @@ SECTIONS
 
   DWARF_DEBUG
 }
+
+#include "vmlinux_check_32.lds.S"
--- /dev/null
+++ b/arch/x86/kernel/vmlinux_check_32.lds.S
@@ -0,0 +1,7 @@
+/*
+ * Link time checks
+ */
+
+#include 
+
+ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE, "kexec control 
code size is too big")



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 6/6] kexec jump: fix for ftrace

2008-08-07 Thread Huang Ying
On Thu, 2008-08-07 at 09:38 -0400, Vivek Goyal wrote:
[...]
> What kind of problem we run into if we don't disable the ftracer?
> 
> I think there are too many #ifdefs now and probably we can at least
> get rid if #ifdef CONFIG_FTRACE thing.
> 
> I think ftracer needs to export the function to enable the tracer
> back (tracer_enable()) so that we don't directly play with ftrace_enabled
> variable. tracer_enable() can be do {} while{0} in case of CONFIG_FTRACE=n
> so that we can get rid of #ifdefs here.

The ftracer issue for kexec is reported by Dhaval Giani and fixed by
Ingo as in following thread:

http://lkml.org/lkml/2008/2/19/175

After some testing, I found that if we enable ftrace before
restore_processor_state(), system will hang. I think maybe ftracer
depends on some processor state that we destroyed during kexec and
restored by restore_processor_state(). So I move save_processor_state()
and restore_processor_state() into machine_kexec() and enable ftrace
after restore_processor_state().

The #ifdef CONFIG_FTRACE should be removed. I think an interface like
irq_save/restore is good for this.

saved_ftrace_enabled = ftrace_save_enabled()
<...>
ftrace_restore_enabled(saved_ftrace_enabled)

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 2/6] kexec jump: check code size in control page

2008-08-07 Thread Huang Ying
On Thu, 2008-08-07 at 22:31 +0200, Pavel Machek wrote:
> Hi!
> 
> > > PAGE_SIZE/2. This patch adds runtime checking for this.
> > > 
> > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
> ...
> 
> > >  {
> > >   if (nx_enabled)
> > >   set_pages_x(image->control_code_page, 1);
> > > +
> > > + BUG_ON((unsigned long)kexec_control_page_code_end - \
> > > +(unsigned long)relocate_kernel >= PAGE_SIZE/2);
> > > +
> > 
> 
> > Run time check is better than nothing but I think in this case it would
> > be better if we can catch it at compile time. 
> > 
> > One of the methods will be to write a small program of your own and
> > put in script/ and at build time check for the size and flag error. May
> > be there are other better ways to do this.
> 
> BUILD_BUG_ON()?

I tried with BUILD_BUG_ON(), and compiling is OK for both of following
statement:

BUILD_BUG_ON((unsigned long)kexec_control_page_code_end - \
 (unsigned long)relocate_kernel >= PAGE_SIZE/2);

BUILD_BUG_ON((unsigned long)kexec_control_page_code_end - \
 (unsigned long)relocate_kernel < PAGE_SIZE/2);

In general, I think value of kexec_control_page_code_end and
relocate_kernel is not determined during compiling time. So
BUILD_BUG_ON() doesn't work.

Another idea, use ASSERT() command of ld link script as in the following
patch:

--- a/arch/x86/kernel/vmlinux_32.lds.S
+++ b/arch/x86/kernel/vmlinux_32.lds.S
@@ -209,3 +209,5 @@ SECTIONS
 
   DWARF_DEBUG
 }
+
+#include "vmlinux_check_32.lds.S"
--- /dev/null
+++ b/arch/x86/kernel/vmlinux_check_32.lds.S
@@ -0,0 +1,3 @@
+#include 
+
+ASSERT(kexec_control_page_code_end - relocate_kernel >= 2048, "kexec control 
page code size is too big")


It works for me. What do you think about that?

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 4/6] kexec jump: in sync with hibernation implementation

2008-08-07 Thread Huang Ying
On Thu, 2008-08-07 at 11:22 +0200, Pavel Machek wrote:
> > Add device_pm_lock() and device_pm_unlock() in kernel_kexec() to be
> > in sync with current hibernation implementation.
> > 
> > Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
> > 
> > ---
> >  kernel/kexec.c |2 ++
> >  1 file changed, 2 insertions(+)
> > 
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1457,6 +1457,7 @@ int kernel_kexec(void)
> > error = disable_nonboot_cpus();
> > if (error)
> > goto Resume_devices;
> > +   device_pm_lock();
> > local_irq_disable();
> > /* At this point, device_suspend() has been called,
> >  * but *not* device_power_down(). We *must*
> > @@ -1485,6 +1486,7 @@ int kernel_kexec(void)
> > device_power_up(PMSG_RESTORE);
> >   Enable_irqs:
> > local_irq_enable();
> > +   device_pm_unlock();
> > enable_nonboot_cpus();
> >   Resume_devices:
> > device_resume(PMSG_RESTORE);
> > 
> 
> Would it be possible to create common function for hibernation and
> kexec? Keeping complex stuff like this in sync is ugly.

Yes, it is ugly. But it is a little difficult to do that. Hibernation
one is more complex than this one.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH 1/6] kexec jump: clean up #ifdef and comments

2008-08-07 Thread Huang Ying
On Thu, 2008-08-07 at 11:20 +0200, Pavel Machek wrote:
> Hi!
> 
> > CONFIG_KEXEC_JUMP to make code looks cleaner.
> > 
> > Fix no longer correct comments of kernel_kexec().
> > 
> > Signed-off-by: Huang Ying <[EMAIL PROTECTED]>
> > 
> > ---
> >  kernel/kexec.c |   11 +--
> >  1 file changed, 5 insertions(+), 6 deletions(-)
> > 
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -1427,8 +1427,6 @@ static int __init crash_save_vmcoreinfo_
> >  module_init(crash_save_vmcoreinfo_init)
> >  
> >  /**
> > - * kernel_kexec - reboot the system
> > - *
> >   * Move into place and start executing a preloaded standalone
> >   * executable.  If nothing was preloaded return an error.
> >   */
> 
> If it is not kerneldoc, it should not be /** .

OK. I will fix it.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/6] kexec jump: clean up #ifdef and comments

2008-08-07 Thread Huang Ying
Move if (kexec_image->preserve_context) { ... } into #ifdef
CONFIG_KEXEC_JUMP to make code looks cleaner.

Fix no longer correct comments of kernel_kexec().

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kernel/kexec.c |   11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1427,8 +1427,6 @@ static int __init crash_save_vmcoreinfo_
 module_init(crash_save_vmcoreinfo_init)
 
 /**
- * kernel_kexec - reboot the system
- *
  * Move into place and start executing a preloaded standalone
  * executable.  If nothing was preloaded return an error.
  */
@@ -1443,8 +1441,8 @@ int kernel_kexec(void)
goto Unlock;
}
 
-   if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context) {
mutex_lock(&pm_mutex);
pm_prepare_console();
error = freeze_processes();
@@ -1471,8 +1469,9 @@ int kernel_kexec(void)
if (error)
goto Enable_irqs;
save_processor_state();
+   } else
 #endif
-   } else {
+   {
blocking_notifier_call_chain(&reboot_notifier_list,
 SYS_RESTART, NULL);
system_state = SYSTEM_RESTART;
@@ -1484,8 +1483,8 @@ int kernel_kexec(void)
 
machine_kexec(kexec_image);
 
-   if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context) {
restore_processor_state();
device_power_up(PMSG_RESTORE);
  Enable_irqs:
@@ -1499,8 +1498,8 @@ int kernel_kexec(void)
  Restore_console:
pm_restore_console();
mutex_unlock(&pm_mutex);
-#endif
}
+#endif
 
  Unlock:
xchg(&kexec_lock, 0);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 4/6] kexec jump: in sync with hibernation implementation

2008-08-07 Thread Huang Ying
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() to be
in sync with current hibernation implementation.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 kernel/kexec.c |2 ++
 1 file changed, 2 insertions(+)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1457,6 +1457,7 @@ int kernel_kexec(void)
error = disable_nonboot_cpus();
if (error)
goto Resume_devices;
+   device_pm_lock();
local_irq_disable();
/* At this point, device_suspend() has been called,
 * but *not* device_power_down(). We *must*
@@ -1485,6 +1486,7 @@ int kernel_kexec(void)
device_power_up(PMSG_RESTORE);
  Enable_irqs:
local_irq_enable();
+   device_pm_unlock();
enable_nonboot_cpus();
  Resume_devices:
device_resume(PMSG_RESTORE);



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 3/6] kexec jump: remove duplication of kexec_restart_prepare()

2008-08-07 Thread Huang Ying
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating
the code.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 include/linux/reboot.h |1 +
 kernel/kexec.c |6 +-
 kernel/sys.c   |2 +-
 3 files changed, 3 insertions(+), 6 deletions(-)

--- a/include/linux/reboot.h
+++ b/include/linux/reboot.h
@@ -59,6 +59,7 @@ extern void machine_crash_shutdown(struc
  * Architecture independent implemenations of sys_reboot commands.
  */
 
+extern void kernel_restart_prepare(char *cmd);
 extern void kernel_restart(char *cmd);
 extern void kernel_halt(void);
 extern void kernel_power_off(void);
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -274,7 +274,7 @@ void emergency_restart(void)
 }
 EXPORT_SYMBOL_GPL(emergency_restart);
 
-static void kernel_restart_prepare(char *cmd)
+void kernel_restart_prepare(char *cmd)
 {
blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd);
system_state = SYSTEM_RESTART;
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1472,11 +1472,7 @@ int kernel_kexec(void)
} else
 #endif
{
-   blocking_notifier_call_chain(&reboot_notifier_list,
-SYS_RESTART, NULL);
-   system_state = SYSTEM_RESTART;
-   device_shutdown();
-   sysdev_shutdown();
+   kernel_restart_prepare(NULL);
printk(KERN_EMERG "Starting new kernel\n");
machine_shutdown();
}



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 5/6] kexec jump: fix for lockdep

2008-08-07 Thread Huang Ying
Replace local_irq_disable() with raw_local_irq_disable() to prevent
lockdep complain.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -130,7 +130,7 @@ void machine_kexec(struct kimage *image)
 #endif
 
/* Interrupts aren't acceptable while we reboot */
-   local_irq_disable();
+   raw_local_irq_disable();
 
if (image->preserve_context) {
 #ifdef CONFIG_X86_IO_APIC



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 2/6] kexec jump: check code size in control page

2008-08-07 Thread Huang Ying
Kexec/Kexec-jump requires code size in control page is less than
PAGE_SIZE/2. This patch adds runtime checking for this.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c   |4 
 arch/x86/kernel/relocate_kernel_32.S |3 +++
 include/asm-x86/kexec.h  |1 +
 3 files changed, 8 insertions(+)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -92,6 +92,10 @@ int machine_kexec_prepare(struct kimage 
 {
if (nx_enabled)
set_pages_x(image->control_code_page, 1);
+
+   BUG_ON((unsigned long)kexec_control_page_code_end - \
+  (unsigned long)relocate_kernel >= PAGE_SIZE/2);
+
return 0;
 }
 
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -376,3 +376,6 @@ swap_pages:
popl%ebx
popl%ebp
ret
+
+   .globl kexec_control_page_code_end
+kexec_control_page_code_end:
--- a/include/asm-x86/kexec.h
+++ b/include/asm-x86/kexec.h
@@ -159,6 +159,7 @@ relocate_kernel(unsigned long indirectio
unsigned long start_address,
unsigned int has_pae,
unsigned int preserve_context);
+void kexec_control_page_code_end(void);
 #else
 NORET_TYPE void
 relocate_kernel(unsigned long indirection_page,



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 0/6] kexec jump: fixes for 2.6.27

2008-08-07 Thread Huang Ying
Hi,

This patchset fixes some issues of kexec jump for 2.6.27.

It is based on 2.6.27-rc2 and has been tested on i386 platform.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 6/6] kexec jump: fix for ftrace

2008-08-07 Thread Huang Ying
Restore ftrace after jumping back from kexeced kernel.

Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |   19 +++
 kernel/kexec.c |2 --
 2 files changed, 19 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -12,6 +12,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -117,6 +118,7 @@ void machine_kexec(struct kimage *image)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   int save_ftrace_enabled;
asmlinkage unsigned long
(*relocate_kernel_ptr)(unsigned long indirection_page,
   unsigned long control_page,
@@ -124,7 +126,15 @@ void machine_kexec(struct kimage *image)
   unsigned int has_pae,
   unsigned int preserve_context);
 
+#ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context)
+   save_processor_state();
+#endif
+
+#ifdef CONFIG_FTRACE
+   save_ftrace_enabled = ftrace_enabled;
tracer_disable();
+#endif
 
/* Interrupts aren't acceptable while we reboot */
raw_local_irq_disable();
@@ -182,6 +192,15 @@ void machine_kexec(struct kimage *image)
   (unsigned long)page_list,
   image->start, cpu_has_pae,
   image->preserve_context);
+
+#ifdef CONFIG_KEXEC_JUMP
+   if (kexec_image->preserve_context)
+   restore_processor_state();
+#endif
+
+#ifdef CONFIG_FTRACE
+   ftrace_enabled = save_ftrace_enabled;
+#endif
 }
 
 void arch_crash_save_vmcoreinfo(void)
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1469,7 +1469,6 @@ int kernel_kexec(void)
error = device_power_down(PMSG_FREEZE);
if (error)
goto Enable_irqs;
-   save_processor_state();
} else
 #endif
{
@@ -1482,7 +1481,6 @@ int kernel_kexec(void)
 
 #ifdef CONFIG_KEXEC_JUMP
if (kexec_image->preserve_context) {
-   restore_processor_state();
device_power_up(PMSG_RESTORE);
  Enable_irqs:
local_irq_enable();



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 2/2] kexec jump -v12: save/restore device state

2008-07-14 Thread Huang Ying
Hi, Vivek,

On Mon, 2008-07-14 at 09:48 -0400, Vivek Goyal wrote:
[...]
> You have cited various possible use cases of this patchset. Which is
> the specific feature you are planning to use?

I think two features are useful:

1. Kexec based hibernation
2. Do kdump then continue

> Thinking more about it, what's the compelling feature out of this list
> which makes this patchset a strong candidate for inclusion?
> 
> Regarding hibernation, Rafael does not think this is the way to go 
> for future.

I think kexec based hibernation can be a better hibernation scheme, at
least more code sharing.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump

2008-07-14 Thread huang ying
On Sat, Jul 12, 2008 at 3:21 AM, Andrew Morton
<[EMAIL PROTECTED]> wrote:
> On Tue, 8 Jul 2008 10:50:51 -0400 Vivek Goyal <[EMAIL PROTECTED]> wrote:
>
>> On Mon, Jul 07, 2008 at 11:25:22AM +0800, Huang Ying wrote:
>> > This patch provides an enhancement to kexec/kdump. It implements
>> > the following features:
>> >
>> > - Backup/restore memory used by the original kernel before/after
>> >   kexec.
>> >
>> > - Save/restore CPU state before/after kexec.
>> >
>>
>> Hi Huang,
>>
>> In general this patch set looks good enough to live in -mm and
>> get some testing going.
>>
>> To me, adding capability to return back to original kernel looks
>> like a logical extension to kexec functionality.
>
> Exciting ;)  It's much less code than I expected.
>
> I don't think I understand the feature any more.  Once upon a time we
> thought that this might become a new and better (or at least
> better-code-sharing) way of doing suspend-to-disk.  How far are we from
> that?

At least there are still issues as follow:

- We need a mechanism to pass some information (such as backup pages
map) from hibernated kernel to hibernating kernel. Maybe in C calling
convention.
- To load hibernation image via /sbin/kexec, the segment number
constraint of sys_kexec_load needs to be extended (maybe via
multi-stage loading).
- Make kexec based hibernation compatible with ACPI S4.
- Extend makedumpfile utility for kexec based hibernation.

> What are the prospects of supporting other architectures?

I will work on x86_64 supporting.

> Who maintains kexec-tools, and are they OK with merging up the
> corresponding changes?

I will work with kexec-tools mailing list for corresponding kexec-tools patches.

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump

2008-07-08 Thread Huang Ying
Hi, Pavel,

On Tue, 2008-07-08 at 12:40 +0200, Pavel Machek wrote:
> Hi!
> 
> > > > @@ -1411,3 +1421,50 @@ static int __init crash_save_vmcoreinfo_
> > > >  }
> > > > 
> > > >  module_init(crash_save_vmcoreinfo_init)
> > > > +
> > > > +/**
> > > > + *   kernel_kexec - reboot the system
> > 
> > > Really?
> > 
> > I will change the comments to reflect the changes to kernel_kexec.
> > 
> > > > + *   Move into place and start executing a preloaded standalone
> > > > + *   executable.  If nothing was preloaded return an error.
> > > > + */
> > > > +int kernel_kexec(void)
> > > > +{
> > > > + int error = 0;
> > > > +
> > > > + if (xchg(&kexec_lock, 1))
> > > > + return -EBUSY;
> > > 
> > > That's quite a strange way to provide a lock. mutex_trylock?
> > 
> > I think this is because kexec_lock is used by crash_kexec() too, which
> > may be called in some extreme environment, such as during panic().
> > 
> > > > + if (!kexec_image) {
> > > > + error = -EINVAL;
> > > > + goto Unlock;
> > > > + }
> > > > +
> > > > + if (kexec_image->preserve_context) {
> > > > +#ifdef CONFIG_KEXEC_JUMP
> > > > + local_irq_disable();
> > > > + save_processor_state();
> > > 
> > > #else
> > > BUG()
> > > 
> > > ...because otherwise you silently do nothing?
> > > 
> > > > +#endif
> > 
> > If CONFIG_KEXEC_JUMP is defined, kexec_image->preserve_context will
> > always be 0. So current code is safe. Here, #ifdef is used to resolve
> > the dependency issue. For example, save_processor_state() may be
> > undefined if CONFIG_KEXEC_JUMP is not defined.
> 
> Move the #ifdef outside the if (), then, so this is clear?

I think this is reasonable, I will do it.

> Actually, if preserve_context is always zero in !KEXEC_JUMP case, it
> might make sense to remove whole variable...

I think this will add too many #ifndef CONFIG_KEXEC_JUMP ... #endif that
is necessary. The memory and performance gain is too little to
compensate the code readability reduction.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump

2008-07-08 Thread Huang Ying
On Tue, 2008-07-08 at 10:50 -0400, Vivek Goyal wrote:
> On Mon, Jul 07, 2008 at 11:25:22AM +0800, Huang Ying wrote:
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Backup/restore memory used by the original kernel before/after
> >   kexec.
> > 
> > - Save/restore CPU state before/after kexec.
> > 
> 
> Hi Huang,
> 
> In general this patch set looks good enough to live in -mm and
> get some testing going.
> 
> To me, adding capability to return back to original kernel looks
> like a logical extension to kexec functionality.
> 
> Acked-by: Vivek Goyal <[EMAIL PROTECTED]>
> 
> Few minor comments inline.

Thank you very much!

> [..]
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -22,6 +22,7 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  
> >  #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
> >  static u32 kexec_pgd[1024] PAGE_ALIGNED;
> > @@ -85,10 +86,12 @@ static void load_segments(void)
> >   * reboot code buffer to allow us to avoid allocations
> >   * later.
> >   *
> > - * Currently nothing.
> > + * Make control page executable.
> >   */
> >  int machine_kexec_prepare(struct kimage *image)
> >  {
> > +   if (nx_enabled)
> > +   set_pages_x(image->control_code_page, 1);
> > return 0;
> >  }
> >  
> > @@ -98,16 +101,24 @@ int machine_kexec_prepare(struct kimage 
> >   */
> >  void machine_kexec_cleanup(struct kimage *image)
> >  {
> > +   if (nx_enabled)
> > +   set_pages_nx(image->control_code_page, 1);
> >  }
> >  
> >  /*
> >   * Do not allocate memory (or fail in any way) in machine_kexec().
> >   * We are past the point of no return, committed to rebooting now.
> >   */
> > -NORET_TYPE void machine_kexec(struct kimage *image)
> > +void machine_kexec(struct kimage *image)
> >  {
> > unsigned long page_list[PAGES_NR];
> > void *control_page;
> > +   asmlinkage unsigned long
> > +   (*relocate_kernel_ptr)(unsigned long indirection_page,
> > +  unsigned long control_page,
> > +  unsigned long start_address,
> > +  unsigned int has_pae,
> > +  unsigned int preserve_context);
> >  
> > tracer_disable();
> >  
> > @@ -115,10 +126,11 @@ NORET_TYPE void machine_kexec(struct kim
> > local_irq_disable();
> >  
> > control_page = page_address(image->control_code_page);
> > -   memcpy(control_page, relocate_kernel, PAGE_SIZE);
> > +   memcpy(control_page, relocate_kernel, PAGE_SIZE/2);
> >  
> 
> Is it possible to add either a compile time or run time check
> somewhere to make sure code in relocate_kernel.S does not exceed
> PAGE_SIZE/2.

OK, I will add it.

> [..]
> > --- a/kernel/kexec.c
> > +++ b/kernel/kexec.c
> > @@ -24,6 +24,8 @@
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> > +#include 
> >  
> >  #include 
> >  #include 
> > @@ -242,6 +244,12 @@ static int kimage_normal_alloc(struct ki
> > goto out;
> > }
> >  
> > +   image->swap_page = kimage_alloc_control_pages(image, 0);
> > +   if (!image->swap_page) {
> > +   printk(KERN_ERR "Could not allocate swap buffer\n");
> > +   goto out;
> > +   }
> > +
> > result = 0;
> >   out:
> > if (result == 0)
> > @@ -986,6 +994,8 @@ asmlinkage long sys_kexec_load(unsigned 
> > if (result)
> > goto out;
> >  
> > +   if (flags & KEXEC_PRESERVE_CONTEXT)
> > +   image->preserve_context = 1;
> > result = machine_kexec_prepare(image);
> > if (result)
> > goto out;
> > @@ -1411,3 +1421,50 @@ static int __init crash_save_vmcoreinfo_
> >  }
> >  
> >  module_init(crash_save_vmcoreinfo_init)
> > +
> > +/**
> > + * kernel_kexec - reboot the system
> > + *
> > + * Move into place and start executing a preloaded standalone
> > + * executable.  If nothing was preloaded return an error.
> > + */
> > +int kernel_kexec(void)
> > +{
> > +   int error = 0;
> > +
> > +   if (xchg(&kexec_lock, 1))
> > +   return -EBUSY;
> > +  

Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump

2008-07-08 Thread Huang Ying
Hi, Pavel,

On Mon, 2008-07-07 at 20:50 +0800, Pavel Machek wrote:
> Hi!
> 
> The patch looks mostly ok to me. (Perhaps there's time to split it
> into smaller chunks?)
> 
> You can add Acked-by: Pavel Machek <[EMAIL PROTECTED]> to it, I guess.

Thank you very much!

[...]
> > @@ -98,16 +101,24 @@ int machine_kexec_prepare(struct kimage
> >   */
> >  void machine_kexec_cleanup(struct kimage *image)
> >  {
> > + if (nx_enabled)
> > + set_pages_nx(image->control_code_page, 1);
> >  }
> 
> , 0 ? (setup and cleanup were same, which is strange).

Oh, Yes. That should be 0, I will change it.
> 
> > @@ -1411,3 +1421,50 @@ static int __init crash_save_vmcoreinfo_
> >  }
> > 
> >  module_init(crash_save_vmcoreinfo_init)
> > +
> > +/**
> > + *   kernel_kexec - reboot the system

> Really?

I will change the comments to reflect the changes to kernel_kexec.

> > + *   Move into place and start executing a preloaded standalone
> > + *   executable.  If nothing was preloaded return an error.
> > + */
> > +int kernel_kexec(void)
> > +{
> > + int error = 0;
> > +
> > + if (xchg(&kexec_lock, 1))
> > + return -EBUSY;
> 
> That's quite a strange way to provide a lock. mutex_trylock?

I think this is because kexec_lock is used by crash_kexec() too, which
may be called in some extreme environment, such as during panic().

> > + if (!kexec_image) {
> > + error = -EINVAL;
> > + goto Unlock;
> > + }
> > +
> > + if (kexec_image->preserve_context) {
> > +#ifdef CONFIG_KEXEC_JUMP
> > + local_irq_disable();
> > + save_processor_state();
> 
> #else
> BUG()
> 
> ...because otherwise you silently do nothing?
> 
> > +#endif
> 
> Pavel

If CONFIG_KEXEC_JUMP is defined, kexec_image->preserve_context will
always be 0. So current code is safe. Here, #ifdef is used to resolve
the dependency issue. For example, save_processor_state() may be
undefined if CONFIG_KEXEC_JUMP is not defined.

Best Regards,
Huang Ying



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -mm 1/2] kexec jump -v12: kexec jump

2008-07-06 Thread Huang Ying
nel.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.26-rc8-mm1, and has been tested on IBM T42.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/powerpc/kernel/machine_kexec.c  |2 
 arch/sh/kernel/machine_kexec.c   |2 
 arch/x86/Kconfig |7 +
 arch/x86/kernel/machine_kexec_32.c   |   27 -
 arch/x86/kernel/machine_kexec_64.c   |2 
 arch/x86/kernel/relocate_kernel_32.S |  174 ++-
 include/asm-x86/kexec.h  |   18 ++-
 include/linux/kexec.h|   17 ++-
 kernel/kexec.c   |   57 +++
 kernel/sys.c |   31 +-
 10 files changed, 269 insertions(+), 68 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -22,6 +22,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -85,10 +86,12 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * Make control page executable.
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   if (nx_enabled)
+   set_pages_x(image->control_code_page, 1);
return 0;
 }
 
@@ -98,16 +101,24 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+   if (nx_enabled)
+   set_pages_nx(image->control_code_page, 1);
 }
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   asmlinkage unsigned long
+   (*relocate_kernel_ptr)(unsigned long indirection_page,
+  unsigned long control_page,
+  unsigned long start_address,
+  unsigned int has_pae,
+  unsigned int preserve_context);
 
tracer_disable();
 
@@ -115,10 +126,11 @@ NORET_TYPE void machine_kexec(struct kim
local_irq_disable();
 
control_page = page_address(image->control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE);
+   memcpy(control_page, relocate_kernel, PAGE_SIZE/2);
 
+   relocate_kernel_ptr = control_page;
page_list[PA_CONTROL_PAGE] = __pa(control_page);
-   page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+   page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
page_list[PA_PGD] = __pa(kexec_pgd);
page_list[VA_PGD] = (unsigned long)kexec_pgd;
 #ifdef CONFIG_X86_PAE
@@ -131,6 +143,7 @@ NORET_TYPE void machine_kexec(struct kim
page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
page_list[PA_PTE_1] = __pa(kexec_pte1);
page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+   page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT);
 
/* The segment registers are funny things, they have both a
 * visible and an invisible part.  Whenever the visible part is
@@ -149,8 +162,10 @@ NORET_TYPE void machine_kexec(struct kim
set_idt(phys_to_virt(0),0);
 
/* now call it */
-   relocate_kernel((unsigned long)image->head, (unsigned long)page_list,
-   image->start, cpu_has_pae);
+   image->start = relocate_kernel_ptr((unsigned long)image->head,
+  (unsigned long)page_list,
+  image->start, cpu_has_pae,
+  image->preserve_context);
 }
 
 void arch_crash_save_vmcoreinfo(void)
--- a/include/linux/kexec.h
+++ b/include/linux/kexec.h
@@ -83,6 +83,7 @@ struct kimage {
 
unsigned long start;
struct page *control_code_page;
+   struct page *swap_page;
 
unsigned long nr_segments;
struct kexec_segment segment[KEXEC_SEGMENT_MAX];
@@ -98,18 +99,20 @@ struct kimage {
unsigned int type : 1;
 #define KEXEC_TYPE_DEFAULT 0
 #define KEXEC_TYPE_CRASH   1
+   unsigned int preserve_context : 1;
 };
 


 /* kexec interface functions */
-extern NORET_TYPE void machine_kexec(struct kimage *image) ATTRIB_NORET;
+extern void machine_kexec(struct kimage *image);
 extern int machine_kexec_prepare(struct kimage *image);
 extern void machine_kexec_cleanup(struct kimage *image);
 extern asmlinkage long sys_kexec_load(unsigned long entry,
unsigned long nr_segments,
struct kexec_segment __user *segments,
unsigned long flags);

[PATCH -mm 2/2] kexec jump -v12: save/restore device state

2008-07-06 Thread Huang Ying
iver callback.

v10:

- Split from original kexec_jump patch.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.26-rc8-mm1, and has been tested on IBM T42 with ACPI
on and off.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/Kconfig   |5 ++--
 arch/x86/kernel/machine_kexec_32.c |   12 +++
 include/linux/suspend.h|2 +
 kernel/kexec.c |   39 +
 kernel/power/power.h   |2 -
 5 files changed, 56 insertions(+), 4 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -26,6 +26,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1441,7 +1445,31 @@ int kernel_kexec(void)
 
if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
+   mutex_lock(&pm_mutex);
+   pm_prepare_console();
+   error = freeze_processes();
+   if (error) {
+   error = -EBUSY;
+   goto Restore_console;
+   }
+   suspend_console();
+   error = device_suspend(PMSG_FREEZE);
+   if (error)
+   goto Resume_console;
+   error = disable_nonboot_cpus();
+   if (error)
+   goto Resume_devices;
local_irq_disable();
+   /* At this point, device_suspend() has been called,
+* but *not* device_power_down(). We *must*
+* device_power_down() now.  Otherwise, drivers for
+* some devices (e.g. interrupt controllers) become
+* desynchronized with the actual state of the
+* hardware at resume time, and evil weirdness ensues.
+*/
+   error = device_power_down(PMSG_FREEZE);
+   if (error)
+   goto Enable_irqs;
save_processor_state();
 #endif
} else {
@@ -1459,7 +1487,18 @@ int kernel_kexec(void)
if (kexec_image->preserve_context) {
 #ifdef CONFIG_KEXEC_JUMP
restore_processor_state();
+   device_power_up(PMSG_RESTORE);
+ Enable_irqs:
local_irq_enable();
+   enable_nonboot_cpus();
+ Resume_devices:
+   device_resume(PMSG_RESTORE);
+ Resume_console:
+   resume_console();
+   thaw_processes();
+ Restore_console:
+   pm_restore_console();
+   mutex_unlock(&pm_mutex);
 #endif
}
 
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo
 
 extern int pfn_is_nosave(unsigned long);
 
-extern struct mutex pm_mutex;
-
 #define power_attr(_name) \
 static struct kobj_attribute _name##_attr = {  \
.attr   = { \
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -278,4 +278,6 @@ static inline void register_nosave_regio
 }
 #endif
 
+extern struct mutex pm_mutex;
+
 #endif /* _LINUX_SUSPEND_H */
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -125,6 +125,18 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
 
+   if (image->preserve_context) {
+#ifdef CONFIG_X86_IO_APIC
+   /* We need to put APICs in legacy mode so that we can
+* get timer interrupts in second kernel. kexec/kdump
+* paths already have calls to disable_IO_APIC() in
+* one form or other. kexec jump path also need
+* one.
+*/
+   disable_IO_APIC();
+#endif
+   }
+
control_page = page_address(image->control_code_page);
memcpy(control_page, relocate_kernel, PAGE_SIZE/2);
 
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1276,9 +1276,10 @@ config CRASH_DUMP
 config KEXEC_JUMP
bool "kexec jump (EXPERIMENTAL)"
depends on EXPERIMENTAL
-   depends on KEXEC && PM_SLEEP && X86_32
+   depends on KEXEC && HIBERNATION && X86_32
help
- Invoke code in physical address mode via KEXEC
+ Jump between original kernel and kexeced kernel and invoke
+ code in physical address mode via KEXEC
 
 config PHYSICAL_START
hex "Physical address where the kernel is loaded" if (EMBEDDED || 
CRASH_DUMP)


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 1/2] kexec jump -v11: kexec jump

2008-06-15 Thread Huang, Ying
On Fri, 2008-06-13 at 14:00 -0400, Vivek Goyal wrote:
[...]
> Ok, I found that in my config CONFIG_HIBERNATION was not enabled. After
> enabling CONFIG_HIBERNATION, both suspend to disk and kjump started
> working.
> 
> Does that mean there is some dependency on code under CONFIG_HIBERNATION.
> If yes, I think this dependency should be resolved during compile time.
> May be addtional config option (CONFIG_KEXEC_JUMP), which also selects
> the CONFIG_HIBERNATION automatically etc...

Yes. kexec jump need to put devices into quiescent state and save
devices state into memory, which is implemented by calling hibernation
function: device_suspend(PMSG_FREEZE), whose implementation depends on 
CONFIG_HIBERNATION.

So, I will add CONFIG_KEXEC_JUMP and select CONFIG_HIBERNATION
automatically.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [linux-pm] [PATCH -mm 2/2] kexec jump -v11: save/restore device state

2008-06-15 Thread Huang, Ying
Hi, Vivek,

On Fri, 2008-06-13 at 14:05 -0400, Vivek Goyal wrote:
[...]
> 
> Can't we implement ACPI S5 state as an option in current hibernation
> framework? Or kexec jump is a requirement for that? 

ACPI S5 has been implemented in current hibernation framework. If you
do:

echo shutdown > /sys/power/disk

ACPI S5 instead of S4 will be used. That is, corresponding ACPI method
is not executed.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 2/2] kexec jump -v11: save/restore device state

2008-06-12 Thread Huang, Ying
On Thu, 2008-06-12 at 09:02 -0400, Vivek Goyal wrote:
[...]
> Few things I don't understand.
> 
> - Are you saying that hibernated image will be saved in initrd
>   (rootfs.gz)? But that saving is only in RAM, we never write back
>   it to disk?

No. Hibernated image should be saved in a dedicated raw partition as you
said below.

> - I thought we probably have to dedicate a raw partition kind of thing
>   for saving image and then modify boot loader command line to something
>   similar to, "resume=partition". Then initrd can go hunting for image
>   in respective partition (as specified by command line parameter) and if
>   image is not available then continue with normal boot.

Yes. But the boot-loader command line only need to be changed during
system install or hibernation setup. We need not change the location of
"hibernation partition" frequently.

So I think one boot-loader command line is sufficient for:

- normal boot
- normal boot a system to be hibernated
- boot helper system to restore the hibernated system

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 1/2] kexec jump -v11: kexec jump

2008-06-12 Thread Huang, Ying
On Thu, 2008-06-12 at 15:20 -0400, Vivek Goyal wrote:
> On Tue, Jun 10, 2008 at 03:15:04PM +0800, Huang, Ying wrote:
> > This patch provides an enhancement to kexec/kdump. It implements
> > the following features:
> > 
> > - Backup/restore memory used by the original kernel before/after
> >   kexec.
> > 
> > - Save/restore CPU state before/after kexec.
> > 
> > The features of this patch can be used as a general method to call
> > program in physical mode (paging turning off). This can be used to
> > call BIOS code under Linux.
> > 
> > 
> 
> Hi Huang,
> 
> I was testing these patches and I get following error on my machine.
> 
> Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done.
> Suspending console(s) (use no_console_suspend to debug)
> PM: Device i8042 failed to freeze: error -22
> Restarting tasks ... done.
> 
> Any idea why keyboard controller would not freeze?

Which kernel version do you use?

Does original hibernation work? It can be setup easily. Just add
following parameters to kernel command line:

resume=/dev/

Where  is the swap partition.

And execute following command:

echo reboot > /sys/power/disk
echo disk > /sys/power/state

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm 2/2] kexec jump -v11: save/restore device state

2008-06-11 Thread Huang, Ying
On Wed, 2008-06-11 at 12:30 -0400, Vivek Goyal wrote:
[...]
> > Usage example of simple hibernation:
> > 
> > 1. Compile and install patched kernel with following options selected:
> > 
> > CONFIG_X86_32=y
> > CONFIG_RELOCATABLE=y
> > CONFIG_KEXEC=y
> > CONFIG_CRASH_DUMP=y
> > CONFIG_PM=y
> > 
> > 2. Build an initramfs image contains kexec-tool and makedumpfile, or
> >download the pre-built initramfs image, called rootfs.gz in
> >following text.
> > 
> > 3. Prepare a partition to save memory image of original kernel, called
> >hibernating partition in following text.
> > 
> > 4. Boot kernel compiled in step 1 (kernel A).
> > 
> > 5. In the kernel A, load kernel compiled in step 1 (kernel B) with
> >/sbin/kexec. The shell command line can be as follow:
> > 
> >/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x10
> >  --mem-max=0xff --initrd=rootfs.gz
> > 
> > 6. Boot the kernel B with following shell command line:
> > 
> >/sbin/kexec -e
> > 
> > 7. The kernel B will boot as normal kexec. In kernel B the memory
> >image of kernel A can be saved into hibernating partition as
> >follow:
> > 
> >jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep 
> > kexec_jump_back_entry | cut -d '='`
> >echo $jump_back_entry > kexec_jump_back_entry
> >cp /proc/vmcore dump.elf
> > 
> >Then you can shutdown the machine as normal.
> > 
> > 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as
> >root file system.
> > 
> 
> One of the concerns raised by hibernation people in the past was to use
> single boot loader entry to boot normally as well while resuming a kernel.
> 
> So in this case a user either needs to maintain two boot-loader entries
> or modify it on the fly. I wished there was a better way to handle that.

Now it is not needed to have two boot-loader entries, just one is
enough. Step 4 and step 8 can share the same boot-loader entries. The
rootfs.gz can be the normal initramfs or initrd when deployment. In
rootfs.gz, if there is a valid hibernation image, the hibernated system
will be restored, otherwise, normal boot process follows.


> I am more interested in ability to have multiple kernel loaded in RAM
> and capability to switch between them. Allows me to take non-disruptive
> core dumps and somebody wanted to snapshots the kernels. That should
> still work.
> 
> 
> [..]
> > --- a/arch/x86/kernel/machine_kexec_32.c
> > +++ b/arch/x86/kernel/machine_kexec_32.c
> > @@ -125,6 +125,12 @@ void machine_kexec(struct kimage *image)
> > /* Interrupts aren't acceptable while we reboot */
> > local_irq_disable();
> >  
> > +   if (image->preserve_context) {
> > +#ifdef CONFIG_X86_IO_APIC
> > +   disable_IO_APIC();
> > +#endif
> 
> I think it would be a good idea to put some kind of comment here. We
> need to put APICs in legacy mode so that we can get timer interrupts
> in second kernel. kexec/kdump paths already have calls to
> disable_IO_APIC() in one form or other. kexec jump path also needs one.

OK. I will add it.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [linux-pm] [PATCH -mm 2/2] kexec jump -v11: save/restore device state

2008-06-11 Thread Huang, Ying
On Wed, 2008-06-11 at 04:21 -0400, Len Brown wrote:
> On Wed, 11 Jun 2008, Huang, Ying wrote:
> > On Tue, 2008-06-10 at 14:01 -0400, Len Brown wrote:
> > > 
> > > On Tue, 10 Jun 2008, Huang, Ying wrote:
> > > 
> > > > This patch implements devices state save/restore before after kexec.
> > > > 
> > > > 
> > > > This patch together with features in kexec_jump patch can be used for
> > > > following:
> > > > 
> > > > - A simple hibernation implementation without ACPI support. You can
> > > >   kexec a hibernating kernel, save the memory image of original system
> > > >   and shutdown the system. When resuming, you restore the memory image
> > > >   of original system via ordinary kexec load then jump back.
> > > 
> > > What part of ACPI's role in hibernation are you trying to avoid
> > > 1. enabling wake devices
> > > 2. removing power from the system
> > > 3. something else?
> > 
> > ACPI S5 is used instead of S4 for this simple hibernation
> > implementation. That is, before creating the hibernation image, the ACPI
> > _PTS is not executed, devices are not put into low power state and wake
> > devices are not enabled. After creating the hibernation image, the image
> > is saved to disk and system is shutdown (go to S5). When resuming from
> > hibernated image, ACPI _BFS and _WAK are not executed too.
> 
> Doesn't that resume the devices and their drivers into an unknown state?

device_suspend(PMSG_FREEZE) and device_power_down(PMSG_FREEZE) are
called before hibernation; device_power_up(PMSG_RESTORE) and
device_resume(PMSG_RESTORE) are called after restore. The new
hibernation/restore device driver callbacks introduced by Rafael is
used. So I think the device/driver state will be saved/restored
properly.

It is planed to support ACPI S4 in the future. But I think a hibernation
scheme without ACPI may be useful in some situation too.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [linux-pm] [PATCH -mm 2/2] kexec jump -v11: save/restore device state

2008-06-10 Thread Huang, Ying
On Tue, 2008-06-10 at 14:01 -0400, Len Brown wrote:
> 
> On Tue, 10 Jun 2008, Huang, Ying wrote:
> 
> > This patch implements devices state save/restore before after kexec.
> > 
> > 
> > This patch together with features in kexec_jump patch can be used for
> > following:
> > 
> > - A simple hibernation implementation without ACPI support. You can
> >   kexec a hibernating kernel, save the memory image of original system
> >   and shutdown the system. When resuming, you restore the memory image
> >   of original system via ordinary kexec load then jump back.
> 
> What part of ACPI's role in hibernation are you trying to avoid
> 1. enabling wake devices
> 2. removing power from the system
> 3. something else?

ACPI S5 is used instead of S4 for this simple hibernation
implementation. That is, before creating the hibernation image, the ACPI
_PTS is not executed, devices are not put into low power state and wake
devices are not enabled. After creating the hibernation image, the image
is saved to disk and system is shutdown (go to S5). When resuming from
hibernated image, ACPI _BFS and _WAK are not executed too.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -mm 2/2] kexec jump -v11: save/restore device state

2008-06-10 Thread Huang, Ying
rnel 2.6.26-rc5-mm1, and has been tested on IBM T42 with ACPI
on and off.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 arch/x86/kernel/machine_kexec_32.c |6 +
 include/linux/suspend.h|2 +
 kernel/kexec.c |   43 -
 kernel/power/power.h   |2 -
 4 files changed, 50 insertions(+), 3 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -25,6 +25,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1427,8 +1431,34 @@ module_init(crash_save_vmcoreinfo_init)
 
 int kexec_jump(struct kimage *image)
 {
+   int error = 0;
+
+   mutex_lock(&pm_mutex);
if (image->preserve_context) {
+   pm_prepare_console();
+   error = freeze_processes();
+   if (error) {
+   error = -EBUSY;
+   goto Exit;
+   }
+   suspend_console();
+   error = device_suspend(PMSG_FREEZE);
+   if (error)
+   goto Resume_console;
+   error = disable_nonboot_cpus();
+   if (error)
+   goto Resume_devices;
local_irq_disable();
+   /* At this point, device_suspend() has been called,
+* but *not* device_power_down(). We *must*
+* device_power_down() now.  Otherwise, drivers for
+* some devices (e.g. interrupt controllers) become
+* desynchronized with the actual state of the
+* hardware at resume time, and evil weirdness ensues.
+*/
+   error = device_power_down(PMSG_FREEZE);
+   if (error)
+   goto Enable_irqs;
save_processor_state();
}
 
@@ -1436,7 +1466,18 @@ int kexec_jump(struct kimage *image)
 
if (image->preserve_context) {
restore_processor_state();
+   device_power_up(PMSG_RESTORE);
+ Enable_irqs:
local_irq_enable();
+   enable_nonboot_cpus();
+ Resume_devices:
+   device_resume(PMSG_RESTORE);
+ Resume_console:
+   resume_console();
+   thaw_processes();
+ Exit:
+   pm_restore_console();
}
-   return 0;
+   mutex_unlock(&pm_mutex);
+   return error;
 }
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo
 
 extern int pfn_is_nosave(unsigned long);
 
-extern struct mutex pm_mutex;
-
 #define power_attr(_name) \
 static struct kobj_attribute _name##_attr = {  \
.attr   = { \
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -266,4 +266,6 @@ static inline void register_nosave_regio
 }
 #endif
 
+extern struct mutex pm_mutex;
+
 #endif /* _LINUX_SUSPEND_H */
--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -125,6 +125,12 @@ void machine_kexec(struct kimage *image)
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
 
+   if (image->preserve_context) {
+#ifdef CONFIG_X86_IO_APIC
+   disable_IO_APIC();
+#endif
+   }
+
control_page = page_address(image->control_code_page);
memcpy(control_page, relocate_kernel, PAGE_SIZE/2);
 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH -mm 1/2] kexec jump -v11: kexec jump

2008-06-10 Thread Huang, Ying
This patch provides an enhancement to kexec/kdump. It implements
the following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call
program in physical mode (paging turning off). This can be used to
call BIOS code under Linux.


kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

   source: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
   patches: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
   binary: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10


Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e


Implementation point:

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by kexeced code image
(destination page). When do kexec_load, the image of kexeced code is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the kexeced code image and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.


ChangeLog:

v10:

- Device state save/restore related code is split into another patch
  because it depends on devices hibernation/restore callback and prone
  to be changed.

- C ABI (calling convention) is used as communication protocol between
  kernel and called code.

- Code cleanup: CPU state save/restore code goes in relocate_kernel().

v9:

- pm_mutex is locked during kexec jump to avoid potential conflict
  between kexec jump and suspend/resume/hibernation.

- Split /dev/oldmem writing and kimagecore patch out, keep only the
  core function.

v8:

- Split kexec jump patchset from kexec based hibernation patchset.

- Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT
  because there is no need for such subtle control.

- Delete variable argument based "kernel to kernel" communication
  mechanism from basic kexec jump patchset.

v7:

- Refactor kexec jump to be a command driven programming model.

- Use kexec_lock to do synchronization.

v6:

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.26-rc5-mm1, and has been tested on IBM T42.


Signe

Re: [PATCH -mm] kexec jump -v9

2008-05-27 Thread Huang, Ying
On Tue, 2008-05-27 at 18:15 -0400, Vivek Goyal wrote:
[...]
> > But, because IOAPIC may need to be in original state during
> > suspend/resume, so it is not appropriate to call disable_IO_APIC() in
> > ioapic_suspend(). So I think we can call disable_IO_APIC() in new
> > hibernation/restore callback.
> 
> My hunch is suspend/resume will still work if we put this call in
> ioapic_suspend() but I would not recommend that. suspend/resume does
> not need to put IOAPIC in legacy mode.
>   
> I am not sure what is "new hibernation/restore callback"? Are you
> referring to new patches from Rafel?

Yes. Rafel has a new patch to separate suspend and hibernation device
call backs.
http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation

> I think this issue is specifc to kexec and kjump so probably we should
> not tweaking any suspend/resume related bit.
> 
> How about calling disable_IO_APIC() in kexec_jump()? We can probably even
> optimize it by calling it only when we are transitioning into new image
> for the first time and not for subsquent transitions (by keeping some kind of
> count in kimage). This is little hackish but, should work...

Yes. This issue is kexec/kjump specific. We can call it in kexec_jump().
Maybe we also need call something other in native_machine_shutdown()?

BTW: I have a new version -v10: http://lkml.org/lkml/2008/5/22/106, do
you have time to review it?

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-05-27 Thread Huang, Ying
On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote:
> On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote:
> > Hi, Vivek,
> > 
> > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, I have done some testing on this patch. Currently I have just
> > > tested switching back and forth between two kernels and it is working for
> > > me.
> > > 
> > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> > > comments/questions are inline.
> > 
> > It seems that for LAPIC and IOAPIC, there is
> > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
> > which will be called before/after kexec jump through
> > device_power_down()/device_power_up(). So, the mechanism for
> > LAPIC/IOAPIC is there, we may need to check the corresponding
> > implementation.
> > 
> 
> ioapic_suspend() is not putting APICs in Legacy mode and that's why
> we are seeing the issue. It only saves the IOAPIC routing table entries
> and these entries are restored during ioapic_resume().
> 
> But I think somebody has to put APICs in legacy mode for normal 
> hibernation also. Not sure who does it. May be BIOS, so that during
> resume, second kernel can get the timer interrupts.

As for IOAPIC legacy mode, is it related to the following code which set
the routing table entry for i8259?


void disable_IO_APIC(void)
{
/*
 * Clear the IO-APIC before rebooting:
 */
clear_IO_APIC();

/*
 * If the i8259 is routed through an IOAPIC
 * Put that IOAPIC in virtual wire mode
 * so legacy interrupts can be delivered.
 */
if (ioapic_i8259.pin != -1) {
struct IO_APIC_route_entry entry;

memset(&entry, 0, sizeof(entry));
entry.mask= 0; /* Enabled */
entry.trigger = 0; /* Edge */
entry.irr = 0;
entry.polarity= 0; /* High */
entry.delivery_status = 0;
entry.dest_mode   = 0; /* Physical */
entry.delivery_mode   = dest_ExtINT; /* ExtInt */
entry.vector  = 0;
entry.dest.physical.physical_dest =
GET_APIC_ID(apic_read(APIC_ID));

/*
 * Add it to the IO-APIC irq-routing table:
 */
ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin,
entry);
}
disconnect_bsp_APIC(ioapic_i8259.pin != -1);
}


But, because IOAPIC may need to be in original state during
suspend/resume, so it is not appropriate to call disable_IO_APIC() in
ioapic_suspend(). So I think we can call disable_IO_APIC() in new
hibernation/restore callback.

Am I right?

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH 1/2] kexec jump -v10: kexec jump (resend, fix subject, please ignore previous one)

2008-05-22 Thread Huang, Ying
This patch provides an enhancement to kexec/kdump. It implements
the following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call
program in physical mode (paging turning off). This can be used to
call BIOS code under Linux.


kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

   source: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
   patches: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
   binary: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10


Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e


Implementation point:

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by kexeced code image
(destination page). When do kexec_load, the image of kexeced code is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the kexeced code image and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.


ChangeLog:

v10:

- Device state save/restore related code is split into another patch
  because it depends on devices hibernation/restore callback and prone
  to be changed.

- C ABI (calling convention) is used as communication protocol between
  kernel and called code.

- Code cleanup: CPU state save/restore code goes in relocate_kernel().

v9:

- pm_mutex is locked during kexec jump to avoid potential conflict
  between kexec jump and suspend/resume/hibernation.

- Split /dev/oldmem writing and kimagecore patch out, keep only the
  core function.

v8:

- Split kexec jump patchset from kexec based hibernation patchset.

- Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT
  because there is no need for such subtle control.

- Delete variable argument based "kernel to kernel" communication
  mechanism from basic kexec jump patchset.

v7:

- Refactor kexec jump to be a command driven programming model.

- Use kexec_lock to do synchronization.

v6:

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.26-rc3, and has been tested on IBM T42.


Signed-off-by

[PATCH 2/2] kexec jump -v10: save/restore device state

2008-05-22 Thread Huang, Ying
d is limited,
  hibernation image with many segments may not be load. This is
  planned to be eliminated by adding a new flag to sys_kexec_load to
  make a image can be loaded with multiple sys_kexec_load invoking.


ChangeLog:

v10:

- Split from original kexec_jump patch.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.26-rc3, and has been tested on IBM T42 with ACPI on
and off.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 include/linux/suspend.h |2 ++
 kernel/kexec.c  |   43 ++-
 kernel/power/power.h|2 --
 3 files changed, 44 insertions(+), 3 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -25,6 +25,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -1427,8 +1431,34 @@ module_init(crash_save_vmcoreinfo_init)
 
 int kexec_jump(struct kimage *image)
 {
+   int error = 0;
+
+   mutex_lock(&pm_mutex);
if (image->preserve_context) {
+   pm_prepare_console();
+   error = freeze_processes();
+   if (error) {
+   error = -EBUSY;
+   goto Exit;
+   }
+   suspend_console();
+   error = device_suspend(PMSG_FREEZE);
+   if (error)
+   goto Resume_console;
+   error = disable_nonboot_cpus();
+   if (error)
+   goto Resume_devices;
local_irq_disable();
+   /* At this point, device_suspend() has been called,
+* but *not* device_power_down(). We *must*
+* device_power_down() now.  Otherwise, drivers for
+* some devices (e.g. interrupt controllers) become
+* desynchronized with the actual state of the
+* hardware at resume time, and evil weirdness ensues.
+*/
+   error = device_power_down(PMSG_FREEZE);
+   if (error)
+   goto Enable_irqs;
save_processor_state();
}
 
@@ -1436,7 +1466,18 @@ int kexec_jump(struct kimage *image)
 
if (image->preserve_context) {
restore_processor_state();
+   device_power_up();
+ Enable_irqs:
local_irq_enable();
+   enable_nonboot_cpus();
+ Resume_devices:
+   device_resume();
+ Resume_console:
+   resume_console();
+   thaw_processes();
+ Exit:
+   pm_restore_console();
}
-   return 0;
+   mutex_unlock(&pm_mutex);
+   return error;
 }
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo
 
 extern int pfn_is_nosave(unsigned long);
 
-extern struct mutex pm_mutex;
-
 #define power_attr(_name) \
 static struct kobj_attribute _name##_attr = {  \
.attr   = { \
--- a/include/linux/suspend.h
+++ b/include/linux/suspend.h
@@ -266,4 +266,6 @@ static inline void register_nosave_regio
 }
 #endif
 
+extern struct mutex pm_mutex;
+
 #endif /* _LINUX_SUSPEND_H */


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] kexec jump -v10: kexec jump

2008-05-22 Thread Huang, Ying
This patch provides an enhancement to kexec/kdump. It implements
the following features:

- Backup/restore memory used by the original kernel before/after
  kexec.

- Save/restore CPU state before/after kexec.

The features of this patch can be used as a general method to call
program in physical mode (paging turning off). This can be used to
call BIOS code under Linux.


kexec-tools needs to be patched to support kexec jump. The patches and
the precompiled kexec can be download from the following URL:

   source: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2
   patches: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2
   binary: 
http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10


Usage example of calling some physical mode code and return:

1. Compile and install patched kernel with following options selected:

CONFIG_X86_32=y
CONFIG_KEXEC=y

2. Build patched kexec-tool or download the pre-built one.

3. Build some physical mode executable named such as "phy_mode"

4. Boot kernel compiled in step 1.

5. Load physical mode executable with /sbin/kexec. The shell command
   line can be as follow:

   /sbin/kexec --load-preserve-context --args-none phy_mode

6. Call physical mode executable with following shell command line:

   /sbin/kexec -e


Implementation point:

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by kexeced code image
(destination page). When do kexec_load, the image of kexeced code is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the kexeced code image and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

C ABI (calling convention) is used as communication protocol between
kernel and called code.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.


ChangeLog:

v10:

- Device state save/restore related code is split into another patch
  because it depends on devices hibernation/restore callback and prone
  to be changed.

- C ABI (calling convention) is used as communication protocol between
  kernel and called code.

- Code cleanup: CPU state save/restore code goes in relocate_kernel().

v9:

- pm_mutex is locked during kexec jump to avoid potential conflict
  between kexec jump and suspend/resume/hibernation.

- Split /dev/oldmem writing and kimagecore patch out, keep only the
  core function.

v8:

- Split kexec jump patchset from kexec based hibernation patchset.

- Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT
  because there is no need for such subtle control.

- Delete variable argument based "kernel to kernel" communication
  mechanism from basic kexec jump patchset.

v7:

- Refactor kexec jump to be a command driven programming model.

- Use kexec_lock to do synchronization.

v6:

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.


Now, only the i386 architecture is supported. The patchset is based on
Linux kernel 2.6.26-rc3, and has been tested on IBM T42.


Signed-off-by

Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 19:55 -0700, Eric W. Biederman wrote:
> "Huang, Ying" <[EMAIL PROTECTED]> writes:
> 
> > The disadvantage of this solution is that kernel B must know it is
> > original kernel (A) or kexeced kernel (B). Different code should be used
> > by kernel A and kernel B. And after jump from A to B, jump from B to A,
> > when jump from A to B again, kernel A must use different code from the
> > first time.
> 
> I don't know what the case is for keeping two kernels in memory and switching
> between them.

This can be used to save the memory image of kernel B and accelerate the
hibernation. The real boot of kernel B is only needed first time.

> I suspect a small piece of trampoline code between the two kernels could
> handle the case. (i.e. purgatory pays attention).
> 
> That is a fundamental aspect of the design.  A general purpose infrastructure
> with trampoline code to adapt it to whatever situation comes up.

It is possible to use purgatory to deal with this problem.

Jump from kernel A to kernel B
Jump to entry of purgatory (purgatory_entry)
purgatory save the return address (kexec_jump_back_entry_A)
Purgatory set kexec_jump_back_entry for kernel B to a code
segment in purgatory, say kexec_jump_back_entry_A_for_B
Purgatory jump to entry point of kernel B
Jump from kernel B to kernel A
Jump to purgatory (kexec_jump_back_entry_A_for_B)
Purgatory save the return address (kexec_jump_back_entry_B)
Purgatory return to kernel A (kexec_jump_back_entry_A)
Jump from kernel A to kernel B again
Jump to entry of purgatory (purgatory_entry)
Purgatory save the return address (kexec_jump_back_entry_A)
Purgatory jump to kexec_jump_back_entry_B

The disadvantage of this solution is that some information is saved in
purgatory (kexec_jump_back_entry_A, kexec_jump_back_entry_B). So,
purgatory must be saved too when save the memory image of kernel A or
kernel B. Purgatory can be seen as a part of kernel B. But it is a
little tricky to think it as a part of kernel A too.

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote:
> "Huang, Ying" <[EMAIL PROTECTED]> writes:
> 
> > On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote:
> > [...]
> >> 2) After we figure out our address read the stack pointer from
> >>a fixed location and simply set it.  (This is my preference)
> >
> > Just for confirmation (My English is poor).
> >
> > Do you mean that kernel A just read the stack top as re-entry point,
> > regardless of whether it is return address or argument 1?
> 
> What I was thinking was:
> 
> In kernel A()
> 
> relocate_new_kernel:
> 
> ...
> 
> call  *%eax
> 
> kexec_jump_back_entry:
> /* This code should be PIC so figure out where we are */
> call  1f
> 1:
> popl  %edi
> subl  $(1b - relocate_kernel), %edi
> 
> /* Setup a safe stack */
> lealPAGE_SIZE(%edi), %esp
> ...
> 
> 
> Then in purgatory we can read the address of kexec_jump_back_entry
> by examining 0(%esp) and export it in whatever fashion is sane.
> 
> However we reach kexec_jump_back_entry we should be fine.

I think it is reasonable to enable jumping back and forth more than one
time. So the following should be possible:

1. Jump from A to B (actually jump to purgatory, trigger the boot of B)
2. Jump from B to A
3. Jump from A to B again (jump to the kexec_jump_back_entry of B)
4. Jump from B to A
...

So it should be possible to get the re-entry point of kernel B in
kexec_jump_back_entry of kernel A too. So I think in
kexec_jump_back_entry, the caller's stack should be checked to get
re-entry point of peer. And the stack state is different depend on where
come from, from relocate_new_kernel() or return.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 22:00 -0400, Vivek Goyal wrote:
[...]
> IMHO, this kind of make more sense to me when keeping C function like
> semantics in mind.
> 
> Both the cases can be treated like calls to functions (calling BIOS function
> and jumping to kernel B). The basic difference between two cases is the
> re-entry point. In BIOS function case, we always re-enter the function at the
> start but in case of kernel B, except first entry, all other entries happen
> at a run time determined address, which needs to be communicated to kernel A.
> 
> I would think that second kernel B just should execute "ret" and new entry
> address of kernel B is passed to kernel A through %eax (return value of
> function).

The disadvantage of this solution is that kernel B must know it is
original kernel (A) or kexeced kernel (B). Different code should be used
by kernel A and kernel B. And after jump from A to B, jump from B to A,
when jump from A to B again, kernel A must use different code from the
first time.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 21:51 -0400, Vivek Goyal wrote:
> On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote:
> > On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
> > [...]
> > > Ok, You want to make BIOS calls. We already do that using vm86 mode and
> > > use bios real mode interrupts. So why do we need this interface? Or, IOW,
> > > how is this interface better?
> > 
> > It can call code in 32-bit physical mode in addition to real mode. So It
> > can be used to call EFI runtime service, especially call EFI 64 runtime
> > service under 32-bit kernel or vice versa.
> > 
> > The main purpose of kexec jump is for hibernation. But I think if the
> > effort is small, why not support general 32-bit physical mode code call
> > at same time.
> > 
> 
> In general what's the environment requirements for EFI runtime 
> services? I mean, just that processor should be in protected mode with
> paging disabled or one need to stop all other cpus and devices and then make
> the call (as we are doing in this case?). 

Put processor in protected mode with paging disabled is sufficient. In
one of previous kexec jump versions, I provide some option to choose the
state saved (whether stop other cpus, whether stop devices).

I agree that now we should focus on kexec based hibernation. But I think
it is reasonable to keep the possibility with minimal effort.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 18:35 -0700, Eric W. Biederman wrote:
> Vivek Goyal <[EMAIL PROTECTED]> writes:
> 
> > ioapic_suspend() is not putting APICs in Legacy mode and that's why
> > we are seeing the issue. It only saves the IOAPIC routing table entries
> > and these entries are restored during ioapic_resume().
> >
> > But I think somebody has to put APICs in legacy mode for normal 
> > hibernation also. Not sure who does it. May be BIOS, so that during
> > resume, second kernel can get the timer interrupts.
> 
> I doubt anything cares in the suspend to ram case. There should just
> be a small BIOS trampoline to get back to linux when the processor
> restarts.  And you don't need interrupts for any of that. 

As far as I know, in suspend to ram, interrupt is used as waking up
event, such as, keyboard interrupt.

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote:
[...]
> Ok, You want to make BIOS calls. We already do that using vm86 mode and
> use bios real mode interrupts. So why do we need this interface? Or, IOW,
> how is this interface better?

It can call code in 32-bit physical mode in addition to real mode. So It
can be used to call EFI runtime service, especially call EFI 64 runtime
service under 32-bit kernel or vice versa.

The main purpose of kexec jump is for hibernation. But I think if the
effort is small, why not support general 32-bit physical mode code call
at same time.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-15 Thread Huang, Ying
On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote:
[...]
> 2) After we figure out our address read the stack pointer from
>a fixed location and simply set it.  (This is my preference)

Just for confirmation (My English is poor).

Do you mean that kernel A just read the stack top as re-entry point,
regardless of whether it is return address or argument 1?

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-05-14 Thread Huang, Ying
Hi, Vivek,

On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.
> 
> Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few
> comments/questions are inline.

It seems that for LAPIC and IOAPIC, there is
lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(),
which will be called before/after kexec jump through
device_power_down()/device_power_up(). So, the mechanism for
LAPIC/IOAPIC is there, we may need to check the corresponding
implementation.

Best Regards,
Huang Ying

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-14 Thread Huang, Ying
On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote:
[...]
> Then as a preliminary design let's plan on this.
> 
> - Pass the rentry point as the return address (using the C ABI).
>   We may want to load the stack pointer etc so we can act as
>   a direct entry point for new code.

There are some issues about passing entry point as return address. The
kexec jump (or kexec with return) is used for

- Switching between original kernel (A) and kexeced kernel (B)
- Call some code (such as BIOS code) in physical mode

1) When call some code in physical mode, the called code can use a
simple return to return to kernel A. So there is no return address on
stack after return to kernel A. Instead, argument 1 is on stack top.

2) When switch back from kernel B to kernel A, kernel B will call the
jump back entry of kernel A with C ABI. So, the return address is on
stack top. And kernel A get jump back entry of kernel B via the return
address.

Because the stack state is different between 1) and 2), the jump back
entry of kernel A should distinguish them. Possible solution can be as
follow:

a) Before kernel A call some physical mode code or kernel B, it set
argument 1 to be a magic number that can not be return address (such as
-1). Jump back entry of kernel A can check whether the stack top is
argument 1 or return address.

b) Distinguish by return address. Such as, called physical mode code
must return 0, while kernel B must set %eax to some other number.

c) Use different entry point for 1) and 2). Two entry points are deduced
from return address. Such as:

entry1 = return_address;
entry2 = return_address & ~0xfff;   /* page aligned */

entry1 is used by physical mode code. entry2 is used by kernel B.


Which one is better? Or some other solution?

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-14 Thread Huang, Ying
On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote:
> "Huang, Ying" <[EMAIL PROTECTED]> writes:
> 
> >> So, IMHO, for first simple implementation, we don't have to pass around
> >> any data between kernels except entry point. (Please correct me if I am 
> >> wrong). Lets get that implementation in first and then we can get rest
> >> of the pieces in place.
> >
> > Yes. Kernel entry/re-entry point is the only information need to be
> > communicated between kernels for just switching between them. So we can
> > focus on kexec jump patch firstly.
> 
> Then as a preliminary design let's plan on this.
> 
> - Pass the rentry point as the return address (using the C ABI).
>   We may want to load the stack pointer etc so we can act as
>   a direct entry point for new code.

OK, I will try to do this.

> - Look at passing a pointer to the mapping of pages that the kexec
>   trampoline uses in arg1 of the C ABI.  Largely the format is defacto
>   fixed anyway because we need to pass the structure from C to
>   assembly.

You mean pass image->head to purgatory of /sbin/kexec using arg1 of C
ABI?

> Using the standard C ABI makes things much it much easier to pick
> a calling convention, and to document it.

Yes.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH -mm] kexec jump -v9

2008-05-14 Thread Huang, Ying
On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote:
[...]
> Ok, I have done some testing on this patch. Currently I have just
> tested switching back and forth between two kernels and it is working for
> me.

Thanks.

[...]
> > +/*
> > + * Entry point for jumping back from kexeced kernel, the paging is
> > + * turned off.
> > + */
> > +kexec_jump_back_entry:
> > +   call1f
> > +1:
> > +   popl%ebx
> > +   subl$(1b - kexec_relocate_page), %ebx
> > +   movl%edi, KJUMP_ENTRY_OFF(%ebx)
> > +   movlCP_VA_CONTROL_PAGE(%ebx), %edi
> > +   lea STACK_TOP(%ebx), %esp
> > +   movlCP_PA_SWAP_PAGE(%ebx), %eax
> > +   movlCP_PA_BACKUP_PAGES_MAP(%ebx), %edx
> > +   pushl   %eax
> > +   pushl   %edx
> > +   callswap_pages
> > +   addl$8, %esp
> > +   movlCP_PA_PGD(%ebx), %eax
> > +   movl%eax, %cr3
> > +   movl%cr0, %eax
> > +   orl $(1<<31), %eax
> > +   movl%eax, %cr0
> > +   lea STACK_TOP(%edi), %esp
> > +   movl%edi, %eax
> > +   addl$(virtual_mapped - kexec_relocate_page), %eax
> > +   pushl   %eax
> > +   ret
> 
> Upon re-entering the kernel, what happens to GDT table? So gdtr will be
> pointing to GDT of other kernel (which is not there as pages have been
> swapped)? Do we need to reload the gdtr upon re-entering the kernel.

After re-entering the kernel and returning from machine_kexec,
restore_processor_state() is called, where the GDTR and some other CPU
state such as FPU, IDT, etc are restored.

> [..]
> > @@ -197,8 +282,54 @@ identity_mapped:
> > xorl%eax, %eax
> > movl%eax, %cr3
> >  
> > +   movlCP_PA_SWAP_PAGE(%edi), %eax
> > +   pushl   %eax
> > +   pushl   %ebx
> > +   callswap_pages
> > +   addl$8, %esp
> > +
> > +   /* To be certain of avoiding problems with self-modifying code
> > +* I need to execute a serializing instruction here.
> > +* So I flush the TLB, it's handy, and not processor dependent.
> > +*/
> > +   xorl%eax, %eax
> > +   movl%eax, %cr3
> > +
> > +   /* set all of the registers to known values */
> > +   /* leave %esp alone */
> > +
> > +   movlKJUMP_MAGIC_OFF(%edi), %eax
> > +   cmpl$KJUMP_MAGIC_NUMBER, %eax
> > +   jz 1f
> > +   xorl%edi, %edi
> > +   xorl%eax, %eax
> > +   xorl%ebx, %ebx
> > +   xorl%ecx, %ecx
> > +   xorl%edx, %edx
> > +   xorl%esi, %esi
> > +   xorl%ebp, %ebp
> > +   ret
> > +1:
> > +   popl%edx
> > +   movlCP_PA_SWAP_PAGE(%edi), %esp
> > +   addl$PAGE_SIZE_asm, %esp
> > +   pushl   %edx
> > +2:
> > +   call*%edx
> 
> > +   movl%edi, %edx
> > +   popl%edi
> > +   pushl   %edx
> > +   jmp 2b
> > +
> 
> What does above piece of code do? Looks like redundant for switching
> between the kernels? After call *%edx, we never return here. Instead
> we come back to "kexec_jump_back_entry"?

For switching between the kernels, this is redundant. Originally another
feature of kexec jump is to call some code in physical mode. This is
used to provide a C ABI to called code.

Now, Eric suggests to use a C ABI compatible mode to pass the jump back
entry point too, that is, use the return address on stack instead of %
edi. I think that is reasonable. Maybe we can revise this code to be
compatible with C ABI and provide a convenient interface for both kernel
and other physical mode code.

> [..]
> > --- /dev/null
> > +++ b/Documentation/i386/jump_back_protocol.txt
> > @@ -0,0 +1,66 @@
> > +   THE LINUX/I386 JUMP BACK PROTOCOL
> > +   -
> > +
> > +   Huang Ying <[EMAIL PROTECTED]>
> > +   Last update 2007-12-19
> > +
> > +Currently, the following versions of the jump back protocol exist.
> > +
> > +Protocol 1.00: Jumping between original kernel and kexeced kernel
> > +   support. Calling ordinary C function support.
> > +
> > +
> > +*** JUMP BACK ENTRY
> > +
> > +At jump back entry of callee, the CPU must be in 32-bit protected mode
> > +with paging disabled; the CS, DS, ES and SS must be 4G flat segments;
> > +CS must have execute/read permission, and DS, ES and SS must have
> > +read/write permission; interrupt must be disabled; the contents of
> > +registers and corresponding memory must be as follow:
> > +
> > +Offset/Size    Meaning
> > +
> > +%edi   Real 

Re: [PATCH -mm] kexec jump -v9

2008-05-14 Thread Huang, Ying
On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote:
[...]
> >  
> > +   if (image->preserve_context) {
> > +   KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
> > +   if (kexec_jump_save_cpu(control_page)) {
> > +   image->start = KJUMP_ENTRY(control_page);
> > +   return;
> 
> Tricky, and I expect unnecessary.
> We should be able to just have relocate_new_kernel return?

OK, I will check this. Maybe we can move CPU state saving code into
relocate_new_kernel.

[...]
> > -static void kernel_kexec(void)
> > +static int kernel_kexec(void)
> >  {
> > +   int ret = -ENOSYS;
> >  #ifdef CONFIG_KEXEC
> > -   struct kimage *image;
> > -   image = xchg(&kexec_image, NULL);
> > -   if (!image)
> > -   return;
> > -   kernel_restart_prepare(NULL);
> > -   printk(KERN_EMERG "Starting new kernel\n");
> > -   machine_shutdown();
> > -   machine_kexec(image);
> > +   if (xchg(&kexec_lock, 1))
> > +   return -EBUSY;
> > +   if (!kexec_image) {
> > +   ret = -EINVAL;
> > +   goto unlock;
> > +   }
> > +   if (!kexec_image->preserve_context) {
> > +   kernel_restart_prepare(NULL);
> > +   printk(KERN_EMERG "Starting new kernel\n");
> > +   machine_shutdown();
> > +   }
> > +   ret = kexec_jump(kexec_image);
> > +unlock:
> > +   xchg(&kexec_lock, 0);
> >  #endif
> 
> Ugh.  No.  Not sharing the shutdown methods with reboot and
> the normal kexec path looks like a recipe for failure to me.
> 
> This looks like where we really need to have the conversation.
> What methods do we use to shutdown the system.
> 
> My take on the situation is this.  For proper handling we
> need driver device_detach and device_reattach methods.
> 
> With the following semantics.  The device_detach methods
> will disable DMA and place the hardware in a sane state
> from which the device driver can reclaim and reinitialize it,
> but the hardware will not be touched.
> 
> device_reattach reattaches the driver to the hardware.

Yes. Current device PM callback is not suitable for hibernation (kexec
based or original). I think we can collaborate with Rafael J. Wysocki on
the new device drivers hibernation callbacks.

> So looking at this patch I see two very productive directions
> we can go.
> 1) A patch that just fixes up the kexec infrastructure code
>so it implements the swap page and provides the kernel
>reentry point.  And doesn't handle the upper layer
>user interface portion.
> 
> 2) A patch that renames device_shutdown to device_detach.
>And starts implementing the driver hooks needed from
>a resumable kexec.

OK. I can separate the patch into two patches.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-13 Thread Huang, Ying
On Tue, 2008-05-13 at 22:56 -0400, Vivek Goyal wrote:
[...]
> > 
> > > Last time I tried the patches (V9) and kexec jump did not work for me. I
> > > was not getting timer interrupts in second kernel. Then I had to put 
> > > LAPIC and IOAPIC in legacy mode and then at one way jump started working.
> > > I am not sure how the next kernel boots for you without putting APICs
> > > in legacy mode. (Yet to make returning back to original kernel work
> > > using V9). 
> > 
> > Can normal kexec (without kexec jump) works without putting LAPIC and
> > IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC
> > into legacy mode before kexec and restore them after?
> > 
> 
> We do put LAPIC and IOAPIC in legacy mode in normal kexec. Look at 
> disable_IO_APIC() in native_machine_shutdown(). So I think we shall
> have to do the same thing in kexec jump code too.

OK. I will look at this.

> I went through above mail thread again where we were discussing what all
> information need to be passed between kernels.
> 
> Last time we enumerated three things.
> 
> - kernel entry/re-entry point for switch between kernels.
> - backup pages map for core filtering
> - Probably ELF core notes for saving hibernated image.
> 
> I think if we just implement the functionality so that one can switch
> back and forth between kernels (no hibernated image saving),then we probably
> need to pass around only kernel entry/re-entry point and nothing else and in
> your patches I think you are already doing using %edi.

Yes.

> So, IMHO, for first simple implementation, we don't have to pass around
> any data between kernels except entry point. (Please correct me if I am 
> wrong). Lets get that implementation in first and then we can get rest
> of the pieces in place.

Yes. Kernel entry/re-entry point is the only information need to be
communicated between kernels for just switching between them. So we can
focus on kexec jump patch firstly.

Best Regards,
Huang Ying


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-13 Thread Huang, Ying
Hi, Vivek,

On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote:
> On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote:
> > This patch implements a prototype of kexec multi-stage load. With this
> > patch, the "backup pages map" can be passed to kexeced kernel via
> > /sbin/kexec; and the sys_kexec_load can be used to load large
> > hibernated image with huge number of segments.
> > 
> > 
> 
> Hi Huang,
> 
> Had a quick look at the patch. Will review in detail soon. Had few
> thoughts.
> 
> In general, these patches are on top of previous kexec jump patches.
> It would be good if you could repost your updated patches so that
> I can apply the patches and and get some testing going.

The kexec jump patch v9 is sufficient for this patch to work. I have no
new version of kexec jump patch so far.

> Last time I tried the patches (V9) and kexec jump did not work for me. I
> was not getting timer interrupts in second kernel. Then I had to put 
> LAPIC and IOAPIC in legacy mode and then at one way jump started working.
> I am not sure how the next kernel boots for you without putting APICs
> in legacy mode. (Yet to make returning back to original kernel work
> using V9). 

Can normal kexec (without kexec jump) works without putting LAPIC and
IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC
into legacy mode before kexec and restore them after?

The kexec jump patch works well on my IBM T42. But it seems that the
IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this
machine.

> > In kexec based hibernation, resuming from disk is implemented as
> > loading the hibernated disk image with sys_kexec_load(). But unlike
> > the normal kexec load, the hibernated image may have huge number of
> > segments. So multi-stage loading is necessary for kexec load based
> > resuming from disk implementation.
> 
> I understand that hibernated images are huge. But why do we require
> multi stage loading? I knew there was a maximum segment limit in kexec.
> But I think we can change that limit. Anything else prevents us from
> loading large images in one go?

There are two reason for multi-stage loading:

- Pass backup pages map from original kernel (A) to kexeced kernel (B),
because it is not known before loading. We have discussed this before
in:
http://lkml.org/lkml/2008/3/12/308
http://lkml.org/lkml/2008/3/14/59
http://lkml.org/lkml/2008/3/21/299

- Load large hibernated image. The hibernated image can be not only
large but also discontinuous. For example, the physical memory size is
4G, and there is one free page every 2 pages, that is, there will be
nearly 2G segments. Loading these segments in one go is impossible. So
multi-stage load is necessary. And if the hibernated image is
compressed, it is also very difficult to load it in one go because the
anonymous pages needed.

> > And, multi-stage loading is also
> > necessary for parameter passing from original kernel to kexeced kernel
> > because some information such as "backup pages map" is not available
> > before loading.
> > 
> > 
> > Four stages are defined:
> > 
> > - KS_start: start stage; begin a new kexec loading; there must be only
> >   one KS_start stage in one kexec loading.
> > 
> > - KS_mid: middle stage; continue load some segments; there may be many
> >   or zero KS_mid stages in one kexec loading; follows a KS_start or
> >   KS_mid stage.
> > 
> > - KS_final: final stage; finish a kexec loading; there must be only
> >   one KS_final stage in one kexec loading; follows a KS_start or
> >   KS_mid stage.
> > 
> > - KS_full: back compatible with original loading semantics, finish all
> >   work of a kexec loading in one KS_full stage.
> > 
> > 
> > Overlapping between pages of different segments is allowed to support
> > "parameter passing".
> > 
> > 
> > During loading, a hash table mapped from destination page to source
> > page is used instead of original linear mapping
> > implementation. Because the hibernated image may be very large (up to
> > near the size of physical memory), it is very time-consuming to search
> > a source page given the destination page, which is used to check
> > whether an newly allocated page is in the range of allocated
> > destination pages.
> 
> This seems to be an optimization of kexec so that it becomes efficient
> in loading large images (containing large number of segments). Probably
> this can be a separate patch.

If it is desired, I can separate it into another patch.

> IMHO, we can just first write a minimal patch where one can just switch
> between kernels. Once that patch is

[PATCH] kexec based hibernation: a prototype of kexec multi-stage load

2008-05-11 Thread Huang, Ying
This patch implements a prototype of kexec multi-stage load. With this
patch, the "backup pages map" can be passed to kexeced kernel via
/sbin/kexec; and the sys_kexec_load can be used to load large
hibernated image with huge number of segments.


In kexec based hibernation, resuming from disk is implemented as
loading the hibernated disk image with sys_kexec_load(). But unlike
the normal kexec load, the hibernated image may have huge number of
segments. So multi-stage loading is necessary for kexec load based
resuming from disk implementation. And, multi-stage loading is also
necessary for parameter passing from original kernel to kexeced kernel
because some information such as "backup pages map" is not available
before loading.


Four stages are defined:

- KS_start: start stage; begin a new kexec loading; there must be only
  one KS_start stage in one kexec loading.

- KS_mid: middle stage; continue load some segments; there may be many
  or zero KS_mid stages in one kexec loading; follows a KS_start or
  KS_mid stage.

- KS_final: final stage; finish a kexec loading; there must be only
  one KS_final stage in one kexec loading; follows a KS_start or
  KS_mid stage.

- KS_full: back compatible with original loading semantics, finish all
  work of a kexec loading in one KS_full stage.


Overlapping between pages of different segments is allowed to support
"parameter passing".


During loading, a hash table mapped from destination page to source
page is used instead of original linear mapping
implementation. Because the hibernated image may be very large (up to
near the size of physical memory), it is very time-consuming to search
a source page given the destination page, which is used to check
whether an newly allocated page is in the range of allocated
destination pages. The original mapping is only used by assembly code
to swap the page contents. This map is also exported to user space via
/proc/kexec_pgmap, so that /sbin/kexec can use it to construct the
"backup pages map" parameter for kexeced kernel.


This patch is based on Linux kernel 2.6.25 and kexec_jump patch, and
has been tested on an IBM T42.


Signed-off-by: Huang Ying <[EMAIL PROTECTED]>

---
 include/linux/kexec.h |   19 +
 kernel/kexec.c|  608 +-
 2 files changed, 478 insertions(+), 149 deletions(-)

--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -29,6 +29,10 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -107,43 +111,29 @@ int kexec_should_crash(struct task_struc
  */
 #define KIMAGE_NO_DEST (-1UL)
 
+#define KIMAGE_HASH_BITS 10
+#define KIMAGE_PGTABLE_SIZE (1 << KIMAGE_HASH_BITS)
+
+struct pgmap {
+   struct hlist_node hlist;
+   unsigned long dst_pfn;
+   unsigned long src_pfn;
+};
+
 static int kimage_is_destination_range(struct kimage *image,
   unsigned long start, unsigned long end);
 static struct page *kimage_alloc_page(struct kimage *image,
   gfp_t gfp_mask,
   unsigned long dest);
 
-static int do_kimage_alloc(struct kimage **rimage, unsigned long entry,
-   unsigned long nr_segments,
-struct kexec_segment __user *segments)
+static int kimage_copy_segments(struct kimage *image,
+   unsigned long nr_segments,
+   struct kexec_segment __user *segments)
 {
size_t segment_bytes;
-   struct kimage *image;
unsigned long i;
int result;
 
-   /* Allocate a controlling structure */
-   result = -ENOMEM;
-   image = kzalloc(sizeof(*image), GFP_KERNEL);
-   if (!image)
-   goto out;
-
-   image->head = 0;
-   image->entry = &image->head;
-   image->last_entry = &image->head;
-   image->control_page = ~0; /* By default this does not apply */
-   image->start = entry;
-   image->type = KEXEC_TYPE_DEFAULT;
-
-   /* Initialize the list of control pages */
-   INIT_LIST_HEAD(&image->control_pages);
-
-   /* Initialize the list of destination pages */
-   INIT_LIST_HEAD(&image->dest_pages);
-
-   /* Initialize the list of unuseable pages */
-   INIT_LIST_HEAD(&image->unuseable_pages);
-
/* Read in the segments */
image->nr_segments = nr_segments;
segment_bytes = nr_segments * sizeof(*segments);
@@ -210,6 +200,44 @@ static int do_kimage_alloc(struct kimage
}
 
result = 0;
+ out:
+   return result;
+}
+
+static int do_kimage_alloc(struct kimage **rimage, unsigned long entry,
+   unsigned long nr_segments,
+   struct kexec_segment __user *segments)
+{
+   struct kimage *image;
+   int resul

  1   2   >