Re: [PATCH] mm: convert totalram_pages, totalhigh_pages and managed_pages to atomic.
Arun KS writes: > Remove managed_page_count_lock spinlock and instead use atomic > variables. > > Suggested-by: Michal Hocko > Suggested-by: Vlastimil Babka > Signed-off-by: Arun KS > > --- > As discussed here, > https://patchwork.kernel.org/patch/10627521/#22261253 My 2 cents. I think you should include at least part of the discussion in the patch description to make it more readable by itself. Best Regards, Huang, Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC 1/3] PCI/PM: Fix kexec for D3cold and bridge suspending
On Thu, 2012-09-20 at 21:20 +0200, Rafael J. Wysocki wrote: > On Monday, September 17, 2012, Bjorn Helgaas wrote: > > +cc Eric and kexec list > > > > On Mon, Sep 17, 2012 at 2:54 AM, Huang Ying wrote: > > > If PCI devices are put into D3cold before kexec, because the > > > configuration registers of PCI devices in D3cold are not accessible. > > > > > > And if PCI bridges are put into low power state before kexec, > > > configuration registers of PCI devices underneath the PCI bridges are > > > not accessible too. > > > > > > These will make some PCI devices can not be scanned after kexec, so > > > resume the PCI devices in D3cold or PCI bridges in low power state > > > before kexec. > > > > Don't we need to resume the device even without the kexec issue? And > > even if it's in D1 or D2? > > > > It looks to me like pci_msi_shutdown() (and probably drv->shutdown()) > > depend on the device being in D0. > > We should in theory, but we didn't do any power management of PCI bridges > before, so this is the first time we have a problem with it. > > So I'd say, yeah, let's resume if current_state is between D1 and D3cold > inclusive and the kexec comment is not very helpful (the problem is not > kexec-specific in general). Resume from D1 to D3cold for any device or just bridges? Best Regards, Huang Ying > Speaking of kexec, it might consider using the hibernation device freeze > instead of device shutdown (which the kexec jump feature does). I've seen > reports of problems that would be solved this way most likely. > > Thanks, > Rafael > > > > > Signed-off-by: Huang Ying > > > --- > > > drivers/pci/pci-driver.c |4 > > > 1 file changed, 4 insertions(+) > > > > > > --- a/drivers/pci/pci-driver.c > > > +++ b/drivers/pci/pci-driver.c > > > @@ -421,6 +421,10 @@ static void pci_device_shutdown(struct d > > > struct pci_dev *pci_dev = to_pci_dev(dev); > > > struct pci_driver *drv = pci_dev->driver; > > > > > > + /* Resume bridges and devices in D3cold for kexec to work > > > properly */ > > > + if (pci_dev->current_state == PCI_D3cold || pci_dev->subordinate) > > > + pm_runtime_resume(dev); > > > + > > > if (drv && drv->shutdown) > > > drv->shutdown(pci_dev); > > > pci_msi_shutdown(pci_dev); > > > > > ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [RFC 1/3] PCI/PM: Fix kexec for D3cold and bridge suspending
On Thu, 2012-09-20 at 00:38 -0700, Eric W. Biederman wrote: > Bjorn Helgaas writes: > > > +cc Eric and kexec list > > > > On Mon, Sep 17, 2012 at 2:54 AM, Huang Ying wrote: > >> If PCI devices are put into D3cold before kexec, because the > >> configuration registers of PCI devices in D3cold are not accessible. > >> > >> And if PCI bridges are put into low power state before kexec, > >> configuration registers of PCI devices underneath the PCI bridges are > >> not accessible too. > >> > >> These will make some PCI devices can not be scanned after kexec, so > >> resume the PCI devices in D3cold or PCI bridges in low power state > >> before kexec. > > > > Don't we need to resume the device even without the kexec issue? And > > even if it's in D1 or D2? > > The basic requirement is that the device needs to be visible so we can > auto discover it. As I recall most sleep states don't make the device > invisible and we can handle the rest in the device initializatoin code. PCI devices in D3cold or under a bridge in D3hot will not be visible, so we must fix that for kexec to run properly. > > It looks to me like pci_msi_shutdown() (and probably drv->shutdown()) > > depend on the device being in D0. > > There is certainly a depenency on the config registers being visible. > Although I don't know if much will go wrong if they aren't. > > Ceratinly pci_msi_shutdown doesn't have anything to do if the device has > had so much power removed that the device is not even exectuing. Don't know which power state device should be in for pci_msi_shutdown etc. But it appears that normal shutdown/reboot and kexec works at most times so far. D3cold and bridge in D3hot works for normal shutdown/reboot, but not for kexec. So I write some fix. Best Regards, Huang Ying > >> Signed-off-by: Huang Ying > >> --- > >> drivers/pci/pci-driver.c |4 > >> 1 file changed, 4 insertions(+) > >> > >> --- a/drivers/pci/pci-driver.c > >> +++ b/drivers/pci/pci-driver.c > >> @@ -421,6 +421,10 @@ static void pci_device_shutdown(struct d > >> struct pci_dev *pci_dev = to_pci_dev(dev); > >> struct pci_driver *drv = pci_dev->driver; > >> > >> + /* Resume bridges and devices in D3cold for kexec to work properly > >> */ > >> + if (pci_dev->current_state == PCI_D3cold || pci_dev->subordinate) > >> + pm_runtime_resume(dev); > >> + > >> if (drv && drv->shutdown) > >> drv->shutdown(pci_dev); > >> pci_msi_shutdown(pci_dev); > > > > ___ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [Bug] Kdump does not work when panic triggered due to MCE
Hi, Prasad, On 05/10/2011 12:35 AM, K.Prasad wrote: > On Fri, May 06, 2011 at 07:38:25PM +0200, Andi Kleen wrote: >>> Has anybody tested this before? Or have found kdump working when fatal >>> MCEs have actually occurred? >> >> Ying did some testing. mce-test has test cases for kdump. >> > > We'd be glad to hear about any successful testcases with recent kernels. > My manual testing was quite similar to what the LTP kdump testcase would > do i.e. configure kdump service, trigger crash through > /proc/sysrq-trigger and watchout for kdumpbut as you could see in > the logs, that did not happen. > >> My guess is you injected the error into some area used by the kexec >> code or boot up path of the kexec kernel. >> >> -Andi > > The logs did not suggest that the second kernel was booted into. The > "Rebooting in ... seconds" message appeared from the first kernel. I > tried the kdump testcase in atleast two dissimilar machines but with > the same results, so it is not clear if the kexec code was affected by > the MCE injection in both the cases. >From your panic logs, it seems that panic is triggered for MCE on one CPU, when crash_kexec is executing, another panic is triggered on another CPU for timeout mechanism in MCE. We have seen something like that in mce-test developing. Please try following command line for mce injecting. mce-inject --no-random /home/prasadkr/mce/mce-test/cases/soft-inj/panic_ucr/data/srar_over Which is used by kdump test driver of mce-test too. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH][EFI] Run EFI in physical mode
On Mon, Aug 16, 2010 at 11:27 AM, H. Peter Anvin wrote: > No, it should not be dynamic; rather we should unify all the users who need a > 1:1 map and just keep that page table set around. Agree. One known issue of global 1:1 map is that we need to make at least part of page table PAGE_KERNEL_EXEC for EFI runtime code, and change_page_attr can not be used before page allocator is available. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH][EFI] Run EFI in physical mode
On Sat, Aug 14, 2010 at 3:18 AM, Takao Indoh wrote: > diff -Nurp linux-2.6.35.org/arch/x86/kernel/efi_64.c > linux-2.6.35/arch/x86/kernel/efi_64.c > --- linux-2.6.35.org/arch/x86/kernel/efi_64.c 2010-08-01 18:11:14.0 > -0400 > +++ linux-2.6.35/arch/x86/kernel/efi_64.c 2010-08-13 14:39:25.819105004 > -0400 > @@ -39,7 +39,9 @@ > #include > > static pgd_t save_pgd __initdata; > -static unsigned long efi_flags __initdata; > +static unsigned long efi_flags; > +static pgd_t efi_pgd[PTRS_PER_PGD] __page_aligned_bss; > +static unsigned long save_cr3; > > static void __init early_mapping_set_exec(unsigned long start, > unsigned long end, > @@ -98,6 +100,19 @@ void __init efi_call_phys_epilog(void) > early_runtime_code_mapping_set_exec(0); > } > > +void efi_call_phys_prelog_in_physmode(void) > +{ > + local_irq_save(efi_flags); > + save_cr3 = read_cr3(); > + write_cr3(virt_to_phys(efi_pgd)); > +} > + > +void efi_call_phys_epilog_in_physmode(void) > +{ > + write_cr3(save_cr3); > + local_irq_restore(efi_flags); > +} efi_flags and save_cr3 should be per-CPU, because they now will be used after SMP is enabled. efi_pgd should be dynamically allocated instead of statically allocated, because EFI may be not enabled on some platform. And I think it is better to unify early physical mode with run-time physical mode. Just allocate the page table with early page allocator (lmb?). Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2] kexec jump support for x86_64
x86_64 specific support, including crash memory range and purgatory setup. Corresponding kernel support has been merged already. Together with the kexec jump features in Linux kernel, kexec jump can be used for following: - A simple hibernation implementation without ACPI support. You can kexec a hibernating kernel, save the memory image of original system and shutdown the system. When resuming, you restore the memory image of original system via ordinary kexec load then jump back. - Kernel/system debug through making system snapshot. You can make system snapshot with kexec/kdump, jump back, do some thing and make another system snapshot. - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. - A general method to call program in physical mode (paging turning off). This can be used to invoke BIOS code under Linux. Signed-off-by: Huang Ying --- kexec/arch/x86_64/crashdump-x86_64.c | 48 ++- purgatory/arch/x86_64/purgatory-x86_64.c | 11 ++- purgatory/arch/x86_64/setup-x86_64.S |3 + 3 files changed, 48 insertions(+), 14 deletions(-) --- a/kexec/arch/x86_64/crashdump-x86_64.c +++ b/kexec/arch/x86_64/crashdump-x86_64.c @@ -161,7 +161,8 @@ static struct memory_range crash_reserve * to look into down the line. May be something like /proc/kernelmem or may * be zone data structures exported from kernel. */ -static int get_crash_memory_ranges(struct memory_range **range, int *ranges) +static int get_crash_memory_ranges(struct memory_range **range, int *ranges, + int kexec_flags) { const char *iomem= proc_iomem(); int memory_ranges = 0, gart = 0; @@ -179,10 +180,12 @@ static int get_crash_memory_ranges(struc /* First entry is for first 640K region. Different bios report first * 640K in different manner hence hardcoding it */ - crash_memory_range[0].start = 0x; - crash_memory_range[0].end = 0x0009; - crash_memory_range[0].type = RANGE_RAM; - memory_ranges++; + if (!(kexec_flags & KEXEC_PRESERVE_CONTEXT)) { + crash_memory_range[0].start = 0x; + crash_memory_range[0].end = 0x0009; + crash_memory_range[0].type = RANGE_RAM; + memory_ranges++; + } while(fgets(line, sizeof(line), fp) != 0) { char *str; @@ -239,6 +242,22 @@ static int get_crash_memory_ranges(struc memory_ranges++; } fclose(fp); + if (kexec_flags & KEXEC_PRESERVE_CONTEXT) { + int i; + for (i = 0; i < memory_ranges; i++) { + if (crash_memory_range[i].end > 0x0009) { + crash_reserved_mem.start = \ + crash_memory_range[i].start; + break; + } + } + if (crash_reserved_mem.start >= mem_max) { + fprintf(stderr, "Too small mem_max: 0x%llx.\n", mem_max); + return -1; + } + crash_reserved_mem.end = mem_max; + crash_reserved_mem.type = RANGE_RAM; + } if (exclude_region(&memory_ranges, crash_reserved_mem.start, crash_reserved_mem.end) < 0) return -1; @@ -590,7 +609,8 @@ int load_crashdump_segments(struct kexec if (get_kernel_vaddr_and_size(info)) return -1; - if (get_crash_memory_ranges(&mem_range, &nr_ranges) < 0) + if (get_crash_memory_ranges(&mem_range, &nr_ranges, + info->kexec_flags) < 0) return -1; /* Memory regions which panic kernel can safely use to boot into */ @@ -602,13 +622,15 @@ int load_crashdump_segments(struct kexec add_memmap(memmap_p, crash_reserved_mem.start, sz); /* Create a backup region segment to store backup data*/ - sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1); - tmp = xmalloc(sz); - memset(tmp, 0, sz); - info->backup_start = add_buffer(info, tmp, sz, sz, align, - 0, max_addr, 1); - if (delete_memmap(memmap_p, info->backup_start, sz) < 0) - return -1; + if (!(info->kexec_flags & KEXEC_PRESERVE_CONTEXT)) { + sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1); + tmp = xmalloc(sz); + memset(tmp, 0, sz); + info->backup_start = add_buffer(info, tmp, sz, sz, align, + 0, max_addr, 1); + if (delete_memmap(memmap_p, info->backup_start, sz)
[PATCH] kexec jump support for x86_64
x86_64 specific support, including crash memory range and purgatory setup. Corresponding kernel support has been merged already. Signed-off-by: Huang Ying --- kexec/arch/x86_64/crashdump-x86_64.c | 48 ++- purgatory/arch/x86_64/purgatory-x86_64.c | 11 ++- purgatory/arch/x86_64/setup-x86_64.S |3 + 3 files changed, 48 insertions(+), 14 deletions(-) --- a/kexec/arch/x86_64/crashdump-x86_64.c +++ b/kexec/arch/x86_64/crashdump-x86_64.c @@ -161,7 +161,8 @@ static struct memory_range crash_reserve * to look into down the line. May be something like /proc/kernelmem or may * be zone data structures exported from kernel. */ -static int get_crash_memory_ranges(struct memory_range **range, int *ranges) +static int get_crash_memory_ranges(struct memory_range **range, int *ranges, + int kexec_flags) { const char *iomem= proc_iomem(); int memory_ranges = 0, gart = 0; @@ -179,10 +180,12 @@ static int get_crash_memory_ranges(struc /* First entry is for first 640K region. Different bios report first * 640K in different manner hence hardcoding it */ - crash_memory_range[0].start = 0x; - crash_memory_range[0].end = 0x0009; - crash_memory_range[0].type = RANGE_RAM; - memory_ranges++; + if (!(kexec_flags & KEXEC_PRESERVE_CONTEXT)) { + crash_memory_range[0].start = 0x; + crash_memory_range[0].end = 0x0009; + crash_memory_range[0].type = RANGE_RAM; + memory_ranges++; + } while(fgets(line, sizeof(line), fp) != 0) { char *str; @@ -239,6 +242,22 @@ static int get_crash_memory_ranges(struc memory_ranges++; } fclose(fp); + if (kexec_flags & KEXEC_PRESERVE_CONTEXT) { + int i; + for (i = 0; i < memory_ranges; i++) { + if (crash_memory_range[i].end > 0x0009) { + crash_reserved_mem.start = \ + crash_memory_range[i].start; + break; + } + } + if (crash_reserved_mem.start >= mem_max) { + fprintf(stderr, "Too small mem_max: 0x%llx.\n", mem_max); + return -1; + } + crash_reserved_mem.end = mem_max; + crash_reserved_mem.type = RANGE_RAM; + } if (exclude_region(&memory_ranges, crash_reserved_mem.start, crash_reserved_mem.end) < 0) return -1; @@ -590,7 +609,8 @@ int load_crashdump_segments(struct kexec if (get_kernel_vaddr_and_size(info)) return -1; - if (get_crash_memory_ranges(&mem_range, &nr_ranges) < 0) + if (get_crash_memory_ranges(&mem_range, &nr_ranges, + info->kexec_flags) < 0) return -1; /* Memory regions which panic kernel can safely use to boot into */ @@ -602,13 +622,15 @@ int load_crashdump_segments(struct kexec add_memmap(memmap_p, crash_reserved_mem.start, sz); /* Create a backup region segment to store backup data*/ - sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1); - tmp = xmalloc(sz); - memset(tmp, 0, sz); - info->backup_start = add_buffer(info, tmp, sz, sz, align, - 0, max_addr, 1); - if (delete_memmap(memmap_p, info->backup_start, sz) < 0) - return -1; + if (!(info->kexec_flags & KEXEC_PRESERVE_CONTEXT)) { + sz = (BACKUP_SRC_SIZE + align - 1) & ~(align - 1); + tmp = xmalloc(sz); + memset(tmp, 0, sz); + info->backup_start = add_buffer(info, tmp, sz, sz, align, + 0, max_addr, 1); + if (delete_memmap(memmap_p, info->backup_start, sz) < 0) + return -1; + } /* Create elf header segment and store crash image data. */ if (crash_create_elf64_headers(info, &elf_info, --- a/purgatory/arch/x86_64/purgatory-x86_64.c +++ b/purgatory/arch/x86_64/purgatory-x86_64.c @@ -6,6 +6,7 @@ uint8_t reset_vga = 0; uint8_t legacy_pic = 0; uint8_t panic_kernel = 0; +unsigned long jump_back_entry = 0; char *cmdline_end = NULL; void setup_arch(void) @@ -14,8 +15,16 @@ void setup_arch(void) if (legacy_pic) x86_setup_legacy_pic(); } +void x86_setup_jump_back_entry(void) +{ + if (cmdline_end) + sprintf(cmdline_end, " kexec_jump_back_entry=0x%lx", + jump_back_entry); +} + /* This function can be used to execute after the SHA256 verification. */ void post_verification_setup_arch(void) { -
[PATCH resend] kexec/x86_64: Use one page table in x86_64 machine_kexec
Impact: reduce kernel BSS size by 7 pages, improve code readability Two page tables are used in current x86_64 kexec implementation. One is used to jump from kernel virtual address to identity map address, the other is used to map all physical memory. In fact, on x86_64, there is no conflict between kernel virtual address space and physical memory space, so just one page table is sufficient. The page table pages used to map control page are dynamically allocated to save memory if kexec image is not loaded. ASM code used to map control page is replaced by C code too. Signed-off-by: Huang Ying --- arch/x86/include/asm/kexec.h | 27 ++- arch/x86/kernel/machine_kexec_64.c | 82 +++--- arch/x86/kernel/relocate_kernel_64.S | 125 --- 3 files changed, 67 insertions(+), 167 deletions(-) --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -9,23 +9,8 @@ # define PAGES_NR 4 #else # define PA_CONTROL_PAGE 0 -# define VA_CONTROL_PAGE 1 -# define PA_PGD2 -# define VA_PGD3 -# define PA_PUD_0 4 -# define VA_PUD_0 5 -# define PA_PMD_0 6 -# define VA_PMD_0 7 -# define PA_PTE_0 8 -# define VA_PTE_0 9 -# define PA_PUD_1 10 -# define VA_PUD_1 11 -# define PA_PMD_1 12 -# define VA_PMD_1 13 -# define PA_PTE_1 14 -# define VA_PTE_1 15 -# define PA_TABLE_PAGE 16 -# define PAGES_NR 17 +# define PA_TABLE_PAGE 1 +# define PAGES_NR 2 #endif #ifdef CONFIG_X86_32 @@ -157,9 +142,9 @@ relocate_kernel(unsigned long indirectio unsigned long start_address) ATTRIB_NORET; #endif -#ifdef CONFIG_X86_32 #define ARCH_HAS_KIMAGE_ARCH +#ifdef CONFIG_X86_32 struct kimage_arch { pgd_t *pgd; #ifdef CONFIG_X86_PAE @@ -169,6 +154,12 @@ struct kimage_arch { pte_t *pte0; pte_t *pte1; }; +#else +struct kimage_arch { + pud_t *pud; + pmd_t *pmd; + pte_t *pte; +}; #endif #endif /* __ASSEMBLY__ */ --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -18,15 +18,6 @@ #include #include -#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) -static u64 kexec_pgd[512] PAGE_ALIGNED; -static u64 kexec_pud0[512] PAGE_ALIGNED; -static u64 kexec_pmd0[512] PAGE_ALIGNED; -static u64 kexec_pte0[512] PAGE_ALIGNED; -static u64 kexec_pud1[512] PAGE_ALIGNED; -static u64 kexec_pmd1[512] PAGE_ALIGNED; -static u64 kexec_pte1[512] PAGE_ALIGNED; - static void init_level2_page(pmd_t *level2p, unsigned long addr) { unsigned long end_addr; @@ -107,12 +98,65 @@ out: return result; } +static void free_transition_pgtable(struct kimage *image) +{ + free_page((unsigned long)image->arch.pud); + free_page((unsigned long)image->arch.pmd); + free_page((unsigned long)image->arch.pte); +} + +static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) +{ + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long vaddr, paddr; + int result = -ENOMEM; + + vaddr = (unsigned long)relocate_kernel; + paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE); + pgd += pgd_index(vaddr); + if (!pgd_present(*pgd)) { + pud = (pud_t *)get_zeroed_page(GFP_KERNEL); + if (!pud) + goto err; + image->arch.pud = pud; + set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE)); + } + pud = pud_offset(pgd, vaddr); + if (!pud_present(*pud)) { + pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + if (!pmd) + goto err; + image->arch.pmd = pmd; + set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE)); + } + pmd = pmd_offset(pud, vaddr); + if (!pmd_present(*pmd)) { + pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!pte) + goto err; + image->arch.pte = pte; + set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE)); + } + pte = pte_offset_kernel(pmd, vaddr); + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); + return 0; +err: + free_transition_pgtable(image); + return result; +} + static int init_pgtable(struct kimage *image, unsigned long start_pgtable) { pgd_t *level4p; + int result; level4p = (pgd_t *)__va(start_pgtable); - return init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT); + result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT); + if (result) + return result; + return init_transition_pgtable(image, level4p); } stat
[PATCH] kexec/x86_64: Use one page table in x86_64 kexec
Two page tables are used in current x86_64 kexec implementation. One is used to jump from kernel virtual address to identity map address, the other is used to map all physical memory. In fact, on x86_64, there is no conflict between kernel virtual address space and physical memory space, so just one page table is sufficient. The page table pages used to map control page are dynamically allocated to save memory if kexec image is not loaded. ASM code used to map control page is replaced by C code too. Signed-off-by: Huang Ying --- arch/x86/include/asm/kexec.h | 27 ++- arch/x86/kernel/machine_kexec_64.c | 82 +++--- arch/x86/kernel/relocate_kernel_64.S | 125 --- 3 files changed, 67 insertions(+), 167 deletions(-) --- a/arch/x86/include/asm/kexec.h +++ b/arch/x86/include/asm/kexec.h @@ -9,23 +9,8 @@ # define PAGES_NR 4 #else # define PA_CONTROL_PAGE 0 -# define VA_CONTROL_PAGE 1 -# define PA_PGD2 -# define VA_PGD3 -# define PA_PUD_0 4 -# define VA_PUD_0 5 -# define PA_PMD_0 6 -# define VA_PMD_0 7 -# define PA_PTE_0 8 -# define VA_PTE_0 9 -# define PA_PUD_1 10 -# define VA_PUD_1 11 -# define PA_PMD_1 12 -# define VA_PMD_1 13 -# define PA_PTE_1 14 -# define VA_PTE_1 15 -# define PA_TABLE_PAGE 16 -# define PAGES_NR 17 +# define PA_TABLE_PAGE 1 +# define PAGES_NR 2 #endif #ifdef CONFIG_X86_32 @@ -157,9 +142,9 @@ relocate_kernel(unsigned long indirectio unsigned long start_address) ATTRIB_NORET; #endif -#ifdef CONFIG_X86_32 #define ARCH_HAS_KIMAGE_ARCH +#ifdef CONFIG_X86_32 struct kimage_arch { pgd_t *pgd; #ifdef CONFIG_X86_PAE @@ -169,6 +154,12 @@ struct kimage_arch { pte_t *pte0; pte_t *pte1; }; +#else +struct kimage_arch { + pud_t *pud; + pmd_t *pmd; + pte_t *pte; +}; #endif #endif /* __ASSEMBLY__ */ --- a/arch/x86/kernel/machine_kexec_64.c +++ b/arch/x86/kernel/machine_kexec_64.c @@ -18,15 +18,6 @@ #include #include -#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) -static u64 kexec_pgd[512] PAGE_ALIGNED; -static u64 kexec_pud0[512] PAGE_ALIGNED; -static u64 kexec_pmd0[512] PAGE_ALIGNED; -static u64 kexec_pte0[512] PAGE_ALIGNED; -static u64 kexec_pud1[512] PAGE_ALIGNED; -static u64 kexec_pmd1[512] PAGE_ALIGNED; -static u64 kexec_pte1[512] PAGE_ALIGNED; - static void init_level2_page(pmd_t *level2p, unsigned long addr) { unsigned long end_addr; @@ -107,12 +98,65 @@ out: return result; } +static void free_transition_pgtable(struct kimage *image) +{ + free_page((unsigned long)image->arch.pud); + free_page((unsigned long)image->arch.pmd); + free_page((unsigned long)image->arch.pte); +} + +static int init_transition_pgtable(struct kimage *image, pgd_t *pgd) +{ + pud_t *pud; + pmd_t *pmd; + pte_t *pte; + unsigned long vaddr, paddr; + int result = -ENOMEM; + + vaddr = (unsigned long)relocate_kernel; + paddr = __pa(page_address(image->control_code_page)+PAGE_SIZE); + pgd += pgd_index(vaddr); + if (!pgd_present(*pgd)) { + pud = (pud_t *)get_zeroed_page(GFP_KERNEL); + if (!pud) + goto err; + image->arch.pud = pud; + set_pgd(pgd, __pgd(__pa(pud) | _KERNPG_TABLE)); + } + pud = pud_offset(pgd, vaddr); + if (!pud_present(*pud)) { + pmd = (pmd_t *)get_zeroed_page(GFP_KERNEL); + if (!pmd) + goto err; + image->arch.pmd = pmd; + set_pud(pud, __pud(__pa(pmd) | _KERNPG_TABLE)); + } + pmd = pmd_offset(pud, vaddr); + if (!pmd_present(*pmd)) { + pte = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!pte) + goto err; + image->arch.pte = pte; + set_pmd(pmd, __pmd(__pa(pte) | _KERNPG_TABLE)); + } + pte = pte_offset_kernel(pmd, vaddr); + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); + return 0; +err: + free_transition_pgtable(image); + return result; +} + static int init_pgtable(struct kimage *image, unsigned long start_pgtable) { pgd_t *level4p; + int result; level4p = (pgd_t *)__va(start_pgtable); - return init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT); + result = init_level4_page(image, level4p, 0, max_pfn << PAGE_SHIFT); + if (result) + return result; + return init_transition_pgtable(image, level4p); } static void set_idt(void *newidt, u16 limit) @@ -174,7 +218,7 @@ int machi
Re: [PATCH] Fix kexec x86_64 load failed bug
On Wed, 2008-11-26 at 14:16 +0800, Simon Horman wrote: > On Wed, Nov 26, 2008 at 12:25:51PM +0800, Huang Ying wrote: > > On Wed, 2008-11-26 at 11:25 +0800, Randy Dunlap wrote: > > > This isn't kernel code? Where is /purgatory/ ? > > > > > > Anyway, for kernel code, that should be: > > > char *cmdline_end = NULL; > > > > This patch is against kexec tools, not kernel. > > > > Best Regards, > > Huang Ying > > Hi Huang, > > I think that I would prefer "char *cmdline_end = NULL;" for kexec-tools > code too. Patch v2 follows with NULL instead of 0. Best Regards, Huang Ying > Fix a bug of kexec load on x86_64. Kexec fails to do load on x86_64, with error message: Symbol: cmdline_end not found cannot set Because kexec/arch/i386/kexec-bzImage.c accesses cmdline_end symbol in i386 purgatory, but there is no cmdline_end in x86_64 purgatory, and kexec-bzImage.c is used by x86_64 too. cmdline_end is added into x86_64 purgatory to solve the bug, because kexec jump support for x86_64 is planned. Reported-by: Bernhard Walle <[EMAIL PROTECTED]> Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- purgatory/arch/x86_64/purgatory-x86_64.c |2 ++ 1 file changed, 2 insertions(+) --- a/purgatory/arch/x86_64/purgatory-x86_64.c +++ b/purgatory/arch/x86_64/purgatory-x86_64.c @@ -1,10 +1,12 @@ #include +#include #include #include "purgatory-x86_64.h" uint8_t reset_vga = 0; uint8_t legacy_pic = 0; uint8_t panic_kernel = 0; +char *cmdline_end = NULL; void setup_arch(void) { signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] Fix kexec x86_64 load failed bug
On Wed, 2008-11-26 at 11:25 +0800, Randy Dunlap wrote: > This isn't kernel code? Where is /purgatory/ ? > > Anyway, for kernel code, that should be: > char *cmdline_end = NULL; This patch is against kexec tools, not kernel. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] Fix kexec x86_64 load failed bug
Fix a bug of kexec load on x86_64. Kexec fails to do load on x86_64, with error message: Symbol: cmdline_end not found cannot set Because kexec/arch/i386/kexec-bzImage.c accesses cmdline_end symbol in i386 purgatory, but there is no cmdline_end in x86_64 purgatory, and kexec-bzImage.c is used by x86_64 too. cmdline_end is added into x86_64 purgatory to solve the bug, because kexec jump support for x86_64 is planned. Reported-by: Bernhard Walle <[EMAIL PROTECTED]> Signed-off-by: Huang Ying <[EMAIL PROTECTED]> diff --git a/purgatory/arch/x86_64/purgatory-x86_64.c b/purgatory/arch/x86_64/purgatory-x86_64.c index 374b554..67a37f9 100644 --- a/purgatory/arch/x86_64/purgatory-x86_64.c +++ b/purgatory/arch/x86_64/purgatory-x86_64.c @@ -5,6 +5,7 @@ uint8_t reset_vga = 0; uint8_t legacy_pic = 0; uint8_t panic_kernel = 0; +char *cmdline_end = 0; void setup_arch(void) { signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/3 -v4] kexec/i386: remove PAGE_SIZE alignment from relocate_kernel
This patch removes PAGE_SIZE alignment from relocate_kernel(). Before kexec jump patches are merged, control page is mapped to relocate_kernel in kexec page tables, so relocate_kernel must be PAGE_SIZE aligned. Now, control page is mapped to identity mapped address, so relocate_kernel need not to be PAGE_SIZE aligned any more. This can reduce a few KB from kernel text segement. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/relocate_kernel_32.S |1 - 1 file changed, 1 deletion(-) --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -39,7 +39,6 @@ #define CP_PA_BACKUP_PAGES_MAP DATA(0x1c) .text - .align PAGE_SIZE .globl relocate_kernel relocate_kernel: /* Save the CPU context, used for jumping back */ signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/3 -v4] kexec/i386: allocate page table pages dynamically
This patch adds an architecture specific struct kimage_arch into struct kimage. The pointers to page table pages used by kexec are added to struct kimage_arch. The page tables pages are dynamically allocated in machine_kexec_prepare instead of statically from BSS segment. This will save up to 20k memory when kexec image is not loaded. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/include/asm/kexec.h | 14 +++ arch/x86/kernel/machine_kexec_32.c | 69 + include/linux/kexec.h |4 ++ 3 files changed, 65 insertions(+), 22 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -25,15 +26,6 @@ #include #include -#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) -static u32 kexec_pgd[1024] PAGE_ALIGNED; -#ifdef CONFIG_X86_PAE -static u32 kexec_pmd0[1024] PAGE_ALIGNED; -static u32 kexec_pmd1[1024] PAGE_ALIGNED; -#endif -static u32 kexec_pte0[1024] PAGE_ALIGNED; -static u32 kexec_pte1[1024] PAGE_ALIGNED; - static void set_idt(void *newidt, __u16 limit) { struct desc_ptr curidt; @@ -76,6 +68,37 @@ static void load_segments(void) #undef __STR } +static void machine_kexec_free_page_tables(struct kimage *image) +{ + free_page((unsigned long)image->arch.pgd); +#ifdef CONFIG_X86_PAE + free_page((unsigned long)image->arch.pmd0); + free_page((unsigned long)image->arch.pmd1); +#endif + free_page((unsigned long)image->arch.pte0); + free_page((unsigned long)image->arch.pte1); +} + +static int machine_kexec_alloc_page_tables(struct kimage *image) +{ + image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL); +#ifdef CONFIG_X86_PAE + image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL); + image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL); +#endif + image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL); + image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!image->arch.pgd || +#ifdef CONFIG_X86_PAE + !image->arch.pmd0 || !image->arch.pmd1 || +#endif + !image->arch.pte0 || !image->arch.pte1) { + machine_kexec_free_page_tables(image); + return -ENOMEM; + } + return 0; +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -87,13 +110,14 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Make control page executable. + * - Make control page executable. + * - Allocate page tables */ int machine_kexec_prepare(struct kimage *image) { if (nx_enabled) set_pages_x(image->control_code_page, 1); - return 0; + return machine_kexec_alloc_page_tables(image); } /* @@ -104,6 +128,7 @@ void machine_kexec_cleanup(struct kimage { if (nx_enabled) set_pages_nx(image->control_code_page, 1); + machine_kexec_free_page_tables(image); } /* @@ -150,18 +175,18 @@ void machine_kexec(struct kimage *image) relocate_kernel_ptr = control_page; page_list[PA_CONTROL_PAGE] = __pa(control_page); page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; - page_list[PA_PGD] = __pa(kexec_pgd); - page_list[VA_PGD] = (unsigned long)kexec_pgd; + page_list[PA_PGD] = __pa(image->arch.pgd); + page_list[VA_PGD] = (unsigned long)image->arch.pgd; #ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(kexec_pmd0); - page_list[VA_PMD_0] = (unsigned long)kexec_pmd0; - page_list[PA_PMD_1] = __pa(kexec_pmd1); - page_list[VA_PMD_1] = (unsigned long)kexec_pmd1; -#endif - page_list[PA_PTE_0] = __pa(kexec_pte0); - page_list[VA_PTE_0] = (unsigned long)kexec_pte0; - page_list[PA_PTE_1] = __pa(kexec_pte1); - page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + page_list[PA_PMD_0] = __pa(image->arch.pmd0); + page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0; + page_list[PA_PMD_1] = __pa(image->arch.pmd1); + page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1; +#endif + page_list[PA_PTE_0] = __pa(image->arch.pte0); + page_list[VA_PTE_0] = (unsigned long)image->arch.pte0; + page_list[PA_PTE_1] = __pa(image->arch.pte1); + page_list[VA_PTE_1] = (unsigned long)image->arch.pte1; if (image->type == KEXEC_TYPE_DEFAULT) page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -100,6 +100,10 @@ struct kimage { #define KEXEC_TYPE_DEFAULT 0 #define KEXEC_TYPE_CRASH 1 unsigned int preserve_context : 1; + +#ifdef ARCH_HAS_KIMAGE_ARCH + struct kimage_arch arch; +#endif }; --- a/arch/x86/include
[PATCH 0/3 -v4] kexec/i386: kexec page table code clean up
This patchset cleans up page table setup code of kexec on i386. This patchset is based on v2.6.28-rc2-338-g65fc716 and has been tested on i386. v4: - Re-based on mainline git tree: v2.6.28-rc2-338-g65fc716. v3: - Remove PAGE_SIZE alignment from relocate_kernel() - Re-based on 2.6.28-rc2-mm1 v2: - Rename some function names, such as alloc_page_tables -> machine_kexec_alloc_page_tables, etc. - Cleanup error processing for machine_alloc_page_tables. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 3/3 -v4] kexec/i386: setup kexec page table in C
This patch transforms the kexec page tables setup code from assembler code to C code in machine_kexec_prepare. This improves readability and reduces code line number. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/include/asm/kexec.h | 17 - arch/x86/kernel/machine_kexec_32.c | 59 ++ arch/x86/kernel/relocate_kernel_32.S | 114 --- 3 files changed, 49 insertions(+), 141 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -99,6 +99,45 @@ static int machine_kexec_alloc_page_tabl return 0; } +static void machine_kexec_page_table_set_one( + pgd_t *pgd, pmd_t *pmd, pte_t *pte, + unsigned long vaddr, unsigned long paddr) +{ + pud_t *pud; + + pgd += pgd_index(vaddr); +#ifdef CONFIG_X86_PAE + if (!(pgd_val(*pgd) & _PAGE_PRESENT)) + set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT)); +#endif + pud = pud_offset(pgd, vaddr); + pmd = pmd_offset(pud, vaddr); + if (!(pmd_val(*pmd) & _PAGE_PRESENT)) + set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); + pte = pte_offset_kernel(pmd, vaddr); + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); +} + +static void machine_kexec_prepare_page_tables(struct kimage *image) +{ + void *control_page; + pmd_t *pmd = 0; + + control_page = page_address(image->control_code_page); +#ifdef CONFIG_X86_PAE + pmd = image->arch.pmd0; +#endif + machine_kexec_page_table_set_one( + image->arch.pgd, pmd, image->arch.pte0, + (unsigned long)control_page, __pa(control_page)); +#ifdef CONFIG_X86_PAE + pmd = image->arch.pmd1; +#endif + machine_kexec_page_table_set_one( + image->arch.pgd, pmd, image->arch.pte1, + __pa(control_page), __pa(control_page)); +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -112,12 +151,19 @@ static int machine_kexec_alloc_page_tabl * * - Make control page executable. * - Allocate page tables + * - Setup page tables */ int machine_kexec_prepare(struct kimage *image) { + int error; + if (nx_enabled) set_pages_x(image->control_code_page, 1); - return machine_kexec_alloc_page_tables(image); + error = machine_kexec_alloc_page_tables(image); + if (error) + return error; + machine_kexec_prepare_page_tables(image); + return 0; } /* @@ -176,17 +222,6 @@ void machine_kexec(struct kimage *image) page_list[PA_CONTROL_PAGE] = __pa(control_page); page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; page_list[PA_PGD] = __pa(image->arch.pgd); - page_list[VA_PGD] = (unsigned long)image->arch.pgd; -#ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(image->arch.pmd0); - page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0; - page_list[PA_PMD_1] = __pa(image->arch.pmd1); - page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1; -#endif - page_list[PA_PTE_0] = __pa(image->arch.pte0); - page_list[VA_PTE_0] = (unsigned long)image->arch.pte0; - page_list[PA_PTE_1] = __pa(image->arch.pte1); - page_list[VA_PTE_1] = (unsigned long)image->arch.pte1; if (image->type == KEXEC_TYPE_DEFAULT) page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -10,15 +10,12 @@ #include #include #include -#include /* * Must be relocatable PIC code callable as a C function */ #define PTR(x) (x << 2) -#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) -#define PAE_PGD_ATTR (_PAGE_PRESENT) /* control_page + KEXEC_CONTROL_CODE_MAX_SIZE * ~ control_page + PAGE_SIZE are used as data storage and stack for @@ -59,117 +56,6 @@ relocate_kernel: movl%cr4, %eax movl%eax, CR4(%edi) -#ifdef CONFIG_X86_PAE - /* map the control page at its virtual address */ - - movlPTR(VA_PGD)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0xc000, %eax - shrl$27, %eax - addl%edi, %eax - - movlPTR(PA_PMD_0)(%ebp), %edx - orl $PAE_PGD_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PMD_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x3fe0, %eax - shrl$18, %eax - addl%edi, %eax - - movlPTR(PA_PTE_0)(%ebp), %edx - orl $PAGE_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PTE_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x001ff000, %eax - shrl$9, %eax - addl%edi, %eax - -
[PATCH -mm 1/3 -v3] kexec/i386: remove PAGE_SIZE alignment from relocate_kernel
This patch removes PAGE_SIZE alignment from relocate_kernel(). Before kexec jump patches are merged, control page is mapped to relocate_kernel in kexec page tables, so relocate_kernel must be PAGE_SIZE aligned. Now, control page is mapped to identity mapped address, so relocate_kernel need not to be PAGE_SIZE aligned any more. This can reduce a few KB from kernel text segement. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/relocate_kernel_32.S |1 - 1 file changed, 1 deletion(-) --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -39,7 +39,6 @@ #define CP_PA_BACKUP_PAGES_MAP DATA(0x1c) .text - .align PAGE_SIZE .globl relocate_kernel relocate_kernel: /* Save the CPU context, used for jumping back */ signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -mm 2/3 -v3] kexec/i386: allocate page table pages dynamically
This patch adds an architecture specific struct kimage_arch into struct kimage. The pointers to page table pages used by kexec are added to struct kimage_arch. The page tables pages are dynamically allocated in machine_kexec_prepare instead of statically from BSS segment. This will save up to 20k memory when kexec image is not loaded. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/include/asm/kexec.h | 14 +++ arch/x86/kernel/machine_kexec_32.c | 69 + include/linux/kexec.h |4 ++ 3 files changed, 65 insertions(+), 22 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -13,6 +13,7 @@ #include #include #include +#include #include #include @@ -25,15 +26,6 @@ #include #include -#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) -static u32 kexec_pgd[1024] PAGE_ALIGNED; -#ifdef CONFIG_X86_PAE -static u32 kexec_pmd0[1024] PAGE_ALIGNED; -static u32 kexec_pmd1[1024] PAGE_ALIGNED; -#endif -static u32 kexec_pte0[1024] PAGE_ALIGNED; -static u32 kexec_pte1[1024] PAGE_ALIGNED; - static void set_idt(void *newidt, __u16 limit) { struct desc_ptr curidt; @@ -76,6 +68,37 @@ static void load_segments(void) #undef __STR } +static void machine_kexec_free_page_tables(struct kimage *image) +{ + free_page((unsigned long)image->arch.pgd); +#ifdef CONFIG_X86_PAE + free_page((unsigned long)image->arch.pmd0); + free_page((unsigned long)image->arch.pmd1); +#endif + free_page((unsigned long)image->arch.pte0); + free_page((unsigned long)image->arch.pte1); +} + +static int machine_kexec_alloc_page_tables(struct kimage *image) +{ + image->arch.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL); +#ifdef CONFIG_X86_PAE + image->arch.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL); + image->arch.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL); +#endif + image->arch.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL); + image->arch.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!image->arch.pgd || +#ifdef CONFIG_X86_PAE + !image->arch.pmd0 || !image->arch.pmd1 || +#endif + !image->arch.pte0 || !image->arch.pte1) { + machine_kexec_free_page_tables(image); + return -ENOMEM; + } + return 0; +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -87,13 +110,14 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Make control page executable. + * - Make control page executable. + * - Allocate page tables */ int machine_kexec_prepare(struct kimage *image) { if (nx_enabled) set_pages_x(image->control_code_page, 1); - return 0; + return machine_kexec_alloc_page_tables(image); } /* @@ -104,6 +128,7 @@ void machine_kexec_cleanup(struct kimage { if (nx_enabled) set_pages_nx(image->control_code_page, 1); + machine_kexec_free_page_tables(image); } /* @@ -150,18 +175,18 @@ void machine_kexec(struct kimage *image) relocate_kernel_ptr = control_page; page_list[PA_CONTROL_PAGE] = __pa(control_page); page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; - page_list[PA_PGD] = __pa(kexec_pgd); - page_list[VA_PGD] = (unsigned long)kexec_pgd; + page_list[PA_PGD] = __pa(image->arch.pgd); + page_list[VA_PGD] = (unsigned long)image->arch.pgd; #ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(kexec_pmd0); - page_list[VA_PMD_0] = (unsigned long)kexec_pmd0; - page_list[PA_PMD_1] = __pa(kexec_pmd1); - page_list[VA_PMD_1] = (unsigned long)kexec_pmd1; -#endif - page_list[PA_PTE_0] = __pa(kexec_pte0); - page_list[VA_PTE_0] = (unsigned long)kexec_pte0; - page_list[PA_PTE_1] = __pa(kexec_pte1); - page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + page_list[PA_PMD_0] = __pa(image->arch.pmd0); + page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0; + page_list[PA_PMD_1] = __pa(image->arch.pmd1); + page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1; +#endif + page_list[PA_PTE_0] = __pa(image->arch.pte0); + page_list[VA_PTE_0] = (unsigned long)image->arch.pte0; + page_list[PA_PTE_1] = __pa(image->arch.pte1); + page_list[VA_PTE_1] = (unsigned long)image->arch.pte1; page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT); /* The segment registers are funny things, they have both a --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -100,6 +100,10 @@ struct kimage { #define KEXEC_TYPE_DEFAULT 0 #define KEXEC_TYPE_CRASH 1 unsigned int preserve_context : 1; + +#ifdef ARCH_HAS_KIMAGE_ARCH + struct kimage_arch arch;
[PATCH -mm 3/3 -v3] kexec/i386: setup kexec page table in C
This patch transforms the kexec page tables setup code from assembler code to C code in machine_kexec_prepare. This improves readability and reduces code line number. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/include/asm/kexec.h | 17 - arch/x86/kernel/machine_kexec_32.c | 59 ++ arch/x86/kernel/relocate_kernel_32.S | 114 --- 3 files changed, 49 insertions(+), 141 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -99,6 +99,45 @@ static int machine_kexec_alloc_page_tabl return 0; } +static void machine_kexec_page_table_set_one( + pgd_t *pgd, pmd_t *pmd, pte_t *pte, + unsigned long vaddr, unsigned long paddr) +{ + pud_t *pud; + + pgd += pgd_index(vaddr); +#ifdef CONFIG_X86_PAE + if (!(pgd_val(*pgd) & _PAGE_PRESENT)) + set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT)); +#endif + pud = pud_offset(pgd, vaddr); + pmd = pmd_offset(pud, vaddr); + if (!(pmd_val(*pmd) & _PAGE_PRESENT)) + set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); + pte = pte_offset_kernel(pmd, vaddr); + set_pte(pte, pfn_pte(paddr >> PAGE_SHIFT, PAGE_KERNEL_EXEC)); +} + +static void machine_kexec_prepare_page_tables(struct kimage *image) +{ + void *control_page; + pmd_t *pmd = 0; + + control_page = page_address(image->control_code_page); +#ifdef CONFIG_X86_PAE + pmd = image->arch.pmd0; +#endif + machine_kexec_page_table_set_one( + image->arch.pgd, pmd, image->arch.pte0, + (unsigned long)control_page, __pa(control_page)); +#ifdef CONFIG_X86_PAE + pmd = image->arch.pmd1; +#endif + machine_kexec_page_table_set_one( + image->arch.pgd, pmd, image->arch.pte1, + __pa(control_page), __pa(control_page)); +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -112,12 +151,19 @@ static int machine_kexec_alloc_page_tabl * * - Make control page executable. * - Allocate page tables + * - Setup page tables */ int machine_kexec_prepare(struct kimage *image) { + int error; + if (nx_enabled) set_pages_x(image->control_code_page, 1); - return machine_kexec_alloc_page_tables(image); + error = machine_kexec_alloc_page_tables(image); + if (error) + return error; + machine_kexec_prepare_page_tables(image); + return 0; } /* @@ -176,17 +222,6 @@ void machine_kexec(struct kimage *image) page_list[PA_CONTROL_PAGE] = __pa(control_page); page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; page_list[PA_PGD] = __pa(image->arch.pgd); - page_list[VA_PGD] = (unsigned long)image->arch.pgd; -#ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(image->arch.pmd0); - page_list[VA_PMD_0] = (unsigned long)image->arch.pmd0; - page_list[PA_PMD_1] = __pa(image->arch.pmd1); - page_list[VA_PMD_1] = (unsigned long)image->arch.pmd1; -#endif - page_list[PA_PTE_0] = __pa(image->arch.pte0); - page_list[VA_PTE_0] = (unsigned long)image->arch.pte0; - page_list[PA_PTE_1] = __pa(image->arch.pte1); - page_list[VA_PTE_1] = (unsigned long)image->arch.pte1; page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT); /* The segment registers are funny things, they have both a --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -10,15 +10,12 @@ #include #include #include -#include /* * Must be relocatable PIC code callable as a C function */ #define PTR(x) (x << 2) -#define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) -#define PAE_PGD_ATTR (_PAGE_PRESENT) /* control_page + KEXEC_CONTROL_CODE_MAX_SIZE * ~ control_page + PAGE_SIZE are used as data storage and stack for @@ -59,117 +56,6 @@ relocate_kernel: movl%cr4, %eax movl%eax, CR4(%edi) -#ifdef CONFIG_X86_PAE - /* map the control page at its virtual address */ - - movlPTR(VA_PGD)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0xc000, %eax - shrl$27, %eax - addl%edi, %eax - - movlPTR(PA_PMD_0)(%ebp), %edx - orl $PAE_PGD_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PMD_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x3fe0, %eax - shrl$18, %eax - addl%edi, %eax - - movlPTR(PA_PTE_0)(%ebp), %edx - orl $PAGE_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PTE_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x001ff000, %eax - shrl$9, %eax
[PATCH -mm 0/3 -v3] kexec/i386: kexec page table code clean up
This patchset cleans up page table setup code of kexec on i386. This patchset is based on 2.6.28-rc2-mm1 and has been tested on i386. v3: - Remove PAGE_SIZE alignment from relocate_kernel() - Re-based on 2.6.28-rc2-mm1 v2: - Rename some function names, such as alloc_page_tables -> machine_kexec_alloc_page_tables, etc. - Cleanup error processing for machine_alloc_page_tables. Best Regards, Huang Ying signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/2] kexec jump/hibernation support for kexec-tools
This patchset add kexec jump/hibernation support to kexec tools. Together with the kexec jump/hibernation features in Linux kernel (which is merged into mainstream from 2.6.27 on) can be used for following: - A simple hibernation implementation without ACPI support. You can kexec a hibernating kernel, save the memory image of original system and shutdown the system. When resuming, you restore the memory image of original system via ordinary kexec load then jump back. - Kernel/system debug through making system snapshot. You can make system snapshot with kexec/kdump, jump back, do some thing and make another system snapshot. - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. - A general method to call program in physical mode (paging turning off). This can be used to invoke BIOS code under Linux. The following additional kernel/tools may be needed for kexec jump/hibernation: - Linux kernel from 2.6.27 on. - makedumpfile with patches are used as memory image saving tool, it can exclude free pages from original kernel memory image file. The patches and the precompiled makedumpfile can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-src_cvs_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile-patches_cvs_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/makedumpfile/makedumpfile_cvs_kh10 - An initramfs image can be used as the root file system of kexeced kernel. An initramfs image built with "BuildRoot" can be downloaded from the following URL: initramfs image: http://khibernation.sourceforge.net/download/release_v10/initramfs/rootfs_cvs_kh10.gz All user space tools above are included in the initramfs image. Usage example of simple hibernation: 1. Compile and install Linux kernel (newer than 2.6.27) with following options selected: CONFIG_X86_32=y CONFIG_RELOCATABLE=y CONFIG_KEXEC=y CONFIG_CRASH_DUMP=y CONFIG_PM=y CONFIG_HIBERNATION=y CONFIG_KEXEC_JUMP=y 2. Build an initramfs image contains kexec-tool and makedumpfile, or download the pre-built initramfs image, called rootfs.gz in following text. 3. Prepare a partition to save memory image of original kernel, called hibernating partition in following text. 4. Boot kernel compiled in step 1 (kernel A). 5. In the kernel A, load kernel compiled in step 1 (kernel B) with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context /boot/bzImage --mem-max=0xff --initrd=rootfs.gz 6. Boot the kernel B with following shell command line: /sbin/kexec -e 7. The kernel B will boot as normal kexec. In kernel B the memory image of kernel A can be saved into hibernating partition as follow: jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='` echo $jump_back_entry > kexec_jump_back_entry cp /proc/vmcore dump.elf Then you can shutdown the machine as normal. 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as root file system. 9. In kernel C, load the memory image of kernel A as follow: /sbin/kexec -l --args-none --entry=`cat kexec_jump_back_entry` dump.elf 10. Jump back to the kernel A as follow: /sbin/kexec -e Then, kernel A is resumed. Now, only the i386 architecture is supported. The patchset is based on the latest kexec-tools git tree, and has been tested on IBM T42 with ACPI on and off. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/2] kexec jump support for kexec-tools
To support memory backup/restore an option named --load-preserve-context is added to kexec. When it is specified toggether with --mem-max, most segments for crash dump support are loaded, and the memory range between mem_min to mem_max which has no segments loaded are loaded as backup segments. To support jump back from kexeced, options named --load-jump-back-helper and --entry are added to load a helper image with specified entry to jump back. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kexec/arch/i386/crashdump-x86.c | 51 +++--- kexec/arch/i386/kexec-bzImage.c | 10 +- kexec/arch/i386/kexec-elf-x86.c |4 kexec/arch/i386/kexec-x86-common.c |3 kexec/arch/i386/x86-linux-setup.h |3 kexec/crashdump-elf.c |2 kexec/crashdump.c |1 kexec/kexec-syscall.h |5 - kexec/kexec.c | 177 +++- kexec/kexec.h | 12 ++ purgatory/arch/i386/purgatory-x86.c | 14 ++ purgatory/arch/i386/setup-x86.S |3 purgatory/include/purgatory.h |1 purgatory/printf.c | 38 ++- 14 files changed, 291 insertions(+), 33 deletions(-) --- a/kexec/kexec-syscall.h +++ b/kexec/kexec-syscall.h @@ -75,8 +75,9 @@ static inline long kexec_reboot(void) } -#define KEXEC_ON_CRASH 0x0001 -#define KEXEC_ARCH_MASK 0x +#define KEXEC_ON_CRASH 0x0001 +#define KEXEC_PRESERVE_CONTEXT 0x0002 +#define KEXEC_ARCH_MASK0x /* These values match the ELF architecture values. * Unless there is a good reason that should continue to be the case. --- a/kexec/kexec.c +++ b/kexec/kexec.c @@ -378,6 +378,91 @@ unsigned long add_buffer_virt(struct kex buf_min, buf_max, buf_end, 0); } +static int find_memory_range(struct kexec_info *info, +unsigned long *base, unsigned long *size) +{ + int i; + unsigned long start, end; + + for (i = 0; i < info->memory_ranges; i++) { + if (info->memory_range[i].type != RANGE_RAM) + continue; + start = info->memory_range[i].start; + end = info->memory_range[i].end; + if (end > *base && start < *base + *size) { + if (start > *base) { + *size = *base + *size - start; + *base = start; + } + if (end < *base + *size) + *size = end - *base; + return 1; + } + } + return 0; +} + +static int find_segment_hole(struct kexec_info *info, +unsigned long *base, unsigned long *size) +{ + int i; + unsigned long seg_base, seg_size; + + for (i = 0; i < info->nr_segments; i++) { + seg_base = (unsigned long)info->segment[i].mem; + seg_size = info->segment[i].memsz; + + if (seg_base + seg_size <= *base) + continue; + else if (seg_base >= *base + *size) + break; + else if (*base < seg_base) { + *size = seg_base - *base; + break; + } else if (seg_base + seg_size < *base + *size) { + *size = *base + *size - (seg_base + seg_size); + *base = seg_base + seg_size; + } else { + *size = 0; + break; + } + } + return *size; +} + +int add_backup_segments(struct kexec_info *info, unsigned long backup_base, + unsigned long backup_size) +{ + unsigned long mem_base, mem_size, bkseg_base, bkseg_size, start, end; + unsigned long pagesize; + + pagesize = getpagesize(); + while (backup_size) { + mem_base = backup_base; + mem_size = backup_size; + if (!find_memory_range(info, &mem_base, &mem_size)) + break; + backup_size = backup_base + backup_size - \ + (mem_base + mem_size); + backup_base = mem_base + mem_size; + while (mem_size) { + bkseg_base = mem_base; + bkseg_size = mem_size; + if (sort_segments(info) < 0) + return -1; + if (!find_segment_hole(info, &bkseg_base, &bkseg_size)) + break; + start = (bkseg_base + pagesize - 1) & ~(pagesize - 1); + end = (bkseg_base + bkseg_size) & ~(pagesize - 1); + add_segm
[PATCH 2/2] core dump file support for ELF loader
This patch adds core dump file support to ELF file loader. This can be used by kexec based hibernation to load hibernated image, which is from /proc/vmcore, a core dump file. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kexec/kexec-elf-exec.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) --- a/kexec/kexec-elf-exec.c +++ b/kexec/kexec-elf-exec.c @@ -20,7 +20,8 @@ int build_elf_exec_info(const char *buf, if (result < 0) { return result; } - if ((ehdr->e_type != ET_EXEC) && (ehdr->e_type != ET_DYN)) { + if ((ehdr->e_type != ET_EXEC) && (ehdr->e_type != ET_DYN) && + (ehdr->e_type != ET_CORE)) { /* not an ELF executable */ if (probe_debug) { fprintf(stderr, "Not ELF type ET_EXEC or ET_DYN\n"); signature.asc Description: This is a digitally signed message part ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v3 6/7] kexec jump: __ftrace_enabled_save/restore
On Fri, 2008-08-15 at 14:49 +0200, Ingo Molnar wrote: > * Huang Ying <[EMAIL PROTECTED]> wrote: > > > +/* Ftrace disable/restore without lock. Some synchronization mechanism > > + * must be used to prevent ftrace_enabled to be changed between > > + * disable/restore. */ > > use the proper comment style please: > > /* > * > */ OK. I will change it. > > +static inline int __ftrace_enabled_save(void) > > +{ > > +#ifdef CONFIG_FTRACE > > + int saved_ftrace_enabled = ftrace_enabled; > > + ftrace_enabled = 0; > > + return saved_ftrace_enabled; > > +#else > > + return 0; > > +#endif > > +} > > + > > +static inline void __ftrace_enabled_restore(int enabled) > > +{ > > +#ifdef CONFIG_FTRACE > > + ftrace_enabled = enabled; > > +#endif > > +} > > hm, what is this used for? > > also, instead of such an ugly inline, why not create a proper > kernel/trace/* function for this. That would also give it access to all > the proper locking mechanisms - instead of relying on some extral > mechanism. This function is used for kexec jump in machine_kexec(). Where all non-boot CPUs and IRQ are disabled, system is going to kexec, and it is not allowed to schedule to other process in this circumstance, so a non-lock version is needed. A locked version has been implemented by Steven Rostedt, I think it can be used for other circumstance. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] kexec jump: fix compiling warning on xchg(&kexec_lock, 0) in kernel_kexec()
Fix compiling warning on xchg(&kexec_lock, 0) in kernel_kexec(). Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kernel/kexec.c |4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1433,6 +1433,7 @@ module_init(crash_save_vmcoreinfo_init) int kernel_kexec(void) { int error = 0; + int locked; if (xchg(&kexec_lock, 1)) return -EBUSY; @@ -1498,7 +1499,8 @@ int kernel_kexec(void) #endif Unlock: - xchg(&kexec_lock, 0); + locked = xchg(&kexec_lock, 0); + BUG_ON(!locked); return error; } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec jump: fix code size checking
On Tue, 2008-08-12 at 20:40 -0700, Eric W. Biederman wrote: [...] > 4) Put the code is a special section .text.kexec? and have the linker >always do the size comparison and the computation of the section size. > > The fewer conditionals we have the less likely something is to break. Yes. This one is good. But I think current one is acceptable too. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v3 1/7] kexec jump: clean up #ifdef and comments
On Tue, 2008-08-12 at 20:49 -0700, Andrew Morton wrote: > On Tue, 12 Aug 2008 11:14:21 +0800 Huang Ying <[EMAIL PROTECTED]> wrote: > > > xchg(&kexec_lock, 0); > > kernel/kexec.c: In function 'kernel_kexec': > kernel/kexec.c:1501: warning: value computed is not used > > Is there any reason why we cannot use the more conventional > test_and_set_bit() etc, rather than this peculiarity? > > Or perhaps spin_trylock? Hi, Andrew, I think it is of no problem to replace xchg() with test_and_set_bit() or spin_trylock(). Hi, Eric, Do you have some reason to use xchg() instead of others? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec jump: fix code size checking
On Wed, 2008-08-13 at 12:47 +1000, Simon Horman wrote: > On Wed, Aug 13, 2008 at 09:04:35AM +0800, Huang Ying wrote: > > Fix building issue when CONFIG_KEXEC=n. Thanks to Vivek Goyal for his > > reminding. > > > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > > > > --- > > include/asm-x86/kexec.h |3 +++ > > 1 file changed, 3 insertions(+) > > > > --- a/include/asm-x86/kexec.h > > +++ b/include/asm-x86/kexec.h > > @@ -43,6 +43,9 @@ > > > > #ifdef CONFIG_X86_32 > > # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 > > +# ifndef CONFIG_KEXEC > > +# define kexec_control_code_size 0 > > +# endif > > #endif > > > > #ifndef __ASSEMBLY__ > > Is it impossible to skip the linker check in the !CONFIG_KEXEC case? It is possible. I think there are several ways to do that. 1) use #ifdef in vmlinux_32.lds.S, such as: #ifdef CONFIG_KEXEC ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE, "kexec control code size is too big") #endif 2) #define a macro for kexec check ld script in asm/kexec.h, such as: #define LD_CHECK_KEXEC()ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE, \ "kexec control code size is too big") and use that in vmlinux_32.lds.S. 3) #define kexec_control_code_size 0. So that the check can be passed always. And, code size = 0 is reasonable for no code (CONFIG_KEXEC=n). I think 3) is better. What do you think about? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v3 6/7] kexec jump: __ftrace_enabled_save/restore
On Tue, 2008-08-12 at 09:06 -0400, Vivek Goyal wrote: > On Tue, Aug 12, 2008 at 11:14:36AM +0800, Huang Ying wrote: > > Add __ftrace_enabled_save/restore, used to disable ftrace for a > > while. Now, this is used by kexec jump, which need a version without > > lock, for general situation, a locked version should be used. > > > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > > > > --- > > include/linux/ftrace.h | 21 + > > 1 file changed, 21 insertions(+) > > > > --- a/include/linux/ftrace.h > > +++ b/include/linux/ftrace.h > > @@ -98,6 +98,27 @@ static inline void tracer_disable(void) > > #endif > > } > > > > +/* Ftrace disable/restore without lock. Some synchronization mechanism > > + * must be used to prevent ftrace_enabled to be changed between > > + * disable/restore. */ > > +static inline int __ftrace_enabled_save(void) > > +{ > > +#ifdef CONFIG_FTRACE > > + int saved_ftrace_enabled = ftrace_enabled; > > + ftrace_enabled = 0; > > + return saved_ftrace_enabled; > > +#else > > + return 0; > > +#endif > > +} > > + > > +static inline void __ftrace_enabled_restore(int enabled) > > +{ > > +#ifdef CONFIG_FTRACE > > + ftrace_enabled = enabled; > > +#endif > > +} > > + > > #ifdef CONFIG_FRAME_POINTER > > /* TODO: need to fix this for ARM */ > > # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) > > I guess steven would like to see a patch which introduces both locked > and lockless versions and with a very good comment explaining in what > kind of unusual situation one can use the lockless version. Have sent a locked version to Steven. And, there are some comments for non-locked version, __ftrace_enabled_save() in above patch. What do you think about it? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] kexec jump: fix code size checking
Fix building issue when CONFIG_KEXEC=n. Thanks to Vivek Goyal for his reminding. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- include/asm-x86/kexec.h |3 +++ 1 file changed, 3 insertions(+) --- a/include/asm-x86/kexec.h +++ b/include/asm-x86/kexec.h @@ -43,6 +43,9 @@ #ifdef CONFIG_X86_32 # define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +# ifndef CONFIG_KEXEC +# define kexec_control_code_size 0 +# endif #endif #ifndef __ASSEMBLY__ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 3/7] kexec jump: check code size in control page
Kexec/Kexec-jump require code size in control page is less than PAGE_SIZE/2. This patch add link-time checking for this. ASSERT() of ld link script is used as the link-time checking mechanism. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c |2 +- arch/x86/kernel/relocate_kernel_32.S | 10 +++--- arch/x86/kernel/vmlinux_32.lds.S |6 ++ include/asm-x86/kexec.h |4 4 files changed, 18 insertions(+), 4 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -138,7 +138,7 @@ void machine_kexec(struct kimage *image) } control_page = page_address(image->control_code_page); - memcpy(control_page, relocate_kernel, PAGE_SIZE/2); + memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE); relocate_kernel_ptr = control_page; page_list[PA_CONTROL_PAGE] = __pa(control_page); --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -20,10 +20,11 @@ #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) #define PAE_PGD_ATTR (_PAGE_PRESENT) -/* control_page + PAGE_SIZE/2 ~ control_page + PAGE_SIZE * 3/4 are - * used to save some data for jumping back +/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE + * ~ control_page + PAGE_SIZE are used as data storage and stack for + * jumping back */ -#define DATA(offset) (PAGE_SIZE/2+(offset)) +#define DATA(offset) (KEXEC_CONTROL_CODE_MAX_SIZE+(offset)) /* Minimal CPU state */ #define ESPDATA(0x0) @@ -376,3 +377,6 @@ swap_pages: popl%ebx popl%ebp ret + + .globl kexec_control_code_size +.set kexec_control_code_size, . - relocate_kernel --- a/include/asm-x86/kexec.h +++ b/include/asm-x86/kexec.h @@ -41,6 +41,10 @@ # define PAGES_NR 17 #endif +#ifdef CONFIG_X86_32 +# define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +#endif + #ifndef __ASSEMBLY__ #include --- a/arch/x86/kernel/vmlinux_32.lds.S +++ b/arch/x86/kernel/vmlinux_32.lds.S @@ -209,3 +209,9 @@ SECTIONS DWARF_DEBUG } + +/* Link time checks */ +#include + +ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE, + "kexec control code size is too big") ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 5/7] kexec jump: in sync with hibernation implementation
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with current hibernation implementation. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kernel/kexec.c |2 ++ 1 file changed, 2 insertions(+) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1457,6 +1457,7 @@ int kernel_kexec(void) error = disable_nonboot_cpus(); if (error) goto Resume_devices; + device_pm_lock(); local_irq_disable(); /* At this point, device_suspend() has been called, * but *not* device_power_down(). We *must* @@ -1485,6 +1486,7 @@ int kernel_kexec(void) device_power_up(PMSG_RESTORE); Enable_irqs: local_irq_enable(); + device_pm_unlock(); enable_nonboot_cpus(); Resume_devices: device_resume(PMSG_RESTORE); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 4/7] kexec jump: remove duplication of kexec_restart_prepare()
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the code. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> Acked-by: Pavel Machek <[EMAIL PROTECTED]> Acked-by: Vivek Goyal <[EMAIL PROTECTED]> --- include/linux/reboot.h |1 + kernel/kexec.c |6 +- kernel/sys.c |2 +- 3 files changed, 3 insertions(+), 6 deletions(-) --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -59,6 +59,7 @@ extern void machine_crash_shutdown(struc * Architecture independent implemenations of sys_reboot commands. */ +extern void kernel_restart_prepare(char *cmd); extern void kernel_restart(char *cmd); extern void kernel_halt(void); extern void kernel_power_off(void); --- a/kernel/sys.c +++ b/kernel/sys.c @@ -274,7 +274,7 @@ void emergency_restart(void) } EXPORT_SYMBOL_GPL(emergency_restart); -static void kernel_restart_prepare(char *cmd) +void kernel_restart_prepare(char *cmd) { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); system_state = SYSTEM_RESTART; --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1472,11 +1472,7 @@ int kernel_kexec(void) } else #endif { - blocking_notifier_call_chain(&reboot_notifier_list, -SYS_RESTART, NULL); - system_state = SYSTEM_RESTART; - device_shutdown(); - sysdev_shutdown(); + kernel_restart_prepare(NULL); printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown(); } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 2/7] kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control page is used for not only code on some platform. For example in kexec jump, it is used for data and stack too. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/arm/include/asm/kexec.h |2 +- arch/ia64/include/asm/kexec.h|2 +- arch/powerpc/include/asm/kexec.h |2 +- arch/s390/include/asm/kexec.h|2 +- arch/sh/include/asm/kexec.h |2 +- include/asm-mips/kexec.h |2 +- include/asm-x86/kexec.h |4 ++-- include/linux/kexec.h|4 ++-- kernel/kexec.c |4 ++-- 9 files changed, 12 insertions(+), 12 deletions(-) --- a/include/asm-x86/kexec.h +++ b/include/asm-x86/kexec.h @@ -63,7 +63,7 @@ /* Maximum address we can use for the control code buffer */ # define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE -# define KEXEC_CONTROL_CODE_SIZE 4096 +# define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_386 @@ -79,7 +79,7 @@ # define KEXEC_CONTROL_MEMORY_LIMIT (0xFFUL) /* Allocate one page for the pdp and the second for the code */ -# define KEXEC_CONTROL_CODE_SIZE (4096UL + 4096UL) +# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL) /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_X86_64 --- a/arch/sh/include/asm/kexec.h +++ b/arch/sh/include/asm/kexec.h @@ -21,7 +21,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE -#define KEXEC_CONTROL_CODE_SIZE4096 +#define KEXEC_CONTROL_PAGE_SIZE4096 /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_SH --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -22,7 +22,7 @@ #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE #endif -#define KEXEC_CONTROL_CODE_SIZE 4096 +#define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ #ifdef __powerpc64__ --- a/arch/ia64/include/asm/kexec.h +++ b/arch/ia64/include/asm/kexec.h @@ -9,7 +9,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE -#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096) +#define KEXEC_CONTROL_PAGE_SIZE (8192 + 8192 + 4096) /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_IA_64 --- a/arch/s390/include/asm/kexec.h +++ b/arch/s390/include/asm/kexec.h @@ -31,7 +31,7 @@ #define KEXEC_CONTROL_MEMORY_LIMIT (1UL<<31) /* Allocate one page for the pdp and the second for the code */ -#define KEXEC_CONTROL_CODE_SIZE 4096 +#define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_S390 --- a/arch/arm/include/asm/kexec.h +++ b/arch/arm/include/asm/kexec.h @@ -10,7 +10,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT (-1UL) -#define KEXEC_CONTROL_CODE_SIZE4096 +#define KEXEC_CONTROL_PAGE_SIZE4096 #define KEXEC_ARCH KEXEC_ARCH_ARM --- a/include/asm-mips/kexec.h +++ b/include/asm-mips/kexec.h @@ -16,7 +16,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT (0x2000) -#define KEXEC_CONTROL_CODE_SIZE 4096 +#define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_MIPS --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -242,7 +242,7 @@ static int kimage_normal_alloc(struct ki */ result = -ENOMEM; image->control_code_page = kimage_alloc_control_pages(image, - get_order(KEXEC_CONTROL_CODE_SIZE)); + get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { printk(KERN_ERR "Could not allocate control_code_buffer\n"); goto out; @@ -317,7 +317,7 @@ static int kimage_crash_alloc(struct kim */ result = -ENOMEM; image->control_code_page = kimage_alloc_control_pages(image, - get_order(KEXEC_CONTROL_CODE_SIZE)); + get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { printk(KERN_ERR "Could not allocate control_code_buffer\n"); goto out; --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -25,8 +25,8 @@ #error KEXEC_CONTROL_MEMORY_LIMIT not defined #endif -#ifndef KEXEC_CONTROL_CODE_SIZE -#error KEXEC_CONTROL_CODE_SIZE not defined +#ifndef KEXEC_CONTROL_PAGE_SIZE +#error KEXEC_CONTROL_PAGE_SIZE not defined #endif #ifndef KEXEC_ARCH ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 6/7] kexec jump: __ftrace_enabled_save/restore
Add __ftrace_enabled_save/restore, used to disable ftrace for a while. Now, this is used by kexec jump, which need a version without lock, for general situation, a locked version should be used. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- include/linux/ftrace.h | 21 + 1 file changed, 21 insertions(+) --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -98,6 +98,27 @@ static inline void tracer_disable(void) #endif } +/* Ftrace disable/restore without lock. Some synchronization mechanism + * must be used to prevent ftrace_enabled to be changed between + * disable/restore. */ +static inline int __ftrace_enabled_save(void) +{ +#ifdef CONFIG_FTRACE + int saved_ftrace_enabled = ftrace_enabled; + ftrace_enabled = 0; + return saved_ftrace_enabled; +#else + return 0; +#endif +} + +static inline void __ftrace_enabled_restore(int enabled) +{ +#ifdef CONFIG_FTRACE + ftrace_enabled = enabled; +#endif +} + #ifdef CONFIG_FRAME_POINTER /* TODO: need to fix this for ARM */ # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 0/7] kexec jump: fixes for 2.6.27
Hi, This patchset fixes some issues of kexec jump for 2.6.27. It is based on 2.6.27-rc2 and has been tested on i386 platform. ChangeLog: v3: - Merge added file vmlinux_check_32.lds.S into vmlinux_32.lds.S. - Add comments about lock for ftrace code. v2: - Check control code size at link time instead of run time. - Encapsulate ftrace related code into functions. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 7/7] kexec jump: fix for ftrace
Ftrace depends on some processor state that we destroyed during kexec and restored by restore_processor_state(). So save_processor_state() and restore_processor_state() are moved into machine_kexec() and ftrace is restored after restore_processor_state(). Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c | 16 +++- kernel/kexec.c |2 -- 2 files changed, 15 insertions(+), 3 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -113,6 +114,7 @@ void machine_kexec(struct kimage *image) { unsigned long page_list[PAGES_NR]; void *control_page; + int save_ftrace_enabled; asmlinkage unsigned long (*relocate_kernel_ptr)(unsigned long indirection_page, unsigned long control_page, @@ -120,7 +122,12 @@ void machine_kexec(struct kimage *image) unsigned int has_pae, unsigned int preserve_context); - tracer_disable(); +#ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) + save_processor_state(); +#endif + + save_ftrace_enabled = __ftrace_enabled_save(); /* Interrupts aren't acceptable while we reboot */ local_irq_disable(); @@ -178,6 +185,13 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, cpu_has_pae, image->preserve_context); + +#ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) + restore_processor_state(); +#endif + + __ftrace_enabled_restore(save_ftrace_enabled); } void arch_crash_save_vmcoreinfo(void) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1469,7 +1469,6 @@ int kernel_kexec(void) error = device_power_down(PMSG_FREEZE); if (error) goto Enable_irqs; - save_processor_state(); } else #endif { @@ -1482,7 +1481,6 @@ int kernel_kexec(void) #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { - restore_processor_state(); device_power_up(PMSG_RESTORE); Enable_irqs: local_irq_enable(); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v3 1/7] kexec jump: clean up #ifdef and comments
Move if (kexec_image->preserve_context) { ... } into #ifdef CONFIG_KEXEC_JUMP to make code looks cleaner. Fix no longer correct comments of kernel_kexec(). Signed-off-by: Huang Ying <[EMAIL PROTECTED]> Acked-by: Vivek Goyal <[EMAIL PROTECTED]> --- kernel/kexec.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1426,11 +1426,9 @@ static int __init crash_save_vmcoreinfo_ module_init(crash_save_vmcoreinfo_init) -/** - * kernel_kexec - reboot the system - * - * Move into place and start executing a preloaded standalone - * executable. If nothing was preloaded return an error. +/* + * Move into place and start executing a preloaded standalone + * executable. If nothing was preloaded return an error. */ int kernel_kexec(void) { @@ -1443,8 +1441,8 @@ int kernel_kexec(void) goto Unlock; } - if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) { mutex_lock(&pm_mutex); pm_prepare_console(); error = freeze_processes(); @@ -1471,8 +1469,9 @@ int kernel_kexec(void) if (error) goto Enable_irqs; save_processor_state(); + } else #endif - } else { + { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL); system_state = SYSTEM_RESTART; @@ -1484,8 +1483,8 @@ int kernel_kexec(void) machine_kexec(kexec_image); - if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) { restore_processor_state(); device_power_up(PMSG_RESTORE); Enable_irqs: @@ -1499,8 +1498,8 @@ int kernel_kexec(void) Restore_console: pm_restore_console(); mutex_unlock(&pm_mutex); -#endif } +#endif Unlock: xchg(&kexec_lock, 0); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore
Hi, Vivek, On Mon, 2008-08-11 at 09:51 -0400, Vivek Goyal wrote: [...] > So you want to use a non-locked version from optimization point of view? > So that we don't end up taking and release a lock? Not from optimization point of view. machine_kexec() may be called from crash_kexec(), where it is not permitted to take and release a lock. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 6/8] kexec jump: fix for lockdep
On Mon, 2008-08-11 at 08:09 +0200, Peter Zijlstra wrote: > On Mon, 2008-08-11 at 08:59 +0800, Huang Ying wrote: > > On Fri, 2008-08-08 at 12:13 +0200, Peter Zijlstra wrote: > > > On Fri, 2008-08-08 at 14:52 +0800, Huang Ying wrote: > > > > Replace local_irq_disable() with raw_local_irq_disable() to prevent > > > > lockdep complain. > > > Uhhm, please provide more information - just using raw_* to silence > > > lockdep is generally the wrong thing to do. > > > > In traditional kexec, the new kernel will replace current one, so the > > irq is simply disabled. But now jumping back from kexeced kernel is > > supported, so the irq should be enabled again. > > > > The code sequence of irq during kexec jump is as follow: > > > > local_irq_disable(); /* in kernel_kexec() */ > > local_irq_disable(); /* in machine_kexec() */ > > local_irq_enable(); /* in kernel_kexec() */ > > > > The disable and enable is not match. Maybe another method is to use > > local_irq_save(), local_irq_restore() pair in machine_kexec(), so the > > disable and enable is matched. > > And its the machine kernel's lockdep instance that goes complain? > > whichever annotation gets used - and I think I can agree that raw_* > might be approriate there, this should be accompanied with a rather > elaborate changelog and preferably a comment in the code too. Without > such we'll be wondering in the years to come WTH happens here. Sorry, I find there is no complain from lockdep. Un-paired irq disable/enable has no problem with lockdep, just increase something such as "redundant_hardirqs_off". Please ignore this thread. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore
Hi, Steven, On Fri, 2008-08-08 at 10:30 -0400, Steven Rostedt wrote: [...] > The only problem with this approach is what happens if the user changes > the enabled in between these two calls. This would make ftrace > inconsistent. > > I have a patch from the -rt tree that handles what you want. It is > attached below. Not sure how well it will apply to mainline. > > I really need to go through the rt patch set and start submitting a bunch > of clean-up/fixes to mainline. We've been meaning to do it, just have been > distracted :-( Your version is better in general sense. Thank you very much! But in this specific situation of kexec/kjump. The execution environment is that other CPUs are disabled, local irq is disabled, and it is not permitted to switch to other process. But it is safe and sufficient to use non-locked version here. So to satisfy both demands, I think it is better to provide both version, locked and non-locked. What do you think about that? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 3/8] kexec jump: check code size in control page
Hi, Vivek, On Fri, 2008-08-08 at 10:09 -0400, Vivek Goyal wrote: [...] > > --- a/arch/x86/kernel/relocate_kernel_32.S > > +++ b/arch/x86/kernel/relocate_kernel_32.S > > @@ -20,10 +20,11 @@ > > #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) > > #define PAE_PGD_ATTR (_PAGE_PRESENT) > > > > -/* control_page + PAGE_SIZE/2 ~ control_page + PAGE_SIZE * 3/4 are > > - * used to save some data for jumping back > > +/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE > > + * ~ control_page + PAGE_SIZE * 3/4 are used to save some data for > > + * jumping back > > */ > > Hi Huang, > > Above comment is not very clear. Can you please elaborate it. I thought > that PAGE_SIZE/2 is used for control code and rest half is shared between > kjump data and stack. What is PAGE_SIZE *3/4? Yes. Rest half is shared between kjump data and stack. I will change it. > > +++ b/arch/x86/kernel/vmlinux_check_32.lds.S > > @@ -0,0 +1,7 @@ > > +/* > > + * Link time checks > > + */ > > + > > +#include > > + > > +ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE, > "kexec control code size is too big") > > Will it make sense to move it into vmlinux_32.lds.S itself? Creating a > separate > file for a single check seems superfluous. I hope other ones can use it. But for now, put it in vmlinux_32.lds.S is better. I will change it. Best Regards, HUang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -v2 6/8] kexec jump: fix for lockdep
On Fri, 2008-08-08 at 12:13 +0200, Peter Zijlstra wrote: > On Fri, 2008-08-08 at 14:52 +0800, Huang Ying wrote: > > Replace local_irq_disable() with raw_local_irq_disable() to prevent > > lockdep complain. > Uhhm, please provide more information - just using raw_* to silence > lockdep is generally the wrong thing to do. In traditional kexec, the new kernel will replace current one, so the irq is simply disabled. But now jumping back from kexeced kernel is supported, so the irq should be enabled again. The code sequence of irq during kexec jump is as follow: local_irq_disable(); /* in kernel_kexec() */ local_irq_disable(); /* in machine_kexec() */ local_irq_enable(); /* in kernel_kexec() */ The disable and enable is not match. Maybe another method is to use local_irq_save(), local_irq_restore() pair in machine_kexec(), so the disable and enable is matched. Best Regrards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 6/8] kexec jump: fix for lockdep
Replace local_irq_disable() with raw_local_irq_disable() to prevent lockdep complain. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -123,7 +123,7 @@ void machine_kexec(struct kimage *image) tracer_disable(); /* Interrupts aren't acceptable while we reboot */ - local_irq_disable(); + raw_local_irq_disable(); if (image->preserve_context) { #ifdef CONFIG_X86_IO_APIC ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 8/8] kexec jump: fix for ftrace
Ftrace depends on some processor state that we destroyed during kexec and restored by restore_processor_state(). So save_processor_state() and restore_processor_state() are moved into machine_kexec() and ftrace is restored after restore_processor_state(). Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c | 16 +++- kernel/kexec.c |2 -- 2 files changed, 15 insertions(+), 3 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -113,6 +114,7 @@ void machine_kexec(struct kimage *image) { unsigned long page_list[PAGES_NR]; void *control_page; + int save_ftrace_enabled; asmlinkage unsigned long (*relocate_kernel_ptr)(unsigned long indirection_page, unsigned long control_page, @@ -120,7 +122,12 @@ void machine_kexec(struct kimage *image) unsigned int has_pae, unsigned int preserve_context); - tracer_disable(); +#ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) + save_processor_state(); +#endif + + save_ftrace_enabled = ftrace_enabled_save(); /* Interrupts aren't acceptable while we reboot */ raw_local_irq_disable(); @@ -178,6 +185,13 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, cpu_has_pae, image->preserve_context); + +#ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) + restore_processor_state(); +#endif + + ftrace_enabled_restore(save_ftrace_enabled); } void arch_crash_save_vmcoreinfo(void) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1469,7 +1469,6 @@ int kernel_kexec(void) error = device_power_down(PMSG_FREEZE); if (error) goto Enable_irqs; - save_processor_state(); } else #endif { @@ -1482,7 +1481,6 @@ int kernel_kexec(void) #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { - restore_processor_state(); device_power_up(PMSG_RESTORE); Enable_irqs: local_irq_enable(); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 5/8] kexec jump: in sync with hibernation implementation
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() in sync with current hibernation implementation. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kernel/kexec.c |2 ++ 1 file changed, 2 insertions(+) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1457,6 +1457,7 @@ int kernel_kexec(void) error = disable_nonboot_cpus(); if (error) goto Resume_devices; + device_pm_lock(); local_irq_disable(); /* At this point, device_suspend() has been called, * but *not* device_power_down(). We *must* @@ -1485,6 +1486,7 @@ int kernel_kexec(void) device_power_up(PMSG_RESTORE); Enable_irqs: local_irq_enable(); + device_pm_unlock(); enable_nonboot_cpus(); Resume_devices: device_resume(PMSG_RESTORE); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 1/8] kexec jump: clean up #ifdef and comments
Move if (kexec_image->preserve_context) { ... } into #ifdef CONFIG_KEXEC_JUMP to make code looks cleaner. Fix no longer correct comments of kernel_kexec(). Signed-off-by: Huang Ying <[EMAIL PROTECTED]> Acked-by: Vivek Goyal <[EMAIL PROTECTED]> --- kernel/kexec.c | 17 - 1 file changed, 8 insertions(+), 9 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1426,11 +1426,9 @@ static int __init crash_save_vmcoreinfo_ module_init(crash_save_vmcoreinfo_init) -/** - * kernel_kexec - reboot the system - * - * Move into place and start executing a preloaded standalone - * executable. If nothing was preloaded return an error. +/* + * Move into place and start executing a preloaded standalone + * executable. If nothing was preloaded return an error. */ int kernel_kexec(void) { @@ -1443,8 +1441,8 @@ int kernel_kexec(void) goto Unlock; } - if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) { mutex_lock(&pm_mutex); pm_prepare_console(); error = freeze_processes(); @@ -1471,8 +1469,9 @@ int kernel_kexec(void) if (error) goto Enable_irqs; save_processor_state(); + } else #endif - } else { + { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL); system_state = SYSTEM_RESTART; @@ -1484,8 +1483,8 @@ int kernel_kexec(void) machine_kexec(kexec_image); - if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) { restore_processor_state(); device_power_up(PMSG_RESTORE); Enable_irqs: @@ -1499,8 +1498,8 @@ int kernel_kexec(void) Restore_console: pm_restore_console(); mutex_unlock(&pm_mutex); -#endif } +#endif Unlock: xchg(&kexec_lock, 0); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 0/8] kexec jump: fixes for 2.6.27
Hi, This patchset fixes some issues of kexec jump for 2.6.27. It is based on 2.6.27-rc2 and has been tested on i386 platform. ChangeLog: v2: - Check control code size at link time instead of run time. - Encapsulate ftrace related code into functions. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 7/8] kexec jump: ftrace_enabled_save/restore
Add ftrace_enabled_save/restore, used to disable ftrace for a while. This is used by kexec jump. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- include/linux/ftrace.h | 18 ++ 1 file changed, 18 insertions(+) --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -98,6 +98,24 @@ static inline void tracer_disable(void) #endif } +static inline int ftrace_enabled_save(void) +{ +#ifdef CONFIG_FTRACE + int saved_ftrace_enabled = ftrace_enabled; + ftrace_enabled = 0; + return saved_ftrace_enabled; +#else + return 0; +#endif +} + +static inline void ftrace_enabled_restore(int enabled) +{ +#ifdef CONFIG_FTRACE + ftrace_enabled = enabled; +#endif +} + #ifdef CONFIG_FRAME_POINTER /* TODO: need to fix this for ARM */ # define CALLER_ADDR0 ((unsigned long)__builtin_return_address(0)) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 4/8] kexec jump: remove duplication of kexec_restart_prepare()
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the code. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> Acked-by: Pavel Machek <[EMAIL PROTECTED]> Acked-by: Vivek Goyal <[EMAIL PROTECTED]> --- include/linux/reboot.h |1 + kernel/kexec.c |6 +- kernel/sys.c |2 +- 3 files changed, 3 insertions(+), 6 deletions(-) --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -59,6 +59,7 @@ extern void machine_crash_shutdown(struc * Architecture independent implemenations of sys_reboot commands. */ +extern void kernel_restart_prepare(char *cmd); extern void kernel_restart(char *cmd); extern void kernel_halt(void); extern void kernel_power_off(void); --- a/kernel/sys.c +++ b/kernel/sys.c @@ -274,7 +274,7 @@ void emergency_restart(void) } EXPORT_SYMBOL_GPL(emergency_restart); -static void kernel_restart_prepare(char *cmd) +void kernel_restart_prepare(char *cmd) { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); system_state = SYSTEM_RESTART; --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1472,11 +1472,7 @@ int kernel_kexec(void) } else #endif { - blocking_notifier_call_chain(&reboot_notifier_list, -SYS_RESTART, NULL); - system_state = SYSTEM_RESTART; - device_shutdown(); - sysdev_shutdown(); + kernel_restart_prepare(NULL); printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown(); } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 2/8] kexec jump: rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE
Rename KEXEC_CONTROL_CODE_SIZE to KEXEC_CONTROL_PAGE_SIZE, because control page is used for not only code on some platform. For example in kexec jump, it is used for data and stack too. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/arm/include/asm/kexec.h |2 +- arch/ia64/include/asm/kexec.h|2 +- arch/powerpc/include/asm/kexec.h |2 +- arch/s390/include/asm/kexec.h|2 +- arch/sh/include/asm/kexec.h |2 +- include/asm-mips/kexec.h |2 +- include/asm-x86/kexec.h |4 ++-- include/linux/kexec.h|4 ++-- kernel/kexec.c |4 ++-- 9 files changed, 12 insertions(+), 12 deletions(-) --- a/include/asm-x86/kexec.h +++ b/include/asm-x86/kexec.h @@ -63,7 +63,7 @@ /* Maximum address we can use for the control code buffer */ # define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE -# define KEXEC_CONTROL_CODE_SIZE 4096 +# define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_386 @@ -79,7 +79,7 @@ # define KEXEC_CONTROL_MEMORY_LIMIT (0xFFUL) /* Allocate one page for the pdp and the second for the code */ -# define KEXEC_CONTROL_CODE_SIZE (4096UL + 4096UL) +# define KEXEC_CONTROL_PAGE_SIZE (4096UL + 4096UL) /* The native architecture */ # define KEXEC_ARCH KEXEC_ARCH_X86_64 --- a/arch/sh/include/asm/kexec.h +++ b/arch/sh/include/asm/kexec.h @@ -21,7 +21,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE -#define KEXEC_CONTROL_CODE_SIZE4096 +#define KEXEC_CONTROL_PAGE_SIZE4096 /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_SH --- a/arch/powerpc/include/asm/kexec.h +++ b/arch/powerpc/include/asm/kexec.h @@ -22,7 +22,7 @@ #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE #endif -#define KEXEC_CONTROL_CODE_SIZE 4096 +#define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ #ifdef __powerpc64__ --- a/arch/ia64/include/asm/kexec.h +++ b/arch/ia64/include/asm/kexec.h @@ -9,7 +9,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT TASK_SIZE -#define KEXEC_CONTROL_CODE_SIZE (8192 + 8192 + 4096) +#define KEXEC_CONTROL_PAGE_SIZE (8192 + 8192 + 4096) /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_IA_64 --- a/arch/s390/include/asm/kexec.h +++ b/arch/s390/include/asm/kexec.h @@ -31,7 +31,7 @@ #define KEXEC_CONTROL_MEMORY_LIMIT (1UL<<31) /* Allocate one page for the pdp and the second for the code */ -#define KEXEC_CONTROL_CODE_SIZE 4096 +#define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_S390 --- a/arch/arm/include/asm/kexec.h +++ b/arch/arm/include/asm/kexec.h @@ -10,7 +10,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT (-1UL) -#define KEXEC_CONTROL_CODE_SIZE4096 +#define KEXEC_CONTROL_PAGE_SIZE4096 #define KEXEC_ARCH KEXEC_ARCH_ARM --- a/include/asm-mips/kexec.h +++ b/include/asm-mips/kexec.h @@ -16,7 +16,7 @@ /* Maximum address we can use for the control code buffer */ #define KEXEC_CONTROL_MEMORY_LIMIT (0x2000) -#define KEXEC_CONTROL_CODE_SIZE 4096 +#define KEXEC_CONTROL_PAGE_SIZE 4096 /* The native architecture */ #define KEXEC_ARCH KEXEC_ARCH_MIPS --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -242,7 +242,7 @@ static int kimage_normal_alloc(struct ki */ result = -ENOMEM; image->control_code_page = kimage_alloc_control_pages(image, - get_order(KEXEC_CONTROL_CODE_SIZE)); + get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { printk(KERN_ERR "Could not allocate control_code_buffer\n"); goto out; @@ -317,7 +317,7 @@ static int kimage_crash_alloc(struct kim */ result = -ENOMEM; image->control_code_page = kimage_alloc_control_pages(image, - get_order(KEXEC_CONTROL_CODE_SIZE)); + get_order(KEXEC_CONTROL_PAGE_SIZE)); if (!image->control_code_page) { printk(KERN_ERR "Could not allocate control_code_buffer\n"); goto out; --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -25,8 +25,8 @@ #error KEXEC_CONTROL_MEMORY_LIMIT not defined #endif -#ifndef KEXEC_CONTROL_CODE_SIZE -#error KEXEC_CONTROL_CODE_SIZE not defined +#ifndef KEXEC_CONTROL_PAGE_SIZE +#error KEXEC_CONTROL_PAGE_SIZE not defined #endif #ifndef KEXEC_ARCH ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -v2 3/8] kexec jump: check code size in control page
Kexec/Kexec-jump require code size in control page is less than PAGE_SIZE/2. This patch add link-time checking for this. ASSERT() of ld link script is used as the link-time checking mechanism. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c |2 +- arch/x86/kernel/relocate_kernel_32.S | 10 +++--- arch/x86/kernel/vmlinux_32.lds.S |2 ++ arch/x86/kernel/vmlinux_check_32.lds.S |7 +++ include/asm-x86/kexec.h|4 5 files changed, 21 insertions(+), 4 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -138,7 +138,7 @@ void machine_kexec(struct kimage *image) } control_page = page_address(image->control_code_page); - memcpy(control_page, relocate_kernel, PAGE_SIZE/2); + memcpy(control_page, relocate_kernel, KEXEC_CONTROL_CODE_MAX_SIZE); relocate_kernel_ptr = control_page; page_list[PA_CONTROL_PAGE] = __pa(control_page); --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -20,10 +20,11 @@ #define PAGE_ATTR (_PAGE_PRESENT | _PAGE_RW | _PAGE_ACCESSED | _PAGE_DIRTY) #define PAE_PGD_ATTR (_PAGE_PRESENT) -/* control_page + PAGE_SIZE/2 ~ control_page + PAGE_SIZE * 3/4 are - * used to save some data for jumping back +/* control_page + KEXEC_CONTROL_CODE_MAX_SIZE + * ~ control_page + PAGE_SIZE * 3/4 are used to save some data for + * jumping back */ -#define DATA(offset) (PAGE_SIZE/2+(offset)) +#define DATA(offset) (KEXEC_CONTROL_CODE_MAX_SIZE+(offset)) /* Minimal CPU state */ #define ESPDATA(0x0) @@ -376,3 +377,6 @@ swap_pages: popl%ebx popl%ebp ret + + .globl kexec_control_code_size +.set kexec_control_code_size, . - relocate_kernel --- a/include/asm-x86/kexec.h +++ b/include/asm-x86/kexec.h @@ -41,6 +41,10 @@ # define PAGES_NR 17 #endif +#ifdef CONFIG_X86_32 +# define KEXEC_CONTROL_CODE_MAX_SIZE 2048 +#endif + #ifndef __ASSEMBLY__ #include --- a/arch/x86/kernel/vmlinux_32.lds.S +++ b/arch/x86/kernel/vmlinux_32.lds.S @@ -209,3 +209,5 @@ SECTIONS DWARF_DEBUG } + +#include "vmlinux_check_32.lds.S" --- /dev/null +++ b/arch/x86/kernel/vmlinux_check_32.lds.S @@ -0,0 +1,7 @@ +/* + * Link time checks + */ + +#include + +ASSERT(kexec_control_code_size <= KEXEC_CONTROL_CODE_MAX_SIZE, "kexec control code size is too big") ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 6/6] kexec jump: fix for ftrace
On Thu, 2008-08-07 at 09:38 -0400, Vivek Goyal wrote: [...] > What kind of problem we run into if we don't disable the ftracer? > > I think there are too many #ifdefs now and probably we can at least > get rid if #ifdef CONFIG_FTRACE thing. > > I think ftracer needs to export the function to enable the tracer > back (tracer_enable()) so that we don't directly play with ftrace_enabled > variable. tracer_enable() can be do {} while{0} in case of CONFIG_FTRACE=n > so that we can get rid of #ifdefs here. The ftracer issue for kexec is reported by Dhaval Giani and fixed by Ingo as in following thread: http://lkml.org/lkml/2008/2/19/175 After some testing, I found that if we enable ftrace before restore_processor_state(), system will hang. I think maybe ftracer depends on some processor state that we destroyed during kexec and restored by restore_processor_state(). So I move save_processor_state() and restore_processor_state() into machine_kexec() and enable ftrace after restore_processor_state(). The #ifdef CONFIG_FTRACE should be removed. I think an interface like irq_save/restore is good for this. saved_ftrace_enabled = ftrace_save_enabled() <...> ftrace_restore_enabled(saved_ftrace_enabled) Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 2/6] kexec jump: check code size in control page
On Thu, 2008-08-07 at 22:31 +0200, Pavel Machek wrote: > Hi! > > > > PAGE_SIZE/2. This patch adds runtime checking for this. > > > > > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > ... > > > > { > > > if (nx_enabled) > > > set_pages_x(image->control_code_page, 1); > > > + > > > + BUG_ON((unsigned long)kexec_control_page_code_end - \ > > > +(unsigned long)relocate_kernel >= PAGE_SIZE/2); > > > + > > > > > Run time check is better than nothing but I think in this case it would > > be better if we can catch it at compile time. > > > > One of the methods will be to write a small program of your own and > > put in script/ and at build time check for the size and flag error. May > > be there are other better ways to do this. > > BUILD_BUG_ON()? I tried with BUILD_BUG_ON(), and compiling is OK for both of following statement: BUILD_BUG_ON((unsigned long)kexec_control_page_code_end - \ (unsigned long)relocate_kernel >= PAGE_SIZE/2); BUILD_BUG_ON((unsigned long)kexec_control_page_code_end - \ (unsigned long)relocate_kernel < PAGE_SIZE/2); In general, I think value of kexec_control_page_code_end and relocate_kernel is not determined during compiling time. So BUILD_BUG_ON() doesn't work. Another idea, use ASSERT() command of ld link script as in the following patch: --- a/arch/x86/kernel/vmlinux_32.lds.S +++ b/arch/x86/kernel/vmlinux_32.lds.S @@ -209,3 +209,5 @@ SECTIONS DWARF_DEBUG } + +#include "vmlinux_check_32.lds.S" --- /dev/null +++ b/arch/x86/kernel/vmlinux_check_32.lds.S @@ -0,0 +1,3 @@ +#include + +ASSERT(kexec_control_page_code_end - relocate_kernel >= 2048, "kexec control page code size is too big") It works for me. What do you think about that? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 4/6] kexec jump: in sync with hibernation implementation
On Thu, 2008-08-07 at 11:22 +0200, Pavel Machek wrote: > > Add device_pm_lock() and device_pm_unlock() in kernel_kexec() to be > > in sync with current hibernation implementation. > > > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > > > > --- > > kernel/kexec.c |2 ++ > > 1 file changed, 2 insertions(+) > > > > --- a/kernel/kexec.c > > +++ b/kernel/kexec.c > > @@ -1457,6 +1457,7 @@ int kernel_kexec(void) > > error = disable_nonboot_cpus(); > > if (error) > > goto Resume_devices; > > + device_pm_lock(); > > local_irq_disable(); > > /* At this point, device_suspend() has been called, > > * but *not* device_power_down(). We *must* > > @@ -1485,6 +1486,7 @@ int kernel_kexec(void) > > device_power_up(PMSG_RESTORE); > > Enable_irqs: > > local_irq_enable(); > > + device_pm_unlock(); > > enable_nonboot_cpus(); > > Resume_devices: > > device_resume(PMSG_RESTORE); > > > > Would it be possible to create common function for hibernation and > kexec? Keeping complex stuff like this in sync is ugly. Yes, it is ugly. But it is a little difficult to do that. Hibernation one is more complex than this one. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH 1/6] kexec jump: clean up #ifdef and comments
On Thu, 2008-08-07 at 11:20 +0200, Pavel Machek wrote: > Hi! > > > CONFIG_KEXEC_JUMP to make code looks cleaner. > > > > Fix no longer correct comments of kernel_kexec(). > > > > Signed-off-by: Huang Ying <[EMAIL PROTECTED]> > > > > --- > > kernel/kexec.c | 11 +-- > > 1 file changed, 5 insertions(+), 6 deletions(-) > > > > --- a/kernel/kexec.c > > +++ b/kernel/kexec.c > > @@ -1427,8 +1427,6 @@ static int __init crash_save_vmcoreinfo_ > > module_init(crash_save_vmcoreinfo_init) > > > > /** > > - * kernel_kexec - reboot the system > > - * > > * Move into place and start executing a preloaded standalone > > * executable. If nothing was preloaded return an error. > > */ > > If it is not kerneldoc, it should not be /** . OK. I will fix it. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/6] kexec jump: clean up #ifdef and comments
Move if (kexec_image->preserve_context) { ... } into #ifdef CONFIG_KEXEC_JUMP to make code looks cleaner. Fix no longer correct comments of kernel_kexec(). Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kernel/kexec.c | 11 +-- 1 file changed, 5 insertions(+), 6 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1427,8 +1427,6 @@ static int __init crash_save_vmcoreinfo_ module_init(crash_save_vmcoreinfo_init) /** - * kernel_kexec - reboot the system - * * Move into place and start executing a preloaded standalone * executable. If nothing was preloaded return an error. */ @@ -1443,8 +1441,8 @@ int kernel_kexec(void) goto Unlock; } - if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) { mutex_lock(&pm_mutex); pm_prepare_console(); error = freeze_processes(); @@ -1471,8 +1469,9 @@ int kernel_kexec(void) if (error) goto Enable_irqs; save_processor_state(); + } else #endif - } else { + { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, NULL); system_state = SYSTEM_RESTART; @@ -1484,8 +1483,8 @@ int kernel_kexec(void) machine_kexec(kexec_image); - if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) { restore_processor_state(); device_power_up(PMSG_RESTORE); Enable_irqs: @@ -1499,8 +1498,8 @@ int kernel_kexec(void) Restore_console: pm_restore_console(); mutex_unlock(&pm_mutex); -#endif } +#endif Unlock: xchg(&kexec_lock, 0); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 4/6] kexec jump: in sync with hibernation implementation
Add device_pm_lock() and device_pm_unlock() in kernel_kexec() to be in sync with current hibernation implementation. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- kernel/kexec.c |2 ++ 1 file changed, 2 insertions(+) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1457,6 +1457,7 @@ int kernel_kexec(void) error = disable_nonboot_cpus(); if (error) goto Resume_devices; + device_pm_lock(); local_irq_disable(); /* At this point, device_suspend() has been called, * but *not* device_power_down(). We *must* @@ -1485,6 +1486,7 @@ int kernel_kexec(void) device_power_up(PMSG_RESTORE); Enable_irqs: local_irq_enable(); + device_pm_unlock(); enable_nonboot_cpus(); Resume_devices: device_resume(PMSG_RESTORE); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 3/6] kexec jump: remove duplication of kexec_restart_prepare()
Call kernel_restart_prepare() in kernel_kexec() instead of duplicating the code. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- include/linux/reboot.h |1 + kernel/kexec.c |6 +- kernel/sys.c |2 +- 3 files changed, 3 insertions(+), 6 deletions(-) --- a/include/linux/reboot.h +++ b/include/linux/reboot.h @@ -59,6 +59,7 @@ extern void machine_crash_shutdown(struc * Architecture independent implemenations of sys_reboot commands. */ +extern void kernel_restart_prepare(char *cmd); extern void kernel_restart(char *cmd); extern void kernel_halt(void); extern void kernel_power_off(void); --- a/kernel/sys.c +++ b/kernel/sys.c @@ -274,7 +274,7 @@ void emergency_restart(void) } EXPORT_SYMBOL_GPL(emergency_restart); -static void kernel_restart_prepare(char *cmd) +void kernel_restart_prepare(char *cmd) { blocking_notifier_call_chain(&reboot_notifier_list, SYS_RESTART, cmd); system_state = SYSTEM_RESTART; --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1472,11 +1472,7 @@ int kernel_kexec(void) } else #endif { - blocking_notifier_call_chain(&reboot_notifier_list, -SYS_RESTART, NULL); - system_state = SYSTEM_RESTART; - device_shutdown(); - sysdev_shutdown(); + kernel_restart_prepare(NULL); printk(KERN_EMERG "Starting new kernel\n"); machine_shutdown(); } ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 5/6] kexec jump: fix for lockdep
Replace local_irq_disable() with raw_local_irq_disable() to prevent lockdep complain. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c |2 +- 1 file changed, 1 insertion(+), 1 deletion(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -130,7 +130,7 @@ void machine_kexec(struct kimage *image) #endif /* Interrupts aren't acceptable while we reboot */ - local_irq_disable(); + raw_local_irq_disable(); if (image->preserve_context) { #ifdef CONFIG_X86_IO_APIC ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 2/6] kexec jump: check code size in control page
Kexec/Kexec-jump requires code size in control page is less than PAGE_SIZE/2. This patch adds runtime checking for this. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c |4 arch/x86/kernel/relocate_kernel_32.S |3 +++ include/asm-x86/kexec.h |1 + 3 files changed, 8 insertions(+) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -92,6 +92,10 @@ int machine_kexec_prepare(struct kimage { if (nx_enabled) set_pages_x(image->control_code_page, 1); + + BUG_ON((unsigned long)kexec_control_page_code_end - \ + (unsigned long)relocate_kernel >= PAGE_SIZE/2); + return 0; } --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -376,3 +376,6 @@ swap_pages: popl%ebx popl%ebp ret + + .globl kexec_control_page_code_end +kexec_control_page_code_end: --- a/include/asm-x86/kexec.h +++ b/include/asm-x86/kexec.h @@ -159,6 +159,7 @@ relocate_kernel(unsigned long indirectio unsigned long start_address, unsigned int has_pae, unsigned int preserve_context); +void kexec_control_page_code_end(void); #else NORET_TYPE void relocate_kernel(unsigned long indirection_page, ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 0/6] kexec jump: fixes for 2.6.27
Hi, This patchset fixes some issues of kexec jump for 2.6.27. It is based on 2.6.27-rc2 and has been tested on i386 platform. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 6/6] kexec jump: fix for ftrace
Restore ftrace after jumping back from kexeced kernel. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c | 19 +++ kernel/kexec.c |2 -- 2 files changed, 19 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -117,6 +118,7 @@ void machine_kexec(struct kimage *image) { unsigned long page_list[PAGES_NR]; void *control_page; + int save_ftrace_enabled; asmlinkage unsigned long (*relocate_kernel_ptr)(unsigned long indirection_page, unsigned long control_page, @@ -124,7 +126,15 @@ void machine_kexec(struct kimage *image) unsigned int has_pae, unsigned int preserve_context); +#ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) + save_processor_state(); +#endif + +#ifdef CONFIG_FTRACE + save_ftrace_enabled = ftrace_enabled; tracer_disable(); +#endif /* Interrupts aren't acceptable while we reboot */ raw_local_irq_disable(); @@ -182,6 +192,15 @@ void machine_kexec(struct kimage *image) (unsigned long)page_list, image->start, cpu_has_pae, image->preserve_context); + +#ifdef CONFIG_KEXEC_JUMP + if (kexec_image->preserve_context) + restore_processor_state(); +#endif + +#ifdef CONFIG_FTRACE + ftrace_enabled = save_ftrace_enabled; +#endif } void arch_crash_save_vmcoreinfo(void) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1469,7 +1469,6 @@ int kernel_kexec(void) error = device_power_down(PMSG_FREEZE); if (error) goto Enable_irqs; - save_processor_state(); } else #endif { @@ -1482,7 +1481,6 @@ int kernel_kexec(void) #ifdef CONFIG_KEXEC_JUMP if (kexec_image->preserve_context) { - restore_processor_state(); device_power_up(PMSG_RESTORE); Enable_irqs: local_irq_enable(); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 2/2] kexec jump -v12: save/restore device state
Hi, Vivek, On Mon, 2008-07-14 at 09:48 -0400, Vivek Goyal wrote: [...] > You have cited various possible use cases of this patchset. Which is > the specific feature you are planning to use? I think two features are useful: 1. Kexec based hibernation 2. Do kdump then continue > Thinking more about it, what's the compelling feature out of this list > which makes this patchset a strong candidate for inclusion? > > Regarding hibernation, Rafael does not think this is the way to go > for future. I think kexec based hibernation can be a better hibernation scheme, at least more code sharing. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump
On Sat, Jul 12, 2008 at 3:21 AM, Andrew Morton <[EMAIL PROTECTED]> wrote: > On Tue, 8 Jul 2008 10:50:51 -0400 Vivek Goyal <[EMAIL PROTECTED]> wrote: > >> On Mon, Jul 07, 2008 at 11:25:22AM +0800, Huang Ying wrote: >> > This patch provides an enhancement to kexec/kdump. It implements >> > the following features: >> > >> > - Backup/restore memory used by the original kernel before/after >> > kexec. >> > >> > - Save/restore CPU state before/after kexec. >> > >> >> Hi Huang, >> >> In general this patch set looks good enough to live in -mm and >> get some testing going. >> >> To me, adding capability to return back to original kernel looks >> like a logical extension to kexec functionality. > > Exciting ;) It's much less code than I expected. > > I don't think I understand the feature any more. Once upon a time we > thought that this might become a new and better (or at least > better-code-sharing) way of doing suspend-to-disk. How far are we from > that? At least there are still issues as follow: - We need a mechanism to pass some information (such as backup pages map) from hibernated kernel to hibernating kernel. Maybe in C calling convention. - To load hibernation image via /sbin/kexec, the segment number constraint of sys_kexec_load needs to be extended (maybe via multi-stage loading). - Make kexec based hibernation compatible with ACPI S4. - Extend makedumpfile utility for kexec based hibernation. > What are the prospects of supporting other architectures? I will work on x86_64 supporting. > Who maintains kexec-tools, and are they OK with merging up the > corresponding changes? I will work with kexec-tools mailing list for corresponding kexec-tools patches. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump
Hi, Pavel, On Tue, 2008-07-08 at 12:40 +0200, Pavel Machek wrote: > Hi! > > > > > @@ -1411,3 +1421,50 @@ static int __init crash_save_vmcoreinfo_ > > > > } > > > > > > > > module_init(crash_save_vmcoreinfo_init) > > > > + > > > > +/** > > > > + * kernel_kexec - reboot the system > > > > > Really? > > > > I will change the comments to reflect the changes to kernel_kexec. > > > > > > + * Move into place and start executing a preloaded standalone > > > > + * executable. If nothing was preloaded return an error. > > > > + */ > > > > +int kernel_kexec(void) > > > > +{ > > > > + int error = 0; > > > > + > > > > + if (xchg(&kexec_lock, 1)) > > > > + return -EBUSY; > > > > > > That's quite a strange way to provide a lock. mutex_trylock? > > > > I think this is because kexec_lock is used by crash_kexec() too, which > > may be called in some extreme environment, such as during panic(). > > > > > > + if (!kexec_image) { > > > > + error = -EINVAL; > > > > + goto Unlock; > > > > + } > > > > + > > > > + if (kexec_image->preserve_context) { > > > > +#ifdef CONFIG_KEXEC_JUMP > > > > + local_irq_disable(); > > > > + save_processor_state(); > > > > > > #else > > > BUG() > > > > > > ...because otherwise you silently do nothing? > > > > > > > +#endif > > > > If CONFIG_KEXEC_JUMP is defined, kexec_image->preserve_context will > > always be 0. So current code is safe. Here, #ifdef is used to resolve > > the dependency issue. For example, save_processor_state() may be > > undefined if CONFIG_KEXEC_JUMP is not defined. > > Move the #ifdef outside the if (), then, so this is clear? I think this is reasonable, I will do it. > Actually, if preserve_context is always zero in !KEXEC_JUMP case, it > might make sense to remove whole variable... I think this will add too many #ifndef CONFIG_KEXEC_JUMP ... #endif that is necessary. The memory and performance gain is too little to compensate the code readability reduction. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump
On Tue, 2008-07-08 at 10:50 -0400, Vivek Goyal wrote: > On Mon, Jul 07, 2008 at 11:25:22AM +0800, Huang Ying wrote: > > This patch provides an enhancement to kexec/kdump. It implements > > the following features: > > > > - Backup/restore memory used by the original kernel before/after > > kexec. > > > > - Save/restore CPU state before/after kexec. > > > > Hi Huang, > > In general this patch set looks good enough to live in -mm and > get some testing going. > > To me, adding capability to return back to original kernel looks > like a logical extension to kexec functionality. > > Acked-by: Vivek Goyal <[EMAIL PROTECTED]> > > Few minor comments inline. Thank you very much! > [..] > > --- a/arch/x86/kernel/machine_kexec_32.c > > +++ b/arch/x86/kernel/machine_kexec_32.c > > @@ -22,6 +22,7 @@ > > #include > > #include > > #include > > +#include > > > > #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) > > static u32 kexec_pgd[1024] PAGE_ALIGNED; > > @@ -85,10 +86,12 @@ static void load_segments(void) > > * reboot code buffer to allow us to avoid allocations > > * later. > > * > > - * Currently nothing. > > + * Make control page executable. > > */ > > int machine_kexec_prepare(struct kimage *image) > > { > > + if (nx_enabled) > > + set_pages_x(image->control_code_page, 1); > > return 0; > > } > > > > @@ -98,16 +101,24 @@ int machine_kexec_prepare(struct kimage > > */ > > void machine_kexec_cleanup(struct kimage *image) > > { > > + if (nx_enabled) > > + set_pages_nx(image->control_code_page, 1); > > } > > > > /* > > * Do not allocate memory (or fail in any way) in machine_kexec(). > > * We are past the point of no return, committed to rebooting now. > > */ > > -NORET_TYPE void machine_kexec(struct kimage *image) > > +void machine_kexec(struct kimage *image) > > { > > unsigned long page_list[PAGES_NR]; > > void *control_page; > > + asmlinkage unsigned long > > + (*relocate_kernel_ptr)(unsigned long indirection_page, > > + unsigned long control_page, > > + unsigned long start_address, > > + unsigned int has_pae, > > + unsigned int preserve_context); > > > > tracer_disable(); > > > > @@ -115,10 +126,11 @@ NORET_TYPE void machine_kexec(struct kim > > local_irq_disable(); > > > > control_page = page_address(image->control_code_page); > > - memcpy(control_page, relocate_kernel, PAGE_SIZE); > > + memcpy(control_page, relocate_kernel, PAGE_SIZE/2); > > > > Is it possible to add either a compile time or run time check > somewhere to make sure code in relocate_kernel.S does not exceed > PAGE_SIZE/2. OK, I will add it. > [..] > > --- a/kernel/kexec.c > > +++ b/kernel/kexec.c > > @@ -24,6 +24,8 @@ > > #include > > #include > > #include > > +#include > > +#include > > > > #include > > #include > > @@ -242,6 +244,12 @@ static int kimage_normal_alloc(struct ki > > goto out; > > } > > > > + image->swap_page = kimage_alloc_control_pages(image, 0); > > + if (!image->swap_page) { > > + printk(KERN_ERR "Could not allocate swap buffer\n"); > > + goto out; > > + } > > + > > result = 0; > > out: > > if (result == 0) > > @@ -986,6 +994,8 @@ asmlinkage long sys_kexec_load(unsigned > > if (result) > > goto out; > > > > + if (flags & KEXEC_PRESERVE_CONTEXT) > > + image->preserve_context = 1; > > result = machine_kexec_prepare(image); > > if (result) > > goto out; > > @@ -1411,3 +1421,50 @@ static int __init crash_save_vmcoreinfo_ > > } > > > > module_init(crash_save_vmcoreinfo_init) > > + > > +/** > > + * kernel_kexec - reboot the system > > + * > > + * Move into place and start executing a preloaded standalone > > + * executable. If nothing was preloaded return an error. > > + */ > > +int kernel_kexec(void) > > +{ > > + int error = 0; > > + > > + if (xchg(&kexec_lock, 1)) > > + return -EBUSY; > > +
Re: [PATCH -mm 1/2] kexec jump -v12: kexec jump
Hi, Pavel, On Mon, 2008-07-07 at 20:50 +0800, Pavel Machek wrote: > Hi! > > The patch looks mostly ok to me. (Perhaps there's time to split it > into smaller chunks?) > > You can add Acked-by: Pavel Machek <[EMAIL PROTECTED]> to it, I guess. Thank you very much! [...] > > @@ -98,16 +101,24 @@ int machine_kexec_prepare(struct kimage > > */ > > void machine_kexec_cleanup(struct kimage *image) > > { > > + if (nx_enabled) > > + set_pages_nx(image->control_code_page, 1); > > } > > , 0 ? (setup and cleanup were same, which is strange). Oh, Yes. That should be 0, I will change it. > > > @@ -1411,3 +1421,50 @@ static int __init crash_save_vmcoreinfo_ > > } > > > > module_init(crash_save_vmcoreinfo_init) > > + > > +/** > > + * kernel_kexec - reboot the system > Really? I will change the comments to reflect the changes to kernel_kexec. > > + * Move into place and start executing a preloaded standalone > > + * executable. If nothing was preloaded return an error. > > + */ > > +int kernel_kexec(void) > > +{ > > + int error = 0; > > + > > + if (xchg(&kexec_lock, 1)) > > + return -EBUSY; > > That's quite a strange way to provide a lock. mutex_trylock? I think this is because kexec_lock is used by crash_kexec() too, which may be called in some extreme environment, such as during panic(). > > + if (!kexec_image) { > > + error = -EINVAL; > > + goto Unlock; > > + } > > + > > + if (kexec_image->preserve_context) { > > +#ifdef CONFIG_KEXEC_JUMP > > + local_irq_disable(); > > + save_processor_state(); > > #else > BUG() > > ...because otherwise you silently do nothing? > > > +#endif > > Pavel If CONFIG_KEXEC_JUMP is defined, kexec_image->preserve_context will always be 0. So current code is safe. Here, #ifdef is used to resolve the dependency issue. For example, save_processor_state() may be undefined if CONFIG_KEXEC_JUMP is not defined. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -mm 1/2] kexec jump -v12: kexec jump
nel. Now, only the i386 architecture is supported. The patchset is based on Linux kernel 2.6.26-rc8-mm1, and has been tested on IBM T42. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/powerpc/kernel/machine_kexec.c |2 arch/sh/kernel/machine_kexec.c |2 arch/x86/Kconfig |7 + arch/x86/kernel/machine_kexec_32.c | 27 - arch/x86/kernel/machine_kexec_64.c |2 arch/x86/kernel/relocate_kernel_32.S | 174 ++- include/asm-x86/kexec.h | 18 ++- include/linux/kexec.h| 17 ++- kernel/kexec.c | 57 +++ kernel/sys.c | 31 +- 10 files changed, 269 insertions(+), 68 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -22,6 +22,7 @@ #include #include #include +#include #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) static u32 kexec_pgd[1024] PAGE_ALIGNED; @@ -85,10 +86,12 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * Make control page executable. */ int machine_kexec_prepare(struct kimage *image) { + if (nx_enabled) + set_pages_x(image->control_code_page, 1); return 0; } @@ -98,16 +101,24 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image) { + if (nx_enabled) + set_pages_nx(image->control_code_page, 1); } /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +void machine_kexec(struct kimage *image) { unsigned long page_list[PAGES_NR]; void *control_page; + asmlinkage unsigned long + (*relocate_kernel_ptr)(unsigned long indirection_page, + unsigned long control_page, + unsigned long start_address, + unsigned int has_pae, + unsigned int preserve_context); tracer_disable(); @@ -115,10 +126,11 @@ NORET_TYPE void machine_kexec(struct kim local_irq_disable(); control_page = page_address(image->control_code_page); - memcpy(control_page, relocate_kernel, PAGE_SIZE); + memcpy(control_page, relocate_kernel, PAGE_SIZE/2); + relocate_kernel_ptr = control_page; page_list[PA_CONTROL_PAGE] = __pa(control_page); - page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; + page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; page_list[PA_PGD] = __pa(kexec_pgd); page_list[VA_PGD] = (unsigned long)kexec_pgd; #ifdef CONFIG_X86_PAE @@ -131,6 +143,7 @@ NORET_TYPE void machine_kexec(struct kim page_list[VA_PTE_0] = (unsigned long)kexec_pte0; page_list[PA_PTE_1] = __pa(kexec_pte1); page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + page_list[PA_SWAP_PAGE] = (page_to_pfn(image->swap_page) << PAGE_SHIFT); /* The segment registers are funny things, they have both a * visible and an invisible part. Whenever the visible part is @@ -149,8 +162,10 @@ NORET_TYPE void machine_kexec(struct kim set_idt(phys_to_virt(0),0); /* now call it */ - relocate_kernel((unsigned long)image->head, (unsigned long)page_list, - image->start, cpu_has_pae); + image->start = relocate_kernel_ptr((unsigned long)image->head, + (unsigned long)page_list, + image->start, cpu_has_pae, + image->preserve_context); } void arch_crash_save_vmcoreinfo(void) --- a/include/linux/kexec.h +++ b/include/linux/kexec.h @@ -83,6 +83,7 @@ struct kimage { unsigned long start; struct page *control_code_page; + struct page *swap_page; unsigned long nr_segments; struct kexec_segment segment[KEXEC_SEGMENT_MAX]; @@ -98,18 +99,20 @@ struct kimage { unsigned int type : 1; #define KEXEC_TYPE_DEFAULT 0 #define KEXEC_TYPE_CRASH 1 + unsigned int preserve_context : 1; }; /* kexec interface functions */ -extern NORET_TYPE void machine_kexec(struct kimage *image) ATTRIB_NORET; +extern void machine_kexec(struct kimage *image); extern int machine_kexec_prepare(struct kimage *image); extern void machine_kexec_cleanup(struct kimage *image); extern asmlinkage long sys_kexec_load(unsigned long entry, unsigned long nr_segments, struct kexec_segment __user *segments, unsigned long flags);
[PATCH -mm 2/2] kexec jump -v12: save/restore device state
iver callback. v10: - Split from original kexec_jump patch. Now, only the i386 architecture is supported. The patchset is based on Linux kernel 2.6.26-rc8-mm1, and has been tested on IBM T42 with ACPI on and off. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/Kconfig |5 ++-- arch/x86/kernel/machine_kexec_32.c | 12 +++ include/linux/suspend.h|2 + kernel/kexec.c | 39 + kernel/power/power.h |2 - 5 files changed, 56 insertions(+), 4 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -26,6 +26,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -1441,7 +1445,31 @@ int kernel_kexec(void) if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP + mutex_lock(&pm_mutex); + pm_prepare_console(); + error = freeze_processes(); + if (error) { + error = -EBUSY; + goto Restore_console; + } + suspend_console(); + error = device_suspend(PMSG_FREEZE); + if (error) + goto Resume_console; + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; local_irq_disable(); + /* At this point, device_suspend() has been called, +* but *not* device_power_down(). We *must* +* device_power_down() now. Otherwise, drivers for +* some devices (e.g. interrupt controllers) become +* desynchronized with the actual state of the +* hardware at resume time, and evil weirdness ensues. +*/ + error = device_power_down(PMSG_FREEZE); + if (error) + goto Enable_irqs; save_processor_state(); #endif } else { @@ -1459,7 +1487,18 @@ int kernel_kexec(void) if (kexec_image->preserve_context) { #ifdef CONFIG_KEXEC_JUMP restore_processor_state(); + device_power_up(PMSG_RESTORE); + Enable_irqs: local_irq_enable(); + enable_nonboot_cpus(); + Resume_devices: + device_resume(PMSG_RESTORE); + Resume_console: + resume_console(); + thaw_processes(); + Restore_console: + pm_restore_console(); + mutex_unlock(&pm_mutex); #endif } --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo extern int pfn_is_nosave(unsigned long); -extern struct mutex pm_mutex; - #define power_attr(_name) \ static struct kobj_attribute _name##_attr = { \ .attr = { \ --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -278,4 +278,6 @@ static inline void register_nosave_regio } #endif +extern struct mutex pm_mutex; + #endif /* _LINUX_SUSPEND_H */ --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -125,6 +125,18 @@ void machine_kexec(struct kimage *image) /* Interrupts aren't acceptable while we reboot */ local_irq_disable(); + if (image->preserve_context) { +#ifdef CONFIG_X86_IO_APIC + /* We need to put APICs in legacy mode so that we can +* get timer interrupts in second kernel. kexec/kdump +* paths already have calls to disable_IO_APIC() in +* one form or other. kexec jump path also need +* one. +*/ + disable_IO_APIC(); +#endif + } + control_page = page_address(image->control_code_page); memcpy(control_page, relocate_kernel, PAGE_SIZE/2); --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1276,9 +1276,10 @@ config CRASH_DUMP config KEXEC_JUMP bool "kexec jump (EXPERIMENTAL)" depends on EXPERIMENTAL - depends on KEXEC && PM_SLEEP && X86_32 + depends on KEXEC && HIBERNATION && X86_32 help - Invoke code in physical address mode via KEXEC + Jump between original kernel and kexeced kernel and invoke + code in physical address mode via KEXEC config PHYSICAL_START hex "Physical address where the kernel is loaded" if (EMBEDDED || CRASH_DUMP) ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 1/2] kexec jump -v11: kexec jump
On Fri, 2008-06-13 at 14:00 -0400, Vivek Goyal wrote: [...] > Ok, I found that in my config CONFIG_HIBERNATION was not enabled. After > enabling CONFIG_HIBERNATION, both suspend to disk and kjump started > working. > > Does that mean there is some dependency on code under CONFIG_HIBERNATION. > If yes, I think this dependency should be resolved during compile time. > May be addtional config option (CONFIG_KEXEC_JUMP), which also selects > the CONFIG_HIBERNATION automatically etc... Yes. kexec jump need to put devices into quiescent state and save devices state into memory, which is implemented by calling hibernation function: device_suspend(PMSG_FREEZE), whose implementation depends on CONFIG_HIBERNATION. So, I will add CONFIG_KEXEC_JUMP and select CONFIG_HIBERNATION automatically. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [linux-pm] [PATCH -mm 2/2] kexec jump -v11: save/restore device state
Hi, Vivek, On Fri, 2008-06-13 at 14:05 -0400, Vivek Goyal wrote: [...] > > Can't we implement ACPI S5 state as an option in current hibernation > framework? Or kexec jump is a requirement for that? ACPI S5 has been implemented in current hibernation framework. If you do: echo shutdown > /sys/power/disk ACPI S5 instead of S4 will be used. That is, corresponding ACPI method is not executed. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 2/2] kexec jump -v11: save/restore device state
On Thu, 2008-06-12 at 09:02 -0400, Vivek Goyal wrote: [...] > Few things I don't understand. > > - Are you saying that hibernated image will be saved in initrd > (rootfs.gz)? But that saving is only in RAM, we never write back > it to disk? No. Hibernated image should be saved in a dedicated raw partition as you said below. > - I thought we probably have to dedicate a raw partition kind of thing > for saving image and then modify boot loader command line to something > similar to, "resume=partition". Then initrd can go hunting for image > in respective partition (as specified by command line parameter) and if > image is not available then continue with normal boot. Yes. But the boot-loader command line only need to be changed during system install or hibernation setup. We need not change the location of "hibernation partition" frequently. So I think one boot-loader command line is sufficient for: - normal boot - normal boot a system to be hibernated - boot helper system to restore the hibernated system Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 1/2] kexec jump -v11: kexec jump
On Thu, 2008-06-12 at 15:20 -0400, Vivek Goyal wrote: > On Tue, Jun 10, 2008 at 03:15:04PM +0800, Huang, Ying wrote: > > This patch provides an enhancement to kexec/kdump. It implements > > the following features: > > > > - Backup/restore memory used by the original kernel before/after > > kexec. > > > > - Save/restore CPU state before/after kexec. > > > > The features of this patch can be used as a general method to call > > program in physical mode (paging turning off). This can be used to > > call BIOS code under Linux. > > > > > > Hi Huang, > > I was testing these patches and I get following error on my machine. > > Freezing remaining freezable tasks ... (elapsed 0.00 seconds) done. > Suspending console(s) (use no_console_suspend to debug) > PM: Device i8042 failed to freeze: error -22 > Restarting tasks ... done. > > Any idea why keyboard controller would not freeze? Which kernel version do you use? Does original hibernation work? It can be setup easily. Just add following parameters to kernel command line: resume=/dev/ Where is the swap partition. And execute following command: echo reboot > /sys/power/disk echo disk > /sys/power/state Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm 2/2] kexec jump -v11: save/restore device state
On Wed, 2008-06-11 at 12:30 -0400, Vivek Goyal wrote: [...] > > Usage example of simple hibernation: > > > > 1. Compile and install patched kernel with following options selected: > > > > CONFIG_X86_32=y > > CONFIG_RELOCATABLE=y > > CONFIG_KEXEC=y > > CONFIG_CRASH_DUMP=y > > CONFIG_PM=y > > > > 2. Build an initramfs image contains kexec-tool and makedumpfile, or > >download the pre-built initramfs image, called rootfs.gz in > >following text. > > > > 3. Prepare a partition to save memory image of original kernel, called > >hibernating partition in following text. > > > > 4. Boot kernel compiled in step 1 (kernel A). > > > > 5. In the kernel A, load kernel compiled in step 1 (kernel B) with > >/sbin/kexec. The shell command line can be as follow: > > > >/sbin/kexec --load-preserve-context /boot/bzImage --mem-min=0x10 > > --mem-max=0xff --initrd=rootfs.gz > > > > 6. Boot the kernel B with following shell command line: > > > >/sbin/kexec -e > > > > 7. The kernel B will boot as normal kexec. In kernel B the memory > >image of kernel A can be saved into hibernating partition as > >follow: > > > >jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep > > kexec_jump_back_entry | cut -d '='` > >echo $jump_back_entry > kexec_jump_back_entry > >cp /proc/vmcore dump.elf > > > >Then you can shutdown the machine as normal. > > > > 8. Boot kernel compiled in step 1 (kernel C). Use the rootfs.gz as > >root file system. > > > > One of the concerns raised by hibernation people in the past was to use > single boot loader entry to boot normally as well while resuming a kernel. > > So in this case a user either needs to maintain two boot-loader entries > or modify it on the fly. I wished there was a better way to handle that. Now it is not needed to have two boot-loader entries, just one is enough. Step 4 and step 8 can share the same boot-loader entries. The rootfs.gz can be the normal initramfs or initrd when deployment. In rootfs.gz, if there is a valid hibernation image, the hibernated system will be restored, otherwise, normal boot process follows. > I am more interested in ability to have multiple kernel loaded in RAM > and capability to switch between them. Allows me to take non-disruptive > core dumps and somebody wanted to snapshots the kernels. That should > still work. > > > [..] > > --- a/arch/x86/kernel/machine_kexec_32.c > > +++ b/arch/x86/kernel/machine_kexec_32.c > > @@ -125,6 +125,12 @@ void machine_kexec(struct kimage *image) > > /* Interrupts aren't acceptable while we reboot */ > > local_irq_disable(); > > > > + if (image->preserve_context) { > > +#ifdef CONFIG_X86_IO_APIC > > + disable_IO_APIC(); > > +#endif > > I think it would be a good idea to put some kind of comment here. We > need to put APICs in legacy mode so that we can get timer interrupts > in second kernel. kexec/kdump paths already have calls to > disable_IO_APIC() in one form or other. kexec jump path also needs one. OK. I will add it. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [linux-pm] [PATCH -mm 2/2] kexec jump -v11: save/restore device state
On Wed, 2008-06-11 at 04:21 -0400, Len Brown wrote: > On Wed, 11 Jun 2008, Huang, Ying wrote: > > On Tue, 2008-06-10 at 14:01 -0400, Len Brown wrote: > > > > > > On Tue, 10 Jun 2008, Huang, Ying wrote: > > > > > > > This patch implements devices state save/restore before after kexec. > > > > > > > > > > > > This patch together with features in kexec_jump patch can be used for > > > > following: > > > > > > > > - A simple hibernation implementation without ACPI support. You can > > > > kexec a hibernating kernel, save the memory image of original system > > > > and shutdown the system. When resuming, you restore the memory image > > > > of original system via ordinary kexec load then jump back. > > > > > > What part of ACPI's role in hibernation are you trying to avoid > > > 1. enabling wake devices > > > 2. removing power from the system > > > 3. something else? > > > > ACPI S5 is used instead of S4 for this simple hibernation > > implementation. That is, before creating the hibernation image, the ACPI > > _PTS is not executed, devices are not put into low power state and wake > > devices are not enabled. After creating the hibernation image, the image > > is saved to disk and system is shutdown (go to S5). When resuming from > > hibernated image, ACPI _BFS and _WAK are not executed too. > > Doesn't that resume the devices and their drivers into an unknown state? device_suspend(PMSG_FREEZE) and device_power_down(PMSG_FREEZE) are called before hibernation; device_power_up(PMSG_RESTORE) and device_resume(PMSG_RESTORE) are called after restore. The new hibernation/restore device driver callbacks introduced by Rafael is used. So I think the device/driver state will be saved/restored properly. It is planed to support ACPI S4 in the future. But I think a hibernation scheme without ACPI may be useful in some situation too. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [linux-pm] [PATCH -mm 2/2] kexec jump -v11: save/restore device state
On Tue, 2008-06-10 at 14:01 -0400, Len Brown wrote: > > On Tue, 10 Jun 2008, Huang, Ying wrote: > > > This patch implements devices state save/restore before after kexec. > > > > > > This patch together with features in kexec_jump patch can be used for > > following: > > > > - A simple hibernation implementation without ACPI support. You can > > kexec a hibernating kernel, save the memory image of original system > > and shutdown the system. When resuming, you restore the memory image > > of original system via ordinary kexec load then jump back. > > What part of ACPI's role in hibernation are you trying to avoid > 1. enabling wake devices > 2. removing power from the system > 3. something else? ACPI S5 is used instead of S4 for this simple hibernation implementation. That is, before creating the hibernation image, the ACPI _PTS is not executed, devices are not put into low power state and wake devices are not enabled. After creating the hibernation image, the image is saved to disk and system is shutdown (go to S5). When resuming from hibernated image, ACPI _BFS and _WAK are not executed too. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -mm 2/2] kexec jump -v11: save/restore device state
rnel 2.6.26-rc5-mm1, and has been tested on IBM T42 with ACPI on and off. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- arch/x86/kernel/machine_kexec_32.c |6 + include/linux/suspend.h|2 + kernel/kexec.c | 43 - kernel/power/power.h |2 - 4 files changed, 50 insertions(+), 3 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -25,6 +25,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -1427,8 +1431,34 @@ module_init(crash_save_vmcoreinfo_init) int kexec_jump(struct kimage *image) { + int error = 0; + + mutex_lock(&pm_mutex); if (image->preserve_context) { + pm_prepare_console(); + error = freeze_processes(); + if (error) { + error = -EBUSY; + goto Exit; + } + suspend_console(); + error = device_suspend(PMSG_FREEZE); + if (error) + goto Resume_console; + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; local_irq_disable(); + /* At this point, device_suspend() has been called, +* but *not* device_power_down(). We *must* +* device_power_down() now. Otherwise, drivers for +* some devices (e.g. interrupt controllers) become +* desynchronized with the actual state of the +* hardware at resume time, and evil weirdness ensues. +*/ + error = device_power_down(PMSG_FREEZE); + if (error) + goto Enable_irqs; save_processor_state(); } @@ -1436,7 +1466,18 @@ int kexec_jump(struct kimage *image) if (image->preserve_context) { restore_processor_state(); + device_power_up(PMSG_RESTORE); + Enable_irqs: local_irq_enable(); + enable_nonboot_cpus(); + Resume_devices: + device_resume(PMSG_RESTORE); + Resume_console: + resume_console(); + thaw_processes(); + Exit: + pm_restore_console(); } - return 0; + mutex_unlock(&pm_mutex); + return error; } --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo extern int pfn_is_nosave(unsigned long); -extern struct mutex pm_mutex; - #define power_attr(_name) \ static struct kobj_attribute _name##_attr = { \ .attr = { \ --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -266,4 +266,6 @@ static inline void register_nosave_regio } #endif +extern struct mutex pm_mutex; + #endif /* _LINUX_SUSPEND_H */ --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -125,6 +125,12 @@ void machine_kexec(struct kimage *image) /* Interrupts aren't acceptable while we reboot */ local_irq_disable(); + if (image->preserve_context) { +#ifdef CONFIG_X86_IO_APIC + disable_IO_APIC(); +#endif + } + control_page = page_address(image->control_code_page); memcpy(control_page, relocate_kernel, PAGE_SIZE/2); ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH -mm 1/2] kexec jump -v11: kexec jump
This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. ChangeLog: v10: - Device state save/restore related code is split into another patch because it depends on devices hibernation/restore callback and prone to be changed. - C ABI (calling convention) is used as communication protocol between kernel and called code. - Code cleanup: CPU state save/restore code goes in relocate_kernel(). v9: - pm_mutex is locked during kexec jump to avoid potential conflict between kexec jump and suspend/resume/hibernation. - Split /dev/oldmem writing and kimagecore patch out, keep only the core function. v8: - Split kexec jump patchset from kexec based hibernation patchset. - Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT because there is no need for such subtle control. - Delete variable argument based "kernel to kernel" communication mechanism from basic kexec jump patchset. v7: - Refactor kexec jump to be a command driven programming model. - Use kexec_lock to do synchronization. v6: - Refactor kexec jump to be a general facility to call real mode code. v5: - A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel image is used for jumping back. The reboot command for jumping back is removed. This interface is more stable (proposed by Eric Biederman). - NX bit handling support for kexec is added. - Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute from machine_kexec. - Passing jump back entry to kexeced kernel via kernel command line (parsed by user space tool via /proc/cmdline instead of kernel). Original corresponding boot parameter and sysfs code is removed. v4: - Two reboot command are merged back to one because the underlying implementation is same. - Jumping without reserving memory is implemented. As a side effect, two direction jumping is implemented. - A jump back protocol is defined and documented. The original kernel and kexeced kernel are more independent from each other. - The CPU state save/restore code are merged into relocate_kernel.S. v3: - The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. - Document is added for added kernel parameters. - /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. - Console restoring after jumping back is implemented. v2: - The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. - The same code path is used for both kexec a new kernel and jump back to original kernel. Now, only the i386 architecture is supported. The patchset is based on Linux kernel 2.6.26-rc5-mm1, and has been tested on IBM T42. Signe
Re: [PATCH -mm] kexec jump -v9
On Tue, 2008-05-27 at 18:15 -0400, Vivek Goyal wrote: [...] > > But, because IOAPIC may need to be in original state during > > suspend/resume, so it is not appropriate to call disable_IO_APIC() in > > ioapic_suspend(). So I think we can call disable_IO_APIC() in new > > hibernation/restore callback. > > My hunch is suspend/resume will still work if we put this call in > ioapic_suspend() but I would not recommend that. suspend/resume does > not need to put IOAPIC in legacy mode. > > I am not sure what is "new hibernation/restore callback"? Are you > referring to new patches from Rafel? Yes. Rafel has a new patch to separate suspend and hibernation device call backs. http://kerneltrap.org/Linux/Separating_Suspend_and_Hibernation > I think this issue is specifc to kexec and kjump so probably we should > not tweaking any suspend/resume related bit. > > How about calling disable_IO_APIC() in kexec_jump()? We can probably even > optimize it by calling it only when we are transitioning into new image > for the first time and not for subsquent transitions (by keeping some kind of > count in kimage). This is little hackish but, should work... Yes. This issue is kexec/kjump specific. We can call it in kexec_jump(). Maybe we also need call something other in native_machine_shutdown()? BTW: I have a new version -v10: http://lkml.org/lkml/2008/5/22/106, do you have time to review it? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm] kexec jump -v9
On Thu, 2008-05-15 at 20:51 -0400, Vivek Goyal wrote: > On Thu, May 15, 2008 at 01:41:50PM +0800, Huang, Ying wrote: > > Hi, Vivek, > > > > On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote: > > [...] > > > Ok, I have done some testing on this patch. Currently I have just > > > tested switching back and forth between two kernels and it is working for > > > me. > > > > > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few > > > comments/questions are inline. > > > > It seems that for LAPIC and IOAPIC, there is > > lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(), > > which will be called before/after kexec jump through > > device_power_down()/device_power_up(). So, the mechanism for > > LAPIC/IOAPIC is there, we may need to check the corresponding > > implementation. > > > > ioapic_suspend() is not putting APICs in Legacy mode and that's why > we are seeing the issue. It only saves the IOAPIC routing table entries > and these entries are restored during ioapic_resume(). > > But I think somebody has to put APICs in legacy mode for normal > hibernation also. Not sure who does it. May be BIOS, so that during > resume, second kernel can get the timer interrupts. As for IOAPIC legacy mode, is it related to the following code which set the routing table entry for i8259? void disable_IO_APIC(void) { /* * Clear the IO-APIC before rebooting: */ clear_IO_APIC(); /* * If the i8259 is routed through an IOAPIC * Put that IOAPIC in virtual wire mode * so legacy interrupts can be delivered. */ if (ioapic_i8259.pin != -1) { struct IO_APIC_route_entry entry; memset(&entry, 0, sizeof(entry)); entry.mask= 0; /* Enabled */ entry.trigger = 0; /* Edge */ entry.irr = 0; entry.polarity= 0; /* High */ entry.delivery_status = 0; entry.dest_mode = 0; /* Physical */ entry.delivery_mode = dest_ExtINT; /* ExtInt */ entry.vector = 0; entry.dest.physical.physical_dest = GET_APIC_ID(apic_read(APIC_ID)); /* * Add it to the IO-APIC irq-routing table: */ ioapic_write_entry(ioapic_i8259.apic, ioapic_i8259.pin, entry); } disconnect_bsp_APIC(ioapic_i8259.pin != -1); } But, because IOAPIC may need to be in original state during suspend/resume, so it is not appropriate to call disable_IO_APIC() in ioapic_suspend(). So I think we can call disable_IO_APIC() in new hibernation/restore callback. Am I right? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH 1/2] kexec jump -v10: kexec jump (resend, fix subject, please ignore previous one)
This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. ChangeLog: v10: - Device state save/restore related code is split into another patch because it depends on devices hibernation/restore callback and prone to be changed. - C ABI (calling convention) is used as communication protocol between kernel and called code. - Code cleanup: CPU state save/restore code goes in relocate_kernel(). v9: - pm_mutex is locked during kexec jump to avoid potential conflict between kexec jump and suspend/resume/hibernation. - Split /dev/oldmem writing and kimagecore patch out, keep only the core function. v8: - Split kexec jump patchset from kexec based hibernation patchset. - Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT because there is no need for such subtle control. - Delete variable argument based "kernel to kernel" communication mechanism from basic kexec jump patchset. v7: - Refactor kexec jump to be a command driven programming model. - Use kexec_lock to do synchronization. v6: - Refactor kexec jump to be a general facility to call real mode code. v5: - A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel image is used for jumping back. The reboot command for jumping back is removed. This interface is more stable (proposed by Eric Biederman). - NX bit handling support for kexec is added. - Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute from machine_kexec. - Passing jump back entry to kexeced kernel via kernel command line (parsed by user space tool via /proc/cmdline instead of kernel). Original corresponding boot parameter and sysfs code is removed. v4: - Two reboot command are merged back to one because the underlying implementation is same. - Jumping without reserving memory is implemented. As a side effect, two direction jumping is implemented. - A jump back protocol is defined and documented. The original kernel and kexeced kernel are more independent from each other. - The CPU state save/restore code are merged into relocate_kernel.S. v3: - The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. - Document is added for added kernel parameters. - /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. - Console restoring after jumping back is implemented. v2: - The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. - The same code path is used for both kexec a new kernel and jump back to original kernel. Now, only the i386 architecture is supported. The patchset is based on Linux kernel 2.6.26-rc3, and has been tested on IBM T42. Signed-off-by
[PATCH 2/2] kexec jump -v10: save/restore device state
d is limited, hibernation image with many segments may not be load. This is planned to be eliminated by adding a new flag to sys_kexec_load to make a image can be loaded with multiple sys_kexec_load invoking. ChangeLog: v10: - Split from original kexec_jump patch. Now, only the i386 architecture is supported. The patchset is based on Linux kernel 2.6.26-rc3, and has been tested on IBM T42 with ACPI on and off. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- include/linux/suspend.h |2 ++ kernel/kexec.c | 43 ++- kernel/power/power.h|2 -- 3 files changed, 44 insertions(+), 3 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -25,6 +25,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -1427,8 +1431,34 @@ module_init(crash_save_vmcoreinfo_init) int kexec_jump(struct kimage *image) { + int error = 0; + + mutex_lock(&pm_mutex); if (image->preserve_context) { + pm_prepare_console(); + error = freeze_processes(); + if (error) { + error = -EBUSY; + goto Exit; + } + suspend_console(); + error = device_suspend(PMSG_FREEZE); + if (error) + goto Resume_console; + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; local_irq_disable(); + /* At this point, device_suspend() has been called, +* but *not* device_power_down(). We *must* +* device_power_down() now. Otherwise, drivers for +* some devices (e.g. interrupt controllers) become +* desynchronized with the actual state of the +* hardware at resume time, and evil weirdness ensues. +*/ + error = device_power_down(PMSG_FREEZE); + if (error) + goto Enable_irqs; save_processor_state(); } @@ -1436,7 +1466,18 @@ int kexec_jump(struct kimage *image) if (image->preserve_context) { restore_processor_state(); + device_power_up(); + Enable_irqs: local_irq_enable(); + enable_nonboot_cpus(); + Resume_devices: + device_resume(); + Resume_console: + resume_console(); + thaw_processes(); + Exit: + pm_restore_console(); } - return 0; + mutex_unlock(&pm_mutex); + return error; } --- a/kernel/power/power.h +++ b/kernel/power/power.h @@ -53,8 +53,6 @@ extern int hibernation_platform_enter(vo extern int pfn_is_nosave(unsigned long); -extern struct mutex pm_mutex; - #define power_attr(_name) \ static struct kobj_attribute _name##_attr = { \ .attr = { \ --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -266,4 +266,6 @@ static inline void register_nosave_regio } #endif +extern struct mutex pm_mutex; + #endif /* _LINUX_SUSPEND_H */ ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
[PATCH] kexec jump -v10: kexec jump
This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. ChangeLog: v10: - Device state save/restore related code is split into another patch because it depends on devices hibernation/restore callback and prone to be changed. - C ABI (calling convention) is used as communication protocol between kernel and called code. - Code cleanup: CPU state save/restore code goes in relocate_kernel(). v9: - pm_mutex is locked during kexec jump to avoid potential conflict between kexec jump and suspend/resume/hibernation. - Split /dev/oldmem writing and kimagecore patch out, keep only the core function. v8: - Split kexec jump patchset from kexec based hibernation patchset. - Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT because there is no need for such subtle control. - Delete variable argument based "kernel to kernel" communication mechanism from basic kexec jump patchset. v7: - Refactor kexec jump to be a command driven programming model. - Use kexec_lock to do synchronization. v6: - Refactor kexec jump to be a general facility to call real mode code. v5: - A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel image is used for jumping back. The reboot command for jumping back is removed. This interface is more stable (proposed by Eric Biederman). - NX bit handling support for kexec is added. - Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute from machine_kexec. - Passing jump back entry to kexeced kernel via kernel command line (parsed by user space tool via /proc/cmdline instead of kernel). Original corresponding boot parameter and sysfs code is removed. v4: - Two reboot command are merged back to one because the underlying implementation is same. - Jumping without reserving memory is implemented. As a side effect, two direction jumping is implemented. - A jump back protocol is defined and documented. The original kernel and kexeced kernel are more independent from each other. - The CPU state save/restore code are merged into relocate_kernel.S. v3: - The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. - Document is added for added kernel parameters. - /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. - Console restoring after jumping back is implemented. v2: - The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. - The same code path is used for both kexec a new kernel and jump back to original kernel. Now, only the i386 architecture is supported. The patchset is based on Linux kernel 2.6.26-rc3, and has been tested on IBM T42. Signed-off-by
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, 2008-05-15 at 19:55 -0700, Eric W. Biederman wrote: > "Huang, Ying" <[EMAIL PROTECTED]> writes: > > > The disadvantage of this solution is that kernel B must know it is > > original kernel (A) or kexeced kernel (B). Different code should be used > > by kernel A and kernel B. And after jump from A to B, jump from B to A, > > when jump from A to B again, kernel A must use different code from the > > first time. > > I don't know what the case is for keeping two kernels in memory and switching > between them. This can be used to save the memory image of kernel B and accelerate the hibernation. The real boot of kernel B is only needed first time. > I suspect a small piece of trampoline code between the two kernels could > handle the case. (i.e. purgatory pays attention). > > That is a fundamental aspect of the design. A general purpose infrastructure > with trampoline code to adapt it to whatever situation comes up. It is possible to use purgatory to deal with this problem. Jump from kernel A to kernel B Jump to entry of purgatory (purgatory_entry) purgatory save the return address (kexec_jump_back_entry_A) Purgatory set kexec_jump_back_entry for kernel B to a code segment in purgatory, say kexec_jump_back_entry_A_for_B Purgatory jump to entry point of kernel B Jump from kernel B to kernel A Jump to purgatory (kexec_jump_back_entry_A_for_B) Purgatory save the return address (kexec_jump_back_entry_B) Purgatory return to kernel A (kexec_jump_back_entry_A) Jump from kernel A to kernel B again Jump to entry of purgatory (purgatory_entry) Purgatory save the return address (kexec_jump_back_entry_A) Purgatory jump to kexec_jump_back_entry_B The disadvantage of this solution is that some information is saved in purgatory (kexec_jump_back_entry_A, kexec_jump_back_entry_B). So, purgatory must be saved too when save the memory image of kernel A or kernel B. Purgatory can be seen as a part of kernel B. But it is a little tricky to think it as a part of kernel A too. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, 2008-05-15 at 19:25 -0700, Eric W. Biederman wrote: > "Huang, Ying" <[EMAIL PROTECTED]> writes: > > > On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote: > > [...] > >> 2) After we figure out our address read the stack pointer from > >>a fixed location and simply set it. (This is my preference) > > > > Just for confirmation (My English is poor). > > > > Do you mean that kernel A just read the stack top as re-entry point, > > regardless of whether it is return address or argument 1? > > What I was thinking was: > > In kernel A() > > relocate_new_kernel: > > ... > > call *%eax > > kexec_jump_back_entry: > /* This code should be PIC so figure out where we are */ > call 1f > 1: > popl %edi > subl $(1b - relocate_kernel), %edi > > /* Setup a safe stack */ > lealPAGE_SIZE(%edi), %esp > ... > > > Then in purgatory we can read the address of kexec_jump_back_entry > by examining 0(%esp) and export it in whatever fashion is sane. > > However we reach kexec_jump_back_entry we should be fine. I think it is reasonable to enable jumping back and forth more than one time. So the following should be possible: 1. Jump from A to B (actually jump to purgatory, trigger the boot of B) 2. Jump from B to A 3. Jump from A to B again (jump to the kexec_jump_back_entry of B) 4. Jump from B to A ... So it should be possible to get the re-entry point of kernel B in kexec_jump_back_entry of kernel A too. So I think in kexec_jump_back_entry, the caller's stack should be checked to get re-entry point of peer. And the stack state is different depend on where come from, from relocate_new_kernel() or return. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, 2008-05-15 at 22:00 -0400, Vivek Goyal wrote: [...] > IMHO, this kind of make more sense to me when keeping C function like > semantics in mind. > > Both the cases can be treated like calls to functions (calling BIOS function > and jumping to kernel B). The basic difference between two cases is the > re-entry point. In BIOS function case, we always re-enter the function at the > start but in case of kernel B, except first entry, all other entries happen > at a run time determined address, which needs to be communicated to kernel A. > > I would think that second kernel B just should execute "ret" and new entry > address of kernel B is passed to kernel A through %eax (return value of > function). The disadvantage of this solution is that kernel B must know it is original kernel (A) or kexeced kernel (B). Different code should be used by kernel A and kernel B. And after jump from A to B, jump from B to A, when jump from A to B again, kernel A must use different code from the first time. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm] kexec jump -v9
On Thu, 2008-05-15 at 21:51 -0400, Vivek Goyal wrote: > On Fri, May 16, 2008 at 09:48:34AM +0800, Huang, Ying wrote: > > On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote: > > [...] > > > Ok, You want to make BIOS calls. We already do that using vm86 mode and > > > use bios real mode interrupts. So why do we need this interface? Or, IOW, > > > how is this interface better? > > > > It can call code in 32-bit physical mode in addition to real mode. So It > > can be used to call EFI runtime service, especially call EFI 64 runtime > > service under 32-bit kernel or vice versa. > > > > The main purpose of kexec jump is for hibernation. But I think if the > > effort is small, why not support general 32-bit physical mode code call > > at same time. > > > > In general what's the environment requirements for EFI runtime > services? I mean, just that processor should be in protected mode with > paging disabled or one need to stop all other cpus and devices and then make > the call (as we are doing in this case?). Put processor in protected mode with paging disabled is sufficient. In one of previous kexec jump versions, I provide some option to choose the state saved (whether stop other cpus, whether stop devices). I agree that now we should focus on kexec based hibernation. But I think it is reasonable to keep the possibility with minimal effort. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm] kexec jump -v9
On Thu, 2008-05-15 at 18:35 -0700, Eric W. Biederman wrote: > Vivek Goyal <[EMAIL PROTECTED]> writes: > > > ioapic_suspend() is not putting APICs in Legacy mode and that's why > > we are seeing the issue. It only saves the IOAPIC routing table entries > > and these entries are restored during ioapic_resume(). > > > > But I think somebody has to put APICs in legacy mode for normal > > hibernation also. Not sure who does it. May be BIOS, so that during > > resume, second kernel can get the timer interrupts. > > I doubt anything cares in the suspend to ram case. There should just > be a small BIOS trampoline to get back to linux when the processor > restarts. And you don't need interrupts for any of that. As far as I know, in suspend to ram, interrupt is used as waking up event, such as, keyboard interrupt. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm] kexec jump -v9
On Thu, 2008-05-15 at 16:09 -0400, Vivek Goyal wrote: [...] > Ok, You want to make BIOS calls. We already do that using vm86 mode and > use bios real mode interrupts. So why do we need this interface? Or, IOW, > how is this interface better? It can call code in 32-bit physical mode in addition to real mode. So It can be used to call EFI runtime service, especially call EFI 64 runtime service under 32-bit kernel or vice versa. The main purpose of kexec jump is for hibernation. But I think if the effort is small, why not support general 32-bit physical mode code call at same time. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Thu, 2008-05-15 at 11:39 -0700, Eric W. Biederman wrote: [...] > 2) After we figure out our address read the stack pointer from >a fixed location and simply set it. (This is my preference) Just for confirmation (My English is poor). Do you mean that kernel A just read the stack top as re-entry point, regardless of whether it is return address or argument 1? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm] kexec jump -v9
Hi, Vivek, On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote: [...] > Ok, I have done some testing on this patch. Currently I have just > tested switching back and forth between two kernels and it is working for > me. > > Just that I had to put LAPIC and IOAPIC in legacy mode for it to work. Few > comments/questions are inline. It seems that for LAPIC and IOAPIC, there is lapic_suspend()/lapic_resume() and ioapic_suspend()/ioapic_resume(), which will be called before/after kexec jump through device_power_down()/device_power_up(). So, the mechanism for LAPIC/IOAPIC is there, we may need to check the corresponding implementation. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote: [...] > Then as a preliminary design let's plan on this. > > - Pass the rentry point as the return address (using the C ABI). > We may want to load the stack pointer etc so we can act as > a direct entry point for new code. There are some issues about passing entry point as return address. The kexec jump (or kexec with return) is used for - Switching between original kernel (A) and kexeced kernel (B) - Call some code (such as BIOS code) in physical mode 1) When call some code in physical mode, the called code can use a simple return to return to kernel A. So there is no return address on stack after return to kernel A. Instead, argument 1 is on stack top. 2) When switch back from kernel B to kernel A, kernel B will call the jump back entry of kernel A with C ABI. So, the return address is on stack top. And kernel A get jump back entry of kernel B via the return address. Because the stack state is different between 1) and 2), the jump back entry of kernel A should distinguish them. Possible solution can be as follow: a) Before kernel A call some physical mode code or kernel B, it set argument 1 to be a magic number that can not be return address (such as -1). Jump back entry of kernel A can check whether the stack top is argument 1 or return address. b) Distinguish by return address. Such as, called physical mode code must return 0, while kernel B must set %eax to some other number. c) Use different entry point for 1) and 2). Two entry points are deduced from return address. Such as: entry1 = return_address; entry2 = return_address & ~0xfff; /* page aligned */ entry1 is used by physical mode code. entry2 is used by kernel B. Which one is better? Or some other solution? Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Wed, 2008-05-14 at 14:43 -0700, Eric W. Biederman wrote: > "Huang, Ying" <[EMAIL PROTECTED]> writes: > > >> So, IMHO, for first simple implementation, we don't have to pass around > >> any data between kernels except entry point. (Please correct me if I am > >> wrong). Lets get that implementation in first and then we can get rest > >> of the pieces in place. > > > > Yes. Kernel entry/re-entry point is the only information need to be > > communicated between kernels for just switching between them. So we can > > focus on kexec jump patch firstly. > > Then as a preliminary design let's plan on this. > > - Pass the rentry point as the return address (using the C ABI). > We may want to load the stack pointer etc so we can act as > a direct entry point for new code. OK, I will try to do this. > - Look at passing a pointer to the mapping of pages that the kexec > trampoline uses in arg1 of the C ABI. Largely the format is defacto > fixed anyway because we need to pass the structure from C to > assembly. You mean pass image->head to purgatory of /sbin/kexec using arg1 of C ABI? > Using the standard C ABI makes things much it much easier to pick > a calling convention, and to document it. Yes. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH -mm] kexec jump -v9
On Wed, 2008-05-14 at 16:52 -0400, Vivek Goyal wrote: [...] > Ok, I have done some testing on this patch. Currently I have just > tested switching back and forth between two kernels and it is working for > me. Thanks. [...] > > +/* > > + * Entry point for jumping back from kexeced kernel, the paging is > > + * turned off. > > + */ > > +kexec_jump_back_entry: > > + call1f > > +1: > > + popl%ebx > > + subl$(1b - kexec_relocate_page), %ebx > > + movl%edi, KJUMP_ENTRY_OFF(%ebx) > > + movlCP_VA_CONTROL_PAGE(%ebx), %edi > > + lea STACK_TOP(%ebx), %esp > > + movlCP_PA_SWAP_PAGE(%ebx), %eax > > + movlCP_PA_BACKUP_PAGES_MAP(%ebx), %edx > > + pushl %eax > > + pushl %edx > > + callswap_pages > > + addl$8, %esp > > + movlCP_PA_PGD(%ebx), %eax > > + movl%eax, %cr3 > > + movl%cr0, %eax > > + orl $(1<<31), %eax > > + movl%eax, %cr0 > > + lea STACK_TOP(%edi), %esp > > + movl%edi, %eax > > + addl$(virtual_mapped - kexec_relocate_page), %eax > > + pushl %eax > > + ret > > Upon re-entering the kernel, what happens to GDT table? So gdtr will be > pointing to GDT of other kernel (which is not there as pages have been > swapped)? Do we need to reload the gdtr upon re-entering the kernel. After re-entering the kernel and returning from machine_kexec, restore_processor_state() is called, where the GDTR and some other CPU state such as FPU, IDT, etc are restored. > [..] > > @@ -197,8 +282,54 @@ identity_mapped: > > xorl%eax, %eax > > movl%eax, %cr3 > > > > + movlCP_PA_SWAP_PAGE(%edi), %eax > > + pushl %eax > > + pushl %ebx > > + callswap_pages > > + addl$8, %esp > > + > > + /* To be certain of avoiding problems with self-modifying code > > +* I need to execute a serializing instruction here. > > +* So I flush the TLB, it's handy, and not processor dependent. > > +*/ > > + xorl%eax, %eax > > + movl%eax, %cr3 > > + > > + /* set all of the registers to known values */ > > + /* leave %esp alone */ > > + > > + movlKJUMP_MAGIC_OFF(%edi), %eax > > + cmpl$KJUMP_MAGIC_NUMBER, %eax > > + jz 1f > > + xorl%edi, %edi > > + xorl%eax, %eax > > + xorl%ebx, %ebx > > + xorl%ecx, %ecx > > + xorl%edx, %edx > > + xorl%esi, %esi > > + xorl%ebp, %ebp > > + ret > > +1: > > + popl%edx > > + movlCP_PA_SWAP_PAGE(%edi), %esp > > + addl$PAGE_SIZE_asm, %esp > > + pushl %edx > > +2: > > + call*%edx > > > + movl%edi, %edx > > + popl%edi > > + pushl %edx > > + jmp 2b > > + > > What does above piece of code do? Looks like redundant for switching > between the kernels? After call *%edx, we never return here. Instead > we come back to "kexec_jump_back_entry"? For switching between the kernels, this is redundant. Originally another feature of kexec jump is to call some code in physical mode. This is used to provide a C ABI to called code. Now, Eric suggests to use a C ABI compatible mode to pass the jump back entry point too, that is, use the return address on stack instead of % edi. I think that is reasonable. Maybe we can revise this code to be compatible with C ABI and provide a convenient interface for both kernel and other physical mode code. > [..] > > --- /dev/null > > +++ b/Documentation/i386/jump_back_protocol.txt > > @@ -0,0 +1,66 @@ > > + THE LINUX/I386 JUMP BACK PROTOCOL > > + - > > + > > + Huang Ying <[EMAIL PROTECTED]> > > + Last update 2007-12-19 > > + > > +Currently, the following versions of the jump back protocol exist. > > + > > +Protocol 1.00: Jumping between original kernel and kexeced kernel > > + support. Calling ordinary C function support. > > + > > + > > +*** JUMP BACK ENTRY > > + > > +At jump back entry of callee, the CPU must be in 32-bit protected mode > > +with paging disabled; the CS, DS, ES and SS must be 4G flat segments; > > +CS must have execute/read permission, and DS, ES and SS must have > > +read/write permission; interrupt must be disabled; the contents of > > +registers and corresponding memory must be as follow: > > + > > +Offset/Size Meaning > > + > > +%edi Real
Re: [PATCH -mm] kexec jump -v9
On Wed, 2008-05-14 at 15:30 -0700, Eric W. Biederman wrote: [...] > > > > + if (image->preserve_context) { > > + KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER; > > + if (kexec_jump_save_cpu(control_page)) { > > + image->start = KJUMP_ENTRY(control_page); > > + return; > > Tricky, and I expect unnecessary. > We should be able to just have relocate_new_kernel return? OK, I will check this. Maybe we can move CPU state saving code into relocate_new_kernel. [...] > > -static void kernel_kexec(void) > > +static int kernel_kexec(void) > > { > > + int ret = -ENOSYS; > > #ifdef CONFIG_KEXEC > > - struct kimage *image; > > - image = xchg(&kexec_image, NULL); > > - if (!image) > > - return; > > - kernel_restart_prepare(NULL); > > - printk(KERN_EMERG "Starting new kernel\n"); > > - machine_shutdown(); > > - machine_kexec(image); > > + if (xchg(&kexec_lock, 1)) > > + return -EBUSY; > > + if (!kexec_image) { > > + ret = -EINVAL; > > + goto unlock; > > + } > > + if (!kexec_image->preserve_context) { > > + kernel_restart_prepare(NULL); > > + printk(KERN_EMERG "Starting new kernel\n"); > > + machine_shutdown(); > > + } > > + ret = kexec_jump(kexec_image); > > +unlock: > > + xchg(&kexec_lock, 0); > > #endif > > Ugh. No. Not sharing the shutdown methods with reboot and > the normal kexec path looks like a recipe for failure to me. > > This looks like where we really need to have the conversation. > What methods do we use to shutdown the system. > > My take on the situation is this. For proper handling we > need driver device_detach and device_reattach methods. > > With the following semantics. The device_detach methods > will disable DMA and place the hardware in a sane state > from which the device driver can reclaim and reinitialize it, > but the hardware will not be touched. > > device_reattach reattaches the driver to the hardware. Yes. Current device PM callback is not suitable for hibernation (kexec based or original). I think we can collaborate with Rafael J. Wysocki on the new device drivers hibernation callbacks. > So looking at this patch I see two very productive directions > we can go. > 1) A patch that just fixes up the kexec infrastructure code >so it implements the swap page and provides the kernel >reentry point. And doesn't handle the upper layer >user interface portion. > > 2) A patch that renames device_shutdown to device_detach. >And starts implementing the driver hooks needed from >a resumable kexec. OK. I can separate the patch into two patches. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
On Tue, 2008-05-13 at 22:56 -0400, Vivek Goyal wrote: [...] > > > > > Last time I tried the patches (V9) and kexec jump did not work for me. I > > > was not getting timer interrupts in second kernel. Then I had to put > > > LAPIC and IOAPIC in legacy mode and then at one way jump started working. > > > I am not sure how the next kernel boots for you without putting APICs > > > in legacy mode. (Yet to make returning back to original kernel work > > > using V9). > > > > Can normal kexec (without kexec jump) works without putting LAPIC and > > IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC > > into legacy mode before kexec and restore them after? > > > > We do put LAPIC and IOAPIC in legacy mode in normal kexec. Look at > disable_IO_APIC() in native_machine_shutdown(). So I think we shall > have to do the same thing in kexec jump code too. OK. I will look at this. > I went through above mail thread again where we were discussing what all > information need to be passed between kernels. > > Last time we enumerated three things. > > - kernel entry/re-entry point for switch between kernels. > - backup pages map for core filtering > - Probably ELF core notes for saving hibernated image. > > I think if we just implement the functionality so that one can switch > back and forth between kernels (no hibernated image saving),then we probably > need to pass around only kernel entry/re-entry point and nothing else and in > your patches I think you are already doing using %edi. Yes. > So, IMHO, for first simple implementation, we don't have to pass around > any data between kernels except entry point. (Please correct me if I am > wrong). Lets get that implementation in first and then we can get rest > of the pieces in place. Yes. Kernel entry/re-entry point is the only information need to be communicated between kernels for just switching between them. So we can focus on kexec jump patch firstly. Best Regards, Huang Ying ___ kexec mailing list kexec@lists.infradead.org http://lists.infradead.org/mailman/listinfo/kexec
Re: [PATCH] kexec based hibernation: a prototype of kexec multi-stage load
Hi, Vivek, On Tue, 2008-05-13 at 01:34 -0400, Vivek Goyal wrote: > On Mon, May 12, 2008 at 02:40:41PM +0800, Huang, Ying wrote: > > This patch implements a prototype of kexec multi-stage load. With this > > patch, the "backup pages map" can be passed to kexeced kernel via > > /sbin/kexec; and the sys_kexec_load can be used to load large > > hibernated image with huge number of segments. > > > > > > Hi Huang, > > Had a quick look at the patch. Will review in detail soon. Had few > thoughts. > > In general, these patches are on top of previous kexec jump patches. > It would be good if you could repost your updated patches so that > I can apply the patches and and get some testing going. The kexec jump patch v9 is sufficient for this patch to work. I have no new version of kexec jump patch so far. > Last time I tried the patches (V9) and kexec jump did not work for me. I > was not getting timer interrupts in second kernel. Then I had to put > LAPIC and IOAPIC in legacy mode and then at one way jump started working. > I am not sure how the next kernel boots for you without putting APICs > in legacy mode. (Yet to make returning back to original kernel work > using V9). Can normal kexec (without kexec jump) works without putting LAPIC and IOAPIC in legacy mode? Does this mean we should put LAPIC and IOAPIC into legacy mode before kexec and restore them after? The kexec jump patch works well on my IBM T42. But it seems that the IOAPIC is disabled in BIOS, so I can only use i8259 and LAPIC on this machine. > > In kexec based hibernation, resuming from disk is implemented as > > loading the hibernated disk image with sys_kexec_load(). But unlike > > the normal kexec load, the hibernated image may have huge number of > > segments. So multi-stage loading is necessary for kexec load based > > resuming from disk implementation. > > I understand that hibernated images are huge. But why do we require > multi stage loading? I knew there was a maximum segment limit in kexec. > But I think we can change that limit. Anything else prevents us from > loading large images in one go? There are two reason for multi-stage loading: - Pass backup pages map from original kernel (A) to kexeced kernel (B), because it is not known before loading. We have discussed this before in: http://lkml.org/lkml/2008/3/12/308 http://lkml.org/lkml/2008/3/14/59 http://lkml.org/lkml/2008/3/21/299 - Load large hibernated image. The hibernated image can be not only large but also discontinuous. For example, the physical memory size is 4G, and there is one free page every 2 pages, that is, there will be nearly 2G segments. Loading these segments in one go is impossible. So multi-stage load is necessary. And if the hibernated image is compressed, it is also very difficult to load it in one go because the anonymous pages needed. > > And, multi-stage loading is also > > necessary for parameter passing from original kernel to kexeced kernel > > because some information such as "backup pages map" is not available > > before loading. > > > > > > Four stages are defined: > > > > - KS_start: start stage; begin a new kexec loading; there must be only > > one KS_start stage in one kexec loading. > > > > - KS_mid: middle stage; continue load some segments; there may be many > > or zero KS_mid stages in one kexec loading; follows a KS_start or > > KS_mid stage. > > > > - KS_final: final stage; finish a kexec loading; there must be only > > one KS_final stage in one kexec loading; follows a KS_start or > > KS_mid stage. > > > > - KS_full: back compatible with original loading semantics, finish all > > work of a kexec loading in one KS_full stage. > > > > > > Overlapping between pages of different segments is allowed to support > > "parameter passing". > > > > > > During loading, a hash table mapped from destination page to source > > page is used instead of original linear mapping > > implementation. Because the hibernated image may be very large (up to > > near the size of physical memory), it is very time-consuming to search > > a source page given the destination page, which is used to check > > whether an newly allocated page is in the range of allocated > > destination pages. > > This seems to be an optimization of kexec so that it becomes efficient > in loading large images (containing large number of segments). Probably > this can be a separate patch. If it is desired, I can separate it into another patch. > IMHO, we can just first write a minimal patch where one can just switch > between kernels. Once that patch is
[PATCH] kexec based hibernation: a prototype of kexec multi-stage load
This patch implements a prototype of kexec multi-stage load. With this patch, the "backup pages map" can be passed to kexeced kernel via /sbin/kexec; and the sys_kexec_load can be used to load large hibernated image with huge number of segments. In kexec based hibernation, resuming from disk is implemented as loading the hibernated disk image with sys_kexec_load(). But unlike the normal kexec load, the hibernated image may have huge number of segments. So multi-stage loading is necessary for kexec load based resuming from disk implementation. And, multi-stage loading is also necessary for parameter passing from original kernel to kexeced kernel because some information such as "backup pages map" is not available before loading. Four stages are defined: - KS_start: start stage; begin a new kexec loading; there must be only one KS_start stage in one kexec loading. - KS_mid: middle stage; continue load some segments; there may be many or zero KS_mid stages in one kexec loading; follows a KS_start or KS_mid stage. - KS_final: final stage; finish a kexec loading; there must be only one KS_final stage in one kexec loading; follows a KS_start or KS_mid stage. - KS_full: back compatible with original loading semantics, finish all work of a kexec loading in one KS_full stage. Overlapping between pages of different segments is allowed to support "parameter passing". During loading, a hash table mapped from destination page to source page is used instead of original linear mapping implementation. Because the hibernated image may be very large (up to near the size of physical memory), it is very time-consuming to search a source page given the destination page, which is used to check whether an newly allocated page is in the range of allocated destination pages. The original mapping is only used by assembly code to swap the page contents. This map is also exported to user space via /proc/kexec_pgmap, so that /sbin/kexec can use it to construct the "backup pages map" parameter for kexeced kernel. This patch is based on Linux kernel 2.6.25 and kexec_jump patch, and has been tested on an IBM T42. Signed-off-by: Huang Ying <[EMAIL PROTECTED]> --- include/linux/kexec.h | 19 + kernel/kexec.c| 608 +- 2 files changed, 478 insertions(+), 149 deletions(-) --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -29,6 +29,10 @@ #include #include #include +#include +#include +#include +#include #include #include @@ -107,43 +111,29 @@ int kexec_should_crash(struct task_struc */ #define KIMAGE_NO_DEST (-1UL) +#define KIMAGE_HASH_BITS 10 +#define KIMAGE_PGTABLE_SIZE (1 << KIMAGE_HASH_BITS) + +struct pgmap { + struct hlist_node hlist; + unsigned long dst_pfn; + unsigned long src_pfn; +}; + static int kimage_is_destination_range(struct kimage *image, unsigned long start, unsigned long end); static struct page *kimage_alloc_page(struct kimage *image, gfp_t gfp_mask, unsigned long dest); -static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, - unsigned long nr_segments, -struct kexec_segment __user *segments) +static int kimage_copy_segments(struct kimage *image, + unsigned long nr_segments, + struct kexec_segment __user *segments) { size_t segment_bytes; - struct kimage *image; unsigned long i; int result; - /* Allocate a controlling structure */ - result = -ENOMEM; - image = kzalloc(sizeof(*image), GFP_KERNEL); - if (!image) - goto out; - - image->head = 0; - image->entry = &image->head; - image->last_entry = &image->head; - image->control_page = ~0; /* By default this does not apply */ - image->start = entry; - image->type = KEXEC_TYPE_DEFAULT; - - /* Initialize the list of control pages */ - INIT_LIST_HEAD(&image->control_pages); - - /* Initialize the list of destination pages */ - INIT_LIST_HEAD(&image->dest_pages); - - /* Initialize the list of unuseable pages */ - INIT_LIST_HEAD(&image->unuseable_pages); - /* Read in the segments */ image->nr_segments = nr_segments; segment_bytes = nr_segments * sizeof(*segments); @@ -210,6 +200,44 @@ static int do_kimage_alloc(struct kimage } result = 0; + out: + return result; +} + +static int do_kimage_alloc(struct kimage **rimage, unsigned long entry, + unsigned long nr_segments, + struct kexec_segment __user *segments) +{ + struct kimage *image; + int resul