Re: [PATCH 0/3 -mm] kexec jump -v8
kernel (bzImage or vmlinux) or resuming a already booted kernel. Kexec tools should determine that and setup the entry point accordingly (might require some purgatory changes to take care of transition while jumping to resume hibernated image). Yes. This is a good idea. I will do it. And, it is more useful for invoking some code in physical mode. The procedure is something as follow: 1. load some code executing in physical mode via kexec --load-preserve-context. 2. setup the parameters via amending /proc/kimgcore 3. execute the code in physical mode via kexec -e 4. get the result via reading /proc/kimgcore 5. setup another groups of parameters via amending /proc/kimgcore ... This seems to be extended functionlity. If your focus is Kexec based hibernation then I would think of initially keeping the implementation simple and keeping patches small. Make kexec based hibernation work and then extend functionality for other purposes. I think this is a important use case of kexec jump besides kexec based hibernation. But it can be separated from initial kexec jump patchset if necessary. [...] The main issue of this mechanism is that: it is a kernel-to-kernel communication mechanism, while Eric Biederman thinks we should use only user-to-user communication mechanism. And he is not persuaded now. Because kernel operations such as re-initialize/re-construct the /proc/vmcore, etc are needed for kexec jump or resuming. I think a kernel-to-kernel mechanism may be needed. But I don't know if Eric Biederman will agree with this. Hmm... Personally I am more inclined to exchanging information between two kernels on setup page, in a standard format (using ELF headers etc). This information can be prepared by kexec-tools in user space and be). modified by purgatory (during transition to reflect the swapped pages. Alternatively, one can modify this setup page info from user space through some /proc/kimgcore like interface. I prefer the first one... Some information (such as backup pages map) is not available when /sbin/kexec is executed. So there should be a method to pass such information to purgatory from original kernel. So why not change the information between two kernel via setup page directly (need not purgatory). Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] x86 boot : export boot_params via debugfs for debugging
This patch export the boot parameters via debugfs for debugging. The files added are as follow: boot_params/data: binary file for struct boot_params boot_params/version : boot protocol version This patch is based on 2.6.24-rc5-mm1 and has been tested on i386 and x86_64 platform. This patch is based on the Peter Anvin's proposal. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/Kconfig.debug |7 arch/x86/kernel/Makefile_32 |3 +- arch/x86/kernel/Makefile_64 |2 - arch/x86/kernel/kdebugfs.c | 65 arch/x86/kernel/setup64.c |4 ++ arch/x86/kernel/setup_32.c |4 ++ 6 files changed, 83 insertions(+), 2 deletions(-) --- /dev/null +++ b/arch/x86/kernel/kdebugfs.c @@ -0,0 +1,65 @@ +/* + * Architecture specific debugfs files + * + * Copyright (C) 2007, Intel Corp. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2. + */ + +#include linux/debugfs.h +#include linux/stat.h +#include linux/init.h + +#include asm/setup.h + +#ifdef CONFIG_DEBUG_BOOT_PARAMS +static struct debugfs_blob_wrapper boot_params_blob = { + .data = boot_params, + .size = sizeof(boot_params), +}; + +static int __init boot_params_kdebugfs_init(void) +{ + int error; + struct dentry *dbp, *version, *data; + + dbp = debugfs_create_dir(boot_params, NULL); + if (!dbp) { + error = -ENOMEM; + goto err_return; + } + version = debugfs_create_x16(version, S_IRUGO, dbp, +boot_params.hdr.version); + if (!version) { + error = -ENOMEM; + goto err_dir; + } + data = debugfs_create_blob(data, S_IRUGO, dbp, + boot_params_blob); + if (!data) { + error = -ENOMEM; + goto err_version; + } + return 0; +err_version: + debugfs_remove(version); +err_dir: + debugfs_remove(dbp); +err_return: + return error; +} +#endif + +static int __init arch_kdebugfs_init(void) +{ + int error = 0; + +#ifdef CONFIG_DEBUG_BOOT_PARAMS + error = boot_params_kdebugfs_init(); +#endif + + return error; +} + +arch_initcall(arch_kdebugfs_init); --- a/arch/x86/Kconfig.debug +++ b/arch/x86/Kconfig.debug @@ -112,4 +112,11 @@ config IOMMU_LEAK Add a simple leak tracer to the IOMMU code. This is useful when you are debugging a buggy device driver that leaks IOMMU mappings. +config DEBUG_BOOT_PARAMS + bool Debug boot parameters + depends on DEBUG_KERNEL + depends on DEBUG_FS + help + This option will cause struct boot_params to be exported via debugfs. + endmenu --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -8,7 +8,8 @@ CPPFLAGS_vmlinux.lds += -Ui386 obj-y := process_32.o signal_32.o entry_32.o traps_32.o irq_32.o \ time_32.o ioport_32.o ldt.o setup_32.o i8259_32.o sys_i386_32.o \ pci-dma_32.o i386_ksyms_32.o i387_32.o bootflag.o e820_32.o\ - quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o rtc.o + quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o \ + rtc.o kdebugfs.o obj-y += ptrace.o obj-y += ds.o --- a/arch/x86/kernel/Makefile_64 +++ b/arch/x86/kernel/Makefile_64 @@ -10,7 +10,7 @@ obj-y := process_64.o signal_64.o entry_ x8664_ksyms_64.o i387_64.o syscall_64.o vsyscall_64.o \ setup64.o bootflag.o e820_64.o reboot_64.o quirks.o i8237.o \ pci-dma_64.o pci-nommu_64.o alternative.o hpet.o tsc_64.o bugs_64.o \ - i8253.o rtc.o + i8253.o rtc.o kdebugfs.o obj-y += ptrace.o obj-y += ds.o --- a/arch/x86/kernel/setup64.c +++ b/arch/x86/kernel/setup64.c @@ -24,7 +24,11 @@ #include asm/sections.h #include asm/setup.h +#ifndef CONFIG_DEBUG_BOOT_PARAMS struct boot_params __initdata boot_params; +#else +struct boot_params boot_params; +#endif cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE; --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -194,7 +194,11 @@ unsigned long saved_videomode; static char __initdata command_line[COMMAND_LINE_SIZE]; +#ifndef CONFIG_DEBUG_BOOT_PARAMS struct boot_params __initdata boot_params; +#else +struct boot_params boot_params; +#endif #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE) struct edd edd; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 0/2] kexec/i386: kexec page table code clean up
This patchset cleans up page table setup code of kexec on i386. This patchset is based on 2.6.24-rc5-mm1 and has been tested on i386 with/without PAE enabled. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 1/2] kexec/i386: kexec page table code clean up - add arch_kimage
This patch add an architecture specific struct arch_kimage into struct kimage. Three pointers to page table pages used by kexec are added to struct arch_kimage. The page tables pages are dynamically allocated in machine_kexec_prepare instead of statically from BSS segment. This will save up to 20k memory when kexec image is not loaded. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/machine_kexec_32.c | 68 + include/asm-x86/kexec_32.h | 12 ++ include/linux/kexec.h |4 ++ 3 files changed, 63 insertions(+), 21 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -11,6 +11,7 @@ #include linux/delay.h #include linux/init.h #include linux/numa.h +#include linux/gfp.h #include asm/pgtable.h #include asm/pgalloc.h #include asm/tlbflush.h @@ -21,15 +22,6 @@ #include asm/desc.h #include asm/system.h -#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) -static u32 kexec_pgd[1024] PAGE_ALIGNED; -#ifdef CONFIG_X86_PAE -static u32 kexec_pmd0[1024] PAGE_ALIGNED; -static u32 kexec_pmd1[1024] PAGE_ALIGNED; -#endif -static u32 kexec_pte0[1024] PAGE_ALIGNED; -static u32 kexec_pte1[1024] PAGE_ALIGNED; - static void set_idt(void *newidt, __u16 limit) { struct Xgt_desc_struct curidt; @@ -72,6 +64,28 @@ static void load_segments(void) #undef __STR } +static void alloc_page_tables(struct kimage *image) +{ + image-arch_kimage.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL); +#ifdef CONFIG_X86_PAE + image-arch_kimage.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL); + image-arch_kimage.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL); +#endif + image-arch_kimage.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL); + image-arch_kimage.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL); +} + +static void free_page_tables(struct kimage *image) +{ + free_page((unsigned long)image-arch_kimage.pgd); +#ifdef CONFIG_X86_PAE + free_page((unsigned long)image-arch_kimage.pmd0); + free_page((unsigned long)image-arch_kimage.pmd1); +#endif + free_page((unsigned long)image-arch_kimage.pte0); + free_page((unsigned long)image-arch_kimage.pte1); +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -83,10 +97,21 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * - Allocate page tables */ int machine_kexec_prepare(struct kimage *image) { + alloc_page_tables(image); + if (!image-arch_kimage.pgd || +#ifdef CONFIG_X86_PAE + !image-arch_kimage.pmd0 || + !image-arch_kimage.pmd1 || +#endif + !image-arch_kimage.pte0 || + !image-arch_kimage.pte1) { + free_page_tables(image); + return -ENOMEM; + } return 0; } @@ -96,6 +121,7 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image) { + free_page_tables(image); } /* @@ -115,18 +141,18 @@ NORET_TYPE void machine_kexec(struct kim page_list[PA_CONTROL_PAGE] = __pa(control_page); page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; - page_list[PA_PGD] = __pa(kexec_pgd); - page_list[VA_PGD] = (unsigned long)kexec_pgd; + page_list[PA_PGD] = __pa(image-arch_kimage.pgd); + page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd; #ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(kexec_pmd0); - page_list[VA_PMD_0] = (unsigned long)kexec_pmd0; - page_list[PA_PMD_1] = __pa(kexec_pmd1); - page_list[VA_PMD_1] = (unsigned long)kexec_pmd1; -#endif - page_list[PA_PTE_0] = __pa(kexec_pte0); - page_list[VA_PTE_0] = (unsigned long)kexec_pte0; - page_list[PA_PTE_1] = __pa(kexec_pte1); - page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0); + page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0; + page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1); + page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1; +#endif + page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0); + page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0; + page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1); + page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1; /* The segment registers are funny things, they have both a * visible and an invisible part. Whenever the visible part is --- a/include/asm-x86/kexec_32.h +++ b/include/asm-x86/kexec_32.h @@ -94,6 +94,18 @@ relocate_kernel(unsigned long indirectio unsigned long start_address, unsigned int has_pae) ATTRIB_NORET; +#define ARCH_HAS_ARCH_KIMAGE + +struct arch_kimage { + pgd_t *pgd; +#ifdef CONFIG_X86_PAE + pmd_t *pmd0; + pmd_t *pmd1
[PATCH -mm 2/2] kexec/i386: kexec page table code clean up - page table setup in C
This patch transforms the kexec page tables setup code from assembler code to C code in machine_kexec_prepare. This improves readability and reduces code line number. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/machine_kexec_32.c | 50 +++ arch/x86/kernel/relocate_kernel_32.S | 114 --- include/asm-x86/kexec_32.h | 18 - 3 files changed, 40 insertions(+), 142 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -86,6 +86,42 @@ static void free_page_tables(struct kima free_page((unsigned long)image-arch_kimage.pte1); } +static void page_table_set_one(pgd_t *pgd, pmd_t *pmd, pte_t *pte, + unsigned long vaddr, unsigned long paddr) +{ + pud_t *pud; + + pgd += pgd_index(vaddr); +#ifdef CONFIG_X86_PAE + if (!(pgd_val(*pgd) _PAGE_PRESENT)) + set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT)); +#endif + pud = pud_offset(pgd, vaddr); + pmd = pmd_offset(pud, vaddr); + if (!(pmd_val(*pmd) _PAGE_PRESENT)) + set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); + pte = pte_offset_kernel(pmd, vaddr); + set_pte(pte, pfn_pte(paddr PAGE_SHIFT, PAGE_KERNEL_EXEC)); +} + +static void prepare_page_tables(struct kimage *image) +{ + void *control_page; + pmd_t *pmd = 0; + + control_page = page_address(image-control_code_page); +#ifdef CONFIG_X86_PAE + pmd = image-arch_kimage.pmd0; +#endif + page_table_set_one(image-arch_kimage.pgd, pmd, image-arch_kimage.pte0, + (unsigned long)relocate_kernel, __pa(control_page)); +#ifdef CONFIG_X86_PAE + pmd = image-arch_kimage.pmd1; +#endif + page_table_set_one(image-arch_kimage.pgd, pmd, image-arch_kimage.pte1, + __pa(control_page), __pa(control_page)); +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -98,6 +134,7 @@ static void free_page_tables(struct kima * later. * * - Allocate page tables + * - Setup page tables */ int machine_kexec_prepare(struct kimage *image) { @@ -112,6 +149,7 @@ int machine_kexec_prepare(struct kimage free_page_tables(image); return -ENOMEM; } + prepare_page_tables(image); return 0; } @@ -140,19 +178,7 @@ NORET_TYPE void machine_kexec(struct kim memcpy(control_page, relocate_kernel, PAGE_SIZE); page_list[PA_CONTROL_PAGE] = __pa(control_page); - page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; page_list[PA_PGD] = __pa(image-arch_kimage.pgd); - page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd; -#ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0); - page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0; - page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1); - page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1; -#endif - page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0); - page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0; - page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1); - page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1; /* The segment registers are funny things, they have both a * visible and an invisible part. Whenever the visible part is --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -16,126 +16,12 @@ #define PTR(x) (x 2) #define PAGE_ALIGNED (1 PAGE_SHIFT) -#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */ -#define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */ .text .align PAGE_ALIGNED .globl relocate_kernel relocate_kernel: movl8(%esp), %ebp /* list of pages */ - -#ifdef CONFIG_X86_PAE - /* map the control page at its virtual address */ - - movlPTR(VA_PGD)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0xc000, %eax - shrl$27, %eax - addl%edi, %eax - - movlPTR(PA_PMD_0)(%ebp), %edx - orl $PAE_PGD_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PMD_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x3fe0, %eax - shrl$18, %eax - addl%edi, %eax - - movlPTR(PA_PTE_0)(%ebp), %edx - orl $PAGE_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PTE_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x001ff000, %eax - shrl$9, %eax - addl%edi, %eax - - movlPTR(PA_CONTROL_PAGE)(%ebp), %edx - orl $PAGE_ATTR, %edx - movl%edx, (%eax) - - /* identity map the control page at its physical address */ - - movlPTR(VA_PGD)(%ebp), %edi - movlPTR
Re: [PATCH -mm 1/2] kexec/i386: kexec page table code clean up - add arch_kimage
On Wed, 2008-01-09 at 20:14 -0500, Vivek Goyal wrote: [...] +static void alloc_page_tables(struct kimage *image) +{ This is too generic a name. How about something like arch_alloc_kexec_page_tables() OK, I will change it. + image-arch_kimage.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL); +#ifdef CONFIG_X86_PAE + image-arch_kimage.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL); + image-arch_kimage.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL); +#endif + image-arch_kimage.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL); + image-arch_kimage.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL); +} + +static void free_page_tables(struct kimage *image) +{ How about arch_free_kexec_page_tables() OK, I will change it. + free_page((unsigned long)image-arch_kimage.pgd); +#ifdef CONFIG_X86_PAE + free_page((unsigned long)image-arch_kimage.pmd0); + free_page((unsigned long)image-arch_kimage.pmd1); +#endif + free_page((unsigned long)image-arch_kimage.pte0); + free_page((unsigned long)image-arch_kimage.pte1); +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -83,10 +97,21 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * - Allocate page tables */ int machine_kexec_prepare(struct kimage *image) { + alloc_page_tables(image); + if (!image-arch_kimage.pgd || +#ifdef CONFIG_X86_PAE + !image-arch_kimage.pmd0 || + !image-arch_kimage.pmd1 || +#endif + !image-arch_kimage.pte0 || + !image-arch_kimage.pte1) { + free_page_tables(image); + return -ENOMEM; I think this error handling can be done in alloc_page_tables() itself and following will look neater. OK, I will change it. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm 2/2] kexec/i386: kexec page table code clean up - page table setup in C
On Wed, 2008-01-09 at 20:05 -0500, Vivek Goyal wrote: On Wed, Jan 09, 2008 at 10:57:50AM +0800, Huang, Ying wrote: This patch transforms the kexec page tables setup code from asseumbler code to iC code in machine_kexec_prepare. This improves readability and reduces code line number. I think this will create issues for Xen. Initially page table setup was in C but Xen Guests could not modify the page tables. I think Xen folks implemented a hypercall where they passed all the page table pages and the control pages and then hypervisor executed the control page(which in turn setup the page tables). I think that's why page table setup code is on the control page in assembly. You might want to go through Xen kexec implementation and dig through kexec mailing list archive. OK, I will check the Xen kexec implementation. CCing Magnus and Horms. They had done the page tables related changes for Xen. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 2/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap - remove boot_ioremap
This patch replaces boot_ioremap invokation with bt_ioremap and removes the boot_ioremap implementation. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/Kconfig |4 - arch/x86/kernel/srat_32.c |8 +-- arch/x86/mm/Makefile_32 |1 arch/x86/mm/boot_ioremap_32.c | 100 -- include/asm-x86/efi.h |8 --- 5 files changed, 5 insertions(+), 116 deletions(-) --- a/arch/x86/kernel/srat_32.c +++ b/arch/x86/kernel/srat_32.c @@ -57,8 +57,6 @@ static struct node_memory_chunk_s node_m static int num_memory_chunks; /* total number of memory chunks */ static u8 __initdata apicid_to_pxm[MAX_APICID]; -extern void * boot_ioremap(unsigned long, unsigned long); - /* Identify CPU proximity domains */ static void __init parse_cpu_affinity_structure(char *p) { @@ -299,7 +297,7 @@ int __init get_memcfg_from_srat(void) } rsdt = (struct acpi_table_rsdt *) - boot_ioremap(rsdp-rsdt_physical_address, sizeof(struct acpi_table_rsdt)); + bt_ioremap(rsdp-rsdt_physical_address, sizeof(struct acpi_table_rsdt)); if (!rsdt) { printk(KERN_WARNING @@ -339,11 +337,11 @@ int __init get_memcfg_from_srat(void) for (i = 0; i tables; i++) { /* Map in header, then map in full table length. */ header = (struct acpi_table_header *) - boot_ioremap(saved_rsdt.table.table_offset_entry[i], sizeof(struct acpi_table_header)); + bt_ioremap(saved_rsdt.table.table_offset_entry[i], sizeof(struct acpi_table_header)); if (!header) break; header = (struct acpi_table_header *) - boot_ioremap(saved_rsdt.table.table_offset_entry[i], header-length); + bt_ioremap(saved_rsdt.table.table_offset_entry[i], header-length); if (!header) break; --- a/arch/x86/mm/boot_ioremap_32.c +++ /dev/null @@ -1,100 +0,0 @@ -/* - * arch/i386/mm/boot_ioremap.c - * - * Re-map functions for early boot-time before paging_init() when the - * boot-time pagetables are still in use - * - * Written by Dave Hansen [EMAIL PROTECTED] - */ - - -/* - * We need to use the 2-level pagetable functions, but CONFIG_X86_PAE - * keeps that from happening. If anyone has a better way, I'm listening. - * - * boot_pte_t is defined only if this all works correctly - */ - -#undef CONFIG_X86_PAE -#undef CONFIG_PARAVIRT -#include asm/page.h -#include asm/pgtable.h -#include asm/tlbflush.h -#include linux/init.h -#include linux/stddef.h - -/* - * I'm cheating here. It is known that the two boot PTE pages are - * allocated next to each other. I'm pretending that they're just - * one big array. - */ - -#define BOOT_PTE_PTRS (PTRS_PER_PTE*2) - -static unsigned long boot_pte_index(unsigned long vaddr) -{ - return __pa(vaddr) PAGE_SHIFT; -} - -static inline boot_pte_t* boot_vaddr_to_pte(void *address) -{ - boot_pte_t* boot_pg = (boot_pte_t*)pg0; - return boot_pg[boot_pte_index((unsigned long)address)]; -} - -/* - * This is only for a caller who is clever enough to page-align - * phys_addr and virtual_source, and who also has a preference - * about which virtual address from which to steal ptes - */ -static void __boot_ioremap(unsigned long phys_addr, unsigned long nrpages, - void* virtual_source) -{ - boot_pte_t* pte; - int i; - char *vaddr = virtual_source; - - pte = boot_vaddr_to_pte(virtual_source); - for (i=0; i nrpages; i++, phys_addr += PAGE_SIZE, pte++) { - set_pte(pte, pfn_pte(phys_addrPAGE_SHIFT, PAGE_KERNEL)); - __flush_tlb_one((unsigned long) vaddr[i*PAGE_SIZE]); - } -} - -/* the virtual space we're going to remap comes from this array */ -#define BOOT_IOREMAP_PAGES 4 -#define BOOT_IOREMAP_SIZE (BOOT_IOREMAP_PAGES*PAGE_SIZE) -static __initdata char boot_ioremap_space[BOOT_IOREMAP_SIZE] - __attribute__ ((aligned (PAGE_SIZE))); - -/* - * This only applies to things which need to ioremap before paging_init() - * bt_ioremap() and plain ioremap() are both useless at this point. - * - * When used, we're still using the boot-time pagetables, which only - * have 2 PTE pages mapping the first 8MB - * - * There is no unmap. The boot-time PTE pages aren't used after boot. - * If you really want the space back, just remap it yourself. - * boot_ioremap(ioremap_space-PAGE_OFFSET, BOOT_IOREMAP_SIZE) - */ -__init void* boot_ioremap(unsigned long phys_addr, unsigned long size) -{ - unsigned long last_addr, offset; - unsigned int nrpages; - - last_addr = phys_addr + size - 1; - - /* page align the requested address */ - offset = phys_addr ~PAGE_MASK; - phys_addr = PAGE_MASK; - size = PAGE_ALIGN(last_addr) - phys_addr; - - nrpages = size PAGE_SHIFT
[PATCH -mm 0/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap
This patchset replaces boot_ioremap with a enhanced version of bt_ioremap and renames the bt_ioremap to early_ioremap. This reduces 12k from .init.data segment and increases the size of memory that can be re-mapped before paging_init to 64k. This patchset is based on linux-2.6.24-rc5-mm1 + efi-split-efi-tables-parsing-code-from-efi-runtime-service-support-code.patch. It has been tested on i386 with PAE on/off. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 1/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance bt_ioremap
This patch makes bt_ioremap can be used before paging_init via providing an early implementation of set_fixmap that can be used before paging_init. This makes boot_ioremap can be replaced by bt_ioremap. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/setup_32.c |1 arch/x86/mm/init_32.c |2 + arch/x86/mm/ioremap_32.c | 87 +++-- include/asm-x86/io_32.h|3 + 4 files changed, 91 insertions(+), 2 deletions(-) --- a/arch/x86/mm/ioremap_32.c +++ b/arch/x86/mm/ioremap_32.c @@ -208,6 +208,89 @@ void iounmap(volatile void __iomem *addr } EXPORT_SYMBOL(iounmap); +static __initdata int after_paging_init; +static __initdata unsigned long bm_pte[1024] + __attribute__((aligned(PAGE_SIZE))); + +static inline unsigned long * __init bt_ioremap_pgd(unsigned long addr) +{ + return (unsigned long *)swapper_pg_dir + ((addr 22) 1023); +} + +static inline unsigned long * __init bt_ioremap_pte(unsigned long addr) +{ + return bm_pte + ((addr PAGE_SHIFT) 1023); +} + +void __init bt_ioremap_init(void) +{ + unsigned long *pgd; + + pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN)); + *pgd = __pa(bm_pte) | _PAGE_TABLE; + memset(bm_pte, 0, sizeof(bm_pte)); + BUG_ON(pgd != bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_END))); +} + +void __init bt_ioremap_clear(void) +{ + unsigned long *pgd; + + pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN)); + *pgd = 0; + __flush_tlb_all(); +} + +void __init bt_ioremap_reset(void) +{ + enum fixed_addresses idx; + unsigned long *pte, phys, addr; + + after_paging_init = 1; + for (idx = FIX_BTMAP_BEGIN; idx = FIX_BTMAP_END; idx--) { + addr = fix_to_virt(idx); + pte = bt_ioremap_pte(addr); + if (!*pte _PAGE_PRESENT) { + phys = *pte PAGE_MASK; + set_fixmap(idx, phys); + } + } +} + +static void __init __bt_set_fixmap(enum fixed_addresses idx, + unsigned long phys, pgprot_t flags) +{ + unsigned long *pte, addr = __fix_to_virt(idx); + + if (idx = __end_of_fixed_addresses) { + BUG(); + return; + } + pte = bt_ioremap_pte(addr); + if (pgprot_val(flags)) + *pte = (phys PAGE_MASK) | pgprot_val(flags); + else + *pte = 0; + __flush_tlb_one(addr); +} + +static inline void __init bt_set_fixmap(enum fixed_addresses idx, + unsigned long phys) +{ + if (after_paging_init) + set_fixmap(idx, phys); + else + __bt_set_fixmap(idx, phys, PAGE_KERNEL); +} + +static inline void __init bt_clear_fixmap(enum fixed_addresses idx) +{ + if (after_paging_init) + clear_fixmap(idx); + else + __bt_set_fixmap(idx, 0, __pgprot(0)); +} + void __init *bt_ioremap(unsigned long phys_addr, unsigned long size) { unsigned long offset, last_addr; @@ -244,7 +327,7 @@ void __init *bt_ioremap(unsigned long ph */ idx = FIX_BTMAP_BEGIN; while (nrpages 0) { - set_fixmap(idx, phys_addr); + bt_set_fixmap(idx, phys_addr); phys_addr += PAGE_SIZE; --idx; --nrpages; @@ -267,7 +350,7 @@ void __init bt_iounmap(void *addr, unsig idx = FIX_BTMAP_BEGIN; while (nrpages 0) { - clear_fixmap(idx); + bt_clear_fixmap(idx); --idx; --nrpages; } --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -423,9 +423,11 @@ static void __init pagetable_init (void) * Fixed mappings, only the page table structure has to be * created - mappings will be set by set_fixmap(): */ + bt_ioremap_clear(); vaddr = __fix_to_virt(__end_of_fixed_addresses - 1) PMD_MASK; end = (FIXADDR_TOP + PMD_SIZE - 1) PMD_MASK; page_table_range_init(vaddr, end, pgd_base); + bt_ioremap_reset(); permanent_kmaps_init(pgd_base); --- a/include/asm-x86/io_32.h +++ b/include/asm-x86/io_32.h @@ -130,6 +130,9 @@ extern void iounmap(volatile void __iome * mappings, before the real ioremap() is functional. * A boot-time mapping is currently limited to at most 16 pages. */ +extern void bt_ioremap_init(void); +extern void bt_ioremap_clear(void); +extern void bt_ioremap_reset(void); extern void *bt_ioremap(unsigned long offset, unsigned long size); extern void bt_iounmap(void *addr, unsigned long size); extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys); --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -624,6 +624,7 @@ void __init setup_arch(char **cmdline_p) memcpy(boot_cpu_data, new_cpu_data, sizeof(new_cpu_data
[PATCH -mm 3/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap - rename bt_ioremap to early_ioremap
This patch renames bt_ioremap to early_ioremap, which is used in x86_64. This makes it easier to merge i386 and x86_64 usage. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/efi.c| 16 arch/x86/kernel/efi_32.c |2 +- arch/x86/kernel/efi_tables.c | 12 ++-- arch/x86/kernel/setup_32.c |2 +- arch/x86/kernel/srat_32.c|6 +++--- arch/x86/mm/init_32.c|4 ++-- arch/x86/mm/ioremap_32.c | 38 +++--- include/asm-x86/dmi.h|7 ++- include/asm-x86/efi.h|8 include/asm-x86/io_32.h | 16 10 files changed, 50 insertions(+), 61 deletions(-) --- a/arch/x86/mm/ioremap_32.c +++ b/arch/x86/mm/ioremap_32.c @@ -212,36 +212,36 @@ static __initdata int after_paging_init; static __initdata unsigned long bm_pte[1024] __attribute__((aligned(PAGE_SIZE))); -static inline unsigned long * __init bt_ioremap_pgd(unsigned long addr) +static inline unsigned long * __init early_ioremap_pgd(unsigned long addr) { return (unsigned long *)swapper_pg_dir + ((addr 22) 1023); } -static inline unsigned long * __init bt_ioremap_pte(unsigned long addr) +static inline unsigned long * __init early_ioremap_pte(unsigned long addr) { return bm_pte + ((addr PAGE_SHIFT) 1023); } -void __init bt_ioremap_init(void) +void __init early_ioremap_init(void) { unsigned long *pgd; - pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN)); + pgd = early_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN)); *pgd = __pa(bm_pte) | _PAGE_TABLE; memset(bm_pte, 0, sizeof(bm_pte)); - BUG_ON(pgd != bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_END))); + BUG_ON(pgd != early_ioremap_pgd(fix_to_virt(FIX_BTMAP_END))); } -void __init bt_ioremap_clear(void) +void __init early_ioremap_clear(void) { unsigned long *pgd; - pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN)); + pgd = early_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN)); *pgd = 0; __flush_tlb_all(); } -void __init bt_ioremap_reset(void) +void __init early_ioremap_reset(void) { enum fixed_addresses idx; unsigned long *pte, phys, addr; @@ -249,7 +249,7 @@ void __init bt_ioremap_reset(void) after_paging_init = 1; for (idx = FIX_BTMAP_BEGIN; idx = FIX_BTMAP_END; idx--) { addr = fix_to_virt(idx); - pte = bt_ioremap_pte(addr); + pte = early_ioremap_pte(addr); if (!*pte _PAGE_PRESENT) { phys = *pte PAGE_MASK; set_fixmap(idx, phys); @@ -257,7 +257,7 @@ void __init bt_ioremap_reset(void) } } -static void __init __bt_set_fixmap(enum fixed_addresses idx, +static void __init __early_set_fixmap(enum fixed_addresses idx, unsigned long phys, pgprot_t flags) { unsigned long *pte, addr = __fix_to_virt(idx); @@ -266,7 +266,7 @@ static void __init __bt_set_fixmap(enum BUG(); return; } - pte = bt_ioremap_pte(addr); + pte = early_ioremap_pte(addr); if (pgprot_val(flags)) *pte = (phys PAGE_MASK) | pgprot_val(flags); else @@ -274,24 +274,24 @@ static void __init __bt_set_fixmap(enum __flush_tlb_one(addr); } -static inline void __init bt_set_fixmap(enum fixed_addresses idx, +static inline void __init early_set_fixmap(enum fixed_addresses idx, unsigned long phys) { if (after_paging_init) set_fixmap(idx, phys); else - __bt_set_fixmap(idx, phys, PAGE_KERNEL); + __early_set_fixmap(idx, phys, PAGE_KERNEL); } -static inline void __init bt_clear_fixmap(enum fixed_addresses idx) +static inline void __init early_clear_fixmap(enum fixed_addresses idx) { if (after_paging_init) clear_fixmap(idx); else - __bt_set_fixmap(idx, 0, __pgprot(0)); + __early_set_fixmap(idx, 0, __pgprot(0)); } -void __init *bt_ioremap(unsigned long phys_addr, unsigned long size) +void __init *early_ioremap(unsigned long phys_addr, unsigned long size) { unsigned long offset, last_addr; unsigned int nrpages; @@ -327,7 +327,7 @@ void __init *bt_ioremap(unsigned long ph */ idx = FIX_BTMAP_BEGIN; while (nrpages 0) { - bt_set_fixmap(idx, phys_addr); + early_set_fixmap(idx, phys_addr); phys_addr += PAGE_SIZE; --idx; --nrpages; @@ -335,7 +335,7 @@ void __init *bt_ioremap(unsigned long ph return (void*) (offset + fix_to_virt(FIX_BTMAP_BEGIN)); } -void __init bt_iounmap(void *addr, unsigned long size) +void __init early_iounmap(void *addr, unsigned long size) { unsigned long virt_addr
[PATCH -mm 1/2 -v2] kexec/i386: kexec page table code clean up - add arch_kimage
This patch add an architecture specific struct arch_kimage into struct kimage. Three pointers to page table pages used by kexec are added to struct arch_kimage. The page tables pages are dynamically allocated in machine_kexec_prepare instead of statically from BSS segment. This will save up to 20k memory when kexec image is not loaded. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/machine_kexec_32.c | 70 + include/asm-x86/kexec_32.h | 12 ++ include/linux/kexec.h |4 ++ 3 files changed, 64 insertions(+), 22 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -11,6 +11,7 @@ #include linux/delay.h #include linux/init.h #include linux/numa.h +#include linux/gfp.h #include asm/pgtable.h #include asm/pgalloc.h #include asm/tlbflush.h @@ -21,15 +22,6 @@ #include asm/desc.h #include asm/system.h -#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) -static u32 kexec_pgd[1024] PAGE_ALIGNED; -#ifdef CONFIG_X86_PAE -static u32 kexec_pmd0[1024] PAGE_ALIGNED; -static u32 kexec_pmd1[1024] PAGE_ALIGNED; -#endif -static u32 kexec_pte0[1024] PAGE_ALIGNED; -static u32 kexec_pte1[1024] PAGE_ALIGNED; - static void set_idt(void *newidt, __u16 limit) { struct Xgt_desc_struct curidt; @@ -72,6 +64,39 @@ static void load_segments(void) #undef __STR } +static void machine_kexec_free_page_tables(struct kimage *image) +{ + free_page((unsigned long)image-arch_kimage.pgd); +#ifdef CONFIG_X86_PAE + free_page((unsigned long)image-arch_kimage.pmd0); + free_page((unsigned long)image-arch_kimage.pmd1); +#endif + free_page((unsigned long)image-arch_kimage.pte0); + free_page((unsigned long)image-arch_kimage.pte1); +} + +static int machine_kexec_alloc_page_tables(struct kimage *image) +{ + image-arch_kimage.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL); +#ifdef CONFIG_X86_PAE + image-arch_kimage.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL); + image-arch_kimage.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL); +#endif + image-arch_kimage.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL); + image-arch_kimage.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL); + if (!image-arch_kimage.pgd || +#ifdef CONFIG_X86_PAE + !image-arch_kimage.pmd0 || + !image-arch_kimage.pmd1 || +#endif + !image-arch_kimage.pte0 || + !image-arch_kimage.pte1) { + machine_kexec_free_page_tables(image); + return -ENOMEM; + } + return 0; +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -83,11 +108,11 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * - Allocate page tables */ int machine_kexec_prepare(struct kimage *image) { - return 0; + return machine_kexec_alloc_page_tables(image); } /* @@ -96,6 +121,7 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image) { + machine_kexec_free_page_tables(image); } /* @@ -115,18 +141,18 @@ NORET_TYPE void machine_kexec(struct kim page_list[PA_CONTROL_PAGE] = __pa(control_page); page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; - page_list[PA_PGD] = __pa(kexec_pgd); - page_list[VA_PGD] = (unsigned long)kexec_pgd; + page_list[PA_PGD] = __pa(image-arch_kimage.pgd); + page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd; #ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(kexec_pmd0); - page_list[VA_PMD_0] = (unsigned long)kexec_pmd0; - page_list[PA_PMD_1] = __pa(kexec_pmd1); - page_list[VA_PMD_1] = (unsigned long)kexec_pmd1; -#endif - page_list[PA_PTE_0] = __pa(kexec_pte0); - page_list[VA_PTE_0] = (unsigned long)kexec_pte0; - page_list[PA_PTE_1] = __pa(kexec_pte1); - page_list[VA_PTE_1] = (unsigned long)kexec_pte1; + page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0); + page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0; + page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1); + page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1; +#endif + page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0); + page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0; + page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1); + page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1; /* The segment registers are funny things, they have both a * visible and an invisible part. Whenever the visible part is --- a/include/asm-x86/kexec_32.h +++ b/include/asm-x86/kexec_32.h @@ -94,6 +94,18 @@ relocate_kernel(unsigned long indirectio unsigned long start_address, unsigned int has_pae) ATTRIB_NORET; +#define ARCH_HAS_ARCH_KIMAGE + +struct
[PATCH -mm 0/2 -v2] kexec/i386: kexec page table code clean up
This patchset cleans up page table setup code of kexec on i386. This patchset is based on 2.6.24-rc5-mm1 and has been tested on i386 with/without PAE enabled. v2: - Rename some function names, such as alloc_page_tables - machine_kexec_alloc_page_tables, etc. - Cleanup error processing for machine_alloc_page_tables. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 2/2 -v2] kexec/i386: kexec page table code clean up - page table setup in C
This patch transforms the kexec page tables setup code from assembler code to C code in machine_kexec_prepare. This improves readability and reduces code line number. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/machine_kexec_32.c | 59 ++ arch/x86/kernel/relocate_kernel_32.S | 114 --- include/asm-x86/kexec_32.h | 18 - 3 files changed, 48 insertions(+), 143 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -97,6 +97,45 @@ static int machine_kexec_alloc_page_tabl return 0; } +static void machine_kexec_page_table_set_one( + pgd_t *pgd, pmd_t *pmd, pte_t *pte, + unsigned long vaddr, unsigned long paddr) +{ + pud_t *pud; + + pgd += pgd_index(vaddr); +#ifdef CONFIG_X86_PAE + if (!(pgd_val(*pgd) _PAGE_PRESENT)) + set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT)); +#endif + pud = pud_offset(pgd, vaddr); + pmd = pmd_offset(pud, vaddr); + if (!(pmd_val(*pmd) _PAGE_PRESENT)) + set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE)); + pte = pte_offset_kernel(pmd, vaddr); + set_pte(pte, pfn_pte(paddr PAGE_SHIFT, PAGE_KERNEL_EXEC)); +} + +static void machine_kexec_prepare_page_tables(struct kimage *image) +{ + void *control_page; + pmd_t *pmd = 0; + + control_page = page_address(image-control_code_page); +#ifdef CONFIG_X86_PAE + pmd = image-arch_kimage.pmd0; +#endif + machine_kexec_page_table_set_one( + image-arch_kimage.pgd, pmd, image-arch_kimage.pte0, + (unsigned long)relocate_kernel, __pa(control_page)); +#ifdef CONFIG_X86_PAE + pmd = image-arch_kimage.pmd1; +#endif + machine_kexec_page_table_set_one( + image-arch_kimage.pgd, pmd, image-arch_kimage.pte1, + __pa(control_page), __pa(control_page)); +} + /* * A architecture hook called to validate the * proposed image and prepare the control pages @@ -109,10 +148,16 @@ static int machine_kexec_alloc_page_tabl * later. * * - Allocate page tables + * - Setup page tables */ int machine_kexec_prepare(struct kimage *image) { - return machine_kexec_alloc_page_tables(image); + int error; + error = machine_kexec_alloc_page_tables(image); + if (error) + return error; + machine_kexec_prepare_page_tables(image); + return 0; } /* @@ -140,19 +185,7 @@ NORET_TYPE void machine_kexec(struct kim memcpy(control_page, relocate_kernel, PAGE_SIZE); page_list[PA_CONTROL_PAGE] = __pa(control_page); - page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; page_list[PA_PGD] = __pa(image-arch_kimage.pgd); - page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd; -#ifdef CONFIG_X86_PAE - page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0); - page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0; - page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1); - page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1; -#endif - page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0); - page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0; - page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1); - page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1; /* The segment registers are funny things, they have both a * visible and an invisible part. Whenever the visible part is --- a/arch/x86/kernel/relocate_kernel_32.S +++ b/arch/x86/kernel/relocate_kernel_32.S @@ -16,126 +16,12 @@ #define PTR(x) (x 2) #define PAGE_ALIGNED (1 PAGE_SHIFT) -#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */ -#define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */ .text .align PAGE_ALIGNED .globl relocate_kernel relocate_kernel: movl8(%esp), %ebp /* list of pages */ - -#ifdef CONFIG_X86_PAE - /* map the control page at its virtual address */ - - movlPTR(VA_PGD)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0xc000, %eax - shrl$27, %eax - addl%edi, %eax - - movlPTR(PA_PMD_0)(%ebp), %edx - orl $PAE_PGD_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PMD_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x3fe0, %eax - shrl$18, %eax - addl%edi, %eax - - movlPTR(PA_PTE_0)(%ebp), %edx - orl $PAGE_ATTR, %edx - movl%edx, (%eax) - - movlPTR(VA_PTE_0)(%ebp), %edi - movlPTR(VA_CONTROL_PAGE)(%ebp), %eax - andl$0x001ff000, %eax - shrl$9, %eax - addl%edi, %eax - - movlPTR(PA_CONTROL_PAGE)(%ebp), %edx - orl $PAGE_ATTR, %edx - movl%edx, (%eax) - - /* identity map the control page at its physical
Re: [PATCH -mm 0/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap
On Tue, 2008-01-15 at 09:44 +0100, Ingo Molnar wrote: * Huang, Ying [EMAIL PROTECTED] wrote: This patchset replaces boot_ioremap with a enhanced version of bt_ioremap and renames the bt_ioremap to early_ioremap. This reduces 12k from .init.data segment and increases the size of memory that can be re-mapped before paging_init to 64k. in latest x86.git#mm there's an early_ioremap() introduced as part of the PAT series - available on both 32-bit and 64-bit. Could you take a look at it and use that if it's OK for your purposes? After checking the early_ioremap() implementation in arch/x86/kernel/setup_32.c, I found that it is a duplication of bt_ioremap() implementation in arch/x86/mm/ioremap_32.c. Both implementations use set_fixmap(), so they can be used only after paging_init(). The early_ioremap implementation provided in this patchset works as follow: - Enhances bt_ioremap, make it usable before paging_init() via a dedicated PTE page. - Rename bt_ioremap to early_ioremap So I think maybe we should replace the early_ioremap() implementation in PAT series with that of this series. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [BUGFIX] x86_64: NX bit handling in change_page_attr
On Wed, 2007-09-12 at 15:35 +0200, Andi Kleen wrote: Index: linux-2.6.23-rc2-mm2/arch/x86_64/mm/pageattr.c === --- linux-2.6.23-rc2-mm2.orig/arch/x86_64/mm/pageattr.c 2007-08-17 12:50:25.0 +0800 +++ linux-2.6.23-rc2-mm2/arch/x86_64/mm/pageattr.c2007-08-17 12:50:48.0 +0800 @@ -147,6 +147,7 @@ split = split_large_page(address, prot, ref_prot2); if (!split) return -ENOMEM; + pgprot_val(ref_prot2) = ~_PAGE_NX; set_pte(kpte, mk_pte(split, ref_prot2)); kpte_page = split; } What happened with this? Still valid? The bug is probably latent there, but I don't think it can affect anything in the kernel because nothing in the kernel should change NX status as far as I know. Where did you see it? I found the problem for EFI runtime service supporting. Where the EFI runtime code (from firmware) need to be mapped without NX bit set. Anyways I would prefer to only clear the PMD NX when NX status actually changes on the PTE.Can you do that change? This change is sufficient for Intel CPU. Because the NX bit of PTE is still there, no page will be made executable if not been set explicitly through PTE. For AMD CPU, will the page be made executable if the NX bit of PMD is cleared and the NX bit of PTE is set? If so, I will do the change as you said. Anyways; it's really not very important. It is needed by EFI runtime service supporting. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC -mm 1/2] i386/x86_64 boot: setup data
This patch add a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used to define a more extensible boot parameters passing mechanism. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/i386/Kconfig|3 --- arch/i386/boot/header.S |6 ++ arch/i386/kernel/setup.c | 20 arch/x86_64/kernel/setup.c | 19 +++ include/asm-i386/bootparam.h | 15 +++ include/asm-i386/io.h|7 +++ 6 files changed, 67 insertions(+), 3 deletions(-) Index: linux-2.6.23-rc4/include/asm-i386/bootparam.h === --- linux-2.6.23-rc4.orig/include/asm-i386/bootparam.h 2007-09-17 14:18:24.0 +0800 +++ linux-2.6.23-rc4/include/asm-i386/bootparam.h 2007-09-17 15:02:33.0 +0800 @@ -9,6 +9,17 @@ #include asm/ist.h #include video/edid.h +/* setup data types */ +#define SETUP_NONE 0 + +/* extensible setup data list node */ +struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; +} __attribute__((packed)); + struct setup_header { u8 setup_sects; u16 root_flags; @@ -41,6 +52,10 @@ u32 initrd_addr_max; u32 kernel_alignment; u8 relocatable_kernel; + u8 _pad2[3]; + u32 cmdline_size; + u32 _pad3; + u64 setup_data; } __attribute__((packed)); struct sys_desc_table { Index: linux-2.6.23-rc4/arch/i386/boot/header.S === --- linux-2.6.23-rc4.orig/arch/i386/boot/header.S 2007-09-17 14:17:32.0 +0800 +++ linux-2.6.23-rc4/arch/i386/boot/header.S2007-09-17 14:18:32.0 +0800 @@ -214,6 +214,12 @@ #added with boot protocol #version 2.06 +pad4: .long 0 + +setup_data:.quad 0 # 64-bit physical pointer to + # single linked list of + # struct setup_data + # End of setup header # .section .inittext, ax Index: linux-2.6.23-rc4/arch/x86_64/kernel/setup.c === --- linux-2.6.23-rc4.orig/arch/x86_64/kernel/setup.c2007-09-17 14:18:23.0 +0800 +++ linux-2.6.23-rc4/arch/x86_64/kernel/setup.c 2007-09-17 15:02:33.0 +0800 @@ -221,6 +221,23 @@ ebda_size = 64*1024; } +void __init parse_setup_data(void) +{ + struct setup_data *setup_data; + unsigned long pa_setup_data; + + pa_setup_data = boot_params.hdr.setup_data; + while (pa_setup_data) { + setup_data = early_ioremap(pa_setup_data, PAGE_SIZE); + switch (setup_data-type) { + default: + break; + } + pa_setup_data = setup_data-next; + early_iounmap(setup_data, PAGE_SIZE); + } +} + void __init setup_arch(char **cmdline_p) { printk(KERN_INFO Command line: %s\n, boot_command_line); @@ -256,6 +273,8 @@ strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE); *cmdline_p = command_line; + parse_setup_data(); + parse_early_param(); finish_e820_parsing(); Index: linux-2.6.23-rc4/arch/i386/kernel/setup.c === --- linux-2.6.23-rc4.orig/arch/i386/kernel/setup.c 2007-09-17 14:18:23.0 +0800 +++ linux-2.6.23-rc4/arch/i386/kernel/setup.c 2007-09-17 14:18:32.0 +0800 @@ -496,6 +496,23 @@ return machine_specific_memory_setup(); } +void __init parse_setup_data(void) +{ + struct setup_data *setup_data; + unsigned long pa_setup_data, pa_next; + + pa_setup_data = boot_params.hdr.setup_data; + while (pa_setup_data) { + setup_data = boot_ioremap(pa_setup_data, PAGE_SIZE); + pa_next = setup_data-next; + switch (setup_data-type) { + default: + break; + } + pa_setup_data = pa_next; + } +} + /* * Determine if we were loaded by an EFI loader. If so, then we have also been * passed the efi memmap, systab, etc., so we should use these data structures @@ -544,6 +561,9 @@ rd_prompt = ((boot_params.hdr.ram_size RAMDISK_PROMPT_FLAG) != 0); rd_doload = ((boot_params.hdr.ram_size RAMDISK_LOAD_FLAG) != 0); #endif + + parse_setup_data(); + ARCH_SETUP if (efi_enabled) efi_init(); Index: linux-2.6.23-rc4/include/asm-i386/io.h
[RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol
This patch defines a 32-bit boot protocol and adds corresponding document. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- boot.txt | 105 ++- 1 file changed, 104 insertions(+), 1 deletion(-) Index: linux-2.6.23-rc4/Documentation/i386/boot.txt === --- linux-2.6.23-rc4.orig/Documentation/i386/boot.txt 2007-09-17 11:22:32.0 +0800 +++ linux-2.6.23-rc4/Documentation/i386/boot.txt2007-09-17 11:34:10.0 +0800 @@ -2,7 +2,7 @@ H. Peter Anvin [EMAIL PROTECTED] - Last update 2007-05-23 + Last update 2007-09-14 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -42,6 +42,9 @@ Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of the boot command line +Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical + pointer to single linked list of struct setup_data. + Added 32-bit boot protocol. MEMORY LAYOUT @@ -168,6 +171,9 @@ 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/3 N/A pad2Unused 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line +023c/4 N/A pad3Unused +0240/8 2.07+ setup_data 64-bit physical pointer to linked list + of struct setup_data (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -480,6 +486,36 @@ cmdline_size characters. With protocol version 2.05 and earlier, the maximum size was 255. +Field name:setup_data +Type: write (obligatory) +Offset/size: 0x240/8 +Protocol: 2.07+ + + The 64-bit physical pointer to NULL terminated single linked list of + struct setup_data. This is used to define a more extensible boot + parameters passing mechanism. The definition of struct setup_data is + as follow: + + struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; + } __attribute__((packed)); + + Where, the next is a 64-bit physical pointer to the next node of + linked list, the next field of the last node is 0; the type is used + to identify the contents of data; the len is the length of data + field; the data holds the real payload. + + With this field, to add a new boot parameter written by bootloader, + it is not needed to add a new field to real mode header, just add a + new setup_data type is sufficient. But to add a new boot parameter + read by bootloader, it is still needed to add a new field. + + TODO: Where is the safe place to place the linked list of struct + setup_data? + THE KERNEL COMMAND LINE @@ -753,3 +789,70 @@ After completing your hook, you should jump to the address that was in this field before your boot loader overwrote it (relocated, if appropriate.) + + + SETUP DATA TYPES + + + 32-bit BOOT PROTOCOL + +For machine with some new BIOS other than legacy BIOS, such as EFI, +LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel +based on legacy BIOS can not be used, so a 32-bit boot protocol need +to be defined. + +In 32-bit boot protocol, the first step in loading a Linux kernel +should still be to load the real-mode code and then examine the kernel +header at offset 0x01f1. But, it is not necessary to load all +real-mode code, just first 4K bytes traditionally known as zero page +is needed. + +In addition to read/modify/write kernel header of the zero page as +that of 16-bit boot protocol, the boot loader should fill the +following additional fields of the zero page too. + +Offset TypeDescription +-- --- +0 32 bytesstruct screen_info, SCREEN_INFO + ATTENTION, overlaps the following !!! +2 unsigned short EXT_MEM_K, extended memory size in Kb (from int 0x15) + 0x20 unsigned short CL_MAGIC, commandline magic number (=0xA33F) + 0x22 unsigned short CL_OFFSET, commandline offset + Address of commandline is calculated: + 0x9 + contents of CL_OFFSET + (only taken, when CL_MAGIC = 0xA33F) + 0x40 20 bytesstruct apm_bios_info, APM_BIOS_INFO + 0x60 16 bytesIntel SpeedStep (IST) BIOS support information + 0x80 16 byteshd0-disk-parameter from intvector 0x41 + 0x90 16 byteshd1-disk-parameter from intvector 0x46 + + 0xa0 16 bytesSystem description table truncated to 16 bytes. + ( struct sys_desc_table_struct ) + 0xb0 - 0x13f Free. Add more parameters here if you really need them. + 0x140- 0x1be EDID_INFO Video
[RFC -mm 0/2] i386/x86_64 boot: 32-bit boot protocol
For machine with some new BIOS other than legacy BIOS, such as EFI, LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel based on legacy BIOS can not be used, so a 32-bit boot protocol need to be defined. This patchset defines a 32-bit boot protocol for i386 and x86_64. A linked list based boot parameters pass mechanism is also added to improve the extensibility. The patchset has been tested against 2.6.23-rc4-mm1 kernel on x86_64. This patchset is based on the proposal of Peter Anvin. Known Issues: 1. Where is safe to place the linked list of setup_data? Because the length of the linked list of setup_data is variable, it can not be copied into BSS segment of kernel as that of zero page. We must find a safe place for it, where it will not be overwritten by kernel during booting up. The i386 kernel will overwrite some pages after _end. The x86_64 kernel will overwrite some pages from 0x1000 on. EFI64 runtime service is the first user of 32-bit boot protocol and boot parameters passing mechanism. To demonstrate their usage, the EFI64 runtime service patch is also appended with the mail. Best Regards, Huang Ying --- Documentation/i386/boot.txt| 15 Documentation/x86_64/boot-options.txt | 12 arch/x86_64/Kconfig| 11 arch/x86_64/kernel/Makefile|1 arch/x86_64/kernel/efi.c | 597 + arch/x86_64/kernel/efi_callwrap.S | 69 +++ arch/x86_64/kernel/reboot.c| 20 - arch/x86_64/kernel/setup.c | 14 arch/x86_64/kernel/time.c | 48 +- include/asm-i386/bootparam.h | 10 include/asm-x86_64/efi.h | 18 include/asm-x86_64/eficallwrap.h | 33 + include/asm-x86_64/emergency-restart.h |9 include/asm-x86_64/fixmap.h|3 include/asm-x86_64/time.h |7 15 files changed, 842 insertions(+), 25 deletions(-) Index: linux-2.6.23-rc4/include/asm-x86_64/eficallwrap.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.23-rc4/include/asm-x86_64/eficallwrap.h 2007-09-17 15:03:47.0 +0800 @@ -0,0 +1,33 @@ +/* + * Copyright (C) 2007 Intel Corp + * Bibo Mao [EMAIL PROTECTED] + * Huang Ying [EMAIL PROTECTED] + * + * Function calling ABI conversion from SYSV to Windows for x86_64 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef __ASM_X86_64_EFICALLWRAP_H +#define __ASM_X86_64_EFICALLWRAP_H + +extern efi_status_t lin2win0(void *fp); +extern efi_status_t lin2win1(void *fp, u64 arg1); +extern efi_status_t lin2win2(void *fp, u64 arg1, u64 arg2); +extern efi_status_t lin2win3(void *fp, u64 arg1, u64 arg2, u64 arg3); +extern efi_status_t lin2win4(void *fp, u64 arg1, u64 arg2, u64 arg3, u64 arg4); +extern efi_status_t lin2win5(void *fp, u64 arg1, u64 arg2, u64 arg3, +u64 arg4, u64 arg5); +extern efi_status_t lin2win6(void *fp, u64 arg1, u64 arg2, u64 arg3, +u64 arg4, u64 arg5, u64 arg6); + +#endif Index: linux-2.6.23-rc4/arch/x86_64/kernel/efi.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.23-rc4/arch/x86_64/kernel/efi.c 2007-09-17 15:03:47.0 +0800 @@ -0,0 +1,597 @@ +/* + * Extensible Firmware Interface + * + * Based on Extensible Firmware Interface Specification version 1.0 + * + * Copyright (C) 1999 VA Linux Systems + * Copyright (C) 1999 Walt Drummond [EMAIL PROTECTED] + * Copyright (C) 1999-2002 Hewlett-Packard Co. + * David Mosberger-Tang [EMAIL PROTECTED] + * Stephane Eranian [EMAIL PROTECTED] + * Copyright (C) 2005-2008 Intel Co. + * Fenghua Yu [EMAIL PROTECTED] + * Bibo Mao [EMAIL PROTECTED] + * Chandramouli Narayanan [EMAIL PROTECTED] + * + * Code to convert EFI to E820 map has been implemented in elilo bootloader + * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table + * is setup appropriately for EFI runtime code. + * - mouli 06/14/2007. + * + * All EFI Runtime Services are not implemented yet as EFI only + * supports physical mode addressing on SoftSDV. This is to be fixed + * in a future version. --drummond 1999-07-20 + * + * Implemented EFI runtime services and virtual mode calls. --davidm + * + * Goutham Rao: [EMAIL PROTECTED] + * Skip non-WB memory and ignore empty memory ranges
Re: [RFC -mm 0/2] i386/x86_64 boot: 32-bit boot protocol
On Mon, 2007-09-17 at 10:40 +0200, Andi Kleen wrote: On Monday 17 September 2007 10:26:12 Huang, Ying wrote: For machine with some new BIOS other than legacy BIOS, such as EFI, LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel based on legacy BIOS can not be used, so a 32-bit boot protocol need to be defined. The patch doesn't seem to be what you advertise in the description. Can you start with a patch that just implements the new boot protocol parsing for better review? The EFI code should be all in separate patches. -Andi The real contents of 32-bit boot protocol patch is is in another 2 mails with the title: [RFC -mm 1/2] i386/x86_64 boot: setup data [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol The EFI patch in this mail is just an example of 32-bit boot protocol usage. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC -mm 1/2] i386/x86_64 boot: setup data
On Mon, 2007-09-17 at 08:30 -0700, H. Peter Anvin wrote: Huang, Ying wrote: This patch add a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used to define a more extensible boot parameters passing mechanism. You MUST NOT add a field like this without changing the version number, and, since you expect to enter the kernel at the PM entrypoint, you better *CHECK* that version number before ever descending down the chain. I forgot changing the version number in boot/head.S. I will add it. And I will add version number checking before descending down the chain. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol
On Mon, 2007-09-17 at 08:29 -0700, H. Peter Anvin wrote: Huang, Ying wrote: This patch defines a 32-bit boot protocol and adds corresponding document. + +In addition to read/modify/write kernel header of the zero page as +that of 16-bit boot protocol, the boot loader should fill the +following additional fields of the zero page too. + +Offset TypeDescription +-- --- +0 32 bytesstruct screen_info, SCREEN_INFO + ATTENTION, overlaps the following !!! +2 unsigned short EXT_MEM_K, extended memory size in Kb (from int 0x15) + 0x20 unsigned short CL_MAGIC, commandline magic number (=0xA33F) + 0x22 unsigned short CL_OFFSET, commandline offset + Address of commandline is calculated: + 0x9 + contents of CL_OFFSET + (only taken, when CL_MAGIC = 0xA33F) + 0x40 20 bytesstruct apm_bios_info, APM_BIOS_INFO + 0x60 16 bytesIntel SpeedStep (IST) BIOS support information + 0x80 16 byteshd0-disk-parameter from intvector 0x41 + 0x90 16 byteshd1-disk-parameter from intvector 0x46 + + 0xa0 16 bytesSystem description table truncated to 16 bytes. + ( struct sys_desc_table_struct ) + 0xb0 - 0x13f Free. Add more parameters here if you really need them. + 0x140- 0x1be EDID_INFO Video mode setup + +0x1c4 unsigned long EFI system table pointer +0x1c8 unsigned long EFI memory descriptor size +0x1cc unsigned long EFI memory descriptor version +0x1d0 unsigned long EFI memory descriptor map pointer +0x1d4 unsigned long EFI memory descriptor map size +0x1e0 unsigned long ALT_MEM_K, alternative mem check, in Kb +0x1e4 unsigned long Scratch field for the kernel setup code +0x1e8 charnumber of entries in E820MAP (below) +0x1e9 unsigned char number of entries in EDDBUF (below) +0x1ea unsigned char number of entries in EDD_MBR_SIG_BUFFER (below) +0x290 - 0x2cf EDD_MBR_SIG_BUFFER (edd.S) +0x2d0 - 0xd00 E820MAP +0xd00 - 0xeff EDDBUF (edd.S) for disk signature read sector +0xd00 - 0xeeb EDDBUF (edd.S) for edd data + +After loading and setuping the zero page, the boot loader can load the +32/64-bit kernel in the same way as that of 16-bit boot protocol. + +In 32-bit boot protocol, the kernel is started by jumping to the +32-bit kernel entry point, which is the start address of loaded +32/64-bit kernel. + +At entry, the CPU must be in 32-bit protected mode with paging +disabled; the CS and DS must be 4G flat segments; %esi holds the base +address of the zero page; %esp, %ebp, %edi should be zero. This is just replicating the zero-page.txt document, which can best be described as a total lie -- compare with the actual structure. OK, I will check the actual structure, and change the document accordingly. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol
On Mon, 2007-09-17 at 18:48 -0700, H. Peter Anvin wrote: Huang, Ying wrote: OK, I will check the actual structure, and change the document accordingly. The best would probably be to fix zero-page.txt (and probably rename it something saner.) Does the patch appended with the mail seems better? If it is desired, I can move the zero page description into zero-page.txt, and refer to it in 32-bit boot protocol description. I delete the hd0_info and hd1_info from the zero page. If it is undesired, I will move them back. The field in zero page is fairly complex (such as struct edd_info). Do you think it is necessary to document every field inside the first level field, until the primary data type? Or we just provide the C struct name? Best Regards, Huang Ying --- Index: linux-2.6.23-rc4/Documentation/i386/boot.txt === --- linux-2.6.23-rc4.orig/Documentation/i386/boot.txt 2007-09-18 10:40:34.0 +0800 +++ linux-2.6.23-rc4/Documentation/i386/boot.txt2007-09-18 10:46:13.0 +0800 @@ -2,7 +2,7 @@ H. Peter Anvin [EMAIL PROTECTED] - Last update 2007-05-23 + Last update 2007-09-14 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -42,6 +42,9 @@ Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of the boot command line +Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical + pointer to single linked list of struct setup_data. + Added 32-bit boot protocol. MEMORY LAYOUT @@ -168,6 +171,9 @@ 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/3 N/A pad2Unused 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line +023c/4 N/A pad3Unused +0240/8 2.07+ setup_data 64-bit physical pointer to linked list + of struct setup_data (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -480,6 +486,36 @@ cmdline_size characters. With protocol version 2.05 and earlier, the maximum size was 255. +Field name:setup_data +Type: write (obligatory) +Offset/size: 0x240/8 +Protocol: 2.07+ + + The 64-bit physical pointer to NULL terminated single linked list of + struct setup_data. This is used to define a more extensible boot + parameters passing mechanism. The definition of struct setup_data is + as follow: + + struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; + } __attribute__((packed)); + + Where, the next is a 64-bit physical pointer to the next node of + linked list, the next field of the last node is 0; the type is used + to identify the contents of data; the len is the length of data + field; the data holds the real payload. + + With this field, to add a new boot parameter written by bootloader, + it is not needed to add a new field to real mode header, just add a + new setup_data type is sufficient. But to add a new boot parameter + read by bootloader, it is still needed to add a new field. + + TODO: Where is the safe place to place the linked list of struct + setup_data? + THE KERNEL COMMAND LINE @@ -753,3 +789,57 @@ After completing your hook, you should jump to the address that was in this field before your boot loader overwrote it (relocated, if appropriate.) + + + SETUP DATA TYPES + + + 32-bit BOOT PROTOCOL + +For machine with some new BIOS other than legacy BIOS, such as EFI, +LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel +based on legacy BIOS can not be used, so a 32-bit boot protocol need +to be defined. + +In 32-bit boot protocol, the first step in loading a Linux kernel +should still be to load the real-mode code and then examine the kernel +header at offset 0x01f1. But, it is not necessary to load all +real-mode code, just first 4K bytes traditionally known as zero page +is needed. + +In addition to read/modify/write kernel header of the zero page as +that of 16-bit boot protocol, the boot loader should fill the +following additional fields of the zero page too. + +Offset Proto NameMeaning +/Size + +000/0402.07+ screen_info Text mode or frame buffer information + (struct screen_info) +040/0142.07+ apm_bios_info APM BIOS information (struct apm_bios_info) +060/0102.07+ ist_infoIntel SpeedStep (IST) BIOS support information + (struct ist_info) +0A0/0102.07+ sys_desc_table System description table (struct sys_desc_table) +140/0802.07+ edid_info Video mode setup
[PATCH -mm -v2 1/2] i386/x86_64 boot: setup data
This patch add a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used as a more extensible boot parameters passing mechanism. This patch has been tested against 2.6.23-rc6-mm1 kernel on x86_64. It is based on the proposal of Peter Anvin. Known Issues: 1. Where is safe to place the linked list of setup_data? Because the length of the linked list of setup_data is variable, it can not be copied into BSS segment of kernel as that of zero page. We must find a safe place for it, where it will not be overwritten by kernel during booting up. The i386 kernel will overwrite some pages after _end. The x86_64 kernel will overwrite some pages from 0x1000 on. ChangeLog: -- v2 -- - Increase the boot protocol version number. - Check version number before parsing setup_data. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/i386/Kconfig|3 --- arch/i386/boot/header.S |8 +++- arch/i386/kernel/setup.c | 22 ++ arch/x86_64/kernel/setup.c | 21 + include/asm-i386/bootparam.h | 15 +++ include/asm-i386/io.h|7 +++ 6 files changed, 72 insertions(+), 4 deletions(-) Index: linux-2.6.23-rc6/include/asm-i386/bootparam.h === --- linux-2.6.23-rc6.orig/include/asm-i386/bootparam.h 2007-09-19 10:00:06.0 +0800 +++ linux-2.6.23-rc6/include/asm-i386/bootparam.h 2007-09-19 10:00:08.0 +0800 @@ -9,6 +9,17 @@ #include asm/ist.h #include video/edid.h +/* setup data types */ +#define SETUP_NONE 0 + +/* extensible setup data list node */ +struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; +} __attribute__((packed)); + struct setup_header { u8 setup_sects; u16 root_flags; @@ -41,6 +52,10 @@ u32 initrd_addr_max; u32 kernel_alignment; u8 relocatable_kernel; + u8 _pad2[3]; + u32 cmdline_size; + u32 _pad3; + u64 setup_data; } __attribute__((packed)); struct sys_desc_table { Index: linux-2.6.23-rc6/arch/i386/boot/header.S === --- linux-2.6.23-rc6.orig/arch/i386/boot/header.S 2007-09-11 10:50:29.0 +0800 +++ linux-2.6.23-rc6/arch/i386/boot/header.S2007-09-19 10:00:09.0 +0800 @@ -119,7 +119,7 @@ # Part 2 of the header, from the old setup.S .ascii HdrS # header signature - .word 0x0206 # header version number (= 0x0105) + .word 0x0207 # header version number (= 0x0105) # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch:.word 0, 0# default_switch, SETUPSEG @@ -214,6 +214,12 @@ #added with boot protocol #version 2.06 +pad4: .long 0 + +setup_data:.quad 0 # 64-bit physical pointer to + # single linked list of + # struct setup_data + # End of setup header # .section .inittext, ax Index: linux-2.6.23-rc6/arch/x86_64/kernel/setup.c === --- linux-2.6.23-rc6.orig/arch/x86_64/kernel/setup.c2007-09-19 10:00:00.0 +0800 +++ linux-2.6.23-rc6/arch/x86_64/kernel/setup.c 2007-09-19 10:00:09.0 +0800 @@ -221,6 +221,25 @@ ebda_size = 64*1024; } +void __init parse_setup_data(void) +{ + struct setup_data *setup_data; + unsigned long pa_setup_data; + + if (boot_params.hdr.version 0x0207) + return; + pa_setup_data = boot_params.hdr.setup_data; + while (pa_setup_data) { + setup_data = early_ioremap(pa_setup_data, PAGE_SIZE); + switch (setup_data-type) { + default: + break; + } + pa_setup_data = setup_data-next; + early_iounmap(setup_data, PAGE_SIZE); + } +} + void __init setup_arch(char **cmdline_p) { printk(KERN_INFO Command line: %s\n, boot_command_line); @@ -256,6 +275,8 @@ strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE); *cmdline_p = command_line; + parse_setup_data(); + parse_early_param(); finish_e820_parsing(); Index: linux-2.6.23-rc6/arch/i386/kernel/setup.c === --- linux-2.6.23-rc6.orig/arch/i386/kernel/setup.c 2007-09-19 09:59:59.0 +0800
[PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol
This patch defines a 32-bit boot protocol and adds corresponding document. It is based on the proposal of Peter Anvin. Known issues: - The hd0_info and hd1_info are deleted from the zero page. Additional work should be done for this? Or this is unnecessary (because no new fields will be added to zero page)? - The fields in zero page are fairly complex (such as struct edd_info). Is it necessary to document every field inside the first level fields, until the primary data type? Or is it sufficient to provide the C struct name only? ChangeLog: -- v2 -- - Revise zero page description according to the source code and move them to zero-page.txt. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- boot.txt | 70 +++ zero-page.txt | 127 -- 2 files changed, 97 insertions(+), 100 deletions(-) Index: linux-2.6.23-rc6/Documentation/i386/boot.txt === --- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt 2007-09-11 10:50:29.0 +0800 +++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 10:00:18.0 +0800 @@ -2,7 +2,7 @@ H. Peter Anvin [EMAIL PROTECTED] - Last update 2007-05-23 + Last update 2007-09-18 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -42,6 +42,9 @@ Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of the boot command line +Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical + pointer to single linked list of struct setup_data. + Added 32-bit boot protocol. MEMORY LAYOUT @@ -168,6 +171,9 @@ 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/3 N/A pad2Unused 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line +023c/4 N/A pad3Unused +0240/8 2.07+ setup_data 64-bit physical pointer to linked list + of struct setup_data (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -480,6 +486,36 @@ cmdline_size characters. With protocol version 2.05 and earlier, the maximum size was 255. +Field name:setup_data +Type: write (obligatory) +Offset/size: 0x240/8 +Protocol: 2.07+ + + The 64-bit physical pointer to NULL terminated single linked list of + struct setup_data. This is used to define a more extensible boot + parameters passing mechanism. The definition of struct setup_data is + as follow: + + struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; + } __attribute__((packed)); + + Where, the next is a 64-bit physical pointer to the next node of + linked list, the next field of the last node is 0; the type is used + to identify the contents of data; the len is the length of data + field; the data holds the real payload. + + With this field, to add a new boot parameter written by bootloader, + it is not needed to add a new field to real mode header, just add a + new setup_data type is sufficient. But to add a new boot parameter + read by bootloader, it is still needed to add a new field. + + TODO: Where is the safe place to place the linked list of struct + setup_data? + THE KERNEL COMMAND LINE @@ -753,3 +789,35 @@ After completing your hook, you should jump to the address that was in this field before your boot loader overwrote it (relocated, if appropriate.) + + + SETUP DATA TYPES + + + 32-bit BOOT PROTOCOL + +For machine with some new BIOS other than legacy BIOS, such as EFI, +LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel +based on legacy BIOS can not be used, so a 32-bit boot protocol need +to be defined. + +In 32-bit boot protocol, the first step in loading a Linux kernel +should still be to load the real-mode code and then examine the kernel +header at offset 0x01f1. But, it is not necessary to load all +real-mode code, just first 4K bytes traditionally known as zero page +is needed. + +In addition to read/modify/write kernel header of the zero page as +that of 16-bit boot protocol, the boot loader should also fill the +additional fields of the zero page as that described in zero-page.txt. + +After loading and setuping the zero page, the boot loader can load the +32/64-bit kernel in the same way as that of 16-bit boot protocol. + +In 32-bit boot protocol, the kernel is started by jumping to the +32-bit kernel entry point, which is the start address of loaded +32/64-bit kernel. + +At entry, the CPU must be in 32-bit protected mode with paging +disabled; the CS and DS must be 4G flat segments
Re: [PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol
On Tue, 2007-09-18 at 22:30 -0700, H. Peter Anvin wrote: Huang, Ying wrote: Known issues: - The hd0_info and hd1_info are deleted from the zero page. Additional work should be done for this? Or this is unnecessary (because no new fields will be added to zero page)? For backwards compatibility, they should be marked as there for the short-medium term so we don't reuse them for whatever reason. OK, I will add them back. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm -v3 1/2] i386/x86_64 boot: setup data
This patch add a field of 64-bit physical pointer to NULL terminated single linked list of struct setup_data to real-mode kernel header. This is used as a more extensible boot parameters passing mechanism. This patch has been tested against 2.6.23-rc6-mm1 kernel on x86_64. It is based on the proposal of Peter Anvin. Known Issues: 1. Where is safe to place the linked list of setup_data? Because the length of the linked list of setup_data is variable, it can not be copied into BSS segment of kernel as that of zero page. We must find a safe place for it, where it will not be overwritten by kernel during booting up. The i386 kernel will overwrite some pages after _end. The x86_64 kernel will overwrite some pages from 0x1000 on. ChangeLog: -- v2 -- - Increase the boot protocol version number. - Check version number before parsing setup_data. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/i386/Kconfig|3 --- arch/i386/boot/header.S |8 +++- arch/i386/kernel/setup.c | 22 ++ arch/x86_64/kernel/setup.c | 21 + include/asm-i386/bootparam.h | 15 +++ include/asm-i386/io.h|7 +++ 6 files changed, 72 insertions(+), 4 deletions(-) Index: linux-2.6.23-rc6/include/asm-i386/bootparam.h === --- linux-2.6.23-rc6.orig/include/asm-i386/bootparam.h 2007-09-19 10:22:02.0 +0800 +++ linux-2.6.23-rc6/include/asm-i386/bootparam.h 2007-09-19 16:41:57.0 +0800 @@ -9,6 +9,17 @@ #include asm/ist.h #include video/edid.h +/* setup data types */ +#define SETUP_NONE 0 + +/* extensible setup data list node */ +struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; +} __attribute__((packed)); + struct setup_header { u8 setup_sects; u16 root_flags; @@ -41,6 +52,10 @@ u32 initrd_addr_max; u32 kernel_alignment; u8 relocatable_kernel; + u8 _pad2[3]; + u32 cmdline_size; + u32 _pad3; + u64 setup_data; } __attribute__((packed)); struct sys_desc_table { Index: linux-2.6.23-rc6/arch/i386/boot/header.S === --- linux-2.6.23-rc6.orig/arch/i386/boot/header.S 2007-09-19 10:22:02.0 +0800 +++ linux-2.6.23-rc6/arch/i386/boot/header.S2007-09-19 10:47:34.0 +0800 @@ -119,7 +119,7 @@ # Part 2 of the header, from the old setup.S .ascii HdrS # header signature - .word 0x0206 # header version number (= 0x0105) + .word 0x0207 # header version number (= 0x0105) # or else old loadlin-1.5 will fail) .globl realmode_swtch realmode_swtch:.word 0, 0# default_switch, SETUPSEG @@ -214,6 +214,12 @@ #added with boot protocol #version 2.06 +pad4: .long 0 + +setup_data:.quad 0 # 64-bit physical pointer to + # single linked list of + # struct setup_data + # End of setup header # .section .inittext, ax Index: linux-2.6.23-rc6/arch/x86_64/kernel/setup.c === --- linux-2.6.23-rc6.orig/arch/x86_64/kernel/setup.c2007-09-19 10:22:02.0 +0800 +++ linux-2.6.23-rc6/arch/x86_64/kernel/setup.c 2007-09-19 16:41:57.0 +0800 @@ -221,6 +221,25 @@ ebda_size = 64*1024; } +void __init parse_setup_data(void) +{ + struct setup_data *setup_data; + unsigned long pa_setup_data; + + if (boot_params.hdr.version 0x0207) + return; + pa_setup_data = boot_params.hdr.setup_data; + while (pa_setup_data) { + setup_data = early_ioremap(pa_setup_data, PAGE_SIZE); + switch (setup_data-type) { + default: + break; + } + pa_setup_data = setup_data-next; + early_iounmap(setup_data, PAGE_SIZE); + } +} + void __init setup_arch(char **cmdline_p) { printk(KERN_INFO Command line: %s\n, boot_command_line); @@ -256,6 +275,8 @@ strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE); *cmdline_p = command_line; + parse_setup_data(); + parse_early_param(); finish_e820_parsing(); Index: linux-2.6.23-rc6/arch/i386/kernel/setup.c === --- linux-2.6.23-rc6.orig/arch/i386/kernel/setup.c 2007-09-19 10:22:02.0 +0800
[PATCH -mm -v3 2/2] i386/x86_64 boot: document for 32 bit boot protocol
This patch defines a 32-bit boot protocol and adds corresponding document. It is based on the proposal of Peter Anvin. Known issues: - The fields in zero page are fairly complex (such as struct edd_info). Is it necessary to document every field inside the first level fields, until the primary data type? Or is it sufficient to provide the C struct name only? ChangeLog: -- v3 -- - Move hd0_info and hd1_info back to zero page for compatibility. -- v2 -- - Revise zero page description according to the source code and move them to zero-page.txt. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- boot.txt | 70 +++ zero-page.txt | 129 +- 2 files changed, 99 insertions(+), 100 deletions(-) Index: linux-2.6.23-rc6/Documentation/i386/boot.txt === --- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt 2007-09-19 16:45:23.0 +0800 +++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 16:45:27.0 +0800 @@ -2,7 +2,7 @@ H. Peter Anvin [EMAIL PROTECTED] - Last update 2007-05-23 + Last update 2007-09-18 On the i386 platform, the Linux kernel uses a rather complicated boot convention. This has evolved partially due to historical aspects, as @@ -42,6 +42,9 @@ Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of the boot command line +Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical + pointer to single linked list of struct setup_data. + Added 32-bit boot protocol. MEMORY LAYOUT @@ -168,6 +171,9 @@ 0234/1 2.05+ relocatable_kernel Whether kernel is relocatable or not 0235/3 N/A pad2Unused 0238/4 2.06+ cmdline_sizeMaximum size of the kernel command line +023c/4 N/A pad3Unused +0240/8 2.07+ setup_data 64-bit physical pointer to linked list + of struct setup_data (1) For backwards compatibility, if the setup_sects field contains 0, the real value is 4. @@ -480,6 +486,36 @@ cmdline_size characters. With protocol version 2.05 and earlier, the maximum size was 255. +Field name:setup_data +Type: write (obligatory) +Offset/size: 0x240/8 +Protocol: 2.07+ + + The 64-bit physical pointer to NULL terminated single linked list of + struct setup_data. This is used to define a more extensible boot + parameters passing mechanism. The definition of struct setup_data is + as follow: + + struct setup_data { + u64 next; + u32 type; + u32 len; + u8 data[0]; + } __attribute__((packed)); + + Where, the next is a 64-bit physical pointer to the next node of + linked list, the next field of the last node is 0; the type is used + to identify the contents of data; the len is the length of data + field; the data holds the real payload. + + With this field, to add a new boot parameter written by bootloader, + it is not needed to add a new field to real mode header, just add a + new setup_data type is sufficient. But to add a new boot parameter + read by bootloader, it is still needed to add a new field. + + TODO: Where is the safe place to place the linked list of struct + setup_data? + THE KERNEL COMMAND LINE @@ -753,3 +789,35 @@ After completing your hook, you should jump to the address that was in this field before your boot loader overwrote it (relocated, if appropriate.) + + + SETUP DATA TYPES + + + 32-bit BOOT PROTOCOL + +For machine with some new BIOS other than legacy BIOS, such as EFI, +LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel +based on legacy BIOS can not be used, so a 32-bit boot protocol need +to be defined. + +In 32-bit boot protocol, the first step in loading a Linux kernel +should still be to load the real-mode code and then examine the kernel +header at offset 0x01f1. But, it is not necessary to load all +real-mode code, just first 4K bytes traditionally known as zero page +is needed. + +In addition to read/modify/write kernel header of the zero page as +that of 16-bit boot protocol, the boot loader should also fill the +additional fields of the zero page as that described in zero-page.txt. + +After loading and setuping the zero page, the boot loader can load the +32/64-bit kernel in the same way as that of 16-bit boot protocol. + +In 32-bit boot protocol, the kernel is started by jumping to the +32-bit kernel entry point, which is the start address of loaded +32/64-bit kernel. + +At entry, the CPU must be in 32-bit protected mode with paging +disabled; the CS and DS must be 4G flat segments; %esi holds the base +address of the zero page; %esp, %ebp, %edi should be zero. Index: linux-2.6.23
[RFC][PATCH 0/2 -mm] kexec based hibernation -v3
/proc/vmcore . cp /sys/kernel/kexec_jump_back_entry . 9. Shutdown or reboot in hibernating kernel (kernel B). 10. Boot kernel (kernel C) compiled for hibernating/restore usage on the root file system /dev/hdb in memory range of kernel B. For example, the following kernel command line parameters can be used: root=/dev/hdb single memmap=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED] 11. In restore kernel (kernel C), the memory image of kernel A can be restored as follow: cp kexec_jump_back_entry /sys/kernel/kexec_jump_back_entry krestore vmcore 12. Jump back to hibernated kernel (kernel A) kexec -b Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump
This patch implements the functionality of jumping between the kexeced kernel and the original kernel. A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to trigger the jumping to (executing) the new kernel and jumping back to the original kernel. To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state (to be fully implemented), and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. To support jumping without preserving memory. One shadow backup page is allocated for each page used by new (kexeced) kernel. When do kexec_load, the image of new kernel is loaded into shadow pages, and before executing, the original pages and the shadow pages are swapped, so the contents of original pages are backuped. Before jumping to the new (kexeced) kernel and after jumping back to the original kernel, the original pages and the shadow pages are swapped too. A jump back protocol is defined and documented. Known issues - A field is added to Linux kernel real-mode header. This is temporary, and should be replaced after the 32-bit boot protocol and setup data patches are accepted. - The suspend method of device is used to put device in quiescent state. But if the ACPI is enabled this will also put devices into low power state, which prevent the new kernel from booting. So, the ACPI must be disabled both in original kernel and kexeced kernel. This is planed to be resolved after the suspend method and hibernate method is separated for device as proposed earlier in the LKML. - The NX (none executable) bit should be turned off for the control page if available. ChangeLog -- 2007/9/19 -- 1. Two reboot command are merge back to one again because the underlying implementation is same. 2. Jumping without preserving memory is implemented. As a side effect, two direction jumping is implemented. 3. A jump back protocol is defined and documented. The orignal kernel and kexeced kernel are more independent from each other. 4. The CPU state save/restore code are merged into relocate_kernel.S. -- 2007/8/24 -- 1. The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. 2. Document is added for added kernel parameters. 3. /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. 4. Console restoring after jumping back is implemented. -- 2007/7/15 -- 1. The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. 2. The same code path is used for both kexec a new kernel and jump back to original kernel. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/i386/jump_back_protocol.txt | 81 arch/i386/Kconfig |7 + arch/i386/boot/header.S |2 arch/i386/kernel/machine_kexec.c | 77 +--- arch/i386/kernel/relocate_kernel.S| 187 ++ arch/i386/kernel/setup.c |3 include/asm-i386/bootparam.h |3 include/asm-i386/kexec.h | 48 ++- include/linux/kexec.h |9 + include/linux/reboot.h|2 kernel/kexec.c| 59 + kernel/ksysfs.c | 17 ++ kernel/power/Kconfig |2 kernel/sys.c |8 + 14 files changed, 463 insertions(+), 42 deletions(-) Index: linux-2.6.23-rc6/arch/i386/kernel/machine_kexec.c === --- linux-2.6.23-rc6.orig/arch/i386/kernel/machine_kexec.c 2007-09-20 11:24:25.0 +0800 +++ linux-2.6.23-rc6/arch/i386/kernel/machine_kexec.c 2007-09-20 11:24:31.0 +0800 @@ -20,6 +20,7 @@ #include asm/cpufeature.h #include asm/desc.h #include asm/system.h +#include asm/setup.h #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) static u32 kexec_pgd[1024] PAGE_ALIGNED; @@ -98,23 +99,23 @@ { } -/* - * Do not allocate memory (or fail in any way) in machine_kexec(). - * We are past the point of no return, committed to rebooting now. - */ -NORET_TYPE void machine_kexec(struct kimage *image) +static NORET_TYPE void __machine_kexec(struct kimage *image, + void *control_page) ATTRIB_NORET; + +static NORET_TYPE void __machine_kexec(struct kimage *image, + void *control_page) { unsigned long page_list[PAGES_NR
[RFC][PATCH 2/2 -mm] kexec based hibernation -v3: kexec restore
This patch adds writing support for /dev/oldmem. This is used to restore the memory contents of hibernated system. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/i386/kernel/crash_dump.c | 27 +++ drivers/char/mem.c| 32 include/linux/crash_dump.h|2 ++ 3 files changed, 61 insertions(+) Index: linux-2.6.23-rc4/arch/i386/kernel/crash_dump.c === --- linux-2.6.23-rc4.orig/arch/i386/kernel/crash_dump.c 2007-09-11 16:52:14.0 +0800 +++ linux-2.6.23-rc4/arch/i386/kernel/crash_dump.c 2007-09-20 09:48:10.0 +0800 @@ -58,6 +58,33 @@ return csize; } +ssize_t write_oldmem_page(unsigned long pfn, const char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + if (!userbuf) { + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, buf, csize); + } else { + if (!kdump_buf_page) { + printk(KERN_WARNING Kdump: Kdump buffer page not +allocated\n); + return -EFAULT; + } + if (copy_from_user(kdump_buf_page, buf, csize)) + return -EFAULT; + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, kdump_buf_page, csize); + } + kunmap_atomic(vaddr, KM_PTE0); + + return csize; +} + static int __init kdump_buf_page_init(void) { int ret = 0; Index: linux-2.6.23-rc4/include/linux/crash_dump.h === --- linux-2.6.23-rc4.orig/include/linux/crash_dump.h2007-09-11 16:52:14.0 +0800 +++ linux-2.6.23-rc4/include/linux/crash_dump.h 2007-09-20 09:48:10.0 +0800 @@ -11,6 +11,8 @@ extern unsigned long long elfcorehdr_addr; extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, unsigned long, int); +extern ssize_t write_oldmem_page(unsigned long, const char *, size_t, +unsigned long, int); extern const struct file_operations proc_vmcore_operations; extern struct proc_dir_entry *proc_vmcore; Index: linux-2.6.23-rc4/drivers/char/mem.c === --- linux-2.6.23-rc4.orig/drivers/char/mem.c2007-09-11 16:52:14.0 +0800 +++ linux-2.6.23-rc4/drivers/char/mem.c 2007-09-20 09:48:10.0 +0800 @@ -348,6 +348,37 @@ } return read; } + +/* + * Write memory corresponding to the old kernel. + */ +static ssize_t write_oldmem(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + unsigned long pfn, offset; + size_t write = 0, csize; + int rc = 0; + + while (count) { + pfn = *ppos / PAGE_SIZE; + if (pfn saved_max_pfn) + return write; + + offset = (unsigned long)(*ppos % PAGE_SIZE); + if (count PAGE_SIZE - offset) + csize = PAGE_SIZE - offset; + else + csize = count; + rc = write_oldmem_page(pfn, buf, csize, offset, 1); + if (rc 0) + return rc; + buf += csize; + *ppos += csize; + write += csize; + count -= csize; + } + return write; +} #endif extern long vread(char *buf, char *addr, unsigned long count); @@ -783,6 +814,7 @@ #ifdef CONFIG_CRASH_DUMP static const struct file_operations oldmem_fops = { .read = read_oldmem, + .write = write_oldmem, .open = open_oldmem, }; #endif - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC][PATCH 2/2 -mm] kexec based hibernation -v3: kexec restore
On Thu, 2007-09-20 at 10:15 +0200, Pavel Machek wrote: This patch adds writing support for /dev/oldmem. This is used to restore the memory contents of hibernated system. Signed-off-by: Huang Ying [EMAIL PROTECTED] ACK. (And this can even go in before the patch #1, right?) Yes. This patch does not depend on patch #1. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Could you please merge the x86_64 EFI boot support patchset?
Hi, Linus, Could you please merge the following patchset: [PATCH 0/2 -v3] x86_64 EFI boot support [PATCH 1/2 -v3] x86_64 EFI boot support: EFI frame buffer driver [PATCH 2/2 -v3] x86_64 EFI boot support: EFI boot document The patchset has been in -mm tree from 2.6.23-rc2-mm2 on. Andrew Moton had suggested it to be merged into 2.6.24 during early merge window of 2.6.24. It was not merged because the 32-bit boot protocol had not been done at that time. Now, the 32-bit boot protocol has been merged into 2.6.24. And this patch has been in x86 patch queue. I know that it is a little late for this patchset to be merged into 2.6.24. But this patchset is very simple, just adds a framebuffer driver, so it is impossible for this patchset to break anything. And this patchset will be helpful for people have machine with UEFI 64 firmware instead of legacy BIOS. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3 -mm] kexec based hibernation -v6: kexec jump
This patch implements the functionality of jumping between the kexeced kernel and the original kernel. To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by new (kexeced) kernel (destination page). When do kexec_load, the image of new kernel is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the new (kexeced) kernel and after jumping back to the original kernel, the destination pages and the source pages are swapped too. A jump back protocol for kexec is defined and documented. It is an extension to ordinary function calling protocol. So, the facility provided by this patch can be used to call ordinary C function in real mode. A set of flags for sys_kexec_load are added to control which state are saved/restored before/after real mode code executing. Such as, you can specify the device state and FPU state are saved/restored before/after real mode code executing. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/i386/jump_back_protocol.txt | 103 ++ arch/powerpc/kernel/machine_kexec.c |2 arch/ppc/kernel/machine_kexec.c |2 arch/sh/kernel/machine_kexec.c|2 arch/x86/kernel/machine_kexec_32.c| 88 +--- arch/x86/kernel/machine_kexec_64.c|2 arch/x86/kernel/relocate_kernel_32.S | 214 +++--- include/asm-x86/kexec_32.h| 39 - include/linux/kexec.h | 39 - kernel/kexec.c| 131 ++ kernel/power/Kconfig |2 kernel/sys.c | 27 ++- 12 files changed, 585 insertions(+), 66 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -20,6 +20,7 @@ #include asm/cpufeature.h #include asm/desc.h #include asm/system.h +#include asm/cacheflush.h #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) static u32 kexec_pgd[1024] PAGE_ALIGNED; @@ -83,10 +84,14 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * Turn off NX bit for control page. */ int machine_kexec_prepare(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC); + global_flush_tlb(); + } return 0; } @@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL); + global_flush_tlb(); + } +} + +void machine_kexec(struct kimage *image) +{ + machine_kexec_call(image, NULL, 0); } /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, +unsigned int argc, va_list args) { unsigned long page_list[PAGES_NR]; void *control_page; + asmlinkage NORET_TYPE void + (*relocate_kernel_ptr)(unsigned long indirection_page, + unsigned long control_page, + unsigned long start_address, + unsigned int has_pae) ATTRIB_NORET; /* Interrupts aren't acceptable while we reboot */ local_irq_disable(); control_page = page_address(image-control_code_page); - memcpy(control_page, relocate_kernel, PAGE_SIZE); + memcpy(control_page, relocate_page, PAGE_SIZE/2); + KCALL_MAGIC(control_page) = 0; + if (image-preserve_cpu) { + unsigned int i; + KCALL_MAGIC(control_page) = KCALL_MAGIC_NUMBER; + KCALL_ARGC(control_page) = argc; + for (i = 0; i argc; i++) + KCALL_ARGS(control_page)[i] = \ + va_arg(args, unsigned long); + + if (kexec_call_save_cpu(control_page)) { + image-start = KCALL_ENTRY(control_page); + if (ret) + *ret = KCALL_ARGS(control_page)[0
[PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume
This patch implements kexec based hibernate/resume. This is based on the facility provided by kexec_jump. The ACPI methods are called at specified environment to conform the ACPI specification. Two new reboot commands are added to trigger hibernate/resume. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- include/linux/kexec.h |5 + include/linux/reboot.h |2 include/linux/suspend.h |9 ++ kernel/power/disk.c | 155 kernel/sys.c| 42 + 5 files changed, 212 insertions(+), 1 deletion(-) --- a/kernel/power/disk.c +++ b/kernel/power/disk.c @@ -21,6 +21,7 @@ #include linux/console.h #include linux/cpu.h #include linux/freezer.h +#include linux/kexec.h #include power.h @@ -438,6 +439,160 @@ int hibernate(void) return error; } +#ifdef CONFIG_KEXEC +static void kexec_hibernate_power_down(void) +{ + switch (hibernation_mode) { + case HIBERNATION_TEST: + case HIBERNATION_TESTPROC: + break; + case HIBERNATION_REBOOT: + machine_restart(NULL); + break; + case HIBERNATION_PLATFORM: + if (!hibernation_ops) + break; + hibernation_ops-enter(); + /* We should never get here */ + while (1); + break; + case HIBERNATION_SHUTDOWN: + machine_power_off(); + break; + } + machine_halt(); + /* +* Valid image is on the disk, if we continue we risk serious data +* corruption after resume. +*/ + printk(KERN_CRIT Please power me down manually\n); + while (1); +} + +int kexec_hibernate(struct kimage *image) +{ + int error; + int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM); + unsigned long cmd_ret; + + mutex_lock(pm_mutex); + + pm_prepare_console(); + suspend_console(); + + error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); + if (error) + goto Resume_console; + + error = platform_start(platform_mode); + if (error) + goto Resume_console; + + error = device_suspend(PMSG_FREEZE); + if (error) + goto Resume_console; + + error = platform_pre_snapshot(platform_mode); + if (error) + goto Resume_devices; + + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; + local_irq_disable(); + /* At this point, device_suspend() has been called, but *not* +* device_power_down(). We *must* device_power_down() now. +* Otherwise, drivers for some devices (e.g. interrupt +* controllers) become desynchronized with the actual state of +* the hardware at resume time, and evil weirdness ensues. +*/ + error = device_power_down(PMSG_FREEZE); + if (error) + goto Enable_irqs; + + save_processor_state(); + error = machine_kexec_jump(image, cmd_ret, + KJUMP_CMD_HIBERNATE_WRITE_IMAGE); + restore_processor_state(); + + if (cmd_ret == KJUMP_CMD_HIBERNATE_POWER_DOWN) + kexec_hibernate_power_down(); + + platform_leave(platform_mode); + + /* NOTE: device_power_up() is just a resume() for devices +* that suspended with irqs off ... no overall powerup. +*/ + device_power_up(); + Enable_irqs: + local_irq_enable(); + enable_nonboot_cpus(); + Resume_devices: + platform_finish(platform_mode); + device_resume(); + Resume_console: + pm_notifier_call_chain(PM_POST_HIBERNATION); + resume_console(); + pm_restore_console(); + mutex_unlock(pm_mutex); + return error; +} + +int kexec_resume(struct kimage *image) +{ + int error; + int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM); + + mutex_lock(pm_mutex); + + pm_prepare_console(); + suspend_console(); + + error = device_suspend(PMSG_PRETHAW); + if (error) + goto Resume_console; + + error = platform_pre_restore(platform_mode); + if (error) + goto Resume_devices; + + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; + local_irq_disable(); + /* At this point, device_suspend() has been called, but *not* +* device_power_down(). We *must* device_power_down() now. +* Otherwise, drivers for some devices (e.g. interrupt controllers) +* become desynchronized with the actual state of the hardware +* at resume time, and evil weirdness ensues. +*/ + error = device_power_down(PMSG_PRETHAW); + if (error) + goto Enable_irqs; + + save_processor_state(); + error = machine_kexec_jump(image, NULL, KJUMP_CMD_HIBERNATE_RESUME
[PATCH 2/3 -mm] kexec based hibernation -v6: kexec restore
This patch adds writing support for /dev/oldmem. This is used to restore the memory contents of hibernated system. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/crash_dump_32.c | 27 +++ drivers/char/mem.c | 32 include/linux/crash_dump.h |2 ++ 3 files changed, 61 insertions(+) --- a/arch/x86/kernel/crash_dump_32.c +++ b/arch/x86/kernel/crash_dump_32.c @@ -59,6 +59,33 @@ ssize_t copy_oldmem_page(unsigned long p return csize; } +ssize_t write_oldmem_page(unsigned long pfn, const char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + if (!userbuf) { + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, buf, csize); + } else { + if (!kdump_buf_page) { + printk(KERN_WARNING Kdump: Kdump buffer page not +allocated\n); + return -EFAULT; + } + if (copy_from_user(kdump_buf_page, buf, csize)) + return -EFAULT; + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, kdump_buf_page, csize); + } + kunmap_atomic(vaddr, KM_PTE0); + + return csize; +} + static int __init kdump_buf_page_init(void) { int ret = 0; --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -11,6 +11,8 @@ extern unsigned long long elfcorehdr_addr; extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, unsigned long, int); +extern ssize_t write_oldmem_page(unsigned long, const char *, size_t, +unsigned long, int); extern const struct file_operations proc_vmcore_operations; extern struct proc_dir_entry *proc_vmcore; --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file * } return read; } + +/* + * Write memory corresponding to the old kernel. + */ +static ssize_t write_oldmem(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + unsigned long pfn, offset; + size_t write = 0, csize; + int rc = 0; + + while (count) { + pfn = *ppos / PAGE_SIZE; + if (pfn saved_max_pfn) + return write; + + offset = (unsigned long)(*ppos % PAGE_SIZE); + if (count PAGE_SIZE - offset) + csize = PAGE_SIZE - offset; + else + csize = count; + rc = write_oldmem_page(pfn, buf, csize, offset, 1); + if (rc 0) + return rc; + buf += csize; + *ppos += csize; + write += csize; + count -= csize; + } + return write; +} #endif extern long vread(char *buf, char *addr, unsigned long count); @@ -783,6 +814,7 @@ static const struct file_operations full #ifdef CONFIG_CRASH_DUMP static const struct file_operations oldmem_fops = { .read = read_oldmem, + .write = write_oldmem, .open = open_oldmem, }; #endif - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3 -mm] kexec based hibernation -v6
jump_back_param | cut -d '=' -f 2` vmcoreinfo_size=`grep arg4 jump_back_param | cut -d '=' -f 2` ./makedumpfile -D -E -d 16 -o [EMAIL PROTECTED] -j `cat kexec_jump_back_entry` -M `cat backup_pages_map_root_entry` /proc/vmcore dump.elf 10. Entering ACPI S4 state with following command line: kexec -e -c 0x6b630002 The hibernating kernel (kernel B) will jump back to hibernated kernel again with a special command (0x6b630002: hibernate shut down), and the hibernated kernel (kernel A) will entering ACPI S4 state. 11. Boot kernel (kernel C) compiled for hibernating/resuming usage on the root file system /dev/hdb in memory range of kernel B. For example, the following kernel command line parameters can be used: root=/dev/hdb single memmap=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED] 12. In resuming kernel (kernel C), the memory image of kernel A can be restored as follow: krestore dump.elf 13. Resume the hibernated kernel (kernel A) kexec --load-jump-back-helper --jump-back-entry=`cat kexec_jump_back_entry` kexec --resume The resuming kernel (kernel C) will jump back to hibernating kernel (kernel A), and necessary ACPI methods will be executed. Known issues: - The suspend/resume callback of device drivers are used to put devices into quiescent state. This will unnecessarily (possibly harmfully) put devices into low power state. This is intended to be solved by separating device quiesce/unquiesce callback from the device suspend/resume callback. - The memory image of hibernated kernel must be saved in a separate partition not used by hibernated kernel. This is planned to be solved through make hibernating/resuming kernel work on initramfs and write the memory image to a file in partition used by hibernated kernel through block list instead of ordinary file system operating. - The setup of hibernate/resume is fairly complex. I will continue working on simplifying. TODO: - Implement sys_kexec_store, that is, store the memory image of kexeced kernel. - Write the memory image to a file through block list instead of ordinary file system operating. - Simplify hibernate/resume setup. - Resume from hibernation with bootloader. ChangeLog: v6: - Add ACPI support. - Refactor kexec jump to be a general facility to call real mode code. v5: - A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel image is used for jumping back. The reboot command for jumping back is removed. This interface is more stable (proposed by Eric Biederman). - NX bit handling support for kexec is added. - Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute from machine_kexec. - Passing jump back entry to kexeced kernel via kernel command line (parsed by user space tool via /proc/cmdline instead of kernel). Original corresponding boot parameter and sysfs code is removed. v4: - Two reboot command are merged back to one because the underlying implementation is same. - Jumping without reserving memory is implemented. As a side effect, two direction jumping is implemented. - A jump back protocol is defined and documented. The original kernel and kexeced kernel are more independent from each other. - The CPU state save/restore code are merged into relocate_kernel.S. v3: - The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. - Document is added for added kernel parameters. - /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. - Console restoring after jumping back is implemented. - Writing support is added for /dev/oldmem, to restore memory contents of hibernated system. v2: - The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. - The same code path is used for both kexec a new kernel and jump back to original kernel. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm -v3] x86 boot : export boot_params via sysfs
On Mon, 2007-12-17 at 21:34 -0700, Eric W. Biederman wrote: H. Peter Anvin [EMAIL PROTECTED] writes: This is directly analogous to how we treat identity information in IDE, or PCI configuration space -- some fields are pre-digested, but the entire raw information is also available. Add to that a totally unchanged value can just be easier to get correct. Still the kexec code as much as it can should not look there, as we may get the same basic information in a couple of different ways. EFI memmap vs. e820 for example. If/when that is the case /sbin/kexec should get the information and spit it out into whatever format makes sense for the destination kernel. My sense is just passing through values is brittleness where we don't want it. However I think being able to get at the raw boot information overall sounds useful. I just don't know if it is generally useful or just useful when debugging bootloaders though. If struct boot_params as a whole is useless for kexec, I can move it to debugfs, because kexec is the only normal user now. Then which fields of struct boot_params do you think is useful for kexec? Refer to include/asm-x86/bootparam.h edid_info? e820_entries and e820_map? maybe useful for kdump edd related fields (eddbuf, edd_mbr_sig_buffer, etc)? split fields until fundamental types? Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Tue, 2007-12-11 at 02:27 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: [...] /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, + unsigned int argc, va_list args) { Why do we need var arg support? Can't we do that with a shim we load from user space? If all parameters are provided in user space, the usage model may be as follow: - sys_kexec_load() /* with executable/data/parameters(A) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - /* jump back */ - sys_kexec_load() /* with executable/data/parameters(B) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - /* jump back */ That is, the kexec image should be re-loaded if the parameters are different, and there can be no state reserved in kexec image. This is OK for original kexec implementation, because there is no jumping back. But, for kexec with jumping back, another usage model may be useful too. - sys_kexec_load() /* with executable/data loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical mode code with parameters(A)*/ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical mode code with parameters(B)*/ This way the kexec image need not to be re-loaded, and the state of kexec image can be reserved across several invoking. Interesting. We wind up preserving the code in between invocations. I don't know about your particular issue, but I can see that clearly we need a way to read values back from our target image. And if we can read everything back one way to proceed is to read everything out modify it and then write it back. Amending a kexec image that is already stored may also make sense. I'm not convinced that the var arg parameters make sense, but you added them because of a real need. The kexec function is split into two separate calls so that we can unmount the filesystem the kexec image comes from before actually doing the kexec. My real issue is that I need a kind of kernel to kernel communication method. The var args is just a convenient way to pass an array of unsigned longs between two kernels. The reason is as follow: The kexec based hibernating process is as follow: h1. put devices in quiescent state h2. save devices/CPU state h3. jump to kexeced kernel (kernel B) *h4. normal kernel boot of kernel B *h5. save devices/CPU state *h6. jump back to original kernel (kernel A) h7. restore devices/CPU state h8. put devices in quiescent state h9. put devices in low power state h10. execute necessary ACPI method (prepare to sleep) h11. save devices/CPU state h12. jump to kernel B *h13. execute necessary ACPI method (wake up) *h14. restore devices/CPU state *h15. put devices in normal power state *h16. write memory image of kernel A into disk *h17. put system into ACPI S4 state The kexec based resuming process is as follow: *r1. boot the resuming kernel (kernel C) *r2. restore the memory image of kernel A *r3. put devices in quiescent state *r4. execute necessary ACPI method (prepare to resume) *r5. jump to kernel A r6. execute necessary ACPI method (wake up) r7. restore devices/CPU state Where, line begin with * is executed in kernel B and kernel C, others are executed in kernel A. The kernel A need to distinguish the difference between h7 and r6, while the kernel B/C need to distinguish between *h13 and normal jump back. The different kernel action need to be taken depends on the action of peer kernel. Now, this is solved by kernel-kernel communication, a command word is passed to peer kernel to inform the action required. I remember you have said before that you think it is better to use only user space to user space communication between kernel A and kernel B. This is OK for normal kexec. But if the kexec jump is used for multiple functions with early kernel action involved (normal kexec jump, kexec jump to hibernate, kexec jump to resume), it is necessary to use kernel to kernel communication. The var args in the patch is just an array of unsigned longs, it can be expresses as follow too. int kexec_call(struct kimage *image, unsigned long *ret, unsigned int argc, unsigned long argv[]); The var args version is as follow. int kexec_call(struct kimage *image, unsigned long *ret, unsigned int argc, ...); Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo
[PATCH 0/3 -mm] kexec jump -v8
kernel as you want to via the following shell command line: /sbin/kexec -e Known issues: - The suspend/resume callback of device drivers are used to put devices into quiescent state. This will unnecessarily (possibly harmfully) put devices into low power state. This is intended to be solved by separating device quiesce/unquiesce callback from the device suspend/resume callback. ChangeLog: v8: - Split kexec jump patchset from kexec based hibernation patchset. - Add writing support to kimgcore. This can be used as a communication method between kexeced kernel and original kernel. - Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT because there is no need for such subtle control. - Delete variable argument based kernel to kernel communication mechanism from basic kexec jump patchset. v7: - Add an interface to dump the loaded kexec_image, which may contains the memory image of kexeced system. This is used to accelerate kexec based hibernation. - Refactor kexec jump to be a command driven programming model. - Adjust ACPI support to mimic the ACPI support of u/swsusp. - Use kexec_lock to do synchronization. v6: - Add ACPI support. - Refactor kexec jump to be a general facility to call real mode code. v5: - A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel image is used for jumping back. The reboot command for jumping back is removed. This interface is more stable (proposed by Eric Biederman). - NX bit handling support for kexec is added. - Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute from machine_kexec. - Passing jump back entry to kexeced kernel via kernel command line (parsed by user space tool via /proc/cmdline instead of kernel). Original corresponding boot parameter and sysfs code is removed. v4: - Two reboot command are merged back to one because the underlying implementation is same. - Jumping without reserving memory is implemented. As a side effect, two direction jumping is implemented. - A jump back protocol is defined and documented. The original kernel and kexeced kernel are more independent from each other. - The CPU state save/restore code are merged into relocate_kernel.S. v3: - The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. - Document is added for added kernel parameters. - /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. - Console restoring after jumping back is implemented. - Writing support is added for /dev/oldmem, to restore memory contents of hibernated system. v2: - The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. - The same code path is used for both kexec a new kernel and jump back to original kernel. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/3 -mm] kexec jump -v8 : add write support to oldmem device
This patch adds writing support for /dev/oldmem. This can be used to - Communicate between original kernel and kexeced kernel through write to some pages in original kernel. - Restore the memory contents of hibernated system in kexec based hibernation. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/crash_dump_32.c | 27 +++ drivers/char/mem.c | 32 include/linux/crash_dump.h |2 ++ 3 files changed, 61 insertions(+) --- a/arch/x86/kernel/crash_dump_32.c +++ b/arch/x86/kernel/crash_dump_32.c @@ -59,6 +59,33 @@ ssize_t copy_oldmem_page(unsigned long p return csize; } +ssize_t write_oldmem_page(unsigned long pfn, const char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + if (!userbuf) { + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, buf, csize); + } else { + if (!kdump_buf_page) { + printk(KERN_WARNING Kdump: Kdump buffer page not +allocated\n); + return -EFAULT; + } + if (copy_from_user(kdump_buf_page, buf, csize)) + return -EFAULT; + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, kdump_buf_page, csize); + } + kunmap_atomic(vaddr, KM_PTE0); + + return csize; +} + static int __init kdump_buf_page_init(void) { int ret = 0; --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -11,6 +11,8 @@ extern unsigned long long elfcorehdr_addr; extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, unsigned long, int); +extern ssize_t write_oldmem_page(unsigned long, const char *, size_t, +unsigned long, int); extern const struct file_operations proc_vmcore_operations; extern struct proc_dir_entry *proc_vmcore; --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file * } return read; } + +/* + * Write memory corresponding to the old kernel. + */ +static ssize_t write_oldmem(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + unsigned long pfn, offset; + size_t write = 0, csize; + int rc = 0; + + while (count) { + pfn = *ppos / PAGE_SIZE; + if (pfn saved_max_pfn) + return write; + + offset = (unsigned long)(*ppos % PAGE_SIZE); + if (count PAGE_SIZE - offset) + csize = PAGE_SIZE - offset; + else + csize = count; + rc = write_oldmem_page(pfn, buf, csize, offset, 1); + if (rc 0) + return rc; + buf += csize; + *ppos += csize; + write += csize; + count -= csize; + } + return write; +} #endif extern long vread(char *buf, char *addr, unsigned long count); @@ -783,6 +814,7 @@ static const struct file_operations full #ifdef CONFIG_CRASH_DUMP static const struct file_operations oldmem_fops = { .read = read_oldmem, + .write = write_oldmem, .open = open_oldmem, }; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/3 -mm] kexec jump -v8 : kexec jump basic
This patch implements the functionality of jumping between the kexeced kernel and the original kernel. To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by new (kexeced) kernel (destination page). When do kexec_load, the image of new kernel is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the new (kexeced) kernel and after jumping back to the original kernel, the destination pages and the source pages are swapped too. A jump back protocol for kexec is defined and documented. It is an extension to ordinary function calling protocol. So, the facility provided by this patch can be used to call ordinary C function in physical mode. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/i386/jump_back_protocol.txt | 66 ++ arch/powerpc/kernel/machine_kexec.c |2 arch/ppc/kernel/machine_kexec.c |2 arch/sh/kernel/machine_kexec.c|2 arch/x86/kernel/machine_kexec_32.c| 39 +- arch/x86/kernel/machine_kexec_64.c|2 arch/x86/kernel/relocate_kernel_32.S | 194 ++ include/asm-x86/kexec_32.h| 34 - include/linux/kexec.h | 14 +- kernel/kexec.c| 65 +- kernel/power/Kconfig |2 kernel/sys.c | 35 +++-- 12 files changed, 403 insertions(+), 54 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -20,6 +20,7 @@ #include asm/cpufeature.h #include asm/desc.h #include asm/system.h +#include asm/cacheflush.h #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) static u32 kexec_pgd[1024] PAGE_ALIGNED; @@ -83,10 +84,14 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * Turn off NX bit for control page. */ int machine_kexec_prepare(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC); + global_flush_tlb(); + } return 0; } @@ -96,25 +101,45 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL); + global_flush_tlb(); + } } /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +void machine_kexec(struct kimage *image) { unsigned long page_list[PAGES_NR]; void *control_page; + asmlinkage NORET_TYPE void + (*relocate_kernel_ptr)(unsigned long indirection_page, + unsigned long control_page, + unsigned long start_address, + unsigned int has_pae) ATTRIB_NORET; /* Interrupts aren't acceptable while we reboot */ local_irq_disable(); control_page = page_address(image-control_code_page); - memcpy(control_page, relocate_kernel, PAGE_SIZE); + memcpy(control_page, relocate_page, PAGE_SIZE/2); + KJUMP_MAGIC(control_page) = 0; + if (image-preserve_context) { + KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER; + if (kexec_jump_save_cpu(control_page)) { + image-start = KJUMP_ENTRY(control_page); + return; + } + } + + relocate_kernel_ptr = control_page + + ((void *)relocate_kernel - (void *)relocate_page); page_list[PA_CONTROL_PAGE] = __pa(control_page); - page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel; + page_list[VA_CONTROL_PAGE] = (unsigned long)control_page; page_list[PA_PGD] = __pa(kexec_pgd); page_list[VA_PGD] = (unsigned long)kexec_pgd; #ifdef CONFIG_X86_PAE @@ -127,6 +152,7 @@ NORET_TYPE void machine_kexec(struct kim page_list[VA_PTE_0] = (unsigned long)kexec_pte0; page_list[PA_PTE_1] = __pa
[PATCH 3/3 -mm] kexec jump -v8 : access memory image of kexec_image
This patch adds a file in proc file system to access the loaded kexec_image, which may contains the memory image of kexeced system. This can be used to: - Communicate between original kernel and kexeced kernel through write to some pages in original kernel. - Communicate between original kernel and kexeced kernel through read memory image of kexeced kernel, amend the image, and reload the amended image. - Accelerate boot of kexeced kernel. If you have a memory image of kexeced kernel, you need not a normal boot process to jump to the kexeced kernel, just load the memory image, jump to the point where you leave last time in kexeced kernel. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- fs/proc/Makefile |1 fs/proc/kimgcore.c| 277 ++ fs/proc/proc_misc.c |6 + include/linux/kexec.h |7 + kernel/kexec.c|5 5 files changed, 291 insertions(+), 5 deletions(-) --- /dev/null +++ b/fs/proc/kimgcore.c @@ -0,0 +1,277 @@ +/* + * fs/proc/kimgcore.c - Interface for accessing the loaded + * kexec_image, which may contains the memory image of kexeced system. + * Heavily borrowed from fs/proc/kcore.c + * + * Copyright (C) 2007, Intel Corp. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2 + */ + +#include linux/mm.h +#include linux/proc_fs.h +#include linux/user.h +#include linux/elf.h +#include linux/init.h +#include linux/kexec.h +#include linux/io.h +#include linux/highmem.h +#include linux/page-flags.h +#include asm/uaccess.h + +struct proc_dir_entry *proc_root_kimgcore; + +static u32 kimgcore_size; + +static char *elfcorebuf; +static size_t elfcorebuf_sz; + +static void *buf_page; + +static ssize_t kimage_copy_to_user(struct kimage *image, char __user *buf, + unsigned long offset, size_t count) +{ + kimage_entry_t *ptr, entry; + unsigned long off = 0, offinp, trunk; + struct page *page; + void *vaddr; + + for_each_kimage_entry(image, ptr, entry) { + if (!(entry IND_SOURCE)) + continue; + if (off + PAGE_SIZE offset) { + offinp = offset - off; + if (count PAGE_SIZE - offinp) + trunk = PAGE_SIZE - offinp; + else + trunk = count; + page = pfn_to_page(entry PAGE_SHIFT); + if (PageHighMem(page)) { + vaddr = kmap(page); + memcpy(buf_page, vaddr+offinp, trunk); + kunmap(page); + vaddr = buf_page; + } else + vaddr = __va(entry PAGE_MASK) + offinp; + if (copy_to_user(buf, vaddr, trunk)) + return -EFAULT; + buf += trunk; + offset += trunk; + count -= trunk; + if (!count) + break; + } + off += PAGE_SIZE; + } + return count; +} + +static ssize_t kimage_copy_from_user(struct kimage *image, +const char __user *buf, +unsigned long offset, +size_t count) +{ + kimage_entry_t *ptr, entry; + unsigned long off = 0, offinp, trunk; + struct page *page; + void *vaddr; + + for_each_kimage_entry(image, ptr, entry) { + if (!(entry IND_SOURCE)) + continue; + if (off + PAGE_SIZE offset) { + offinp = offset - off; + if (count PAGE_SIZE - offinp) + trunk = PAGE_SIZE - offinp; + else + trunk = count; + page = pfn_to_page(entry PAGE_SHIFT); + if (PageHighMem(page)) + vaddr = buf_page; + else + vaddr = __va(entry PAGE_MASK) + offinp; + if (copy_from_user(vaddr, buf, trunk)) + return -EFAULT; + if (PageHighMem(page)) { + vaddr = kmap(page); + memcpy(vaddr+offinp, buf_page, trunk); + kunmap(page); + } + buf += trunk; + offset += trunk; + count -= trunk; + if (!count) + break; + } + off += PAGE_SIZE; + } + return count; +} + +static ssize_t read_kimgcore(struct file *file, char __user
Re: [PATCH 0/3 -mm] kexec jump -v8
On Fri, 2007-12-21 at 19:35 +1100, Nigel Cunningham wrote: Hi. Huang, Ying wrote: This patchset provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used both by the original kernel and the kexeced kernel. Why the kexeced kernel as well? The memory range used by kexeced kernel is also the usable memory range in original kernel. Maybe should be: backup/restore memory used by both the original kernel and the kexeced kernel. My English is poor. [...] The features of this patchset can be used as follow: - Kernel/system debug through making system snapshot. You can make system snapshot, jump back, do some thing and make another system snapshot. Are you somehow recording all the filesystem changes after the first snapshot? If not, this is pointless (you'll end up with filesystem corruption). This snapshot is not used for restore/resume. It is just used for debugging. You can check the system state with these snapshots. So I think it is useful even without recording filesystem changes. [...] - Cooperative multi-kernel/system. With kexec jump, you can switch between several kernels/systems quickly without boot process except the first time. This appears like swap a whole kernel/system out/in. How is this useful to the end user? I am not sure how useful is this. Maybe I can run a Redhat and a Debian on my machine and switch between them. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/3 -mm] kexec jump -v8 : add write support to oldmem device
On Dec 21, 2007 6:17 PM, Pavel Machek [EMAIL PROTECTED] wrote: Hi! This patch adds writing support for /dev/oldmem. This can be used to - Communicate between original kernel and kexeced kernel through write to some pages in original kernel. - Restore the memory contents of hibernated system in kexec based hibernation. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- a/arch/x86/kernel/crash_dump_32.c +++ b/arch/x86/kernel/crash_dump_32.c +ssize_t write_oldmem_page(unsigned long pfn, const char *buf, + size_t csize, unsigned long offset, int userbuf) --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file * } return read; } + +/* + * Write memory corresponding to the old kernel. + */ +static ssize_t write_oldmem(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ ... + rc = write_oldmem_page(pfn, buf, csize, offset, 1); I believe this is going to break compilation on non-32bit machines. Yes, I will fix this. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] x86_64 EFI runtime service support : Calling convention fix (resend, cc LKML)
In EFI calling convention, %xmm0 - %xmm5 are specified as the scratch registers (UEFI Specification 2.1, 2.3.4.2). To conforms to EFI specification, this patch save/restore %xmm0 - %xmm5 registers before/after invoking EFI runtime service. At the same time, the stack is aligned in 16 bytes, and TS in CR0 in clear/restore to make it possible to use SSE2 in EFI runtime service. This patch is based on 2.6.24-rc4-mm1. And it has been tested on Intel platforms with 64-bit UEFI 2.0 firmware. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/efi_stub_64.S | 71 +- 1 file changed, 56 insertions(+), 15 deletions(-) --- a/arch/x86/kernel/efi_stub_64.S +++ b/arch/x86/kernel/efi_stub_64.S @@ -8,61 +8,102 @@ #include linux/linkage.h +#define SAVE_XMM \ + mov %rsp, %rax; \ + subq $0x70, %rsp; \ + and $~0xf, %rsp;\ + mov %rax, (%rsp); \ + mov %cr0, %rax; \ + clts; \ + mov %rax, 0x8(%rsp);\ + movaps %xmm0, 0x60(%rsp); \ + movaps %xmm1, 0x50(%rsp); \ + movaps %xmm2, 0x40(%rsp); \ + movaps %xmm3, 0x30(%rsp); \ + movaps %xmm4, 0x20(%rsp); \ + movaps %xmm5, 0x10(%rsp) + +#define RESTORE_XMM\ + movaps 0x60(%rsp), %xmm0; \ + movaps 0x50(%rsp), %xmm1; \ + movaps 0x40(%rsp), %xmm2; \ + movaps 0x30(%rsp), %xmm3; \ + movaps 0x20(%rsp), %xmm4; \ + movaps 0x10(%rsp), %xmm5; \ + mov 0x8(%rsp), %rsi;\ + mov %rsi, %cr0; \ + mov (%rsp), %rsp + ENTRY(efi_call0) - subq $40, %rsp + SAVE_XMM + subq $32, %rsp call *%rdi - addq $40, %rsp + addq $32, %rsp + RESTORE_XMM ret ENTRY(efi_call1) - subq $40, %rsp + SAVE_XMM + subq $32, %rsp mov %rsi, %rcx call *%rdi - addq $40, %rsp + addq $32, %rsp + RESTORE_XMM ret ENTRY(efi_call2) - subq $40, %rsp + SAVE_XMM + subq $32, %rsp mov %rsi, %rcx call *%rdi - addq $40, %rsp + addq $32, %rsp + RESTORE_XMM ret ENTRY(efi_call3) - subq $40, %rsp + SAVE_XMM + subq $32, %rsp mov %rcx, %r8 mov %rsi, %rcx call *%rdi - addq $40, %rsp + addq $32, %rsp + RESTORE_XMM ret ENTRY(efi_call4) - subq $40, %rsp + SAVE_XMM + subq $32, %rsp mov %r8, %r9 mov %rcx, %r8 mov %rsi, %rcx call *%rdi - addq $40, %rsp + addq $32, %rsp + RESTORE_XMM ret ENTRY(efi_call5) - subq $40, %rsp + SAVE_XMM + subq $48, %rsp mov %r9, 32(%rsp) mov %r8, %r9 mov %rcx, %r8 mov %rsi, %rcx call *%rdi - addq $40, %rsp + addq $48, %rsp + RESTORE_XMM ret ENTRY(efi_call6) - subq $56, %rsp - mov 56+8(%rsp), %rax + SAVE_XMM + mov (%rsp), %rax + mov 8(%rax), %rax + subq $48, %rsp mov %r9, 32(%rsp) mov %rax, 40(%rsp) mov %r8, %r9 mov %rcx, %r8 mov %rsi, %rcx call *%rdi - addq $56, %rsp + addq $48, %rsp + RESTORE_XMM ret -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore
This patch adds a file in proc file system to access the loaded kexec_image, which may contains the memory image of kexeced system. This can be used by kexec based hibernation to create a file image of hibernating kernel, so that a kernel booting process is not needed for each hibernating. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- fs/proc/Makefile |1 fs/proc/kimgcore.c| 204 ++ fs/proc/proc_misc.c |5 + include/linux/kexec.h |7 + kernel/kexec.c|5 - 5 files changed, 217 insertions(+), 5 deletions(-) --- /dev/null +++ b/fs/proc/kimgcore.c @@ -0,0 +1,204 @@ +/* + * fs/proc/kimgcore.c - Interface for accessing the loaded + * kexec_image, which may contains the memory image of kexeced system. + * Heavily borrowed from fs/proc/kcore.c + * + * Copyright (C) 2007, Intel Corp. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2 + */ + +#include linux/mm.h +#include linux/proc_fs.h +#include linux/user.h +#include linux/elf.h +#include linux/init.h +#include linux/kexec.h +#include linux/io.h +#include linux/highmem.h +#include asm/uaccess.h + +struct proc_dir_entry *proc_root_kimgcore; + +static u32 kimgcore_size; + +static char *elfcorebuf; +static size_t elfcorebuf_sz; + +static void *buf_page; + +static ssize_t kimage_copy_to_user(struct kimage *image, char __user *buf, + unsigned long offset, size_t count) +{ + kimage_entry_t *ptr, entry; + unsigned long off = 0, offinp, trunk; + struct page *page; + void *vaddr; + + for_each_kimage_entry(image, ptr, entry) { + if (!(entry IND_SOURCE)) + continue; + if (off + PAGE_SIZE offset) { + offinp = offset - off; + if (count PAGE_SIZE - offinp) + trunk = PAGE_SIZE - offinp; + else + trunk = count; + page = pfn_to_page(entry PAGE_SHIFT); + if (PageHighMem(page)) { + vaddr = kmap(page); + memcpy(buf_page, vaddr+offinp, trunk); + kunmap(page); + vaddr = buf_page; + } else + vaddr = __va(entry PAGE_MASK) + offinp; + if (copy_to_user(buf, vaddr, trunk)) + return -EFAULT; + buf += trunk; + offset += trunk; + count -= trunk; + if (!count) + break; + } + off += PAGE_SIZE; + } + return count; +} + +static ssize_t read_kimgcore(struct file *file, char __user *buffer, +size_t buflen, loff_t *fpos) +{ + size_t acc = 0; + size_t tsz; + ssize_t ssz; + + if (buflen == 0 || *fpos = kimgcore_size) + return 0; + + /* trim buflen to not go beyond EOF */ + if (buflen kimgcore_size - *fpos) + buflen = kimgcore_size - *fpos; + /* Read ELF core header */ + if (*fpos elfcorebuf_sz) { + tsz = elfcorebuf_sz - *fpos; + if (buflen tsz) + tsz = buflen; + if (copy_to_user(buffer, elfcorebuf + *fpos, tsz)) + return -EFAULT; + buflen -= tsz; + *fpos += tsz; + buffer += tsz; + acc += tsz; + + /* leave now if filled buffer already */ + if (buflen == 0) + return acc; + } + + ssz = kimage_copy_to_user(kexec_image, buffer, + *fpos - elfcorebuf_sz, buflen); + if (ssz 0) + return ssz; + + *fpos += (buflen - ssz); + acc += (buflen - ssz); + + return acc; +} + +static int init_kimgcore(void) +{ + Elf64_Ehdr *ehdr; + Elf64_Phdr *phdr; + struct kexec_segment *seg; + Elf64_Off off; + unsigned long i; + + elfcorebuf_sz = sizeof(Elf64_Ehdr) + + kexec_image-nr_segments * sizeof(Elf64_Phdr); + elfcorebuf = kzalloc(elfcorebuf_sz, GFP_KERNEL); + if (!elfcorebuf) + return -ENOMEM; + ehdr = (Elf64_Ehdr *)elfcorebuf; + memcpy(ehdr-e_ident, ELFMAG, SELFMAG); + ehdr-e_ident[EI_CLASS] = ELFCLASS64; + ehdr-e_ident[EI_DATA] = ELFDATA2LSB; + ehdr-e_ident[EI_VERSION] = EV_CURRENT; + ehdr-e_ident[EI_OSABI] = ELFOSABI_NONE; + memset(ehdr-e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD); + ehdr-e_type = ET_CORE; + ehdr-e_machine = ELF_ARCH; + ehdr-e_version = EV_CURRENT; + ehdr-e_entry = kexec_image-start; + ehdr-e_phoff = sizeof
[PATCH 2/4 -mm] kexec based hibernation -v7 : kexec restore
This patch adds writing support for /dev/oldmem. This is used to restore the memory contents of hibernated system. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/crash_dump_32.c | 27 +++ drivers/char/mem.c | 32 include/linux/crash_dump.h |2 ++ 3 files changed, 61 insertions(+) --- a/arch/x86/kernel/crash_dump_32.c +++ b/arch/x86/kernel/crash_dump_32.c @@ -59,6 +59,33 @@ ssize_t copy_oldmem_page(unsigned long p return csize; } +ssize_t write_oldmem_page(unsigned long pfn, const char *buf, + size_t csize, unsigned long offset, int userbuf) +{ + void *vaddr; + + if (!csize) + return 0; + + if (!userbuf) { + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, buf, csize); + } else { + if (!kdump_buf_page) { + printk(KERN_WARNING Kdump: Kdump buffer page not +allocated\n); + return -EFAULT; + } + if (copy_from_user(kdump_buf_page, buf, csize)) + return -EFAULT; + vaddr = kmap_atomic_pfn(pfn, KM_PTE0); + memcpy(vaddr + offset, kdump_buf_page, csize); + } + kunmap_atomic(vaddr, KM_PTE0); + + return csize; +} + static int __init kdump_buf_page_init(void) { int ret = 0; --- a/include/linux/crash_dump.h +++ b/include/linux/crash_dump.h @@ -11,6 +11,8 @@ extern unsigned long long elfcorehdr_addr; extern ssize_t copy_oldmem_page(unsigned long, char *, size_t, unsigned long, int); +extern ssize_t write_oldmem_page(unsigned long, const char *, size_t, +unsigned long, int); extern const struct file_operations proc_vmcore_operations; extern struct proc_dir_entry *proc_vmcore; --- a/drivers/char/mem.c +++ b/drivers/char/mem.c @@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file * } return read; } + +/* + * Write memory corresponding to the old kernel. + */ +static ssize_t write_oldmem(struct file *file, const char __user *buf, + size_t count, loff_t *ppos) +{ + unsigned long pfn, offset; + size_t write = 0, csize; + int rc = 0; + + while (count) { + pfn = *ppos / PAGE_SIZE; + if (pfn saved_max_pfn) + return write; + + offset = (unsigned long)(*ppos % PAGE_SIZE); + if (count PAGE_SIZE - offset) + csize = PAGE_SIZE - offset; + else + csize = count; + rc = write_oldmem_page(pfn, buf, csize, offset, 1); + if (rc 0) + return rc; + buf += csize; + *ppos += csize; + write += csize; + count -= csize; + } + return write; +} #endif extern long vread(char *buf, char *addr, unsigned long count); @@ -783,6 +814,7 @@ static const struct file_operations full #ifdef CONFIG_CRASH_DUMP static const struct file_operations oldmem_fops = { .read = read_oldmem, + .write = write_oldmem, .open = open_oldmem, }; #endif -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/4 -mm] kexec based hibernation -v7
. Load the memory image of hibernating kernel with following shell command line: kexec -l --args-none --flags=0x3e kimgcore 7. Start the real hibernating process with following shell command line: kexec -e -c 0x6b630001 The hibernating kernel will write the memory image of hibernated kernel and go to ACPI S4 state automatically. 8. Boot kernel (kernel C) compiled for hibernating/resuming usage in memory range of kernel B. The go_to_resume should be specified in kernel command line to trigger the resuming process automatically. For example, the following kernel command line parameters can be used: memmap=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED] mem=16M go_to_resume khdev=3:7 The initramfs should be used too. In GRUB, this can be specified with following grub command: initrd /boot/rootfs.gz The resuming kernel will restore the memory image of hibernated kernel and jump back to hibernated kernel automatically. Known issues: - The suspend/resume callback of device drivers are used to put devices into quiescent state. This will unnecessarily (possibly harmfully) put devices into low power state. This is intended to be solved by separating device quiesce/unquiesce callback from the device suspend/resume callback. - The memory image of hibernated kernel must be saved in a separate partition not used by hibernated kernel. This is planned to be solved through making hibernating/resuming kernel write the memory image to a file in partition used by hibernated kernel through block list instead. - The hibernating/resuming code are duplicated with current u/swsusp code. They will be merged when kexec based hibernation goes more stable. - The setup of hibernate/resume is fairly complex. I will continue working on simplifying. TODO: - Write the memory image to a file through block list instead of ordinary file system operating. - Merge duplicated code between kexec based hibernation and u/swsusp. - Simplify hibernate/resume setup. - Resume from hibernation with bootloader. ChangeLog: v7: - Add an interface to dump the loaded kexec_image, which may contains the memory image of kexeced system. This is used to accelerate kexec based hibernation. - Refactor kexec jump to be a command driven programming model. - Adjust ACPI support to mimic the ACPI support of u/swsusp. - Use kexec_lock to do synchronization. v6: - Add ACPI support. - Refactor kexec jump to be a general facility to call real mode code. v5: - A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel image is used for jumping back. The reboot command for jumping back is removed. This interface is more stable (proposed by Eric Biederman). - NX bit handling support for kexec is added. - Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute from machine_kexec. - Passing jump back entry to kexeced kernel via kernel command line (parsed by user space tool via /proc/cmdline instead of kernel). Original corresponding boot parameter and sysfs code is removed. v4: - Two reboot command are merged back to one because the underlying implementation is same. - Jumping without reserving memory is implemented. As a side effect, two direction jumping is implemented. - A jump back protocol is defined and documented. The original kernel and kexeced kernel are more independent from each other. - The CPU state save/restore code are merged into relocate_kernel.S. v3: - The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two reboot command to reflect the different function. - Document is added for added kernel parameters. - /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for memory image restoring. - Console restoring after jumping back is implemented. - Writing support is added for /dev/oldmem, to restore memory contents of hibernated system. v2: - The kexec jump implementation is put into the kexec/kdump framework instead of software suspend framework. The device and CPU state save/restore code of software suspend is called when needed. - The same code path is used for both kexec a new kernel and jump back to original kernel. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
This patch implements the functionality of jumping between the kexeced kernel and the original kernel. To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by new (kexeced) kernel (destination page). When do kexec_load, the image of new kernel is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the new (kexeced) kernel and after jumping back to the original kernel, the destination pages and the source pages are swapped too. A jump back protocol for kexec is defined and documented. It is an extension to ordinary function calling protocol. So, the facility provided by this patch can be used to call ordinary C function in real mode. A set of flags for sys_kexec_load are added to control which state are saved/restored before/after real mode code executing. For example, you can specify the device state and FPU state are saved/restored before/after real mode code executing. The states (exclude CPU state) save/restore code can be overridden based on the command parameter of kexec jump. Because more states need to be saved/restored by hibernating/resuming. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/i386/jump_back_protocol.txt | 103 ++ arch/powerpc/kernel/machine_kexec.c |2 arch/ppc/kernel/machine_kexec.c |2 arch/sh/kernel/machine_kexec.c|2 arch/x86/kernel/machine_kexec_32.c| 88 +--- arch/x86/kernel/machine_kexec_64.c|2 arch/x86/kernel/relocate_kernel_32.S | 214 +++--- include/asm-x86/kexec_32.h| 39 - include/linux/kexec.h | 40 + kernel/kexec.c| 188 ++ kernel/power/Kconfig |2 kernel/sys.c | 35 +++- 12 files changed, 648 insertions(+), 69 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -20,6 +20,7 @@ #include asm/cpufeature.h #include asm/desc.h #include asm/system.h +#include asm/cacheflush.h #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) static u32 kexec_pgd[1024] PAGE_ALIGNED; @@ -83,10 +84,14 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * Turn off NX bit for control page. */ int machine_kexec_prepare(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC); + global_flush_tlb(); + } return 0; } @@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL); + global_flush_tlb(); + } +} + +void machine_kexec(struct kimage *image) +{ + machine_kexec_call(image, NULL, 0); } /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, +unsigned int argc, va_list args) { unsigned long page_list[PAGES_NR]; void *control_page; + asmlinkage NORET_TYPE void + (*relocate_kernel_ptr)(unsigned long indirection_page, + unsigned long control_page, + unsigned long start_address, + unsigned int has_pae) ATTRIB_NORET; /* Interrupts aren't acceptable while we reboot */ local_irq_disable(); control_page = page_address(image-control_code_page); - memcpy(control_page, relocate_kernel, PAGE_SIZE); + memcpy(control_page, relocate_page, PAGE_SIZE/2); + KCALL_MAGIC(control_page) = 0; + if (image-preserve_cpu) { + unsigned int i; + KCALL_MAGIC(control_page) = KCALL_MAGIC_NUMBER; + KCALL_ARGC(control_page) = argc; + for (i = 0; i argc; i++) + KCALL_ARGS(control_page)[i] = \ + va_arg(args, unsigned long
[PATCH 3/4 -mm] kexec based hibernation -v7 : kexec hibernate/resume
This patch implements kexec based hibernate/resume. This is based on the facility provided by kexec_jump. The states save/restore code of ordinary kexec_jump is overridden by hibernate/resume specific code. The ACPI methods are called at specified environment to conform the ACPI specification. A new reboot command is added to go to ACPI S4 state from user space. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- include/linux/kexec.h |4 include/linux/reboot.h |1 include/linux/suspend.h |1 kernel/power/disk.c | 244 +++- kernel/sys.c|5 5 files changed, 251 insertions(+), 4 deletions(-) --- a/kernel/power/disk.c +++ b/kernel/power/disk.c @@ -21,6 +21,7 @@ #include linux/console.h #include linux/cpu.h #include linux/freezer.h +#include linux/kexec.h #include power.h @@ -365,13 +366,13 @@ int hibernation_platform_enter(void) } /** - * power_down - Shut the machine down for hibernation. + * hibernate_power_down - Shut the machine down for hibernation. * * Use the platform driver, if configured so; otherwise try * to power off or reboot. */ -static void power_down(void) +void hibernate_power_down(void) { switch (hibernation_mode) { case HIBERNATION_TEST: @@ -461,7 +462,7 @@ int hibernate(void) error = swsusp_write(flags); swsusp_free(); if (!error) - power_down(); + hibernate_power_down(); } else { pr_debug(PM: Image restored successfully.\n); swsusp_free(); @@ -478,6 +479,243 @@ int hibernate(void) return error; } +#ifdef CONFIG_KEXEC +static int kexec_snapshot(struct notifier_block *nb, + unsigned long cmd, void *arg) +{ + int error; + int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM); + + if (cmd != KJUMP_CMD_HIBERNATE_WRITE_IMAGE) + return NOTIFY_DONE; + + pm_prepare_console(); + + error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); + if (error) + goto Exit; + + error = freeze_processes(); + if (error) { + error = -EBUSY; + goto Exit; + } + + if (hibernation_test(TEST_FREEZER) || + hibernation_testmode(HIBERNATION_TESTPROC)) { + error = -EAGAIN; + goto Resume_process; + } + + error = platform_start(platform_mode); + if (error) + goto Resume_process; + + suspend_console(); + error = device_suspend(PMSG_FREEZE); + if (error) + goto Resume_console; + + if (hibernation_test(TEST_DEVICES)) { + error = -EAGAIN; + goto Resume_devices; + } + + error = platform_pre_snapshot(platform_mode); + if (error) + goto Resume_devices; + + if (hibernation_test(TEST_PLATFORM)) { + error = -EAGAIN; + goto Resume_devices; + } + + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; + + if (hibernation_test(TEST_CPUS) || + hibernation_testmode(HIBERNATION_TEST)) { + error = -EAGAIN; + goto Enable_cpus; + } + + local_irq_disable(); + /* At this point, device_suspend() has been called, but *not* +* device_power_down(). We *must* device_power_down() now. +* Otherwise, drivers for some devices (e.g. interrupt +* controllers) become desynchronized with the actual state of +* the hardware at resume time, and evil weirdness ensues. +*/ + error = device_power_down(PMSG_FREEZE); + if (error) + goto Enable_irqs; + + if (hibernation_test(TEST_CORE)) { + error = -EAGAIN; + goto Power_up; + } + + return NOTIFY_STOP; + + Power_up: + device_power_up(); + Enable_irqs: + local_irq_enable(); + Enable_cpus: + enable_nonboot_cpus(); + Resume_devices: + platform_finish(platform_mode); + device_resume(); + Resume_console: + resume_console(); + Resume_process: + thaw_processes(); + Exit: + pm_notifier_call_chain(PM_POST_HIBERNATION); + pm_restore_console(); + return notifier_from_errno(error); +} + +static int kexec_prepare_write_image(struct notifier_block *nb, +unsigned long cmd, void *arg) +{ + int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM); + + if (cmd != KJUMP_CMD_HIBERNATE_WRITE_IMAGE) + return NOTIFY_DONE; + + device_power_up(); + local_irq_enable(); + enable_nonboot_cpus(); + platform_finish(platform_mode); + device_resume(); + resume_console(); + thaw_processes(); + pm_restore_console(); + return
Re: [PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore
On Dec 7, 2007 8:33 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Friday, 7 of December 2007, Huang, Ying wrote: This patch adds a file in proc file system to access the loaded kexec_image, which may contains the memory image of kexeced system. This can be used by kexec based hibernation to create a file image of hibernating kernel, so that a kernel booting process is not needed for each hibernating. Hm, I'm not sure what you mean. Can you explain a bit, please? The normal kexec based hibernation procedure is as follow: 1. kexec_load the kernel image and initramfs 2. jump to hibernating kernel 3. the normal boot process of kexeced kernel 4. jump back to hibernated kernel 5. execute ACPI methods 6. jump to hibernating kernel 7. write memory image of hibernated kernel 8. go to ACPI S4 state With kimgcore: A. Prepare a memory image of hibernation kernel: A.1 kexec_load the kernel image and initramfs A.2 jump to hibernating kernel A.3 the normal boot process of kexeced kernel A.4 jump back to hibernated kernel A.5 save the memory image of hibernating kernel via kimgcore The normal hibernate process is as follow: 1. kexec load the kimgcore of hibernatin kernel 2. jump to the hibernating kernel 3. execute ACPI methods 4. jump to hibernating kernel 5. write memory image of hibernated kernel 6. go to ACPI S4 state So the boot process of hibernating kernel needs only once unless the hardware configuration is changed. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/4 -mm] kexec based hibernation -v7 : kexec hibernate/resume
On Dec 7, 2007 8:52 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote: On Friday, 7 of December 2007, Huang, Ying wrote: This patch implements kexec based hibernate/resume. This is based on the facility provided by kexec_jump. The states save/restore code of ordinary kexec_jump is overridden by hibernate/resume specific code. Can you explain in more details how this works? Two blocking notifier chain named kjump_chain_pre and kjump_chain_post are defined, the basic procedure of kexec jump is as follow: call functions in kjump_chain_pre jump to peer kernel call functions in kjump_chain_post A command is the first parameter of functions in chain. If A command is processed in a function, the function will execute and stop the chain (return NOTIFY_STOP), otherwise it will do nothing (return NOTIFY_DONE). If no function has interest in the command, the default behavior will be executed (kexec_vcall_pre, kexec_vcall_post). So for each command the procedure is as follow: KJUMP_CMD_HIBERNATE_WRITE_IMAGE: [chain] kexec_snapshot jump to kexeced kernel [chain] kexec_prepare_write_image /* in kexeced kernel */ KJUMP_HIBERNATE_RESUME: [chain] kexec_prepare_resume /* in kexeced kernel */ jump to kexec kernel [chain] kexec_resume The ACPI methods are called at specified environment to conform the ACPI specification. A new reboot command is added to go to ACPI S4 state from user space. Well, I still don't like the amount of duplicated code introduced by this patch. Yes, there are too many duplicated code. They should be merged. But I want to delay the merging until the kexec based hibernation code goes more stable. Also, IMO it should be using the mutual exclusion mechanisms used by the existing hibernation code, ie. pm_mutex and the snapshot_device_available atomic variable. Now the kexec_lock is used as a mutex between kexec related operations. It seems reasonable to use pm_mutex and maybe snapshot_device_available to eliminate potential conflict between kexec based hibernation and u/swsusp. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume
On Mon, 2007-11-19 at 19:22 +0100, Rafael J. Wysocki wrote: +#ifdef CONFIG_KEXEC +static void kexec_hibernate_power_down(void) +{ + switch (hibernation_mode) { + case HIBERNATION_TEST: + case HIBERNATION_TESTPROC: + break; + case HIBERNATION_REBOOT: + machine_restart(NULL); + break; + case HIBERNATION_PLATFORM: + if (!hibernation_ops) + break; + hibernation_ops-enter(); hibernation_platform_enter() should be used here (as of the current mainline). The power_down will be called with interrupt disabled, device suspended, non-boot CPU disabled. But the latest hibernate_platform_enter calls the device_suspend, disable_nonboot_cpus etc function. So, I use hibernation_ops-enter() directly instead of hibernation_platform_enter(). + /* We should never get here */ + while (1); + break; + case HIBERNATION_SHUTDOWN: + machine_power_off(); + break; + } + machine_halt(); + /* +* Valid image is on the disk, if we continue we risk serious data +* corruption after resume. +*/ + printk(KERN_CRIT Please power me down manually\n); + while (1); +} Hm, what's the difference between the above function and power_down(), actually? Same as above. + +int kexec_hibernate(struct kimage *image) +{ + int error; + int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM); + unsigned long cmd_ret; + + mutex_lock(pm_mutex); + + pm_prepare_console(); + suspend_console(); + + error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE); + if (error) + goto Resume_console; + + error = platform_start(platform_mode); + if (error) + goto Resume_console; + + error = device_suspend(PMSG_FREEZE); + if (error) + goto Resume_console; + + error = platform_pre_snapshot(platform_mode); + if (error) + goto Resume_devices; + + error = disable_nonboot_cpus(); + if (error) + goto Resume_devices; I wonder if it's viable to merge the above with hibernate() and hibernation_snapshot() somehow, to avoid code duplication? Yes. Most code are duplicated. But there is one advantage not to merge them: power_down can called with IRQ disabled to make it possible to eliminate the freezer. I think it is possible to merge the two implementation. I will try to do it. Apart from the above, there's some new debug code to be added to disk.c in 2.6.25. It's in the ACPI test tree right now and you can get it as individual patches from: http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.24-rc2/patches/ (patches 10-12). Please base your changes on top of that. OK, I will do it. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume
On Tue, 2007-11-20 at 03:24 +0100, Rafael J. Wysocki wrote: On Tuesday, 20 of November 2007, Huang, Ying wrote: On Mon, 2007-11-19 at 19:22 +0100, Rafael J. Wysocki wrote: +#ifdef CONFIG_KEXEC +static void kexec_hibernate_power_down(void) +{ + switch (hibernation_mode) { + case HIBERNATION_TEST: + case HIBERNATION_TESTPROC: + break; + case HIBERNATION_REBOOT: + machine_restart(NULL); + break; + case HIBERNATION_PLATFORM: + if (!hibernation_ops) + break; + hibernation_ops-enter(); hibernation_platform_enter() should be used here (as of the current mainline). The power_down will be called with interrupt disabled, device suspended, non-boot CPU disabled. But the latest hibernate_platform_enter calls the device_suspend, disable_nonboot_cpus etc function. So, I use hibernation_ops-enter() directly instead of hibernation_platform_enter(). Hm, you need to call device_power_down(PMSG_SUSPEND) before hibernation_ops-enter(). Also, all of the ACPI global calls need to be carried out before that and the devices should be suspended rather than shut down in that case. That's why hibernation_platform_enter() has been introduced, BTW. Situation is a little different between u/swsusp and khiberantion. u/swsusp: platform_start(); suspend console(); device_suspend(PMSG_FREEZE); platform_pre_snapshot(); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_FREEZE); /* create snapshot */ device_power_up(); local_irq_enable(); enable_nonboot_cpus(); platform_finish(); device_resume(); resume_console(); /* write the image out */ hibernation_ops-start(); suspend_console(); device_suspend(PMSG_SUSPEND); hibernation_ops-prepare(); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_SUSPEND); hibernation_ops-enter(); khibernation: suspend_console(); platform_start(); device_suspend(PMSG_FREEZE); platform_pre_snapshot(); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_FREEZE); /* jump to kexeced (hibernating) kernel */ /* in kexeced kernel */ device_power_up(); local_irq_eanble(); enable_nonboot_cpus(); device_resume(); resume_console(); /* write the image */ suspend_console(); device_suspend(PMSG_FREEZE); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_FREEZE); /* jump to original (hibernated) kernel */ /* in original kernel */ hibernation_ops-enter(); The difference is: - In u/swsusp, ACPI methods are executed twice, before writing out the image and after writing out the image. - After writing out the image, the PMSG_SUSPEND is used instead of PMSG_FREEZE. Some questions: - What is the difference between PMSG_SUSPEND and PMSG_FREEZE? - The ACPI methods should be executed once or twice? According to ACPI specification? Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume
On Wed, 2007-11-21 at 01:00 +0100, Rafael J. Wysocki wrote: On Tuesday, 20 of November 2007, Huang, Ying wrote: On Tue, 2007-11-20 at 03:24 +0100, Rafael J. Wysocki wrote: On Tuesday, 20 of November 2007, Huang, Ying wrote: On Mon, 2007-11-19 at 19:22 +0100, Rafael J. Wysocki wrote: +#ifdef CONFIG_KEXEC +static void kexec_hibernate_power_down(void) +{ + switch (hibernation_mode) { + case HIBERNATION_TEST: + case HIBERNATION_TESTPROC: + break; + case HIBERNATION_REBOOT: + machine_restart(NULL); + break; + case HIBERNATION_PLATFORM: + if (!hibernation_ops) + break; + hibernation_ops-enter(); hibernation_platform_enter() should be used here (as of the current mainline). The power_down will be called with interrupt disabled, device suspended, non-boot CPU disabled. But the latest hibernate_platform_enter calls the device_suspend, disable_nonboot_cpus etc function. So, I use hibernation_ops-enter() directly instead of hibernation_platform_enter(). Hm, you need to call device_power_down(PMSG_SUSPEND) before hibernation_ops-enter(). Also, all of the ACPI global calls need to be carried out before that and the devices should be suspended rather than shut down in that case. That's why hibernation_platform_enter() has been introduced, BTW. Situation is a little different between u/swsusp and khiberantion. u/swsusp: platform_start(); suspend console(); device_suspend(PMSG_FREEZE); platform_pre_snapshot(); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_FREEZE); /* create snapshot */ device_power_up(); local_irq_enable(); enable_nonboot_cpus(); platform_finish(); device_resume(); resume_console(); /* write the image out */ hibernation_ops-start(); suspend_console(); device_suspend(PMSG_SUSPEND); hibernation_ops-prepare(); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_SUSPEND); hibernation_ops-enter(); khibernation: suspend_console(); platform_start(); device_suspend(PMSG_FREEZE); platform_pre_snapshot(); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_FREEZE); /* jump to kexeced (hibernating) kernel */ /* in kexeced kernel */ device_power_up(); local_irq_eanble(); enable_nonboot_cpus(); You should call platform_finish() here, or device_resume() will not work appropriately on some systems. However, after platform_finish() has been executed, the ACPI firmware will assume that the hibernation has been canceled, so you need to tell it that you'd like to go into the low power state after all. device_resume(); resume_console(); /* write the image */ For this reason, you have to call hibernation_ops-start() once again and the other functions like in the swsusp case, in that order. suspend_console(); device_suspend(PMSG_FREEZE); disable_nonboot_cpus(); local_irq_disable(); device_power_down(PMSG_FREEZE); /* jump to original (hibernated) kernel */ This looks too fragile to my eyes. Why don't you call hibernation_ops-enter() directly from the kexeced kernel? I don't know whether there are ACPI global state inside Linux kernel. So I restrict all ACPI method calling in original kernel. If the ACPI global state in Linux kernel is not a issue, I can call hibernation_ops-enter() directly in the kexeced kernel. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 3/4 -v6] x86_64 EFI runtime service support: document for EFI runtime services
This patch adds document for EFI x86_64 runtime services support. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/x86_64/boot-options.txt |9 - Documentation/x86_64/uefi.txt |9 + 2 files changed, 17 insertions(+), 1 deletion(-) --- a/Documentation/x86_64/boot-options.txt +++ b/Documentation/x86_64/boot-options.txt @@ -110,7 +110,7 @@ Idle loop Rebooting - reboot=b[ios] | t[riple] | k[bd] | a[cpi] [, [w]arm | [c]old] + reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] [, [w]arm | [c]old] bios Use the CPU reboot vector for warm reset warm Don't set the cold reboot flag cold Set the cold reboot flag @@ -119,6 +119,9 @@ Rebooting acpi Use the ACPI RESET_REG in the FADT. If ACPI is not configured or the ACPI reset does not work, the reboot path attempts the reset using the keyboard controller. + efiUse efi reset_system runtime service. If EFI is not configured or the + EFI reset does not work, the reboot path attempts the reset using + the keyboard controller. Using warm reset will be much faster especially on big memory systems because the BIOS will not go through the memory check. @@ -303,4 +306,8 @@ Debugging newfallback: use new unwinder but fall back to old if it gets stuck (default) +EFI + + noefiDisable EFI support + Miscellaneous --- a/Documentation/x86_64/uefi.txt +++ b/Documentation/x86_64/uefi.txt @@ -19,6 +19,10 @@ Mechanics: - Build the kernel with the following configuration. CONFIG_FB_EFI=y CONFIG_FRAMEBUFFER_CONSOLE=y + If EFI runtime services are expected, the following configuration should + be selected. + CONFIG_EFI=y + CONFIG_EFI_VARS=y or m # optional - Create a VFAT partition on the disk - Copy the following to the VFAT partition: elilo bootloader with x86_64 support, elilo configuration file, @@ -27,3 +31,8 @@ Mechanics: can be found in the elilo sourceforge project. - Boot to EFI shell and invoke elilo choosing the kernel image built in first step. +- If some or all EFI runtime services don't work, you can try following + kernel command line parameters to turn off some or all EFI runtime + services. + noefi turn off all EFI runtime services + reboot_type=k turn off EFI reboot runtime service - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 0/4 -v6] x86_64 EFI runtime service support
Following patchset adds EFI/UEFI (Unified Extensible Firmware Interface) runtime services support to x86_64 architecture. The patchset have been tested against 2.6.24-rc3-mm1 kernel on Intel platforms with 64-bit EFI1.10 and UEFI2.0 firmware. Because the duplicated code between efi_32.c and efi_64.c is removed, the patchset is also tested on Intel platform with 32-bit EFI firmware. v6: - Fix a bug about runtime service memory mapping. - Rebase on 2.6.24-rc3-mm1 v5: - Remove the duplicated code between efi_32.c and efi_64.c. - Rename lin2winx to efi_callx. - Make EFI time runtime service default to off. - Use different bootloader signature for EFI32 and EFI64, so that kernel can know whether underlaying EFI firmware is 64-bit or 32-bit. v4: - EFI boot parameters are extended for 64-bit EFI in a 32-bit EFI compatible way. - Add EFI runtime services document. v3: - Remove E820_RUNTIME_CODE, the EFI memory map is used to deal with EFI runtime code area. - The method used to make EFI runtime code area executable is change: a. Before page allocation is usable, the PMD of direct mapping is changed temporarily before and after each EFI call. b. After page allocation is usable, change_page_attr_addr is used to change corresponding page attribute. - Use fixmap to map EFI memory mapped IO memory area to make kexec workable. - Add a kernel command line option noefi to make it possible to turn off EFI runtime services support. - Function pointers are used for EFI time runtime service. - EFI reboot runtime service is embedded into the framework of reboot_type. - A kernel command line option noefi_time is added to make it possible to fall back to CMOS based implementation. v2: - The EFI callwrapper is re-implemented in assembler. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 1/4 -v6] x86_64 EFI runtime service support: EFI basic runtime service support
This patch adds basic runtime services support for EFI x86_64 system. The main file of the patch is the addition of efi_64.c for x86_64. This file is modeled after the EFI IA32 avatar. EFI runtime services initialization are implemented in efi_64.c. Some x86_64 specifics are worth noting here. On x86_64, parameters passed to EFI firmware services need to follow the EFI calling convention. For this purpose, a set of functions named efi_callx (x is the number of parameters) are implemented. EFI function calls are wrapped before calling the firmware service. The duplicated code between efi_32.c and efi_64.c is placed in efi.c to remove them from efi_32.c. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/Kconfig |2 arch/x86/kernel/Makefile_64 |1 arch/x86/kernel/efi.c | 484 ++ arch/x86/kernel/efi_64.c | 171 ++ arch/x86/kernel/efi_stub_64.S | 68 + arch/x86/kernel/setup_64.c| 17 + include/asm-x86/bootparam.h |5 include/asm-x86/efi.h | 70 ++ include/asm-x86/fixmap_64.h |3 9 files changed, 817 insertions(+), 4 deletions(-) --- /dev/null +++ b/arch/x86/kernel/efi_64.c @@ -0,0 +1,171 @@ +/* + * x86_64 specific EFI support functions + * Based on Extensible Firmware Interface Specification version 1.0 + * + * Copyright (C) 2005-2008 Intel Co. + * Fenghua Yu [EMAIL PROTECTED] + * Bibo Mao [EMAIL PROTECTED] + * Chandramouli Narayanan [EMAIL PROTECTED] + * Huang Ying [EMAIL PROTECTED] + * + * Code to convert EFI to E820 map has been implemented in elilo bootloader + * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table + * is setup appropriately for EFI runtime code. + * - mouli 06/14/2007. + * + */ + +#include linux/kernel.h +#include linux/init.h +#include linux/mm.h +#include linux/types.h +#include linux/spinlock.h +#include linux/bootmem.h +#include linux/ioport.h +#include linux/module.h +#include linux/efi.h +#include linux/uaccess.h +#include linux/io.h +#include linux/reboot.h + +#include asm/setup.h +#include asm/page.h +#include asm/e820.h +#include asm/pgtable.h +#include asm/tlbflush.h +#include asm/cacheflush.h +#include asm/proto.h +#include asm/efi.h + +static pgd_t save_pgd __initdata; +static unsigned long efi_flags __initdata; +/* efi_lock protects efi physical mode call */ +static __initdata DEFINE_SPINLOCK(efi_lock); + +static int __init setup_noefi(char *arg) +{ + efi_enabled = 0; + return 0; +} +early_param(noefi, setup_noefi); + +static void __init early_mapping_set_exec(unsigned long start, + unsigned long end, + int executable) +{ + pte_t *kpte; + + while (start end) { + kpte = lookup_address((unsigned long)__va(start)); + BUG_ON(!kpte); + if (executable) + set_pte(kpte, pte_mkexec(*kpte)); + else + set_pte(kpte, __pte((pte_val(*kpte) | _PAGE_NX) \ + __supported_pte_mask)); + if (pte_huge(*kpte)) + start = (start + PMD_SIZE) PMD_MASK; + else + start = (start + PAGE_SIZE) PAGE_MASK; + } +} + +static void __init early_runtime_code_mapping_set_exec(int executable) +{ + efi_memory_desc_t *md; + void *p; + + /* Make EFI runtime service code area executable */ + for (p = memmap.map; p memmap.map_end; p += memmap.desc_size) { + md = p; + if (md-type == EFI_RUNTIME_SERVICES_CODE) { + unsigned long end; + end = md-phys_addr + (md-num_pages PAGE_SHIFT); + early_mapping_set_exec(md-phys_addr, end, executable); + } + } +} + +void __init efi_call_phys_prelog(void) __acquires(efi_lock) +{ + unsigned long vaddress; + + /* +* Lock sequence is different from normal case because +* efi_flags is global +*/ + spin_lock(efi_lock); + local_irq_save(efi_flags); + early_runtime_code_mapping_set_exec(1); + vaddress = (unsigned long)__va(0x0UL); + pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL)); + set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress)); + global_flush_tlb(); +} + +void __init efi_call_phys_epilog(void) __releases(efi_lock) +{ + /* +* After the lock is released, the original page table is restored. +*/ + set_pgd(pgd_offset_k(0x0UL), save_pgd); + early_runtime_code_mapping_set_exec(0); + global_flush_tlb(); + local_irq_restore(efi_flags); + spin_unlock(efi_lock); +} + +/* + * We need to map the EFI memory map again after init_memory_mapping(). + */ +void __init efi_map_memmap
[PATCH -mm 2/4 -v6] x86_64 EFI runtime service support: EFI runtime services
This patch adds support for several EFI runtime services for EFI x86_64 system. The EFI support for emergency_restart is added. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/reboot_64.c | 20 +--- include/asm-x86/emergency-restart.h |9 + 2 files changed, 22 insertions(+), 7 deletions(-) --- a/arch/x86/kernel/reboot_64.c +++ b/arch/x86/kernel/reboot_64.c @@ -9,6 +9,7 @@ #include linux/pm.h #include linux/kdebug.h #include linux/sched.h +#include linux/efi.h #include acpi/reboot.h #include asm/io.h #include asm/delay.h @@ -27,20 +28,17 @@ void (*pm_power_off)(void); EXPORT_SYMBOL(pm_power_off); static long no_idt[3]; -static enum { - BOOT_TRIPLE = 't', - BOOT_KBD = 'k', - BOOT_ACPI = 'a' -} reboot_type = BOOT_KBD; +enum reboot_type reboot_type = BOOT_KBD; static int reboot_mode = 0; int reboot_force; -/* reboot=t[riple] | k[bd] [, [w]arm | [c]old] +/* reboot=t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbdUse the keyboard controller. cold reset (default) acpi Use the RESET_REG in the FADT + efiUse efi reset_system runtime service force Avoid anything that could hang. */ static int __init reboot_setup(char *str) @@ -59,6 +57,7 @@ static int __init reboot_setup(char *str case 'a': case 'b': case 'k': + case 'e': reboot_type = *str; break; case 'f': @@ -151,7 +150,14 @@ void machine_emergency_restart(void) acpi_reboot(); reboot_type = BOOT_KBD; break; - } + + case BOOT_EFI: + if (efi_enabled) + efi.reset_system(reboot_mode ? EFI_RESET_WARM : EFI_RESET_COLD, +EFI_SUCCESS, 0, NULL); + reboot_type = BOOT_KBD; + break; + } } } --- a/include/asm-x86/emergency-restart.h +++ b/include/asm-x86/emergency-restart.h @@ -1,6 +1,15 @@ #ifndef _ASM_EMERGENCY_RESTART_H #define _ASM_EMERGENCY_RESTART_H +enum reboot_type { + BOOT_TRIPLE = 't', + BOOT_KBD = 'k', + BOOT_ACPI = 'a', + BOOT_EFI = 'e' +}; + +extern enum reboot_type reboot_type; + extern void machine_emergency_restart(void); #endif /* _ASM_EMERGENCY_RESTART_H */ - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm 4/4 -v6] x86_64 EFI runtime service support: remove duplicated code from efi_32.c
This patch removes the duplicated code between efi_32.c and efi.c. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/Makefile_32 |2 arch/x86/kernel/e820_32.c |5 arch/x86/kernel/efi_32.c| 430 arch/x86/kernel/setup_32.c | 11 - include/asm-x86/efi.h | 42 5 files changed, 47 insertions(+), 443 deletions(-) --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -35,7 +35,7 @@ obj-$(CONFIG_KPROBES) += kprobes_32.o obj-$(CONFIG_MODULES) += module_32.o obj-y += sysenter_32.o vsyscall_32.o obj-$(CONFIG_ACPI_SRAT)+= srat_32.o -obj-$(CONFIG_EFI) += efi_32.o efi_stub_32.o +obj-$(CONFIG_EFI) += efi.o efi_32.o efi_stub_32.o obj-$(CONFIG_DOUBLEFAULT) += doublefault_32.o obj-$(CONFIG_VM86) += vm86_32.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o --- a/arch/x86/kernel/efi_32.c +++ b/arch/x86/kernel/efi_32.c @@ -39,21 +39,8 @@ #include asm/desc.h #include asm/tlbflush.h -#define EFI_DEBUG 0 #define PFXEFI: -extern efi_status_t asmlinkage efi_call_phys(void *, ...); - -struct efi efi; -EXPORT_SYMBOL(efi); -static struct efi efi_phys; -struct efi_memory_map memmap; - -/* - * We require an early boot_ioremap mapping mechanism initially - */ -extern void * boot_ioremap(unsigned long, unsigned long); - /* * To make EFI call EFI runtime service in physical addressing mode we need * prelog/epilog before/after the invocation to disable interrupt, to @@ -65,7 +52,7 @@ static unsigned long efi_rt_eflags; static DEFINE_SPINLOCK(efi_rt_lock); static pgd_t efi_bak_pg_dir_pointer[2]; -static void efi_call_phys_prelog(void) __acquires(efi_rt_lock) +void efi_call_phys_prelog(void) __acquires(efi_rt_lock) { unsigned long cr4; unsigned long temp; @@ -108,7 +95,7 @@ static void efi_call_phys_prelog(void) _ load_gdt(gdt_descr); } -static void efi_call_phys_epilog(void) __releases(efi_rt_lock) +void efi_call_phys_epilog(void) __releases(efi_rt_lock) { unsigned long cr4; struct Xgt_desc_struct gdt_descr; @@ -138,87 +125,6 @@ static void efi_call_phys_epilog(void) _ spin_unlock(efi_rt_lock); } -static efi_status_t -phys_efi_set_virtual_address_map(unsigned long memory_map_size, -unsigned long descriptor_size, -u32 descriptor_version, -efi_memory_desc_t *virtual_map) -{ - efi_status_t status; - - efi_call_phys_prelog(); - status = efi_call_phys(efi_phys.set_virtual_address_map, -memory_map_size, descriptor_size, -descriptor_version, virtual_map); - efi_call_phys_epilog(); - return status; -} - -static efi_status_t -phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) -{ - efi_status_t status; - - efi_call_phys_prelog(); - status = efi_call_phys(efi_phys.get_time, tm, tc); - efi_call_phys_epilog(); - return status; -} - -inline int efi_set_rtc_mmss(unsigned long nowtime) -{ - int real_seconds, real_minutes; - efi_status_tstatus; - efi_time_t eft; - efi_time_cap_t cap; - - spin_lock(efi_rt_lock); - status = efi.get_time(eft, cap); - spin_unlock(efi_rt_lock); - if (status != EFI_SUCCESS) - panic(Ooops, efitime: can't read time!\n); - real_seconds = nowtime % 60; - real_minutes = nowtime / 60; - - if (((abs(real_minutes - eft.minute) + 15)/30) 1) - real_minutes += 30; - real_minutes %= 60; - - eft.minute = real_minutes; - eft.second = real_seconds; - - if (status != EFI_SUCCESS) { - printk(Ooops: efitime: can't read time!\n); - return -1; - } - return 0; -} -/* - * This is used during kernel init before runtime - * services have been remapped and also during suspend, therefore, - * we'll need to call both in physical and virtual modes. - */ -inline unsigned long efi_get_time(void) -{ - efi_status_t status; - efi_time_t eft; - efi_time_cap_t cap; - - if (efi.get_time) { - /* if we are in virtual mode use remapped function */ - status = efi.get_time(eft, cap); - } else { - /* we are in physical mode */ - status = phys_efi_get_time(eft, cap); - } - - if (status != EFI_SUCCESS) - printk(Oops: efitime: can't read time status: 0x%lx\n,status); - - return mktime(eft.year, eft.month, eft.day, eft.hour, - eft.minute, eft.second); -} - int is_available_memory(efi_memory_desc_t * md) { if (!(md-attribute EFI_MEMORY_WB)) @@ -250,24 +156,6 @@ void __init efi_map_memmap(void) memmap.map_end = memmap.map
Re: [PATCH -mm 1/4 -v6] x86_64 EFI runtime service support: EFI basic runtime service support
for posterity +*/ + c16 = tmp = efi_early_ioremap(efi.systab-fw_vendor, 2); + if (c16) { + for (i = 0; i sizeof(vendor) *c16; ++i) + vendor[i] = *c16++; + vendor[i] = '\0'; + } else + printk(KERN_ERR Could not map the firmware vendor!\n); That would be a very confusing error message to any poor soul who received it. Please consider prefixing all such things with (say) efi: . I will do it. +/* + * This function will switch the EFI runtime services to virtual mode. + * Essentially, look through the EFI memmap and map every region that + * has the runtime attribute bit set in its memory descriptor and update + * that memory descriptor with the virtual address obtained from ioremap(). + * This enables the runtime services to be called without having to + * thunk back into physical mode for every invocation. + */ +void __init efi_enter_virtual_mode(void) +{ + efi_memory_desc_t *md; + efi_status_t status; + unsigned long end; + void *p; + + efi.systab = NULL; + for (p = memmap.map; p memmap.map_end; p += memmap.desc_size) { + md = p; + if (!(md-attribute EFI_MEMORY_RUNTIME)) + continue; + if ((md-attribute EFI_MEMORY_WB) + (((md-phys_addr + (md-num_pagesEFI_PAGE_SHIFT)) + PAGE_SHIFT) end_pfn_map)) + md-virt_addr = (unsigned long)__va(md-phys_addr); + else + md-virt_addr = (unsigned long) + efi_ioremap(md-phys_addr, + md-num_pages EFI_PAGE_SHIFT); + if (!md-virt_addr) + printk(KERN_ERR ioremap of 0x%llX failed!\n, + (unsigned long long)md-phys_addr); + end = md-phys_addr + (md-num_pages EFI_PAGE_SHIFT); + if ((md-phys_addr = (unsigned long)efi_phys.systab) + ((unsigned long)efi_phys.systab end)) + efi.systab = (efi_system_table_t *)(unsigned long) + (md-virt_addr - md-phys_addr + +(unsigned long)efi_phys.systab); + } + + BUG_ON(!efi.systab); + + status = phys_efi_set_virtual_address_map( + memmap.desc_size * memmap.nr_map, + memmap.desc_size, + memmap.desc_version, + memmap.phys_map); + + if (status != EFI_SUCCESS) { + printk(KERN_ALERT You are screwed! This came over when you copied the original file. This patchset would be a decent opportunity to de-stupid these messages. Frankly. I will do it. And I will recheck all messages. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm] x86_64 EFI runtime service support: EFI basic runtime service support fixes
This patch fixes several issues of x86_64 EFI basic runtime service support patch per comments from Andrew Moton. - Delete efi_lock because it is used during system early boot, before smp is initialized. The global_flush_tlb() is changed to __flush_tlb_all for some reason. - Revise some messages. - Turn on debug by default. - Remove unnecessary memset of static variable. This patch has been tested against 2.6.24-rc3-mm1 kernel on Intel platforms with 64-bit EFI1.10 and UEFI2.0 firmware. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/efi.c| 24 ++-- arch/x86/kernel/efi_64.c | 18 +- 2 files changed, 15 insertions(+), 27 deletions(-) --- a/arch/x86/kernel/efi_64.c +++ b/arch/x86/kernel/efi_64.c @@ -39,8 +39,6 @@ static pgd_t save_pgd __initdata; static unsigned long efi_flags __initdata; -/* efi_lock protects efi physical mode call */ -static __initdata DEFINE_SPINLOCK(efi_lock); static int __init setup_noefi(char *arg) { @@ -86,33 +84,27 @@ static void __init early_runtime_code_ma } } -void __init efi_call_phys_prelog(void) __acquires(efi_lock) +void __init efi_call_phys_prelog(void) { unsigned long vaddress; - /* -* Lock sequence is different from normal case because -* efi_flags is global -*/ - spin_lock(efi_lock); local_irq_save(efi_flags); early_runtime_code_mapping_set_exec(1); vaddress = (unsigned long)__va(0x0UL); pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL)); set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress)); - global_flush_tlb(); + __flush_tlb_all(); } -void __init efi_call_phys_epilog(void) __releases(efi_lock) +void __init efi_call_phys_epilog(void) { /* * After the lock is released, the original page table is restored. */ set_pgd(pgd_offset_k(0x0UL), save_pgd); early_runtime_code_mapping_set_exec(0); - global_flush_tlb(); + __flush_tlb_all(); local_irq_restore(efi_flags); - spin_unlock(efi_lock); } /* @@ -143,7 +135,7 @@ void __init runtime_code_page_mkexec(voi md-num_pages, PAGE_KERNEL_EXEC); } - global_flush_tlb(); + __flush_tlb_all(); } void __iomem * __init efi_ioremap(unsigned long offset, --- a/arch/x86/kernel/efi.c +++ b/arch/x86/kernel/efi.c @@ -41,7 +41,8 @@ #include asm/efi.h #include asm/time.h -#define EFI_DEBUG 0 +#define EFI_DEBUG 1 +#define PFXEFI: int efi_enabled; EXPORT_SYMBOL(efi_enabled); @@ -214,7 +215,7 @@ static void __init print_efi_memmap(void p memmap.map_end; p += memmap.desc_size, i++) { md = p; - printk(KERN_INFO mem%02u: type=%u, attr=0x%llx, + printk(KERN_INFO PFX mem%02u: type=%u, attr=0x%llx, range=[0x%016llx-0x%016llx) (%lluMB)\n, i, md-type, md-attribute, md-phys_addr, md-phys_addr + (md-num_pages EFI_PAGE_SHIFT), @@ -232,9 +233,6 @@ void __init efi_init(void) int i = 0; void *tmp; - memset(efi, 0, sizeof(efi)); - memset(efi_phys, 0, sizeof(efi_phys)); - #ifdef CONFIG_X86_32 efi_phys.systab = (efi_system_table_t *)boot_params.efi_info.efi_systab; memmap.phys_map = (void *)boot_params.efi_info.efi_memmap; @@ -254,7 +252,7 @@ void __init efi_init(void) efi.systab = efi_early_ioremap((unsigned long)efi_phys.systab, sizeof(efi_system_table_t)); if (efi.systab == NULL) - printk(KERN_ERR Woah! Couldn't map the EFI systema table.\n); + printk(KERN_ERR Couldn't map the EFI system table!\n); memcpy(efi_systab, efi.systab, sizeof(efi_system_table_t)); efi_early_iounmap(efi.systab, sizeof(efi_system_table_t)); efi.systab = efi_systab; @@ -263,11 +261,10 @@ void __init efi_init(void) * Verify the EFI Table */ if (efi.systab-hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE) - printk(KERN_ERR Woah! EFI system table - signature incorrect\n); + printk(KERN_ERR EFI system table signature incorrect!\n); if ((efi.systab-hdr.revision 16) == 0) printk(KERN_ERR Warning: EFI system table version - %d.%02d, expected 1.00 or greater\n, + %d.%02d, expected 1.00 or greater!\n, efi.systab-hdr.revision 16, efi.systab-hdr.revision 0x); @@ -280,7 +277,7 @@ void __init efi_init(void) vendor[i] = *c16++; vendor[i] = '\0'; } else - printk(KERN_ERR Could not map the firmware vendor!\n); + printk(KERN_ERR PFX Could
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Mon, 2007-12-10 at 14:55 -0500, Vivek Goyal wrote: On Fri, Dec 07, 2007 at 03:53:30PM +, Huang, Ying wrote: This patch implements the functionality of jumping between the kexeced kernel and the original kernel. Hi, I am just going through your patches and trying to understand it. Don't understand many things. Asking is easy so here you go... To support jumping between two kernels, before jumping to (executing) the new kernel and jumping back to the original kernel, the devices are put into quiescent state, and the state of devices and CPU is saved. After jumping back from kexeced kernel and jumping to the new kernel, the state of devices and CPU are restored accordingly. The devices/CPU state save/restore code of software suspend is called to implement corresponding function. I need jumping back to restore a already hibernated kernel image? Can you please tell little more about jumping back and why it is needed? Now, the jumping back is used to implement kexec based hibernation, which uses kexec/kdump to save the memory image of hibernated kernel during hibernating, and uses /dev/oldmem to restore the memory image of hibernated kernel and jump back to the hibernated kernel to continue run. The other usage model maybe include: - Dump the system memory image then continue to run, that is, get some memory snapshot of system during system running. - Cooperative multi-task of different OS. You can load another OS (B) from current OS (A), and jump between the two OSes upon needed. - Call some code (such as firmware, etc) in physical mode. To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by new (kexeced) kernel (destination page). When do kexec_load, the image of new kernel is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the new (kexeced) kernel and after jumping back to the original kernel, the destination pages and the source pages are swapped too. Ok, so due to swapping of source and destination pages first kernel's data is still preserved. How do I get the dynamic memory required for second kernel boot (without writing first kernel's data)? All dynamic memory required for second kernel should be loaded by sys_kexec_load in first kernel. For example, not only the Linux kernel should be loaded at 1M, the memory 0~16M (exclude kernel) should be loaded (all zero) by /sbin/kexec via sys_kexec_load too. A jump back protocol for kexec is defined and documented. It is an extension to ordinary function calling protocol. So, the facility provided by this patch can be used to call ordinary C function in real mode. A set of flags for sys_kexec_load are added to control which state are saved/restored before/after real mode code executing. For example, you can specify the device state and FPU state are saved/restored before/after real mode code executing. The states (exclude CPU state) save/restore code can be overridden based on the command parameter of kexec jump. Because more states need to be saved/restored by hibernating/resuming. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/i386/jump_back_protocol.txt | 103 ++ arch/powerpc/kernel/machine_kexec.c |2 arch/ppc/kernel/machine_kexec.c |2 arch/sh/kernel/machine_kexec.c|2 arch/x86/kernel/machine_kexec_32.c| 88 +--- arch/x86/kernel/machine_kexec_64.c|2 arch/x86/kernel/relocate_kernel_32.S | 214 +++--- include/asm-x86/kexec_32.h| 39 - include/linux/kexec.h | 40 + kernel/kexec.c| 188 ++ kernel/power/Kconfig |2 kernel/sys.c | 35 +++- 12 files changed, 648 insertions(+), 69 deletions(-) --- a/arch/x86/kernel/machine_kexec_32.c +++ b/arch/x86/kernel/machine_kexec_32.c @@ -20,6 +20,7 @@ #include asm/cpufeature.h #include asm/desc.h #include asm/system.h +#include asm/cacheflush.h #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE))) static u32 kexec_pgd[1024] PAGE_ALIGNED; @@ -83,10 +84,14 @@ static void load_segments(void) * reboot code buffer to allow us to avoid allocations * later. * - * Currently nothing. + * Turn off NX bit for control page. */ int machine_kexec_prepare(struct kimage *image) { + if (nx_enabled) { + change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC); + global_flush_tlb(); + } return 0; } @@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage */ void machine_kexec_cleanup(struct kimage *image
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Mon, 2007-12-10 at 17:31 -0500, Vivek Goyal wrote: [..] -#define KEXEC_ON_CRASH 0x0001 -#define KEXEC_ARCH_MASK 0x +#define KEXEC_ON_CRASH 0x0001 +#define KEXEC_PRESERVE_CPU 0x0002 +#define KEXEC_PRESERVE_CPU_EXT 0x0004 +#define KEXEC_SINGLE_CPU 0x0008 +#define KEXEC_PRESERVE_DEVICE 0x0010 +#define KEXEC_PRESERVE_CONSOLE 0x0020 Hi, Why do we need so many different flags for preserving different types of state (CPU, CPU_EXT, Device, console) ? To keep things simple, can't we can create just one flag KEXEC_PRESERVE_CONTEXT, which will indicate any special action required for preserving the previous kernel's context so that one can swith back to old kernel? Yes. There are too many flags, especially when we have no users of these flags now. It is better to use one flag such as KEXEC_PRESERVE_CONTEXT now, and create the others required flags when really needed. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: [...] /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, +unsigned int argc, va_list args) { Why do we need var arg support? Can't we do that with a shim we load from user space? If all parameters are provided in user space, the usage model may be as follow: - sys_kexec_load() /* with executable/data/parameters(A) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - /* jump back */ - sys_kexec_load() /* with executable/data/parameters(B) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - /* jump back */ That is, the kexec image should be re-loaded if the parameters are different, and there can be no state reserved in kexec image. This is OK for original kexec implementation, because there is no jumping back. But, for kexec with jumping back, another usage model may be useful too. - sys_kexec_load() /* with executable/data loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical mode code with parameters(A)*/ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical mode code with parameters(B)*/ This way the kexec image need not to be re-loaded, and the state of kexec image can be reserved across several invoking. Another usage model may be useful is invoking the kexec image (such as firmware) from kernel space. - kmalloc the needed memory and loaded the firmware image (if needed) - sys_kexec_load() with a fake image (one segment with size 0), the entry point of the fake image is the entry point of the firmware image. - kexec_call(fake_image, ...) /* maybe change entry point if needed */ This way, some kernel code can invoke the firmware in physical mode just like invoking an ordinary function. [...] - /* The segment registers are funny things, they have both a -* visible and an invisible part. Whenever the visible part is -* set to a specific selector, the invisible part is loaded -* with from a table in memory. At no other time is the -* descriptor table in memory accessed. -* -* I take advantage of this here by force loading the -* segments, before I zap the gdt with an invalid value. -*/ - load_segments(); - /* The gdt idt are now invalid. -* If you want to load them you must set up your own idt gdt. -*/ - set_gdt(phys_to_virt(0),0); - set_idt(phys_to_virt(0),0); + if (image-preserve_cpu_ext) { + /* The segment registers are funny things, they have +* both a visible and an invisible part. Whenever the +* visible part is set to a specific selector, the +* invisible part is loaded with from a table in +* memory. At no other time is the descriptor table +* in memory accessed. +* +* I take advantage of this here by force loading the +* segments, before I zap the gdt with an invalid +* value. +*/ + load_segments(); + /* The gdt idt are now invalid. If you want to load +* them you must set up your own idt gdt. +*/ + set_gdt(phys_to_virt(0), 0); + set_idt(phys_to_virt(0), 0); + } We can't keep the same idt and gdt as the pages they are on will be overwritten/reused. So explictily stomping on them sounds better so they never work. We can restore them on kernel reentry. The original idea about this code is: If the kexec image is claimed that it need not to perserving extensive CPU state (such as FPU/MMX/GDT/LDT/IDT/CS/DS/ES/FS/GS/SS etc), the IDT/GDT/CS/DS/ES/FS/GS/SS are not touched in kexec image code. So the segment registers need not to be set. But this is not clear. At least more description should be provided for each preserve flag. /* now call it */ - relocate_kernel((unsigned long)image-head, (unsigned long)page_list, - image-start, cpu_has_pae); + relocate_kernel_ptr((unsigned long)image-head, + (unsigned long)page_list, + image-start, cpu_has_pae); Why rename relocate_kernel? Ah. I see. You need to make it into a pointer again. The crazy don't stop the pgd support strikes again. It used to be named rnk. You mean I should change the function pointer name to rnk to keep consistency? I find rnk in IA64 implementation. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info
[PATCH -mm] x86 boot : Use E820 memory map on EFI 32 platform
Because the EFI memory map are converted to e820 memory map in bootloader, the EFI memory map handling code is removed to clean up. This patch is based on 2.6.24-rc4-mm1 and has been tested on Intel 32-bit platform with EFI 32 and UEFI 32 firmware. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/e820_32.c | 117 +++ arch/x86/kernel/efi_32.c | 150 - arch/x86/kernel/setup_32.c | 16 +--- arch/x86/mm/init_32.c | 18 - include/asm-x86/e820_32.h |2 5 files changed, 16 insertions(+), 287 deletions(-) --- a/arch/x86/kernel/e820_32.c +++ b/arch/x86/kernel/e820_32.c @@ -7,7 +7,6 @@ #include linux/kexec.h #include linux/module.h #include linux/mm.h -#include linux/efi.h #include linux/pfn.h #include linux/uaccess.h #include linux/suspend.h @@ -181,7 +180,7 @@ static void __init probe_roms(void) * Request address space for all standard RAM and ROM resources * and also for regions reported as reserved by the e820. */ -void __init legacy_init_iomem_resources(struct resource *code_resource, +void __init init_iomem_resources(struct resource *code_resource, struct resource *data_resource, struct resource *bss_resource) { @@ -261,19 +260,17 @@ void __init add_memory_region(unsigned l { int x; - if (!efi_enabled) { - x = e820.nr_map; + x = e820.nr_map; - if (x == E820MAX) { - printk(KERN_ERR Ooops! Too many entries in the memory map!\n); - return; - } - - e820.map[x].addr = start; - e820.map[x].size = size; - e820.map[x].type = type; - e820.nr_map++; + if (x == E820MAX) { + printk(KERN_ERR Ooops! Too many entries in the memory map!\n); + return; } + + e820.map[x].addr = start; + e820.map[x].size = size; + e820.map[x].type = type; + e820.nr_map++; } /* add_memory_region */ /* @@ -489,29 +486,6 @@ int __init copy_e820_map(struct e820entr } /* - * Callback for efi_memory_walk. - */ -static int __init -efi_find_max_pfn(unsigned long start, unsigned long end, void *arg) -{ - unsigned long *max_pfn = arg, pfn; - - if (start end) { - pfn = PFN_UP(end -1); - if (pfn *max_pfn) - *max_pfn = pfn; - } - return 0; -} - -static int __init -efi_memory_present_wrapper(unsigned long start, unsigned long end, void *arg) -{ - memory_present(0, PFN_UP(start), PFN_DOWN(end)); - return 0; -} - -/* * Find the highest page frame number we have available */ void __init find_max_pfn(void) @@ -519,11 +493,6 @@ void __init find_max_pfn(void) int i; max_pfn = 0; - if (efi_enabled) { - efi_memmap_walk(efi_find_max_pfn, max_pfn); - efi_memmap_walk(efi_memory_present_wrapper, NULL); - return; - } for (i = 0; i e820.nr_map; i++) { unsigned long start, end; @@ -541,34 +510,12 @@ void __init find_max_pfn(void) } /* - * Free all available memory for boot time allocation. Used - * as a callback function by efi_memory_walk() - */ - -static int __init -free_available_memory(unsigned long start, unsigned long end, void *arg) -{ - /* check max_low_pfn */ - if (start = (max_low_pfn PAGE_SHIFT)) - return 0; - if (end = (max_low_pfn PAGE_SHIFT)) - end = max_low_pfn PAGE_SHIFT; - if (start end) - free_bootmem(start, end - start); - - return 0; -} -/* * Register fully available low RAM pages with the bootmem allocator. */ void __init register_bootmem_low_pages(unsigned long max_low_pfn) { int i; - if (efi_enabled) { - efi_memmap_walk(free_available_memory, NULL); - return; - } for (i = 0; i e820.nr_map; i++) { unsigned long curr_pfn, last_pfn, size; /* @@ -676,56 +623,12 @@ void __init print_memory_map(char *who) } } -static __init __always_inline void efi_limit_regions(unsigned long long size) -{ - unsigned long long current_addr = 0; - efi_memory_desc_t *md, *next_md; - void *p, *p1; - int i, j; - - j = 0; - p1 = memmap.map; - for (p = p1, i = 0; p memmap.map_end; p += memmap.desc_size, i++) { - md = p; - next_md = p1; - current_addr = md-phys_addr + - PFN_PHYS(md-num_pages); - if (is_available_memory(md)) { - if (md-phys_addr = size) continue; - memcpy(next_md, md, memmap.desc_size); - if (current_addr = size) { - next_md-num_pages
[PATCH -mm] x86 boot : export boot_params via sysfs
This patch export the boot parameters via sysfs. This can be used for debugging and kexec. The files added are as follow: /sys/kernel/boot_params/data: binary file for struct boot_params /sys/kernel/boot_params/version : boot protocol version This patch is based on 2.6.24-rc4-mm1 and has been tested on i386 and x86_64 platoform. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/Makefile_32 |1 arch/x86/kernel/Makefile_64 |1 arch/x86/kernel/ksysfs.c| 94 arch/x86/kernel/setup64.c |2 arch/x86/kernel/setup_32.c |2 5 files changed, 98 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/Makefile_64 +++ b/arch/x86/kernel/Makefile_64 @@ -39,6 +39,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp_64.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_AUDIT)+= audit_64.o obj-$(CONFIG_EFI) += efi.o efi_64.o efi_stub_64.o +obj-$(CONFIG_SYSFS)+= ksysfs.o obj-$(CONFIG_MODULES) += module_64.o obj-$(CONFIG_PCI) += early-quirks.o --- a/arch/x86/kernel/setup64.c +++ b/arch/x86/kernel/setup64.c @@ -24,7 +24,7 @@ #include asm/sections.h #include asm/setup.h -struct boot_params __initdata boot_params; +struct boot_params boot_params; cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE; --- /dev/null +++ b/arch/x86/kernel/ksysfs.c @@ -0,0 +1,94 @@ +/* + * arch/i386/ksysfs.c - architecture specific sysfs attributes in /sys/kernel + * + * Copyright (C) 2007, Intel Corp. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2 + */ + +#include linux/kobject.h +#include linux/string.h +#include linux/sysfs.h +#include linux/init.h +#include linux/stat.h +#include linux/mm.h + +#include asm/setup.h + +static ssize_t boot_params_version_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, 0x%04x\n, boot_params.hdr.version); +} + +static struct kobj_attribute boot_params_version_attr = { + .attr = { + .name = version, + .mode = S_IRUGO, + }, + .show = boot_params_version_show, +}; + +static struct attribute *boot_params_attrs[] = { + boot_params_version_attr.attr, + NULL +}; + +static struct attribute_group boot_params_attr_group = { + .attrs = boot_params_attrs, +}; + +static ssize_t boot_params_data_read(struct kobject *kobj, +struct bin_attribute *bin_attr, +char *buf, loff_t off, size_t count) +{ + memcpy(buf, (void *)boot_params + off, count); + return count; +} + +static struct bin_attribute boot_params_data_attr = { + .attr = { + .name = data, + .mode = S_IRUGO, + }, + .read = boot_params_data_read, + .size = sizeof(boot_params), +}; + +static int __init boot_params_ksysfs_init(void) +{ + int error; + struct kobject *boot_params_kobj; + + boot_params_kobj = kobject_create_and_register(boot_params, + kernel_kobj); + if (!boot_params_kobj) { + error = -ENOMEM; + goto err_return; + } + error = sysfs_create_group(boot_params_kobj, + boot_params_attr_group); + if (error) + goto err_boot_params_subsys_unregister; + error = sysfs_create_bin_file(boot_params_kobj, + boot_params_data_attr); + if (error) + goto err_boot_params_subsys_unregister; + return 0; +err_boot_params_subsys_unregister: + kobject_unregister(boot_params_kobj); +err_return: + return error; +} + +static int __init arch_ksysfs_init(void) +{ + int error; + + error = boot_params_ksysfs_init(); + + return error; +} + +arch_initcall(arch_ksysfs_init); --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -44,6 +44,7 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin obj-$(CONFIG_HPET_TIMER) += hpet.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o +obj-$(CONFIG_SYSFS)+= ksysfs.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o obj-$(CONFIG_PARAVIRT) += paravirt_32.o --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -194,7 +194,7 @@ unsigned long saved_videomode; static char __initdata command_line[COMMAND_LINE_SIZE]; -struct boot_params __initdata boot_params; +struct boot_params boot_params; #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE) struct edd edd; -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org
Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump
On Tue, 2007-12-11 at 02:27 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote: Huang, Ying [EMAIL PROTECTED] writes: [...] /* * Do not allocate memory (or fail in any way) in machine_kexec(). * We are past the point of no return, committed to rebooting now. */ -NORET_TYPE void machine_kexec(struct kimage *image) +int machine_kexec_vcall(struct kimage *image, unsigned long *ret, + unsigned int argc, va_list args) { Why do we need var arg support? Can't we do that with a shim we load from user space? If all parameters are provided in user space, the usage model may be as follow: - sys_kexec_load() /* with executable/data/parameters(A) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - /* jump back */ - sys_kexec_load() /* with executable/data/parameters(B) loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - /* jump back */ That is, the kexec image should be re-loaded if the parameters are different, and there can be no state reserved in kexec image. This is OK for original kexec implementation, because there is no jumping back. But, for kexec with jumping back, another usage model may be useful too. - sys_kexec_load() /* with executable/data loaded */ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical mode code with parameters(A)*/ - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical mode code with parameters(B)*/ This way the kexec image need not to be re-loaded, and the state of kexec image can be reserved across several invoking. Interesting. We wind up preserving the code in between invocations. I don't know about your particular issue, but I can see that clearly we need a way to read values back from our target image. And if we can read everything back one way to proceed is to read everything out modify it and then write it back. Amending a kexec image that is already stored may also make sense. I'm not convinced that the var arg parameters make sense, but you added them because of a real need. The kexec function is split into two separate calls so that we can unmount the filesystem the kexec image comes from before actually doing the kexec. Yes. Reading/Modifying the loaded kexec image is another way to do necessary communication between the first kernel and the second kernel. In fact, the patch [4/4] of this series with title: [PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore provide a ELF CORE file in /proc (/proc/kimgcore) to read the loaded kexec image. The writing function can be added easily. But I think communication between the first kernel and the second kernel via reading/modifying the loaded kernel image is not very convenient way. The usage mode may be as follow: - sys_kexec_load() /* with executable/data loaded */ - modify the loaded kexec image to set the parameters (A) - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(A)*/ - In physical mode code, check the parameters A and executing accordingly - modify the loaded kexec image to set the parameters (B) - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with parameters(B)*/ - In physical mode code, check the parameters B and executing accordingly There are some issues with this usage model: - Some parameters in kernel needed to be exported (such as the kimage-head to let the second kernel to read the memory contents of backupped memory). - The physical mode code invoker (the first kernel) need to know where to write the parameters. A common protocol or a protocol case by case should be defined. For example, the memory address after the entry point of kexec image is a good candidate. But for Linux kernel, there are two types of entry point, the jump back entry or purgatory. Maybe different protocol should be defined for these two types of entry point. - For the user space of the second kernel to get the parameters. A interface (maybe a file in /proc or /sys) should be provided to export the parameters to user space. So I think the current parameters passing mechanism may be more simple and convenient (defined in Document/i386/jump_back_protocol.txt in the patch). There is only one user of var args. But I think it is simple to be implemented and may be used by others. If extensive user space shutdown or startup is needed I will argue that doing the work in the sys_reboot call is the wrong place to do it. Although if a jump back is happening we should not need much restart. Now, the user space is not shut down or started up across kexec/jump back, just the sys_reboot call is used to trigger the kexec/jump back. Maybe sys_reboot is not the right place to do this. Can you recommended a more
[PATCH -mm -v2] x86 boot : export boot_params via sysfs
This patch export the boot parameters via sysfs. This can be used for debugging and kexec. The files added are as follow: /sys/kernel/boot_params/data: binary file for struct boot_params /sys/kernel/boot_params/version : boot protocol version This patch is based on 2.6.24-rc4-mm1 and has been tested on i386 and x86_64 platoform. This patch is based on the Peter Anvin's proposal. v2: - Add document in Document/ABI. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/ABI/testing/sysfs-kernel-boot_params | 14 +++ arch/x86/kernel/Makefile_32|1 arch/x86/kernel/Makefile_64|1 arch/x86/kernel/ksysfs.c | 89 + arch/x86/kernel/setup64.c |2 arch/x86/kernel/setup_32.c |2 6 files changed, 107 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/Makefile_64 +++ b/arch/x86/kernel/Makefile_64 @@ -39,6 +39,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp_64.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_AUDIT)+= audit_64.o obj-$(CONFIG_EFI) += efi.o efi_64.o efi_stub_64.o +obj-$(CONFIG_SYSFS)+= ksysfs.o obj-$(CONFIG_MODULES) += module_64.o obj-$(CONFIG_PCI) += early-quirks.o --- a/arch/x86/kernel/setup64.c +++ b/arch/x86/kernel/setup64.c @@ -24,7 +24,7 @@ #include asm/sections.h #include asm/setup.h -struct boot_params __initdata boot_params; +struct boot_params boot_params; cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE; --- /dev/null +++ b/arch/x86/kernel/ksysfs.c @@ -0,0 +1,89 @@ +/* + * Architecture specific sysfs attributes in /sys/kernel + * + * Copyright (C) 2007, Intel Corp. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2 + */ + +#include linux/kobject.h +#include linux/string.h +#include linux/sysfs.h +#include linux/init.h +#include linux/stat.h +#include linux/mm.h + +#include asm/setup.h + +static ssize_t boot_params_version_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, 0x%04x\n, boot_params.hdr.version); +} + +static struct kobj_attribute boot_params_version_attr = + __ATTR(version, S_IRUGO, boot_params_version_show, NULL); + +static struct attribute *boot_params_attrs[] = { + boot_params_version_attr.attr, + NULL +}; + +static struct attribute_group boot_params_attr_group = { + .attrs = boot_params_attrs, +}; + +static ssize_t boot_params_data_read(struct kobject *kobj, +struct bin_attribute *bin_attr, +char *buf, loff_t off, size_t count) +{ + memcpy(buf, (void *)boot_params + off, count); + return count; +} + +static struct bin_attribute boot_params_data_attr = { + .attr = { + .name = data, + .mode = S_IRUGO, + }, + .read = boot_params_data_read, + .size = sizeof(boot_params), +}; + +static int __init boot_params_ksysfs_init(void) +{ + int error; + struct kobject *boot_params_kobj; + + boot_params_kobj = kobject_create_and_register(boot_params, + kernel_kobj); + if (!boot_params_kobj) { + error = -ENOMEM; + goto err_return; + } + error = sysfs_create_group(boot_params_kobj, + boot_params_attr_group); + if (error) + goto err_boot_params_subsys_unregister; + error = sysfs_create_bin_file(boot_params_kobj, + boot_params_data_attr); + if (error) + goto err_boot_params_subsys_unregister; + return 0; +err_boot_params_subsys_unregister: + kobject_unregister(boot_params_kobj); +err_return: + return error; +} + +static int __init arch_ksysfs_init(void) +{ + int error; + + error = boot_params_ksysfs_init(); + + return error; +} + +arch_initcall(arch_ksysfs_init); --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -44,6 +44,7 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin obj-$(CONFIG_HPET_TIMER) += hpet.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o +obj-$(CONFIG_SYSFS)+= ksysfs.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o obj-$(CONFIG_PARAVIRT) += paravirt_32.o --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -194,7 +194,7 @@ unsigned long saved_videomode; static char __initdata command_line[COMMAND_LINE_SIZE]; -struct boot_params __initdata boot_params; +struct boot_params boot_params; #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE) struct edd edd; --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-boot_params @@ -0,0
[PATCH -mm] i386 EFI runtime service support : fixes in sync with x86_64 support
This patch fixes several issues of i386 EFI basic runtime service support according to fixes of x86_64 support. - Delete efi_rt_lock because it is used during system early boot, before SMP is initialized. - Change local_flush_tlb() to __flush_tlb_all() to flush global page mapping. - Clean up includes. - Revise Kconfig description. - Enable noefi kernel parameter on i386. This patch has been tested against 2.6.24-rc5-mm1 kernel on Intel platforms with 32-bit EFI1.10 and UEFI2.0 firmware. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/kernel-parameters.txt |2 ++ Documentation/x86_64/boot-options.txt |4 arch/x86/Kconfig | 19 --- arch/x86/kernel/efi.c |7 +++ arch/x86/kernel/efi_32.c | 25 + arch/x86/kernel/efi_64.c |7 --- arch/x86/kernel/setup_32.c|6 +++--- 7 files changed, 25 insertions(+), 45 deletions(-) --- a/arch/x86/kernel/efi_32.c +++ b/arch/x86/kernel/efi_32.c @@ -20,27 +20,15 @@ */ #include linux/kernel.h -#include linux/init.h -#include linux/mm.h #include linux/types.h -#include linux/time.h -#include linux/spinlock.h -#include linux/bootmem.h #include linux/ioport.h -#include linux/module.h #include linux/efi.h -#include linux/kexec.h -#include asm/setup.h #include asm/io.h #include asm/page.h #include asm/pgtable.h -#include asm/processor.h -#include asm/desc.h #include asm/tlbflush.h -#define PFXEFI: - /* * To make EFI call EFI runtime service in physical addressing mode we need * prelog/epilog before/after the invocation to disable interrupt, to @@ -49,16 +37,14 @@ */ static unsigned long efi_rt_eflags; -static DEFINE_SPINLOCK(efi_rt_lock); static pgd_t efi_bak_pg_dir_pointer[2]; -void efi_call_phys_prelog(void) __acquires(efi_rt_lock) +void efi_call_phys_prelog(void) { unsigned long cr4; unsigned long temp; struct Xgt_desc_struct gdt_descr; - spin_lock(efi_rt_lock); local_irq_save(efi_rt_eflags); /* @@ -88,14 +74,14 @@ void efi_call_phys_prelog(void) __acquir /* * After the lock is released, the original page table is restored. */ - local_flush_tlb(); + __flush_tlb_all(); gdt_descr.address = __pa(get_cpu_gdt_table(0)); gdt_descr.size = GDT_SIZE - 1; load_gdt(gdt_descr); } -void efi_call_phys_epilog(void) __releases(efi_rt_lock) +void efi_call_phys_epilog(void) { unsigned long cr4; struct Xgt_desc_struct gdt_descr; @@ -119,10 +105,9 @@ void efi_call_phys_epilog(void) __releas /* * After the lock is released, the original page table is restored. */ - local_flush_tlb(); + __flush_tlb_all(); local_irq_restore(efi_rt_eflags); - spin_unlock(efi_rt_lock); } /* @@ -135,7 +120,7 @@ void __init efi_map_memmap(void) memmap.map = bt_ioremap((unsigned long) memmap.phys_map, (memmap.nr_map * memmap.desc_size)); if (memmap.map == NULL) - printk(KERN_ERR PFX Could not remap the EFI memmap!\n); + printk(KERN_ERR Could not remap the EFI memmap!\n); memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size); } --- a/arch/x86/kernel/efi.c +++ b/arch/x86/kernel/efi.c @@ -55,6 +55,13 @@ struct efi_memory_map memmap; struct efi efi_phys __initdata; static efi_system_table_t efi_systab __initdata; +static int __init setup_noefi(char *arg) +{ + efi_enabled = 0; + return 0; +} +early_param(noefi, setup_noefi); + static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) { return efi_call_virt2(get_time, tm, tc); --- a/arch/x86/kernel/efi_64.c +++ b/arch/x86/kernel/efi_64.c @@ -40,13 +40,6 @@ static pgd_t save_pgd __initdata; static unsigned long efi_flags __initdata; -static int __init setup_noefi(char *arg) -{ - efi_enabled = 0; - return 0; -} -early_param(noefi, setup_noefi); - static void __init early_mapping_set_exec(unsigned long start, unsigned long end, int executable) --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -651,9 +651,6 @@ void __init setup_arch(char **cmdline_p) printk(KERN_INFO BIOS-provided physical RAM map:\n); print_memory_map(memory_setup()); - if (efi_enabled) - efi_init(); - copy_edd(); if (!boot_params.hdr.root_flags) @@ -680,6 +677,9 @@ void __init setup_arch(char **cmdline_p) strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE); *cmdline_p = command_line; + if (efi_enabled) + efi_init(); + max_low_pfn = setup_memory(); #ifdef CONFIG_VMI --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -1001,21 +1001,18 @@ config MTRR
[PATCH -mm -v3] x86 boot : export boot_params via sysfs
This patch export the boot parameters via sysfs. This can be used for debugging and kexec. The files added are as follow: /sys/kernel/boot_params/data: binary file for struct boot_params /sys/kernel/boot_params/version : boot protocol version This patch is based on 2.6.24-rc5-mm1 and has been tested on i386 and x86_64 platoform. This patch is based on the Peter Anvin's proposal. v3: - Use updated API: kobject_create_and_add. v2: - Add document in Document/ABI. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- Documentation/ABI/testing/sysfs-kernel-boot_params | 14 +++ arch/x86/kernel/Makefile_32|1 arch/x86/kernel/Makefile_64|1 arch/x86/kernel/ksysfs.c | 88 + arch/x86/kernel/setup64.c |2 arch/x86/kernel/setup_32.c |2 6 files changed, 106 insertions(+), 2 deletions(-) --- a/arch/x86/kernel/Makefile_64 +++ b/arch/x86/kernel/Makefile_64 @@ -40,6 +40,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp_64.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_AUDIT)+= audit_64.o obj-$(CONFIG_EFI) += efi.o efi_64.o efi_stub_64.o +obj-$(CONFIG_SYSFS)+= ksysfs.o obj-$(CONFIG_MODULES) += module_64.o obj-$(CONFIG_PCI) += early-quirks.o --- a/arch/x86/kernel/setup64.c +++ b/arch/x86/kernel/setup64.c @@ -24,7 +24,7 @@ #include asm/sections.h #include asm/setup.h -struct boot_params __initdata boot_params; +struct boot_params boot_params; cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE; --- /dev/null +++ b/arch/x86/kernel/ksysfs.c @@ -0,0 +1,88 @@ +/* + * Architecture specific sysfs attributes in /sys/kernel + * + * Copyright (C) 2007, Intel Corp. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2 + */ + +#include linux/kobject.h +#include linux/string.h +#include linux/sysfs.h +#include linux/init.h +#include linux/stat.h +#include linux/mm.h + +#include asm/setup.h + +static ssize_t boot_params_version_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sprintf(buf, 0x%04x\n, boot_params.hdr.version); +} + +static struct kobj_attribute boot_params_version_attr = + __ATTR(version, S_IRUGO, boot_params_version_show, NULL); + +static struct attribute *boot_params_attrs[] = { + boot_params_version_attr.attr, + NULL +}; + +static struct attribute_group boot_params_attr_group = { + .attrs = boot_params_attrs, +}; + +static ssize_t boot_params_data_read(struct kobject *kobj, +struct bin_attribute *bin_attr, +char *buf, loff_t off, size_t count) +{ + memcpy(buf, (void *)boot_params + off, count); + return count; +} + +static struct bin_attribute boot_params_data_attr = { + .attr = { + .name = data, + .mode = S_IRUGO, + }, + .read = boot_params_data_read, + .size = sizeof(boot_params), +}; + +static int __init boot_params_ksysfs_init(void) +{ + int error; + struct kobject *boot_params_kobj; + + boot_params_kobj = kobject_create_and_add(boot_params, kernel_kobj); + if (!boot_params_kobj) { + error = -ENOMEM; + goto err_return; + } + error = sysfs_create_group(boot_params_kobj, + boot_params_attr_group); + if (error) + goto err_boot_params_subsys_unregister; + error = sysfs_create_bin_file(boot_params_kobj, + boot_params_data_attr); + if (error) + goto err_boot_params_subsys_unregister; + return 0; +err_boot_params_subsys_unregister: + kobject_unregister(boot_params_kobj); +err_return: + return error; +} + +static int __init arch_ksysfs_init(void) +{ + int error; + + error = boot_params_ksysfs_init(); + + return error; +} + +arch_initcall(arch_ksysfs_init); --- a/arch/x86/kernel/Makefile_32 +++ b/arch/x86/kernel/Makefile_32 @@ -45,6 +45,7 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin obj-$(CONFIG_HPET_TIMER) += hpet.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o +obj-$(CONFIG_SYSFS)+= ksysfs.o obj-$(CONFIG_VMI) += vmi_32.o vmiclock_32.o obj-$(CONFIG_PARAVIRT) += paravirt_32.o --- a/arch/x86/kernel/setup_32.c +++ b/arch/x86/kernel/setup_32.c @@ -194,7 +194,7 @@ unsigned long saved_videomode; static char __initdata command_line[COMMAND_LINE_SIZE]; -struct boot_params __initdata boot_params; +struct boot_params boot_params; #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE) struct edd edd; --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-boot_params @@ -0,0 +1,14
Re: [PATCH 0/3 -mm] kexec jump -v8
On Wed, 2007-12-26 at 20:57 -0500, Vivek Goyal wrote: [...] 9. Now, you are in the original kernel again. You can read/write the memory image of kexeced kernel via /proc/kimgcore. Why do we need two interfaces, /proc/vmcore and /proc/kimgcore? Can't we have just one say /proc/vmcore. Irrespective of what kernel you are in /proc/vmcore gives you the access to the memory of kernel which was previously booted. In theory we can kexec another kernel even in a kexeced kernel, that is, in kernel A kexec kernel B, and in kernel B kexec another kernel C. In this situation, both /proc/vmcore and /proc/kimgcore has valid contents. So I think, it may be better to keep two interfaces. In fact, current kexec jump implementation use a dummy jump back helper image in kexeced kernel to jump back to the original kernel. The jump back helper image has no PT_LOAD segment, it is used to provide a struct kimage (including control page, swap page) and entry point to jump back. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3 -mm] kexec jump -v8
On Thu, 2007-12-27 at 13:12 -0500, Vivek Goyal wrote: On Thu, Dec 27, 2007 at 10:33:13AM +0800, Huang, Ying wrote: On Wed, 2007-12-26 at 20:57 -0500, Vivek Goyal wrote: [...] 9. Now, you are in the original kernel again. You can read/write the memory image of kexeced kernel via /proc/kimgcore. Why do we need two interfaces, /proc/vmcore and /proc/kimgcore? Can't we have just one say /proc/vmcore. Irrespective of what kernel you are in /proc/vmcore gives you the access to the memory of kernel which was previously booted. In theory we can kexec another kernel even in a kexeced kernel, that is, in kernel A kexec kernel B, and in kernel B kexec another kernel C. In this situation, both /proc/vmcore and /proc/kimgcore has valid contents. So I think, it may be better to keep two interfaces. In those situations I think only one interface is better. For example, above will be broken if somebody kexec 4 kernels. A--B---C---D I don't think the two interfaces will be broken if somebody kexec 4 kernels. For example, when kexec D from C, the /proc/vmcore is contents of B, /proc/kimgcore is contents of D. To jump back from C to B, the D is unloaded, and a jump back helper image is loaded. I think better option might be if it is stack like situation. A kernel shows you only the previous kernel's memory contents through /proc/vmcore interface. So If I am in kernel D, I see only kernel C's memory image. To see kernel B's memory image, one shall have to go back to kernel C. Maybe it is not sufficient to only show the previous kernel's memory contents. In kernel C, you maybe need to access the memory image of kernel B and memory image of kernel D. That is, /proc/vmcore is the memory image of the previous kernel, and /proc/kimgcore is the memory image of the next kernel. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3 -mm] kexec jump -v8
On Fri, 2007-12-28 at 16:33 -0500, Vivek Goyal wrote: On Fri, Dec 21, 2007 at 03:33:19PM +0800, Huang, Ying wrote: This patchset provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used both by the original kernel and the kexeced kernel. - Jumping between the original kernel and the kexeced kernel. - Read/write memory image of the kexeced kernel in the original kernel and write memory image of the original kernel in the kexeced kernel. This can be used as a communication method between the kexeced kernel and the original kernel. The features of this patchset can be used as follow: - Kernel/system debug through making system snapshot. You can make system snapshot, jump back, do some thing and make another system snapshot. How do you differentiate between whether a core is resumable or not. IOW, how do you know the generated /proc/vmcore has been generated after a real crash hence can't be resumed (using krestore) or it has been generated because of hibernation/debug purposes and can be resumed? I think you might have to add an extra ELF NOTE to vmcore which can help decide whether kernel memory snapshot is resumable or not. The current solution is as follow: 1. The original kernel will set %edi to jump back entry if resumable and set %edi to 0 if not. 2. The purgatory of loaded kernel will check %edi, if it is not zero, the string jump_back_entry=jump_back_entry will be appended to kernel command line parameter. 3. In kexeced kernel, if there is jump_back_entry=jump_back_entry in /proc/cmdline, the previous kernel is resumable, otherwise not. As for ELF NOTE, in fact, the ELF NOTE does not work for resumable kernel. Because the contents of source page and destination page is swapped during kexec, and the kernel access the destination page directly during parsing ELF NOTE. All memory that is swapped need to be accessed via the backup pages map (image-head). I think these information can be exchanged between two kernels via kernel command line or /proc/kimgcore. [..] 2. Build an initramfs image contains kexec-tool, or download the pre-built initramfs image, called rootfs.gz in following text. 3. Boot kernel compiled in step 1. 4. Load kernel compiled in step 1 with /sbin/kexec. If You want to use krestore tool, the --elf64-core-headers should be specified in command line of /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-jump-back /boot/bzImage --mem-min=0x10 --mem-max=0xff --elf64-core-headers --initrd=rootfs.gz How about a different name like --load-preserve-context. This will just mean that kexec need to preserve the context while kexeing to image being loaded. Combination of --load-jump-back and --load-jump-back-helper is becoming little confusing. Yes, this is better. I will change it. 5. Boot the kexeced kernel with following shell command line: /sbin/kexec -e 6. The kexeced kernel will boot as normal kexec. In kexeced kernel the memory image of original kernel can read via /proc/vmcore or /dev/oldmem, and can be written via /dev/oldmem. You can save/restore/modify it as you want to. Restoring a hibernated image using /dev/oldmem should be easy and I think one should be able to launch it back using --load-jump-back-helper. Yes. I think so too. The current implementation of krestore restoring the hibernated image using /dev/oldmem. And the hibernated image can be launched using --load-jump-back-helper. How do you restore already kexeced kernel? For example if I got two kernels A and B. A is the one which will hibernate and B will be used to store the hibernated kernel. I think as per the procedure one needs to first boot into kernel B and then jump back to kernel A. This will make image of B available in /proc/kimgcore. If I save /proc/kimgcore to disk and want to jump back to it, how do I do it? I guess I need to kexec again using --load-jump-back and not restore using krestore? The image of B is made as you said. And it can be restored as follow: /sbin/kexec -l --args-none --flags=0x2 kimgecore /sbin/kexec -e That is, the image of B is loaded as a ordinary ELF file. A option to /sbin/kexec named --flags are added to specify the KEXEC_PRESERVE_CONTEXT flags for sys_kexec_load. This has been tested. 7. Prepare jumping back from kexeced kernel with following shell command lines: jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep kexec_jump_back_entry | cut -d '='` /sbin/kexec --load-jump-back-helper=$jump_back_entry How about decoupling entry point from --load-jump-back-helper. We can introduce a separate option for entry point. Something like. kexec --load-jump-back-helper --entry=$jump_back_entry May be we can generalize the --entry so that a user can override the entry point of the normal
[PATCH -mm] EFI : Split EFI tables parsing code from EFI runtime service support code
This patch split EFI tables parsing code from EFI runtime service support code. This makes ACPI support and DMI support on EFI platform not depend on EFI runtime service support. Both EFI32 and EFI64 tables parsing functions are provided on i386 and x86_64. This makes it possible to use EFI information in i386 kernel on x86_64 with EFI64 firmware or in x86_64 kernel on x86_64 with EFI32 firmware. This patch is based on 2.6.24-rc5-mm1 and has been tested for following combinations: i386 kernel on EFI 32 i386 kernel on EFI 64 x86_64 kernel on EFI 32 x86_64 kernel on EFI 64 ia64 kernel on EFI 64 Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/ia64/kernel/acpi.c |6 - arch/ia64/kernel/efi.c | 30 arch/ia64/kernel/setup.c |2 arch/ia64/sn/kernel/setup.c |4 - arch/x86/Kconfig |4 - arch/x86/kernel/Makefile_32 |3 arch/x86/kernel/Makefile_64 |2 arch/x86/kernel/efi.c| 111 +++-- arch/x86/kernel/efi_tables.c | 144 +++ arch/x86/kernel/setup_32.c |9 ++ arch/x86/kernel/setup_64.c |9 ++ drivers/acpi/osl.c | 11 +-- drivers/firmware/dmi_scan.c |7 +- drivers/firmware/efivars.c | 53 --- drivers/firmware/pcdp.c |6 - include/asm-ia64/setup.h |5 + include/asm-ia64/sn/sn_sal.h |2 include/asm-x86/efi.h|7 ++ include/asm-x86/setup.h |9 ++ include/linux/efi.h | 64 --- 20 files changed, 331 insertions(+), 157 deletions(-) --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -212,6 +212,16 @@ typedef struct { unsigned long table; } efi_config_table_t; +struct efi_config_table64 { + efi_guid_t guid; + u64 table; +}; + +struct efi_config_table32 { + efi_guid_t guid; + u32 table; +}; + #define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL) typedef struct { @@ -230,6 +240,39 @@ typedef struct { unsigned long tables; } efi_system_table_t; +struct efi_system_table64 { + efi_table_hdr_t hdr; + u64 fw_vendor; + u32 fw_revision; + u32 _pad1; + u64 con_in_handle; + u64 con_in; + u64 con_out_handle; + u64 con_out; + u64 stderr_handle; + u64 stderr; + u64 runtime; + u64 boottime; + u64 nr_tables; + u64 tables; +}; + +struct efi_system_table32 { + efi_table_hdr_t hdr; + u32 fw_vendor; + u32 fw_revision; + u32 con_in_handle; + u32 con_in; + u32 con_out_handle; + u32 con_out; + u32 stderr_handle; + u32 stderr; + u32 runtime; + u32 boottime; + u32 nr_tables; + u32 tables; +}; + struct efi_memory_map { void *phys_map; void *map; @@ -246,14 +289,6 @@ struct efi_memory_map { */ extern struct efi { efi_system_table_t *systab; /* EFI system table */ - unsigned long mps; /* MPS table */ - unsigned long acpi; /* ACPI table (IA64 ext 0.71) */ - unsigned long acpi20; /* ACPI table (ACPI 2.0) */ - unsigned long smbios; /* SM BIOS table */ - unsigned long sal_systab; /* SAL system table */ - unsigned long boot_info;/* boot info table */ - unsigned long hcdp; /* HCDP table */ - unsigned long uga; /* UGA table */ efi_get_time_t *get_time; efi_set_time_t *set_time; efi_get_wakeup_time_t *get_wakeup_time; @@ -266,6 +301,19 @@ extern struct efi { efi_set_virtual_address_map_t *set_virtual_address_map; } efi; +struct efi_tables { + unsigned long mps; /* MPS table */ + unsigned long acpi; /* ACPI table (IA64 ext 0.71) */ + unsigned long acpi20; /* ACPI table (ACPI 2.0) */ + unsigned long smbios; /* SM BIOS table */ + unsigned long sal_systab; /* SAL system table */ + unsigned long boot_info;/* boot info table */ + unsigned long hcdp; /* HCDP table */ + unsigned long uga; /* UGA table */ +}; + +extern struct efi_tables efi_tables; + static inline int efi_guidcmp (efi_guid_t left, efi_guid_t right) { --- /dev/null +++ b/arch/x86/kernel/efi_tables.c @@ -0,0 +1,144 @@ +/* + * EFI tables parsing functions + * + * Copyright (C) 2007 Intel Co. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2. + */ + +#include linux/kernel.h +#include linux/init.h +#include linux/efi.h +#include linux/io.h + +#include asm/setup.h +#include asm/efi.h + +struct efi_tables efi_tables; +EXPORT_SYMBOL(efi_tables); + +#define EFI_TABLE_PARSE(bt)\ +static void __init efi_tables_parse ## bt(void
Re: [PATCH -mm] EFI : Split EFI tables parsing code from EFI runtime service support code
On Sun, 2007-12-30 at 15:28 +0100, Ingo Molnar wrote: * Huang, Ying [EMAIL PROTECTED] wrote: +struct efi_tables efi_tables; +EXPORT_SYMBOL(efi_tables); +enum bios_type bios_type = BIOS_LEGACY; +EXPORT_SYMBOL(bios_type); please make all the new exports EXPORT_SYMBOL_GPL(). OK, I will change it. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3 -mm] kexec jump -v8
be added to setup page. 4. Before one kernel jump to another kernel, the parameters are prepared by current kernel. 5. One kernel can check the parameters of another kernel by reading /proc/vmcore or /proc/kimgcore. 6. When memory image is saved in file. The parameters of hibernated kernel can be check by reading memory location jump_back_entry + 0x800. You can check the details of this mechanism in my previous patch with title: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump The main issue of this mechanism is that: it is a kernel-to-kernel communication mechanism, while Eric Biederman thinks we should use only user-to-user communication mechanism. And he is not persuaded now. Because kernel operations such as re-initialize/re-construct the /proc/vmcore, etc are needed for kexec jump or resuming. I think a kernel-to-kernel mechanism may be needed. But I don't know if Eric Biederman will agree with this. Best Regards, Huang Ying -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH -mm -v2] EFI : Split EFI tables parsing code from EFI runtime service support code
This patch split EFI tables parsing code from EFI runtime service support code. This makes ACPI support and DMI support on EFI platform not depend on EFI runtime service support. Both EFI32 and EFI64 tables parsing functions are provided on i386 and x86_64. This makes it possible to use EFI information in i386 kernel on x86_64 with EFI64 firmware or in x86_64 kernel on x86_64 with EFI32 firmware. This patch is based on 2.6.24-rc5-mm1 and has been tested for following combinations: i386 kernel on EFI 32 i386 kernel on EFI 64 x86_64 kernel on EFI 32 x86_64 kernel on EFI 64 ia64 kernel on EFI 64 ChangeLog v2: - Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL. Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/ia64/kernel/acpi.c |6 - arch/ia64/kernel/efi.c | 30 arch/ia64/kernel/setup.c |2 arch/ia64/sn/kernel/setup.c |4 - arch/x86/Kconfig |4 - arch/x86/kernel/Makefile_32 |3 arch/x86/kernel/Makefile_64 |2 arch/x86/kernel/efi.c| 115 -- arch/x86/kernel/efi_tables.c | 144 +++ arch/x86/kernel/setup_32.c |9 ++ arch/x86/kernel/setup_64.c |9 ++ drivers/acpi/osl.c | 11 +-- drivers/firmware/dmi_scan.c |7 +- drivers/firmware/efivars.c | 53 --- drivers/firmware/pcdp.c |6 - include/asm-ia64/setup.h |5 + include/asm-ia64/sn/sn_sal.h |2 include/asm-x86/efi.h|7 ++ include/asm-x86/setup.h |9 ++ include/linux/efi.h | 64 --- 20 files changed, 333 insertions(+), 159 deletions(-) --- a/include/linux/efi.h +++ b/include/linux/efi.h @@ -212,6 +212,16 @@ typedef struct { unsigned long table; } efi_config_table_t; +struct efi_config_table64 { + efi_guid_t guid; + u64 table; +}; + +struct efi_config_table32 { + efi_guid_t guid; + u32 table; +}; + #define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL) typedef struct { @@ -230,6 +240,39 @@ typedef struct { unsigned long tables; } efi_system_table_t; +struct efi_system_table64 { + efi_table_hdr_t hdr; + u64 fw_vendor; + u32 fw_revision; + u32 _pad1; + u64 con_in_handle; + u64 con_in; + u64 con_out_handle; + u64 con_out; + u64 stderr_handle; + u64 stderr; + u64 runtime; + u64 boottime; + u64 nr_tables; + u64 tables; +}; + +struct efi_system_table32 { + efi_table_hdr_t hdr; + u32 fw_vendor; + u32 fw_revision; + u32 con_in_handle; + u32 con_in; + u32 con_out_handle; + u32 con_out; + u32 stderr_handle; + u32 stderr; + u32 runtime; + u32 boottime; + u32 nr_tables; + u32 tables; +}; + struct efi_memory_map { void *phys_map; void *map; @@ -246,14 +289,6 @@ struct efi_memory_map { */ extern struct efi { efi_system_table_t *systab; /* EFI system table */ - unsigned long mps; /* MPS table */ - unsigned long acpi; /* ACPI table (IA64 ext 0.71) */ - unsigned long acpi20; /* ACPI table (ACPI 2.0) */ - unsigned long smbios; /* SM BIOS table */ - unsigned long sal_systab; /* SAL system table */ - unsigned long boot_info;/* boot info table */ - unsigned long hcdp; /* HCDP table */ - unsigned long uga; /* UGA table */ efi_get_time_t *get_time; efi_set_time_t *set_time; efi_get_wakeup_time_t *get_wakeup_time; @@ -266,6 +301,19 @@ extern struct efi { efi_set_virtual_address_map_t *set_virtual_address_map; } efi; +struct efi_tables { + unsigned long mps; /* MPS table */ + unsigned long acpi; /* ACPI table (IA64 ext 0.71) */ + unsigned long acpi20; /* ACPI table (ACPI 2.0) */ + unsigned long smbios; /* SM BIOS table */ + unsigned long sal_systab; /* SAL system table */ + unsigned long boot_info;/* boot info table */ + unsigned long hcdp; /* HCDP table */ + unsigned long uga; /* UGA table */ +}; + +extern struct efi_tables efi_tables; + static inline int efi_guidcmp (efi_guid_t left, efi_guid_t right) { --- /dev/null +++ b/arch/x86/kernel/efi_tables.c @@ -0,0 +1,144 @@ +/* + * EFI tables parsing functions + * + * Copyright (C) 2007 Intel Co. + * Huang Ying [EMAIL PROTECTED] + * + * This file is released under the GPLv2. + */ + +#include linux/kernel.h +#include linux/init.h +#include linux/efi.h +#include linux/io.h + +#include asm/setup.h +#include asm/efi.h + +struct efi_tables efi_tables; +EXPORT_SYMBOL_GPL(efi_tables); + +#define EFI_TABLE_PARSE(bt)\ +static void __init efi_tables_parse ## bt(void
[PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
This patch adds basic runtime services support for EFI x86_64 system. The main file of the patch is the addition of efi.c for x86_64. This file is modeled after the EFI IA32 avatar. EFI runtime services initialization are implemented in efi.c. Some x86_64 specifics are worth noting here. On x86_64, parameters passed to UEFI firmware services need to follow the UEFI calling convention. For this purpose, a set of functions named lin2winx (x is the number of parameters) are implemented. EFI function calls are wrapped before calling the firmware service. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/Makefile_64 |1 arch/x86/kernel/efi_64.c | 593 ++ arch/x86/kernel/efi_callwrap_64.S | 69 arch/x86/kernel/setup_64.c| 15 arch/x86_64/Kconfig | 11 include/asm-x86/bootparam.h |5 include/asm-x86/efi_64.h |8 include/asm-x86/eficallwrap_64.h | 33 ++ include/asm-x86/fixmap_64.h |3 9 files changed, 735 insertions(+), 3 deletions(-) Index: linux-2.6.24-rc1/include/asm-x86/eficallwrap_64.h === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc1/include/asm-x86/eficallwrap_64.h 2007-10-25 13:58:18.0 +0800 @@ -0,0 +1,33 @@ +/* + * Copyright (C) 2007 Intel Corp + * Bibo Mao [EMAIL PROTECTED] + * Huang Ying [EMAIL PROTECTED] + * + * Function calling ABI conversion from SYSV to Windows for x86_64 + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + */ + +#ifndef __ASM_X86_64_EFICALLWRAP_H +#define __ASM_X86_64_EFICALLWRAP_H + +extern efi_status_t lin2win0(void *fp); +extern efi_status_t lin2win1(void *fp, u64 arg1); +extern efi_status_t lin2win2(void *fp, u64 arg1, u64 arg2); +extern efi_status_t lin2win3(void *fp, u64 arg1, u64 arg2, u64 arg3); +extern efi_status_t lin2win4(void *fp, u64 arg1, u64 arg2, u64 arg3, u64 arg4); +extern efi_status_t lin2win5(void *fp, u64 arg1, u64 arg2, u64 arg3, +u64 arg4, u64 arg5); +extern efi_status_t lin2win6(void *fp, u64 arg1, u64 arg2, u64 arg3, +u64 arg4, u64 arg5, u64 arg6); + +#endif Index: linux-2.6.24-rc1/arch/x86/kernel/efi_64.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc1/arch/x86/kernel/efi_64.c 2007-10-25 14:51:41.0 +0800 @@ -0,0 +1,593 @@ +/* + * Extensible Firmware Interface + * + * Based on Extensible Firmware Interface Specification version 1.0 + * + * Copyright (C) 1999 VA Linux Systems + * Copyright (C) 1999 Walt Drummond [EMAIL PROTECTED] + * Copyright (C) 1999-2002 Hewlett-Packard Co. + * David Mosberger-Tang [EMAIL PROTECTED] + * Stephane Eranian [EMAIL PROTECTED] + * Copyright (C) 2005-2008 Intel Co. + * Fenghua Yu [EMAIL PROTECTED] + * Bibo Mao [EMAIL PROTECTED] + * Chandramouli Narayanan [EMAIL PROTECTED] + * Huang Ying [EMAIL PROTECTED] + * + * Code to convert EFI to E820 map has been implemented in elilo bootloader + * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table + * is setup appropriately for EFI runtime code. + * - mouli 06/14/2007. + * + * All EFI Runtime Services are not implemented yet as EFI only + * supports physical mode addressing on SoftSDV. This is to be fixed + * in a future version. --drummond 1999-07-20 + * + * Implemented EFI runtime services and virtual mode calls. --davidm + * + * Goutham Rao: [EMAIL PROTECTED] + * Skip non-WB memory and ignore empty memory ranges. + */ + +#include linux/kernel.h +#include linux/init.h +#include linux/mm.h +#include linux/types.h +#include linux/time.h +#include linux/spinlock.h +#include linux/bootmem.h +#include linux/ioport.h +#include linux/module.h +#include linux/efi.h +#include linux/uaccess.h +#include linux/io.h +#include linux/reboot.h + +#include asm/setup.h +#include asm/bootparam.h +#include asm/page.h +#include asm/e820.h +#include asm/pgtable.h +#include asm/tlbflush.h +#include asm/cacheflush.h +#include asm/proto.h +#include asm/eficallwrap_64.h +#include asm/efi_64.h +#include asm/time_64.h + +int efi_enabled; +EXPORT_SYMBOL(efi_enabled); + +struct efi efi; +EXPORT_SYMBOL(efi); + +struct efi_memory_map memmap; + +struct efi efi_phys __initdata; +static efi_system_table_t efi_systab __initdata
[PATCH 2/3 -v4] x86_64 EFI runtime service support: EFI runtime services
This patch adds support for several EFI runtime services for EFI x86_64 system. The EFI support for emergency_restart and RTC clock is added. The EFI based implementation and legacy BIOS or CMOS based implementation are put in separate functions and can be chosen with kernel boot options. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/reboot_64.c | 19 +- arch/x86/kernel/time_64.c | 48 include/asm-x86/emergency-restart.h |8 ++ include/asm-x86/time_64.h |7 + 4 files changed, 60 insertions(+), 22 deletions(-) Index: linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c === --- linux-2.6.24-rc1.orig/arch/x86/kernel/reboot_64.c 2007-10-25 11:25:29.0 +0800 +++ linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c2007-10-25 11:25:38.0 +0800 @@ -9,6 +9,7 @@ #include linux/pm.h #include linux/kdebug.h #include linux/sched.h +#include linux/efi.h #include asm/io.h #include asm/delay.h #include asm/desc.h @@ -26,18 +27,16 @@ EXPORT_SYMBOL(pm_power_off); static long no_idt[3]; -static enum { - BOOT_TRIPLE = 't', - BOOT_KBD = 'k' -} reboot_type = BOOT_KBD; +enum reboot_type reboot_type = BOOT_KBD; static int reboot_mode = 0; int reboot_force; -/* reboot=t[riple] | k[bd] [, [w]arm | [c]old] +/* reboot=t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbdUse the keyboard controller. cold reset (default) + efiUse efi reset_system runtime service force Avoid anything that could hang. */ static int __init reboot_setup(char *str) @@ -55,6 +54,7 @@ case 't': case 'b': case 'k': + case 'e': reboot_type = *str; break; case 'f': @@ -142,7 +142,14 @@ reboot_type = BOOT_KBD; break; - } + + case BOOT_EFI: + if (efi_enabled) + efi.reset_system(reboot_mode ? EFI_RESET_WARM : EFI_RESET_COLD, +EFI_SUCCESS, 0, NULL); + reboot_type = BOOT_KBD; + break; + } } } Index: linux-2.6.24-rc1/arch/x86/kernel/time_64.c === --- linux-2.6.24-rc1.orig/arch/x86/kernel/time_64.c 2007-10-25 11:25:29.0 +0800 +++ linux-2.6.24-rc1/arch/x86/kernel/time_64.c 2007-10-25 11:25:38.0 +0800 @@ -25,6 +25,7 @@ #include linux/notifier.h #include linux/cpu.h #include linux/kallsyms.h +#include linux/efi.h #include linux/acpi.h #include linux/clockchips.h @@ -45,12 +46,19 @@ #include asm/mpspec.h #include asm/nmi.h #include asm/vgtod.h +#include asm/time_64.h DEFINE_SPINLOCK(rtc_lock); EXPORT_SYMBOL(rtc_lock); volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES; +static int set_rtc_mmss(unsigned long nowtime); +static unsigned long read_cmos_clock(void); + +unsigned long (*get_wallclock)(void) = read_cmos_clock; +int (*set_wallclock)(unsigned long nowtime) = set_rtc_mmss; + unsigned long profile_pc(struct pt_regs *regs) { unsigned long pc = instruction_pointer(regs); @@ -84,13 +92,6 @@ unsigned char control, freq_select; /* - * IRQs are disabled when we're called from the timer interrupt, - * no need for spin_lock_irqsave() - */ - - spin_lock(rtc_lock); - -/* * Tell the clock it's being set and stop it. */ @@ -138,14 +139,23 @@ CMOS_WRITE(control, RTC_CONTROL); CMOS_WRITE(freq_select, RTC_FREQ_SELECT); - spin_unlock(rtc_lock); - return retval; } int update_persistent_clock(struct timespec now) { - return set_rtc_mmss(now.tv_sec); + int retval; + +/* + * IRQs are disabled when we're called from the timer interrupt, + * no need for spin_lock_irqsave() + */ + + spin_lock(rtc_lock); + retval = set_wallclock(now.tv_sec); + spin_unlock(rtc_lock); + + return retval; } static irqreturn_t timer_event_interrupt(int irq, void *dev_id) @@ -157,14 +167,11 @@ return IRQ_HANDLED; } -unsigned long read_persistent_clock(void) +static unsigned long read_cmos_clock(void) { unsigned int year, mon, day, hour, min, sec; - unsigned long flags; unsigned century = 0; - spin_lock_irqsave(rtc_lock, flags); - do { sec = CMOS_READ(RTC_SECONDS); min = CMOS_READ(RTC_MINUTES); @@ -179,8 +186,6 @@ #endif } while (sec != CMOS_READ(RTC_SECONDS)); - spin_unlock_irqrestore(rtc_lock, flags
[PATCH 3/3 -v4] x86_64 EFI runtime service support: document for EFI runtime services
This patch adds document for EFI x86_64 runtime services support. --- boot-options.txt | 12 +++- uefi.txt | 10 ++ 2 files changed, 21 insertions(+), 1 deletion(-) Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] Index: linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt === --- linux-2.6.24-rc1.orig/Documentation/x86_64/boot-options.txt 2007-10-25 13:58:14.0 +0800 +++ linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt 2007-10-25 13:58:18.0 +0800 @@ -110,12 +110,15 @@ Rebooting - reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old] + reboot=b[ios] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] bios Use the CPU reboot vector for warm reset warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbdUse the keyboard controller. cold reset (default) + efiUse efi reset_system runtime service. If EFI is not configured or the + EFI reset does not work, the reboot path attempts the reset using + the keyboard controller. Using warm reset will be much faster especially on big memory systems because the BIOS will not go through the memory check. @@ -300,4 +303,11 @@ newfallback: use new unwinder but fall back to old if it gets stuck (default) +EFI + + noefiDisable EFI support + + noefi_time Disable EFI time runtime service, programming CMOS + hardware directly + Miscellaneous Index: linux-2.6.24-rc1/Documentation/x86_64/uefi.txt === --- linux-2.6.24-rc1.orig/Documentation/x86_64/uefi.txt 2007-10-25 13:58:18.0 +0800 +++ linux-2.6.24-rc1/Documentation/x86_64/uefi.txt 2007-10-25 13:58:18.0 +0800 @@ -19,6 +19,10 @@ - Build the kernel with the following configuration. CONFIG_FB_EFI=y CONFIG_FRAMEBUFFER_CONSOLE=y + If EFI runtime services are expected, the following configuration should + be selected. + CONFIG_EFI=y + CONFIG_EFI_VARS=y or m # optional - Create a VFAT partition on the disk - Copy the following to the VFAT partition: elilo bootloader with x86_64 support, elilo configuration file, @@ -27,3 +31,9 @@ can be found in the elilo sourceforge project. - Boot to EFI shell and invoke elilo choosing the kernel image built in first step. +- If some or all EFI runtime services don't work, you can try following + kernel command line parameters to turn off some or all EFI runtime + services. + noefi turn off all EFI runtime services + noefi_time turn off EFI time runtime service + reboot_type=k turn off EFI reboot runtime service - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 0/3 -v4] x86_64 EFI runtime service support
Following sets of patches add EFI/UEFI (Unified Extensible Firmware Interface) runtime services support to x86_64 architecture. The patches have been tested against 2.6.24-rc1 kernel on Intel platforms with EFI1.10 and UEFI2.0 firmware. v4: - EFI boot parameters are extended for 64-bit EFI in a 32-bit EFI compatible way. - Add EFI runtime services document. v3: - Remove E820_RUNTIME_CODE, the EFI memory map is used to deal with EFI runtime code area. - The method used to make EFI runtime code area executable is change: a. Before page allocation is usable, the PMD of direct mapping is changed temporarily before and after each EFI call. b. After page allocation is usable, change_page_attr_addr is used to change corresponding page attribute. - Use fixmap to map EFI memory mapped IO memory area to make kexec workable. - Add a kernel command line option noefi to make it possible to turn off EFI runtime services support. - Function pointers are used for EFI time runtime service. - EFI reboot runtime service is embedded into the framework of reboot_type. - A kernel command line option noefi_time is added to make it possible to fall back to CMOS based implementation. v2: - The EFI callwrapper is re-implemented in assembler. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 18:09 +0200, Thomas Gleixner wrote: EFI runtime services initialization are implemented in efi.c. Some x86_64 specifics are worth noting here. On x86_64, parameters passed to UEFI firmware services need to follow the UEFI calling convention. For this purpose, a set of functions named lin2winx (x is the number of parameters) are implemented. EFI function calls are wrapped before calling the firmware service. Why needs this to be called lin2win? We do not call Windows, we call EFI services, so please use a naming convention which is related to the functionality of the code. + * + * Function calling ABI conversion from SYSV to Windows for x86_64 Again, these are wrappers to access EFI and not Windows. EFI uses the Windows x86_64 calling convention. The lin2win may be a more general naming convention that can be used for some other code (the NDISwrapper?) in the future. Do you agree? Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 11:01 -0600, Eric W. Biederman wrote: +static efi_status_t __init phys_efi_set_virtual_address_map( + unsigned long memory_map_size, + unsigned long descriptor_size, + u32 descriptor_version, + efi_memory_desc_t *virtual_map) +{ + efi_status_t status; + + efi_call_phys_prelog(); + status = lin2win4((void *)efi_phys.set_virtual_address_map, + (u64)memory_map_size, (u64)descriptor_size, + (u64)descriptor_version, (u64)virtual_map); + efi_call_phys_epilog(); + return status; +} So you still have this piece of code which makes a kernel using efi not compatible with kexec. But you are still supporting a physical call mode for efi. If you are going to do this can we please just remove the hacks that make the EFI physical call mode early boot only and just always use that mode. Depending on weird call once functions like efi_set_virtual_address_map makes me very uncomfortable. The kexec issue is solved as that of IA-64. The EFI runtime code and data memory area is mapped with identity mapping, so they will have same virtual address across kexec. The memory mapped IO memory area used by EFI runtime services is mapped with fixmap, so they will have same virtual address across kexec too. And the efi_set_virtual_address_map runtime service function will be skipped in the kexeced kernel (set to nop in /sbin/kexec). So the kexeced kernel can use the virtual mode of EFI from kernel bootstrap on. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 11:06 -0600, Eric W. Biederman wrote: Arjan van de Ven [EMAIL PROTECTED] writes: On Thu, 25 Oct 2007 10:55:44 -0600 [EMAIL PROTECTED] (Eric W. Biederman) wrote: I don't think there is a compelling case for us to use any efi services at this time I would almost agree with this if it wasn't for the 1 call that OS installers need to tell EFI about bootloader stuff; I've cc'd Peter Jones since he'll know better what OS installers need; if they don't need it after all... Yes. I think that is usage of the variable service. Although I don't know if that is actually needed. Support for the variable service is not implemented in this patchset. Support for the variable service has been implemented in this patchset. The interfaces are: efi.get_variable efi.get_next_variable efi.set_variable And a sysfs interface (/sys/firmware/efi/vars) is provided in drivers/firmware/efivars.c, which depends on this patchset to work. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 11:30 -0600, Eric W. Biederman wrote: H. Peter Anvin [EMAIL PROTECTED] writes: Andi Kleen wrote: Especially for accessing the real time clock that has a well defined hardware interface going through efi an additional software emulation layer looks like asking for trouble. I agree it's pointless for the hardware clock, but EFI also offers services to write some data to the CMOS RAM which could be very useful to save oops data over reboot. I don't think this can be done safely otherwise without BIOS cooperation. The ability to scurry away even a small amount of data without relying on the disk system is highly desirable. Think next-boot type information. Yes. If that were to be the justifying case and if that was what the code was implementing I could see the point. However this point was made in an earlier review. This point was already been made, and still this patchset doesn't include that functionality and it still includes the code to disable direct hardware access for no seemingly sane reason. The EFI variable runtime service is included in this patchset. The EFI time runtime service is selectable via kernel command line parameter now. If it is desired, it can be disabled by default, and only enabled if specified in kernel command line parameter. I think the time runtime service may be useful if the underlying hardware is changed silently by some vendor. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 13:36 -0700, H. Peter Anvin wrote: Eric W. Biederman wrote: Ying claimed that GOP requires EFI runtime services. Is that not true? None of the EFI framebuffer patches that I saw used EFI runtime services. Ying, could you please clarify this situation? (Eric: do note that there are two EFI framebuffer standard, UGA and GOP. Apparently UGA is obsolete and we have always been at war with GOP at the moment.) EFI framebuffer doesn't depend on EFI runtime service. It only depends on kernel boot parameters (screen_info). Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 15:29 -0700, H. Peter Anvin wrote: Eric W. Biederman wrote: H. Peter Anvin [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Ying claimed that GOP requires EFI runtime services. Is that not true? None of the EFI framebuffer patches that I saw used EFI runtime services. Ying, could you please clarify this situation? (Eric: do note that there are two EFI framebuffer standard, UGA and GOP. Apparently UGA is obsolete and we have always been at war with GOP at the moment.) Peter please look back in your email archives to yesterday and see Ying's patch: [PATCH 1/2 -v2 resend] x86_64 EFI boot support: EFI frame buffer driver All of the data the GOP needs is acquired through the a query made by the bootloader and passed through screen info. Then I fully agree with your assessment. EFI framebuffer doesn't depend on EFI runtime service. But EFI variable service depends on EFI runtime service, and most people think it is useful. It can be used to: - Provide a standard method to communicate with BIOS, such as specifying the boot device or bootloader for the next boot. - Provide a standard method to write the OOPS information to flash. To improve the reliability of OOPS information writing, the virtual mode of EFI should be used. And through mapping all memory area used by EFI to the same virtual address across kexec, EFI can work with kexec under virtual mode just like that of IA-64. So, I think the EFI runtime service is useful and it does not break anything. But the code duplication between efi_32.c and efi_64.c should be eliminated and I will work on this. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Thu, 2007-10-25 at 18:09 +0200, Thomas Gleixner wrote: On Thu, 25 Oct 2007, Huang, Ying wrote: This patch adds basic runtime services support for EFI x86_64 system. The main file of the patch is the addition of efi.c for x86_64. This file is modeled after the EFI IA32 avatar. modeled means copied and modified, right? This is wrong. I compared efi_32.c and efi_64.c and a large amount of the code is simply the same. The small details can be sorted out by two sets of macros/inline functions easily. Please fix this up. Yes. There are many duplicated code between efi_32.c and efi_64.c, and they should be merged. But there are some code that is different between efi_32.c and efi_64.c. For example, there is different implementations of efi_call_phys_prelog in both files, and there is an implementation of efi_memmap_walk only in efi_32.c not in efi_64.c. 3 possible schemes are as follow: - One efi.c, with EFI 32/64 specific code inside corresponding #ifdef/#endif. - 3 files: efi.c, efi_32.c, efi_64.c, common code goes in efi.c, EFI 32/64 specific code goes in efi_32/64.c. This will make some variable, function external instead of static. - 3 files: efi.c, efi_32.c, efi_64.c, common code goes in efi.c, EFI 32/64 specific code goes in efi_32/64.c. efi.c include efi_32/64.c according to architecture. Which one is preferred? Or I should take another scheme? Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Fri, 2007-10-26 at 10:48 +0200, Thomas Gleixner wrote: EFI uses the Windows x86_64 calling convention. The lin2win may be a more general naming convention that can be used for some other code (the NDISwrapper?) in the future. Do you agree? I agree not at all. I do not care whether the EFI creators smoked the Windows-crackpipe or some other hallucinogen when they decided to use this calling convention. We definitely do not want to think about NDISwrapper or any other Windows related hackery in the kernel. OK, I will change the name to something like lin2efi. I still do not understand why we need all this EFI hackery at all aside of the possible usage for saving a crash dump on FLASH, which we could do directly from the kernel as well. Ask every user to setup a crash dump environment is a bit difficult because some configuration like reserving memory, loading crash dump kernel must be done. But saving OOPS information in FLASH via EFI variable runtime service is quite simple, without configuration requirement. That is, there could be more bug report with OOPS information. I think this is useful. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support
On Fri, 2007-10-26 at 12:31 +0100, Alan Cox wrote: On Fri, 26 Oct 2007 09:03:11 +0800 Huang, Ying [EMAIL PROTECTED] wrote: On Thu, 2007-10-25 at 18:09 +0200, Thomas Gleixner wrote: EFI runtime services initialization are implemented in efi.c. Some x86_64 specifics are worth noting here. On x86_64, parameters passed to UEFI firmware services need to follow the UEFI calling convention. For this purpose, a set of functions named lin2winx (x is the number of parameters) are implemented. EFI function calls are wrapped before calling the firmware service. Why needs this to be called lin2win? We do not call Windows, we call EFI services, so please use a naming convention which is related to the functionality of the code. + * + * Function calling ABI conversion from SYSV to Windows for x86_64 Again, these are wrappers to access EFI and not Windows. EFI uses the Windows x86_64 calling convention. The lin2win may be a more general naming convention that can be used for some other code (the NDISwrapper?) in the future. Do you agree? The SYSV description is wrong as well. SYSV has no calling convention. I think you mean iABI or iBCS2 ? The SYSV description comes from the following document: http://www.x86-64.org/documentation/abi-0.98.pdf Whats wrong with following the pattern of other calls like syscall(...) and just having eficall() ? Yes. This is better. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 4/4 -v5] x86_64 EFI runtime service support: remove duplicated code from efi_32.c
This patch removes the duplicated code between efi_32.c and efi.c. --- arch/x86/kernel/Makefile_32 |2 arch/x86/kernel/e820_32.c |5 arch/x86/kernel/efi_32.c| 430 arch/x86/kernel/setup_32.c | 11 - include/asm-x86/efi.h | 37 +++ 5 files changed, 42 insertions(+), 443 deletions(-) Signed-off-by: Huang Ying [EMAIL PROTECTED] Index: linux-2.6.24-rc1/arch/x86/kernel/Makefile_32 === --- linux-2.6.24-rc1.orig/arch/x86/kernel/Makefile_32 2007-10-30 11:05:57.0 +0800 +++ linux-2.6.24-rc1/arch/x86/kernel/Makefile_322007-10-30 11:10:33.0 +0800 @@ -34,7 +34,7 @@ obj-$(CONFIG_MODULES) += module_32.o obj-y += sysenter_32.o vsyscall_32.o obj-$(CONFIG_ACPI_SRAT)+= srat_32.o -obj-$(CONFIG_EFI) += efi_32.o efi_stub_32.o +obj-$(CONFIG_EFI) += efi.o efi_32.o efi_stub_32.o obj-$(CONFIG_DOUBLEFAULT) += doublefault_32.o obj-$(CONFIG_VM86) += vm86_32.o obj-$(CONFIG_EARLY_PRINTK) += early_printk.o Index: linux-2.6.24-rc1/arch/x86/kernel/efi_32.c === --- linux-2.6.24-rc1.orig/arch/x86/kernel/efi_32.c 2007-10-30 11:05:57.0 +0800 +++ linux-2.6.24-rc1/arch/x86/kernel/efi_32.c 2007-10-30 11:10:33.0 +0800 @@ -39,21 +39,8 @@ #include asm/desc.h #include asm/tlbflush.h -#define EFI_DEBUG 0 #define PFXEFI: -extern efi_status_t asmlinkage efi_call_phys(void *, ...); - -struct efi efi; -EXPORT_SYMBOL(efi); -static struct efi efi_phys; -struct efi_memory_map memmap; - -/* - * We require an early boot_ioremap mapping mechanism initially - */ -extern void * boot_ioremap(unsigned long, unsigned long); - /* * To make EFI call EFI runtime service in physical addressing mode we need * prelog/epilog before/after the invocation to disable interrupt, to @@ -65,7 +52,7 @@ static DEFINE_SPINLOCK(efi_rt_lock); static pgd_t efi_bak_pg_dir_pointer[2]; -static void efi_call_phys_prelog(void) __acquires(efi_rt_lock) +void efi_call_phys_prelog(void) __acquires(efi_rt_lock) { unsigned long cr4; unsigned long temp; @@ -108,7 +95,7 @@ load_gdt(gdt_descr); } -static void efi_call_phys_epilog(void) __releases(efi_rt_lock) +void efi_call_phys_epilog(void) __releases(efi_rt_lock) { unsigned long cr4; struct Xgt_desc_struct gdt_descr; @@ -138,87 +125,6 @@ spin_unlock(efi_rt_lock); } -static efi_status_t -phys_efi_set_virtual_address_map(unsigned long memory_map_size, -unsigned long descriptor_size, -u32 descriptor_version, -efi_memory_desc_t *virtual_map) -{ - efi_status_t status; - - efi_call_phys_prelog(); - status = efi_call_phys(efi_phys.set_virtual_address_map, -memory_map_size, descriptor_size, -descriptor_version, virtual_map); - efi_call_phys_epilog(); - return status; -} - -static efi_status_t -phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc) -{ - efi_status_t status; - - efi_call_phys_prelog(); - status = efi_call_phys(efi_phys.get_time, tm, tc); - efi_call_phys_epilog(); - return status; -} - -inline int efi_set_rtc_mmss(unsigned long nowtime) -{ - int real_seconds, real_minutes; - efi_status_tstatus; - efi_time_t eft; - efi_time_cap_t cap; - - spin_lock(efi_rt_lock); - status = efi.get_time(eft, cap); - spin_unlock(efi_rt_lock); - if (status != EFI_SUCCESS) - panic(Ooops, efitime: can't read time!\n); - real_seconds = nowtime % 60; - real_minutes = nowtime / 60; - - if (((abs(real_minutes - eft.minute) + 15)/30) 1) - real_minutes += 30; - real_minutes %= 60; - - eft.minute = real_minutes; - eft.second = real_seconds; - - if (status != EFI_SUCCESS) { - printk(Ooops: efitime: can't read time!\n); - return -1; - } - return 0; -} -/* - * This is used during kernel init before runtime - * services have been remapped and also during suspend, therefore, - * we'll need to call both in physical and virtual modes. - */ -inline unsigned long efi_get_time(void) -{ - efi_status_t status; - efi_time_t eft; - efi_time_cap_t cap; - - if (efi.get_time) { - /* if we are in virtual mode use remapped function */ - status = efi.get_time(eft, cap); - } else { - /* we are in physical mode */ - status = phys_efi_get_time(eft, cap); - } - - if (status != EFI_SUCCESS) - printk(Oops: efitime: can't read time status: 0x%lx\n,status
[PATCH 3/4 -v5] x86_64 EFI runtime service support: document for EFI runtime services
This patch adds document for EFI x86_64 runtime services support. --- boot-options.txt | 11 ++- uefi.txt |9 + 2 files changed, 19 insertions(+), 1 deletion(-) Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] Index: linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt === --- linux-2.6.24-rc1.orig/Documentation/x86_64/boot-options.txt 2007-10-30 10:15:00.0 +0800 +++ linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt 2007-10-30 10:23:52.0 +0800 @@ -110,12 +110,15 @@ Rebooting - reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old] + reboot=b[ios] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] bios Use the CPU reboot vector for warm reset warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbdUse the keyboard controller. cold reset (default) + efiUse efi reset_system runtime service. If EFI is not configured or the + EFI reset does not work, the reboot path attempts the reset using + the keyboard controller. Using warm reset will be much faster especially on big memory systems because the BIOS will not go through the memory check. @@ -300,4 +303,10 @@ newfallback: use new unwinder but fall back to old if it gets stuck (default) +EFI + + noefiDisable EFI support + + efi_time=on Enable EFI time runtime service + Miscellaneous Index: linux-2.6.24-rc1/Documentation/x86_64/uefi.txt === --- linux-2.6.24-rc1.orig/Documentation/x86_64/uefi.txt 2007-10-30 10:15:00.0 +0800 +++ linux-2.6.24-rc1/Documentation/x86_64/uefi.txt 2007-10-30 10:25:39.0 +0800 @@ -19,6 +19,10 @@ - Build the kernel with the following configuration. CONFIG_FB_EFI=y CONFIG_FRAMEBUFFER_CONSOLE=y + If EFI runtime services are expected, the following configuration should + be selected. + CONFIG_EFI=y + CONFIG_EFI_VARS=y or m # optional - Create a VFAT partition on the disk - Copy the following to the VFAT partition: elilo bootloader with x86_64 support, elilo configuration file, @@ -27,3 +31,8 @@ can be found in the elilo sourceforge project. - Boot to EFI shell and invoke elilo choosing the kernel image built in first step. +- If some or all EFI runtime services don't work, you can try following + kernel command line parameters to turn off some or all EFI runtime + services. + noefi turn off all EFI runtime services + reboot_type=k turn off EFI reboot runtime service - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 2/4 -v5] x86_64 EFI runtime service support: EFI runtime services
This patch adds support for several EFI runtime services for EFI x86_64 system. The EFI support for emergency_restart and RTC clock is added. The EFI based implementation and legacy BIOS or CMOS based implementation are put in separate functions and can be chosen with kernel boot options. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/reboot_64.c | 19 +- arch/x86/kernel/time_64.c | 47 +++- include/asm-x86/emergency-restart.h |8 ++ include/asm-x86/time.h | 47 +++- include/asm-x86/time_32.h | 44 + include/asm-x86/time_64.h |7 + 6 files changed, 107 insertions(+), 65 deletions(-) Index: linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c === --- linux-2.6.24-rc1.orig/arch/x86/kernel/reboot_64.c 2007-10-30 10:15:03.0 +0800 +++ linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c2007-10-30 10:22:00.0 +0800 @@ -9,6 +9,7 @@ #include linux/pm.h #include linux/kdebug.h #include linux/sched.h +#include linux/efi.h #include asm/io.h #include asm/delay.h #include asm/desc.h @@ -26,18 +27,16 @@ EXPORT_SYMBOL(pm_power_off); static long no_idt[3]; -static enum { - BOOT_TRIPLE = 't', - BOOT_KBD = 'k' -} reboot_type = BOOT_KBD; +enum reboot_type reboot_type = BOOT_KBD; static int reboot_mode = 0; int reboot_force; -/* reboot=t[riple] | k[bd] [, [w]arm | [c]old] +/* reboot=t[riple] | k[bd] | e[fi] [, [w]arm | [c]old] warm Don't set the cold reboot flag cold Set the cold reboot flag triple Force a triple fault (init) kbdUse the keyboard controller. cold reset (default) + efiUse efi reset_system runtime service force Avoid anything that could hang. */ static int __init reboot_setup(char *str) @@ -55,6 +54,7 @@ case 't': case 'b': case 'k': + case 'e': reboot_type = *str; break; case 'f': @@ -142,7 +142,14 @@ reboot_type = BOOT_KBD; break; - } + + case BOOT_EFI: + if (efi_enabled) + efi.reset_system(reboot_mode ? EFI_RESET_WARM : EFI_RESET_COLD, +EFI_SUCCESS, 0, NULL); + reboot_type = BOOT_KBD; + break; + } } } Index: linux-2.6.24-rc1/arch/x86/kernel/time_64.c === --- linux-2.6.24-rc1.orig/arch/x86/kernel/time_64.c 2007-10-30 10:15:03.0 +0800 +++ linux-2.6.24-rc1/arch/x86/kernel/time_64.c 2007-10-30 10:22:04.0 +0800 @@ -45,12 +45,19 @@ #include asm/mpspec.h #include asm/nmi.h #include asm/vgtod.h +#include asm/time.h DEFINE_SPINLOCK(rtc_lock); EXPORT_SYMBOL(rtc_lock); volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES; +static int set_rtc_mmss(unsigned long nowtime); +static unsigned long read_cmos_clock(void); + +unsigned long (*get_wallclock)(void) = read_cmos_clock; +int (*set_wallclock)(unsigned long nowtime) = set_rtc_mmss; + unsigned long profile_pc(struct pt_regs *regs) { unsigned long pc = instruction_pointer(regs); @@ -84,13 +91,6 @@ unsigned char control, freq_select; /* - * IRQs are disabled when we're called from the timer interrupt, - * no need for spin_lock_irqsave() - */ - - spin_lock(rtc_lock); - -/* * Tell the clock it's being set and stop it. */ @@ -138,14 +138,23 @@ CMOS_WRITE(control, RTC_CONTROL); CMOS_WRITE(freq_select, RTC_FREQ_SELECT); - spin_unlock(rtc_lock); - return retval; } int update_persistent_clock(struct timespec now) { - return set_rtc_mmss(now.tv_sec); + int retval; + +/* + * IRQs are disabled when we're called from the timer interrupt, + * no need for spin_lock_irqsave() + */ + + spin_lock(rtc_lock); + retval = set_wallclock(now.tv_sec); + spin_unlock(rtc_lock); + + return retval; } static irqreturn_t timer_event_interrupt(int irq, void *dev_id) @@ -157,14 +166,11 @@ return IRQ_HANDLED; } -unsigned long read_persistent_clock(void) +static unsigned long read_cmos_clock(void) { unsigned int year, mon, day, hour, min, sec; - unsigned long flags; unsigned century = 0; - spin_lock_irqsave(rtc_lock, flags); - do { sec = CMOS_READ(RTC_SECONDS); min = CMOS_READ(RTC_MINUTES); @@ -179,8 +185,6 @@ #endif } while (sec != CMOS_READ(RTC_SECONDS)); - spin_unlock_irqrestore(rtc_lock, flags
[PATCH 0/4 -v5] x86_64 EFI runtime service support
Following patchset adds EFI/UEFI (Unified Extensible Firmware Interface) runtime services support to x86_64 architecture. The patchset have been tested against 2.6.24-rc1 kernel on Intel platforms with 64-bit EFI1.10 and UEFI2.0 firmware. Because the duplicated code between efi_32.c and efi_64.c is removed, the patchset is also tested on Intel platform with 32-bit EFI firmware. v5: - Remove the duplicated code between efi_32.c and efi_64.c. - Rename lin2winx to efi_callx. - Make EFI time runtime service default to off. - Use different bootloader signature for EFI32 and EFI64, so that kernel can know whether underlaying EFI firmware is 64-bit or 32-bit. v4: - EFI boot parameters are extended for 64-bit EFI in a 32-bit EFI compatible way. - Add EFI runtime services document. v3: - Remove E820_RUNTIME_CODE, the EFI memory map is used to deal with EFI runtime code area. - The method used to make EFI runtime code area executable is change: a. Before page allocation is usable, the PMD of direct mapping is changed temporarily before and after each EFI call. b. After page allocation is usable, change_page_attr_addr is used to change corresponding page attribute. - Use fixmap to map EFI memory mapped IO memory area to make kexec workable. - Add a kernel command line option noefi to make it possible to turn off EFI runtime services support. - Function pointers are used for EFI time runtime service. - EFI reboot runtime service is embedded into the framework of reboot_type. - A kernel command line option noefi_time is added to make it possible to fall back to CMOS based implementation. v2: - The EFI callwrapper is re-implemented in assembler. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH 1/4 -v5] x86_64 EFI runtime service support: EFI basic runtime service support
This patch adds basic runtime services support for EFI x86_64 system. The main file of the patch is the addition of efi_64.c for x86_64. This file is modeled after the EFI IA32 avatar. EFI runtime services initialization are implemented in efi_64.c. Some x86_64 specifics are worth noting here. On x86_64, parameters passed to EFI firmware services need to follow the EFI calling convention. For this purpose, a set of functions named efi_callx (x is the number of parameters) are implemented. EFI function calls are wrapped before calling the firmware service. The duplicated code between efi_32.c and efi_64.c is placed in efi.c to remove them from efi_32.c. Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED] Signed-off-by: Huang Ying [EMAIL PROTECTED] --- arch/x86/kernel/Makefile_64 |1 arch/x86/kernel/efi.c | 483 ++ arch/x86/kernel/efi_64.c | 181 +++ arch/x86/kernel/efi_stub_64.S | 68 + arch/x86/kernel/setup_64.c| 17 + arch/x86_64/Kconfig | 11 include/asm-x86/bootparam.h |5 include/asm-x86/efi.h | 70 ++ include/asm-x86/fixmap_64.h |3 9 files changed, 836 insertions(+), 3 deletions(-) Index: linux-2.6.24-rc1/arch/x86/kernel/efi_64.c === --- /dev/null 1970-01-01 00:00:00.0 + +++ linux-2.6.24-rc1/arch/x86/kernel/efi_64.c 2007-10-30 10:07:57.0 +0800 @@ -0,0 +1,181 @@ +/* + * x86_64 specific EFI support functions + * Based on Extensible Firmware Interface Specification version 1.0 + * + * Copyright (C) 2005-2008 Intel Co. + * Fenghua Yu [EMAIL PROTECTED] + * Bibo Mao [EMAIL PROTECTED] + * Chandramouli Narayanan [EMAIL PROTECTED] + * Huang Ying [EMAIL PROTECTED] + * + * Code to convert EFI to E820 map has been implemented in elilo bootloader + * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table + * is setup appropriately for EFI runtime code. + * - mouli 06/14/2007. + * + */ + +#include linux/kernel.h +#include linux/init.h +#include linux/mm.h +#include linux/types.h +#include linux/spinlock.h +#include linux/bootmem.h +#include linux/ioport.h +#include linux/module.h +#include linux/efi.h +#include linux/uaccess.h +#include linux/io.h +#include linux/reboot.h + +#include asm/setup.h +#include asm/page.h +#include asm/e820.h +#include asm/pgtable.h +#include asm/tlbflush.h +#include asm/cacheflush.h +#include asm/proto.h +#include asm/efi.h + +int efi_time __initdata; + +static pgd_t save_pgd __initdata; +static unsigned long efi_flags __initdata; +/* efi_lock protects efi physical mode call */ +static __initdata DEFINE_SPINLOCK(efi_lock); + +static int __init setup_noefi(char *arg) +{ + efi_enabled = 0; + return 0; +} +early_param(noefi, setup_noefi); + +static int __init setup_efi_time(char *arg) +{ + if (arg !strcmp(on, arg)) + efi_time = 1; + return 0; +} +early_param(efi_time, setup_efi_time); + +static void __init early_mapping_set_exec(unsigned long start, + unsigned long end, + int executable) +{ + pte_t *kpte; + + while (start end) { + kpte = lookup_address((unsigned long)__va(start)); + BUG_ON(!kpte); + if (executable) + set_pte(kpte, pte_mkexec(*kpte)); + else + set_pte(kpte, __pte((pte_val(*kpte) | _PAGE_NX) \ + __supported_pte_mask)); + if (pte_huge(*kpte)) + start = (start + PMD_SIZE) PMD_MASK; + else + start = (start + PAGE_SIZE) PAGE_MASK; + } +} + +static void __init early_runtime_code_mapping_set_exec(int executable) +{ + efi_memory_desc_t *md; + void *p; + + /* Make EFI runtime service code area executable */ + for (p = memmap.map; p memmap.map_end; p += memmap.desc_size) { + md = p; + if (md-type == EFI_RUNTIME_SERVICES_CODE) { + unsigned long end; + end = md-phys_addr + (md-num_pages PAGE_SHIFT); + early_mapping_set_exec(md-phys_addr, end, executable); + } + } +} + +void __init efi_call_phys_prelog(void) __acquires(efi_lock) +{ + unsigned long vaddress; + + /* +* Lock sequence is different from normal case because +* efi_flags is global +*/ + spin_lock(efi_lock); + local_irq_save(efi_flags); + early_runtime_code_mapping_set_exec(1); + vaddress = (unsigned long)__va(0x0UL); + pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL)); + set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress)); + global_flush_tlb(); +} + +void __init efi_call_phys_epilog(void) __releases
Re: [PATCH 2/4 -v5] x86_64 EFI runtime service support: EFI runtime services
On Tue, 2007-10-30 at 15:58 +, Denys Vlasenko wrote: On Tuesday 30 October 2007 05:55, Huang, Ying wrote: +static inline unsigned long native_get_wallclock(void) +{ + unsigned long retval; + + if (efi_enabled) + retval = efi_get_time(); + else + retval = mach_get_cmos_time(); + + return retval; +} mach_get_cmos_time() is itself an inline, and a _large_ one (~20 LOC with macro and function calls). efi_get_time() is an inline too, although strange one: it is declared inline *only* in efi.c file: inline unsigned long efi_get_time(void) (yes, just inline, not static/extern), while efi.h has normal extern for it: extern unsigned long efi_get_time(void); Is it supposed to be like that? efi_get_time is no longer inline in this patch. See efi.c of this patch. Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/2 -v2 resend] x86_64 EFI boot support
Can this patchset be merged into mainline kernel? This patchset has been in -mm tree from 2.6.23-rc2-mm2 on. Andrew Moton has suggested it to be merged into 2.6.24 during early merge window of 2.6.24. It was not merged into mainline because the 32-bit boot protocol has not been done. But now, the 32-bit boot protocol has been merged into mainline. So can this patchset be merged into mainline kernel now? Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH -mm -v5 0/3] i386/x86_64 boot: 32-bit boot protocol
On Wed, 2007-10-17 at 11:24 +0200, Andi Kleen wrote: Can you tell me what that early reservation interface is? What I find in x86_64 that does early memory allocation is alloc_low_page, which gets non-conflict memory area through e820 map. It's a new interface I only recently wrote: ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/early-reserve Then you can use early_reserve() and the e820 allocator will not touch it. Because setup data is allocated by bootloader or kernel 16-bit setup code, and the e820 map is created there too, the memory area used by setup data can be made reserved memory area in e820 map by bootloader or kernel 16-bit setup code. This way, they will not be overwritten by kernel. Do you think this works. It has a little of a chicken'n'egg problem because the e820 map will be actually in the area you want to reserve. But it might work too. Boot data is normally copied before other allocations in head64.c If you do variable size boot data that might not work though. And might be a little fragile overall. Although variable size boot data (such as setup data) can be reserved via early_reserve or e820 map, they may conflict with hard-coded memory area used by kernel. This means boot loader must know the hard-coded memory area used by kernel. Another possible solution is as follow: 1. Bootloader allocates memory for setup data. Just avoid the memory area after kernel loaded address. 2. In the very early stage of kernel boot (head64.c). Copy all the setup data to the memory area after _end. And reserve that memory area with early_reserve (or bad_addr for old code). In this solution, the only unsafe memory area for setup data from bootloader is memory area after _end. And kernel can use hard coded memory area without the risk of conflicting with setup data from bootloader. Do you think this solution is better? Best Regards, Huang Ying - To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/