Re: [PATCH 0/3 -mm] kexec jump -v8

2008-01-07 Thread Huang, Ying
 kernel
 (bzImage or vmlinux) or resuming a already booted kernel. Kexec tools
 should determine that and setup the entry point accordingly (might
 require some purgatory changes to take care of transition while jumping
 to resume hibernated image).

Yes. This is a good idea. I will do it.

  And, it is more useful for invoking some code in physical mode. The
  procedure is something as follow:
  
  1. load some code executing in physical mode via kexec 
  --load-preserve-context.
  2. setup the parameters via amending /proc/kimgcore
  3. execute the code in physical mode via kexec -e
  4. get the result via reading /proc/kimgcore
  5. setup another groups of parameters via amending /proc/kimgcore
  ...
 
 This seems to be extended functionlity. If your focus is Kexec based
 hibernation then I would think of initially keeping the implementation
 simple and keeping patches small. Make kexec based hibernation work
 and then extend functionality for other purposes.

I think this is a important use case of kexec jump besides kexec
based hibernation. But it can be separated from initial kexec jump
patchset if necessary.
 
[...]
  The main issue of this mechanism is that: it is a kernel-to-kernel
  communication mechanism, while Eric Biederman thinks we should use only
  user-to-user communication mechanism. And he is not persuaded now.
  
  Because kernel operations such as re-initialize/re-construct
  the /proc/vmcore, etc are needed for kexec jump or resuming. I think a
  kernel-to-kernel mechanism may be needed. But I don't know if Eric
  Biederman will agree with this.
 
 Hmm... Personally I am more inclined to exchanging information between
 two kernels on setup page, in a standard format (using ELF headers etc).
 This information can be prepared by kexec-tools in user space and be). 
 modified by purgatory (during transition to reflect the swapped pages.
 Alternatively, one can modify this setup page info from user space through
 some /proc/kimgcore like interface.  I prefer the first one...

Some information (such as backup pages map) is not available
when /sbin/kexec is executed. So there should be a method to pass such
information to purgatory from original kernel. So why not change the
information between two kernel via setup page directly (need not
purgatory).

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] x86 boot : export boot_params via debugfs for debugging

2008-01-07 Thread Huang, Ying
This patch export the boot parameters via debugfs for debugging.

The files added are as follow:

boot_params/data:  binary file for struct boot_params
boot_params/version :  boot protocol version

This patch is based on 2.6.24-rc5-mm1 and has been tested on i386 and
x86_64 platform.

This patch is based on the Peter Anvin's proposal.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/Kconfig.debug  |7 
 arch/x86/kernel/Makefile_32 |3 +-
 arch/x86/kernel/Makefile_64 |2 -
 arch/x86/kernel/kdebugfs.c  |   65 
 arch/x86/kernel/setup64.c   |4 ++
 arch/x86/kernel/setup_32.c  |4 ++
 6 files changed, 83 insertions(+), 2 deletions(-)

--- /dev/null
+++ b/arch/x86/kernel/kdebugfs.c
@@ -0,0 +1,65 @@
+/*
+ * Architecture specific debugfs files
+ *
+ * Copyright (C) 2007, Intel Corp.
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/debugfs.h
+#include linux/stat.h
+#include linux/init.h
+
+#include asm/setup.h
+
+#ifdef CONFIG_DEBUG_BOOT_PARAMS
+static struct debugfs_blob_wrapper boot_params_blob = {
+   .data = boot_params,
+   .size = sizeof(boot_params),
+};
+
+static int __init boot_params_kdebugfs_init(void)
+{
+   int error;
+   struct dentry *dbp, *version, *data;
+
+   dbp = debugfs_create_dir(boot_params, NULL);
+   if (!dbp) {
+   error = -ENOMEM;
+   goto err_return;
+   }
+   version = debugfs_create_x16(version, S_IRUGO, dbp,
+boot_params.hdr.version);
+   if (!version) {
+   error = -ENOMEM;
+   goto err_dir;
+   }
+   data = debugfs_create_blob(data, S_IRUGO, dbp,
+  boot_params_blob);
+   if (!data) {
+   error = -ENOMEM;
+   goto err_version;
+   }
+   return 0;
+err_version:
+   debugfs_remove(version);
+err_dir:
+   debugfs_remove(dbp);
+err_return:
+   return error;
+}
+#endif
+
+static int __init arch_kdebugfs_init(void)
+{
+   int error = 0;
+
+#ifdef CONFIG_DEBUG_BOOT_PARAMS
+   error = boot_params_kdebugfs_init();
+#endif
+
+   return error;
+}
+
+arch_initcall(arch_kdebugfs_init);
--- a/arch/x86/Kconfig.debug
+++ b/arch/x86/Kconfig.debug
@@ -112,4 +112,11 @@ config IOMMU_LEAK
  Add a simple leak tracer to the IOMMU code. This is useful when you
  are debugging a buggy device driver that leaks IOMMU mappings.
 
+config DEBUG_BOOT_PARAMS
+   bool Debug boot parameters
+   depends on DEBUG_KERNEL
+   depends on DEBUG_FS
+   help
+ This option will cause struct boot_params to be exported via debugfs.
+
 endmenu
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -8,7 +8,8 @@ CPPFLAGS_vmlinux.lds += -Ui386
 obj-y  := process_32.o signal_32.o entry_32.o traps_32.o irq_32.o \
time_32.o ioport_32.o ldt.o setup_32.o i8259_32.o sys_i386_32.o 
\
pci-dma_32.o i386_ksyms_32.o i387_32.o bootflag.o e820_32.o\
-   quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o rtc.o
+   quirks.o i8237.o topology.o alternative.o i8253.o tsc_32.o \
+   rtc.o kdebugfs.o
 
 obj-y  += ptrace.o
 obj-y  += ds.o
--- a/arch/x86/kernel/Makefile_64
+++ b/arch/x86/kernel/Makefile_64
@@ -10,7 +10,7 @@ obj-y := process_64.o signal_64.o entry_
x8664_ksyms_64.o i387_64.o syscall_64.o vsyscall_64.o \
setup64.o bootflag.o e820_64.o reboot_64.o quirks.o i8237.o \
pci-dma_64.o pci-nommu_64.o alternative.o hpet.o tsc_64.o 
bugs_64.o \
-   i8253.o rtc.o
+   i8253.o rtc.o kdebugfs.o
 
 obj-y  += ptrace.o
 obj-y  += ds.o
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -24,7 +24,11 @@
 #include asm/sections.h
 #include asm/setup.h
 
+#ifndef CONFIG_DEBUG_BOOT_PARAMS
 struct boot_params __initdata boot_params;
+#else
+struct boot_params boot_params;
+#endif
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -194,7 +194,11 @@ unsigned long saved_videomode;
 
 static char __initdata command_line[COMMAND_LINE_SIZE];
 
+#ifndef CONFIG_DEBUG_BOOT_PARAMS
 struct boot_params __initdata boot_params;
+#else
+struct boot_params boot_params;
+#endif
 
 #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
 struct edd edd;

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 0/2] kexec/i386: kexec page table code clean up

2008-01-08 Thread Huang, Ying
This patchset cleans up page table setup code of kexec on i386.

This patchset is based on 2.6.24-rc5-mm1 and has been tested on i386
with/without PAE enabled.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/2] kexec/i386: kexec page table code clean up - add arch_kimage

2008-01-08 Thread Huang, Ying
This patch add an architecture specific struct arch_kimage into struct
kimage. Three pointers to page table pages used by kexec are added to
struct arch_kimage. The page tables pages are dynamically allocated in
machine_kexec_prepare instead of statically from BSS segment. This
will save up to 20k memory when kexec image is not loaded.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/machine_kexec_32.c |   68 +
 include/asm-x86/kexec_32.h |   12 ++
 include/linux/kexec.h  |4 ++
 3 files changed, 63 insertions(+), 21 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -11,6 +11,7 @@
 #include linux/delay.h
 #include linux/init.h
 #include linux/numa.h
+#include linux/gfp.h
 #include asm/pgtable.h
 #include asm/pgalloc.h
 #include asm/tlbflush.h
@@ -21,15 +22,6 @@
 #include asm/desc.h
 #include asm/system.h
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u32 kexec_pgd[1024] PAGE_ALIGNED;
-#ifdef CONFIG_X86_PAE
-static u32 kexec_pmd0[1024] PAGE_ALIGNED;
-static u32 kexec_pmd1[1024] PAGE_ALIGNED;
-#endif
-static u32 kexec_pte0[1024] PAGE_ALIGNED;
-static u32 kexec_pte1[1024] PAGE_ALIGNED;
-
 static void set_idt(void *newidt, __u16 limit)
 {
struct Xgt_desc_struct curidt;
@@ -72,6 +64,28 @@ static void load_segments(void)
 #undef __STR
 }
 
+static void alloc_page_tables(struct kimage *image)
+{
+   image-arch_kimage.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+#ifdef CONFIG_X86_PAE
+   image-arch_kimage.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   image-arch_kimage.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+#endif
+   image-arch_kimage.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   image-arch_kimage.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+}
+
+static void free_page_tables(struct kimage *image)
+{
+   free_page((unsigned long)image-arch_kimage.pgd);
+#ifdef CONFIG_X86_PAE
+   free_page((unsigned long)image-arch_kimage.pmd0);
+   free_page((unsigned long)image-arch_kimage.pmd1);
+#endif
+   free_page((unsigned long)image-arch_kimage.pte0);
+   free_page((unsigned long)image-arch_kimage.pte1);
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -83,10 +97,21 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * - Allocate page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   alloc_page_tables(image);
+   if (!image-arch_kimage.pgd ||
+#ifdef CONFIG_X86_PAE
+   !image-arch_kimage.pmd0 ||
+   !image-arch_kimage.pmd1 ||
+#endif
+   !image-arch_kimage.pte0 ||
+   !image-arch_kimage.pte1) {
+   free_page_tables(image);
+   return -ENOMEM;
+   }
return 0;
 }
 
@@ -96,6 +121,7 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+   free_page_tables(image);
 }
 
 /*
@@ -115,18 +141,18 @@ NORET_TYPE void machine_kexec(struct kim
 
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
-   page_list[PA_PGD] = __pa(kexec_pgd);
-   page_list[VA_PGD] = (unsigned long)kexec_pgd;
+   page_list[PA_PGD] = __pa(image-arch_kimage.pgd);
+   page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd;
 #ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(kexec_pmd0);
-   page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-   page_list[PA_PMD_1] = __pa(kexec_pmd1);
-   page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(kexec_pte0);
-   page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-   page_list[PA_PTE_1] = __pa(kexec_pte1);
-   page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+   page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0);
+   page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0;
+   page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1);
+   page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1;
+#endif
+   page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0);
+   page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0;
+   page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1);
+   page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1;
 
/* The segment registers are funny things, they have both a
 * visible and an invisible part.  Whenever the visible part is
--- a/include/asm-x86/kexec_32.h
+++ b/include/asm-x86/kexec_32.h
@@ -94,6 +94,18 @@ relocate_kernel(unsigned long indirectio
unsigned long start_address,
unsigned int has_pae) ATTRIB_NORET;
 
+#define ARCH_HAS_ARCH_KIMAGE
+
+struct arch_kimage {
+   pgd_t *pgd;
+#ifdef CONFIG_X86_PAE
+   pmd_t *pmd0;
+   pmd_t *pmd1

[PATCH -mm 2/2] kexec/i386: kexec page table code clean up - page table setup in C

2008-01-08 Thread Huang, Ying
This patch transforms the kexec page tables setup code from assembler
code to C code in machine_kexec_prepare. This improves readability and
reduces code line number.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/machine_kexec_32.c   |   50 +++
 arch/x86/kernel/relocate_kernel_32.S |  114 ---
 include/asm-x86/kexec_32.h   |   18 -
 3 files changed, 40 insertions(+), 142 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -86,6 +86,42 @@ static void free_page_tables(struct kima
free_page((unsigned long)image-arch_kimage.pte1);
 }
 
+static void page_table_set_one(pgd_t *pgd, pmd_t *pmd, pte_t *pte,
+  unsigned long vaddr, unsigned long paddr)
+{
+   pud_t *pud;
+
+   pgd += pgd_index(vaddr);
+#ifdef CONFIG_X86_PAE
+   if (!(pgd_val(*pgd)  _PAGE_PRESENT))
+   set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT));
+#endif
+   pud = pud_offset(pgd, vaddr);
+   pmd = pmd_offset(pud, vaddr);
+   if (!(pmd_val(*pmd)  _PAGE_PRESENT))
+   set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+   pte = pte_offset_kernel(pmd, vaddr);
+   set_pte(pte, pfn_pte(paddr  PAGE_SHIFT, PAGE_KERNEL_EXEC));
+}
+
+static void prepare_page_tables(struct kimage *image)
+{
+   void *control_page;
+   pmd_t *pmd = 0;
+
+   control_page = page_address(image-control_code_page);
+#ifdef CONFIG_X86_PAE
+   pmd = image-arch_kimage.pmd0;
+#endif
+   page_table_set_one(image-arch_kimage.pgd, pmd, image-arch_kimage.pte0,
+  (unsigned long)relocate_kernel, __pa(control_page));
+#ifdef CONFIG_X86_PAE
+   pmd = image-arch_kimage.pmd1;
+#endif
+   page_table_set_one(image-arch_kimage.pgd, pmd, image-arch_kimage.pte1,
+  __pa(control_page), __pa(control_page));
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -98,6 +134,7 @@ static void free_page_tables(struct kima
  * later.
  *
  * - Allocate page tables
+ * - Setup page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
@@ -112,6 +149,7 @@ int machine_kexec_prepare(struct kimage 
free_page_tables(image);
return -ENOMEM;
}
+   prepare_page_tables(image);
return 0;
 }
 
@@ -140,19 +178,7 @@ NORET_TYPE void machine_kexec(struct kim
memcpy(control_page, relocate_kernel, PAGE_SIZE);
 
page_list[PA_CONTROL_PAGE] = __pa(control_page);
-   page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
page_list[PA_PGD] = __pa(image-arch_kimage.pgd);
-   page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd;
-#ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0);
-   page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0;
-   page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1);
-   page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0);
-   page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0;
-   page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1);
-   page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1;
 
/* The segment registers are funny things, they have both a
 * visible and an invisible part.  Whenever the visible part is
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -16,126 +16,12 @@
 
 #define PTR(x) (x  2)
 #define PAGE_ALIGNED (1  PAGE_SHIFT)
-#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */
-#define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */
 
.text
.align PAGE_ALIGNED
.globl relocate_kernel
 relocate_kernel:
movl8(%esp), %ebp /* list of pages */
-
-#ifdef CONFIG_X86_PAE
-   /* map the control page at its virtual address */
-
-   movlPTR(VA_PGD)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0xc000, %eax
-   shrl$27, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PMD_0)(%ebp), %edx
-   orl $PAE_PGD_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PMD_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x3fe0, %eax
-   shrl$18, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PTE_0)(%ebp), %edx
-   orl $PAGE_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PTE_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x001ff000, %eax
-   shrl$9, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_CONTROL_PAGE)(%ebp), %edx
-   orl $PAGE_ATTR, %edx
-   movl%edx, (%eax)
-
-   /* identity map the control page at its physical address */
-
-   movlPTR(VA_PGD)(%ebp), %edi
-   movlPTR

Re: [PATCH -mm 1/2] kexec/i386: kexec page table code clean up - add arch_kimage

2008-01-09 Thread Huang, Ying
On Wed, 2008-01-09 at 20:14 -0500, Vivek Goyal wrote:
[...]
   
  +static void alloc_page_tables(struct kimage *image)
  +{
 
 This is too generic a name. How about something like
 arch_alloc_kexec_page_tables()

OK, I will change it.

  +   image-arch_kimage.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
  +#ifdef CONFIG_X86_PAE
  +   image-arch_kimage.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
  +   image-arch_kimage.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
  +#endif
  +   image-arch_kimage.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
  +   image-arch_kimage.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
  +}
  +
  +static void free_page_tables(struct kimage *image)
  +{
 
 How about arch_free_kexec_page_tables()

OK, I will change it.

  +   free_page((unsigned long)image-arch_kimage.pgd);
  +#ifdef CONFIG_X86_PAE
  +   free_page((unsigned long)image-arch_kimage.pmd0);
  +   free_page((unsigned long)image-arch_kimage.pmd1);
  +#endif
  +   free_page((unsigned long)image-arch_kimage.pte0);
  +   free_page((unsigned long)image-arch_kimage.pte1);
  +}
  +
   /*
* A architecture hook called to validate the
* proposed image and prepare the control pages
  @@ -83,10 +97,21 @@ static void load_segments(void)
* reboot code buffer to allow us to avoid allocations
* later.
*
  - * Currently nothing.
  + * - Allocate page tables
*/
   int machine_kexec_prepare(struct kimage *image)
   {
  +   alloc_page_tables(image);
  +   if (!image-arch_kimage.pgd ||
  +#ifdef CONFIG_X86_PAE
  +   !image-arch_kimage.pmd0 ||
  +   !image-arch_kimage.pmd1 ||
  +#endif
  +   !image-arch_kimage.pte0 ||
  +   !image-arch_kimage.pte1) {
  +   free_page_tables(image);
  +   return -ENOMEM;
 
 I think this error handling can be done in alloc_page_tables() itself and
 following will look neater.

OK, I will change it.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm 2/2] kexec/i386: kexec page table code clean up - page table setup in C

2008-01-09 Thread Huang, Ying
On Wed, 2008-01-09 at 20:05 -0500, Vivek Goyal wrote:
 On Wed, Jan 09, 2008 at 10:57:50AM +0800, Huang, Ying wrote:
  This patch transforms the kexec page tables setup code from asseumbler
  code to iC code in machine_kexec_prepare. This improves readability and
  reduces code line number.
  
 
 I think this will create issues for Xen. Initially page table setup
 was in C but Xen Guests could not modify the page tables. I think Xen
 folks implemented a hypercall where they passed all the page table pages
 and the control pages and then hypervisor executed the control page(which
 in turn setup the page tables). I think that's why page table setup
 code is on the control page in assembly.
 
 You might want to go through Xen kexec implementation and dig through
 kexec mailing list archive.

OK, I will check the Xen kexec implementation.

 CCing Magnus and Horms. They had done the page tables related changes
 for Xen.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 2/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap - remove boot_ioremap

2008-01-14 Thread Huang, Ying
This patch replaces boot_ioremap invokation with bt_ioremap and
removes the boot_ioremap implementation.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/Kconfig  |4 -
 arch/x86/kernel/srat_32.c |8 +--
 arch/x86/mm/Makefile_32   |1 
 arch/x86/mm/boot_ioremap_32.c |  100 --
 include/asm-x86/efi.h |8 ---
 5 files changed, 5 insertions(+), 116 deletions(-)

--- a/arch/x86/kernel/srat_32.c
+++ b/arch/x86/kernel/srat_32.c
@@ -57,8 +57,6 @@ static struct node_memory_chunk_s node_m
 static int num_memory_chunks;  /* total number of memory chunks */
 static u8 __initdata apicid_to_pxm[MAX_APICID];
 
-extern void * boot_ioremap(unsigned long, unsigned long);
-
 /* Identify CPU proximity domains */
 static void __init parse_cpu_affinity_structure(char *p)
 {
@@ -299,7 +297,7 @@ int __init get_memcfg_from_srat(void)
}
 
rsdt = (struct acpi_table_rsdt *)
-   boot_ioremap(rsdp-rsdt_physical_address, sizeof(struct 
acpi_table_rsdt));
+   bt_ioremap(rsdp-rsdt_physical_address, sizeof(struct 
acpi_table_rsdt));
 
if (!rsdt) {
printk(KERN_WARNING
@@ -339,11 +337,11 @@ int __init get_memcfg_from_srat(void)
for (i = 0; i  tables; i++) {
/* Map in header, then map in full table length. */
header = (struct acpi_table_header *)
-   boot_ioremap(saved_rsdt.table.table_offset_entry[i], 
sizeof(struct acpi_table_header));
+   bt_ioremap(saved_rsdt.table.table_offset_entry[i], 
sizeof(struct acpi_table_header));
if (!header)
break;
header = (struct acpi_table_header *)
-   boot_ioremap(saved_rsdt.table.table_offset_entry[i], 
header-length);
+   bt_ioremap(saved_rsdt.table.table_offset_entry[i], 
header-length);
if (!header)
break;
 
--- a/arch/x86/mm/boot_ioremap_32.c
+++ /dev/null
@@ -1,100 +0,0 @@
-/*
- * arch/i386/mm/boot_ioremap.c
- *
- * Re-map functions for early boot-time before paging_init() when the
- * boot-time pagetables are still in use
- *
- * Written by Dave Hansen [EMAIL PROTECTED]
- */
-
-
-/*
- * We need to use the 2-level pagetable functions, but CONFIG_X86_PAE
- * keeps that from happening.  If anyone has a better way, I'm listening.
- *
- * boot_pte_t is defined only if this all works correctly
- */
-
-#undef CONFIG_X86_PAE
-#undef CONFIG_PARAVIRT
-#include asm/page.h
-#include asm/pgtable.h
-#include asm/tlbflush.h
-#include linux/init.h
-#include linux/stddef.h
-
-/*
- * I'm cheating here.  It is known that the two boot PTE pages are
- * allocated next to each other.  I'm pretending that they're just
- * one big array.
- */
-
-#define BOOT_PTE_PTRS (PTRS_PER_PTE*2)
-
-static unsigned long boot_pte_index(unsigned long vaddr)
-{
-   return __pa(vaddr)  PAGE_SHIFT;
-}
-
-static inline boot_pte_t* boot_vaddr_to_pte(void *address)
-{
-   boot_pte_t* boot_pg = (boot_pte_t*)pg0;
-   return boot_pg[boot_pte_index((unsigned long)address)];
-}
-
-/*
- * This is only for a caller who is clever enough to page-align
- * phys_addr and virtual_source, and who also has a preference
- * about which virtual address from which to steal ptes
- */
-static void __boot_ioremap(unsigned long phys_addr, unsigned long nrpages,
-   void* virtual_source)
-{
-   boot_pte_t* pte;
-   int i;
-   char *vaddr = virtual_source;
-
-   pte = boot_vaddr_to_pte(virtual_source);
-   for (i=0; i  nrpages; i++, phys_addr += PAGE_SIZE, pte++) {
-   set_pte(pte, pfn_pte(phys_addrPAGE_SHIFT, PAGE_KERNEL));
-   __flush_tlb_one((unsigned long) vaddr[i*PAGE_SIZE]);
-   }
-}
-
-/* the virtual space we're going to remap comes from this array */
-#define BOOT_IOREMAP_PAGES 4
-#define BOOT_IOREMAP_SIZE (BOOT_IOREMAP_PAGES*PAGE_SIZE)
-static __initdata char boot_ioremap_space[BOOT_IOREMAP_SIZE]
-  __attribute__ ((aligned (PAGE_SIZE)));
-
-/*
- * This only applies to things which need to ioremap before paging_init()
- * bt_ioremap() and plain ioremap() are both useless at this point.
- *
- * When used, we're still using the boot-time pagetables, which only
- * have 2 PTE pages mapping the first 8MB
- *
- * There is no unmap.  The boot-time PTE pages aren't used after boot.
- * If you really want the space back, just remap it yourself.
- * boot_ioremap(ioremap_space-PAGE_OFFSET, BOOT_IOREMAP_SIZE)
- */
-__init void* boot_ioremap(unsigned long phys_addr, unsigned long size)
-{
-   unsigned long last_addr, offset;
-   unsigned int nrpages;
-
-   last_addr = phys_addr + size - 1;
-
-   /* page align the requested address */
-   offset = phys_addr  ~PAGE_MASK;
-   phys_addr = PAGE_MASK;
-   size = PAGE_ALIGN(last_addr) - phys_addr;
-
-   nrpages = size  PAGE_SHIFT

[PATCH -mm 0/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap

2008-01-14 Thread Huang, Ying
This patchset replaces boot_ioremap with a enhanced version of
bt_ioremap and renames the bt_ioremap to early_ioremap. This reduces
12k from .init.data segment and increases the size of memory that can
be re-mapped before paging_init to 64k.

This patchset is based on linux-2.6.24-rc5-mm1 +
efi-split-efi-tables-parsing-code-from-efi-runtime-service-support-code.patch.
It has been tested on i386 with PAE on/off.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap - enhance bt_ioremap

2008-01-14 Thread Huang, Ying
This patch makes bt_ioremap can be used before paging_init via
providing an early implementation of set_fixmap that can be used
before paging_init. This makes boot_ioremap can be replaced by
bt_ioremap.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/setup_32.c |1 
 arch/x86/mm/init_32.c  |2 +
 arch/x86/mm/ioremap_32.c   |   87 +++--
 include/asm-x86/io_32.h|3 +
 4 files changed, 91 insertions(+), 2 deletions(-)

--- a/arch/x86/mm/ioremap_32.c
+++ b/arch/x86/mm/ioremap_32.c
@@ -208,6 +208,89 @@ void iounmap(volatile void __iomem *addr
 }
 EXPORT_SYMBOL(iounmap);
 
+static __initdata int after_paging_init;
+static __initdata unsigned long bm_pte[1024]
+   __attribute__((aligned(PAGE_SIZE)));
+
+static inline unsigned long * __init bt_ioremap_pgd(unsigned long addr)
+{
+   return (unsigned long *)swapper_pg_dir + ((addr  22)  1023);
+}
+
+static inline unsigned long * __init bt_ioremap_pte(unsigned long addr)
+{
+   return bm_pte + ((addr  PAGE_SHIFT)  1023);
+}
+
+void __init bt_ioremap_init(void)
+{
+   unsigned long *pgd;
+
+   pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN));
+   *pgd = __pa(bm_pte) | _PAGE_TABLE;
+   memset(bm_pte, 0, sizeof(bm_pte));
+   BUG_ON(pgd != bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_END)));
+}
+
+void __init bt_ioremap_clear(void)
+{
+   unsigned long *pgd;
+
+   pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN));
+   *pgd = 0;
+   __flush_tlb_all();
+}
+
+void __init bt_ioremap_reset(void)
+{
+   enum fixed_addresses idx;
+   unsigned long *pte, phys, addr;
+
+   after_paging_init = 1;
+   for (idx = FIX_BTMAP_BEGIN; idx = FIX_BTMAP_END; idx--) {
+   addr = fix_to_virt(idx);
+   pte = bt_ioremap_pte(addr);
+   if (!*pte  _PAGE_PRESENT) {
+   phys = *pte  PAGE_MASK;
+   set_fixmap(idx, phys);
+   }
+   }
+}
+
+static void __init __bt_set_fixmap(enum fixed_addresses idx,
+  unsigned long phys, pgprot_t flags)
+{
+   unsigned long *pte, addr = __fix_to_virt(idx);
+
+   if (idx = __end_of_fixed_addresses) {
+   BUG();
+   return;
+   }
+   pte = bt_ioremap_pte(addr);
+   if (pgprot_val(flags))
+   *pte = (phys  PAGE_MASK) | pgprot_val(flags);
+   else
+   *pte = 0;
+   __flush_tlb_one(addr);
+}
+
+static inline void __init bt_set_fixmap(enum fixed_addresses idx,
+   unsigned long phys)
+{
+   if (after_paging_init)
+   set_fixmap(idx, phys);
+   else
+   __bt_set_fixmap(idx, phys, PAGE_KERNEL);
+}
+
+static inline void __init bt_clear_fixmap(enum fixed_addresses idx)
+{
+   if (after_paging_init)
+   clear_fixmap(idx);
+   else
+   __bt_set_fixmap(idx, 0, __pgprot(0));
+}
+
 void __init *bt_ioremap(unsigned long phys_addr, unsigned long size)
 {
unsigned long offset, last_addr;
@@ -244,7 +327,7 @@ void __init *bt_ioremap(unsigned long ph
 */
idx = FIX_BTMAP_BEGIN;
while (nrpages  0) {
-   set_fixmap(idx, phys_addr);
+   bt_set_fixmap(idx, phys_addr);
phys_addr += PAGE_SIZE;
--idx;
--nrpages;
@@ -267,7 +350,7 @@ void __init bt_iounmap(void *addr, unsig
 
idx = FIX_BTMAP_BEGIN;
while (nrpages  0) {
-   clear_fixmap(idx);
+   bt_clear_fixmap(idx);
--idx;
--nrpages;
}
--- a/arch/x86/mm/init_32.c
+++ b/arch/x86/mm/init_32.c
@@ -423,9 +423,11 @@ static void __init pagetable_init (void)
 * Fixed mappings, only the page table structure has to be
 * created - mappings will be set by set_fixmap():
 */
+   bt_ioremap_clear();
vaddr = __fix_to_virt(__end_of_fixed_addresses - 1)  PMD_MASK;
end = (FIXADDR_TOP + PMD_SIZE - 1)  PMD_MASK;
page_table_range_init(vaddr, end, pgd_base);
+   bt_ioremap_reset();
 
permanent_kmaps_init(pgd_base);
 
--- a/include/asm-x86/io_32.h
+++ b/include/asm-x86/io_32.h
@@ -130,6 +130,9 @@ extern void iounmap(volatile void __iome
  * mappings, before the real ioremap() is functional.
  * A boot-time mapping is currently limited to at most 16 pages.
  */
+extern void bt_ioremap_init(void);
+extern void bt_ioremap_clear(void);
+extern void bt_ioremap_reset(void);
 extern void *bt_ioremap(unsigned long offset, unsigned long size);
 extern void bt_iounmap(void *addr, unsigned long size);
 extern void __iomem *fix_ioremap(unsigned idx, unsigned long phys);
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -624,6 +624,7 @@ void __init setup_arch(char **cmdline_p)
memcpy(boot_cpu_data, new_cpu_data, sizeof(new_cpu_data

[PATCH -mm 3/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap - rename bt_ioremap to early_ioremap

2008-01-14 Thread Huang, Ying
This patch renames bt_ioremap to early_ioremap, which is used in
x86_64. This makes it easier to merge i386 and x86_64 usage.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/efi.c|   16 
 arch/x86/kernel/efi_32.c |2 +-
 arch/x86/kernel/efi_tables.c |   12 ++--
 arch/x86/kernel/setup_32.c   |2 +-
 arch/x86/kernel/srat_32.c|6 +++---
 arch/x86/mm/init_32.c|4 ++--
 arch/x86/mm/ioremap_32.c |   38 +++---
 include/asm-x86/dmi.h|7 ++-
 include/asm-x86/efi.h|8 
 include/asm-x86/io_32.h  |   16 
 10 files changed, 50 insertions(+), 61 deletions(-)

--- a/arch/x86/mm/ioremap_32.c
+++ b/arch/x86/mm/ioremap_32.c
@@ -212,36 +212,36 @@ static __initdata int after_paging_init;
 static __initdata unsigned long bm_pte[1024]
__attribute__((aligned(PAGE_SIZE)));
 
-static inline unsigned long * __init bt_ioremap_pgd(unsigned long addr)
+static inline unsigned long * __init early_ioremap_pgd(unsigned long addr)
 {
return (unsigned long *)swapper_pg_dir + ((addr  22)  1023);
 }
 
-static inline unsigned long * __init bt_ioremap_pte(unsigned long addr)
+static inline unsigned long * __init early_ioremap_pte(unsigned long addr)
 {
return bm_pte + ((addr  PAGE_SHIFT)  1023);
 }
 
-void __init bt_ioremap_init(void)
+void __init early_ioremap_init(void)
 {
unsigned long *pgd;
 
-   pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN));
+   pgd = early_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN));
*pgd = __pa(bm_pte) | _PAGE_TABLE;
memset(bm_pte, 0, sizeof(bm_pte));
-   BUG_ON(pgd != bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_END)));
+   BUG_ON(pgd != early_ioremap_pgd(fix_to_virt(FIX_BTMAP_END)));
 }
 
-void __init bt_ioremap_clear(void)
+void __init early_ioremap_clear(void)
 {
unsigned long *pgd;
 
-   pgd = bt_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN));
+   pgd = early_ioremap_pgd(fix_to_virt(FIX_BTMAP_BEGIN));
*pgd = 0;
__flush_tlb_all();
 }
 
-void __init bt_ioremap_reset(void)
+void __init early_ioremap_reset(void)
 {
enum fixed_addresses idx;
unsigned long *pte, phys, addr;
@@ -249,7 +249,7 @@ void __init bt_ioremap_reset(void)
after_paging_init = 1;
for (idx = FIX_BTMAP_BEGIN; idx = FIX_BTMAP_END; idx--) {
addr = fix_to_virt(idx);
-   pte = bt_ioremap_pte(addr);
+   pte = early_ioremap_pte(addr);
if (!*pte  _PAGE_PRESENT) {
phys = *pte  PAGE_MASK;
set_fixmap(idx, phys);
@@ -257,7 +257,7 @@ void __init bt_ioremap_reset(void)
}
 }
 
-static void __init __bt_set_fixmap(enum fixed_addresses idx,
+static void __init __early_set_fixmap(enum fixed_addresses idx,
   unsigned long phys, pgprot_t flags)
 {
unsigned long *pte, addr = __fix_to_virt(idx);
@@ -266,7 +266,7 @@ static void __init __bt_set_fixmap(enum 
BUG();
return;
}
-   pte = bt_ioremap_pte(addr);
+   pte = early_ioremap_pte(addr);
if (pgprot_val(flags))
*pte = (phys  PAGE_MASK) | pgprot_val(flags);
else
@@ -274,24 +274,24 @@ static void __init __bt_set_fixmap(enum 
__flush_tlb_one(addr);
 }
 
-static inline void __init bt_set_fixmap(enum fixed_addresses idx,
+static inline void __init early_set_fixmap(enum fixed_addresses idx,
unsigned long phys)
 {
if (after_paging_init)
set_fixmap(idx, phys);
else
-   __bt_set_fixmap(idx, phys, PAGE_KERNEL);
+   __early_set_fixmap(idx, phys, PAGE_KERNEL);
 }
 
-static inline void __init bt_clear_fixmap(enum fixed_addresses idx)
+static inline void __init early_clear_fixmap(enum fixed_addresses idx)
 {
if (after_paging_init)
clear_fixmap(idx);
else
-   __bt_set_fixmap(idx, 0, __pgprot(0));
+   __early_set_fixmap(idx, 0, __pgprot(0));
 }
 
-void __init *bt_ioremap(unsigned long phys_addr, unsigned long size)
+void __init *early_ioremap(unsigned long phys_addr, unsigned long size)
 {
unsigned long offset, last_addr;
unsigned int nrpages;
@@ -327,7 +327,7 @@ void __init *bt_ioremap(unsigned long ph
 */
idx = FIX_BTMAP_BEGIN;
while (nrpages  0) {
-   bt_set_fixmap(idx, phys_addr);
+   early_set_fixmap(idx, phys_addr);
phys_addr += PAGE_SIZE;
--idx;
--nrpages;
@@ -335,7 +335,7 @@ void __init *bt_ioremap(unsigned long ph
return (void*) (offset + fix_to_virt(FIX_BTMAP_BEGIN));
 }
 
-void __init bt_iounmap(void *addr, unsigned long size)
+void __init early_iounmap(void *addr, unsigned long size)
 {
unsigned long virt_addr

[PATCH -mm 1/2 -v2] kexec/i386: kexec page table code clean up - add arch_kimage

2008-01-14 Thread Huang, Ying
This patch add an architecture specific struct arch_kimage into struct
kimage. Three pointers to page table pages used by kexec are added to
struct arch_kimage. The page tables pages are dynamically allocated in
machine_kexec_prepare instead of statically from BSS segment. This
will save up to 20k memory when kexec image is not loaded.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/machine_kexec_32.c |   70 +
 include/asm-x86/kexec_32.h |   12 ++
 include/linux/kexec.h  |4 ++
 3 files changed, 64 insertions(+), 22 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -11,6 +11,7 @@
 #include linux/delay.h
 #include linux/init.h
 #include linux/numa.h
+#include linux/gfp.h
 #include asm/pgtable.h
 #include asm/pgalloc.h
 #include asm/tlbflush.h
@@ -21,15 +22,6 @@
 #include asm/desc.h
 #include asm/system.h
 
-#define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
-static u32 kexec_pgd[1024] PAGE_ALIGNED;
-#ifdef CONFIG_X86_PAE
-static u32 kexec_pmd0[1024] PAGE_ALIGNED;
-static u32 kexec_pmd1[1024] PAGE_ALIGNED;
-#endif
-static u32 kexec_pte0[1024] PAGE_ALIGNED;
-static u32 kexec_pte1[1024] PAGE_ALIGNED;
-
 static void set_idt(void *newidt, __u16 limit)
 {
struct Xgt_desc_struct curidt;
@@ -72,6 +64,39 @@ static void load_segments(void)
 #undef __STR
 }
 
+static void machine_kexec_free_page_tables(struct kimage *image)
+{
+   free_page((unsigned long)image-arch_kimage.pgd);
+#ifdef CONFIG_X86_PAE
+   free_page((unsigned long)image-arch_kimage.pmd0);
+   free_page((unsigned long)image-arch_kimage.pmd1);
+#endif
+   free_page((unsigned long)image-arch_kimage.pte0);
+   free_page((unsigned long)image-arch_kimage.pte1);
+}
+
+static int machine_kexec_alloc_page_tables(struct kimage *image)
+{
+   image-arch_kimage.pgd = (pgd_t *)get_zeroed_page(GFP_KERNEL);
+#ifdef CONFIG_X86_PAE
+   image-arch_kimage.pmd0 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+   image-arch_kimage.pmd1 = (pmd_t *)get_zeroed_page(GFP_KERNEL);
+#endif
+   image-arch_kimage.pte0 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   image-arch_kimage.pte1 = (pte_t *)get_zeroed_page(GFP_KERNEL);
+   if (!image-arch_kimage.pgd ||
+#ifdef CONFIG_X86_PAE
+   !image-arch_kimage.pmd0 ||
+   !image-arch_kimage.pmd1 ||
+#endif
+   !image-arch_kimage.pte0 ||
+   !image-arch_kimage.pte1) {
+   machine_kexec_free_page_tables(image);
+   return -ENOMEM;
+   }
+   return 0;
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -83,11 +108,11 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * - Allocate page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
-   return 0;
+   return machine_kexec_alloc_page_tables(image);
 }
 
 /*
@@ -96,6 +121,7 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+   machine_kexec_free_page_tables(image);
 }
 
 /*
@@ -115,18 +141,18 @@ NORET_TYPE void machine_kexec(struct kim
 
page_list[PA_CONTROL_PAGE] = __pa(control_page);
page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
-   page_list[PA_PGD] = __pa(kexec_pgd);
-   page_list[VA_PGD] = (unsigned long)kexec_pgd;
+   page_list[PA_PGD] = __pa(image-arch_kimage.pgd);
+   page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd;
 #ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(kexec_pmd0);
-   page_list[VA_PMD_0] = (unsigned long)kexec_pmd0;
-   page_list[PA_PMD_1] = __pa(kexec_pmd1);
-   page_list[VA_PMD_1] = (unsigned long)kexec_pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(kexec_pte0);
-   page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
-   page_list[PA_PTE_1] = __pa(kexec_pte1);
-   page_list[VA_PTE_1] = (unsigned long)kexec_pte1;
+   page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0);
+   page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0;
+   page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1);
+   page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1;
+#endif
+   page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0);
+   page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0;
+   page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1);
+   page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1;
 
/* The segment registers are funny things, they have both a
 * visible and an invisible part.  Whenever the visible part is
--- a/include/asm-x86/kexec_32.h
+++ b/include/asm-x86/kexec_32.h
@@ -94,6 +94,18 @@ relocate_kernel(unsigned long indirectio
unsigned long start_address,
unsigned int has_pae) ATTRIB_NORET;
 
+#define ARCH_HAS_ARCH_KIMAGE
+
+struct

[PATCH -mm 0/2 -v2] kexec/i386: kexec page table code clean up

2008-01-14 Thread Huang, Ying
This patchset cleans up page table setup code of kexec on i386.

This patchset is based on 2.6.24-rc5-mm1 and has been tested on i386
with/without PAE enabled.


v2:

- Rename some function names, such as alloc_page_tables -
  machine_kexec_alloc_page_tables, etc.

- Cleanup error processing for machine_alloc_page_tables.


Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 2/2 -v2] kexec/i386: kexec page table code clean up - page table setup in C

2008-01-14 Thread Huang, Ying
This patch transforms the kexec page tables setup code from assembler
code to C code in machine_kexec_prepare. This improves readability and
reduces code line number.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/machine_kexec_32.c   |   59 ++
 arch/x86/kernel/relocate_kernel_32.S |  114 ---
 include/asm-x86/kexec_32.h   |   18 -
 3 files changed, 48 insertions(+), 143 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -97,6 +97,45 @@ static int machine_kexec_alloc_page_tabl
return 0;
 }
 
+static void machine_kexec_page_table_set_one(
+   pgd_t *pgd, pmd_t *pmd, pte_t *pte,
+   unsigned long vaddr, unsigned long paddr)
+{
+   pud_t *pud;
+
+   pgd += pgd_index(vaddr);
+#ifdef CONFIG_X86_PAE
+   if (!(pgd_val(*pgd)  _PAGE_PRESENT))
+   set_pgd(pgd, __pgd(__pa(pmd) | _PAGE_PRESENT));
+#endif
+   pud = pud_offset(pgd, vaddr);
+   pmd = pmd_offset(pud, vaddr);
+   if (!(pmd_val(*pmd)  _PAGE_PRESENT))
+   set_pmd(pmd, __pmd(__pa(pte) | _PAGE_TABLE));
+   pte = pte_offset_kernel(pmd, vaddr);
+   set_pte(pte, pfn_pte(paddr  PAGE_SHIFT, PAGE_KERNEL_EXEC));
+}
+
+static void machine_kexec_prepare_page_tables(struct kimage *image)
+{
+   void *control_page;
+   pmd_t *pmd = 0;
+
+   control_page = page_address(image-control_code_page);
+#ifdef CONFIG_X86_PAE
+   pmd = image-arch_kimage.pmd0;
+#endif
+   machine_kexec_page_table_set_one(
+   image-arch_kimage.pgd, pmd, image-arch_kimage.pte0,
+   (unsigned long)relocate_kernel, __pa(control_page));
+#ifdef CONFIG_X86_PAE
+   pmd = image-arch_kimage.pmd1;
+#endif
+   machine_kexec_page_table_set_one(
+   image-arch_kimage.pgd, pmd, image-arch_kimage.pte1,
+   __pa(control_page), __pa(control_page));
+}
+
 /*
  * A architecture hook called to validate the
  * proposed image and prepare the control pages
@@ -109,10 +148,16 @@ static int machine_kexec_alloc_page_tabl
  * later.
  *
  * - Allocate page tables
+ * - Setup page tables
  */
 int machine_kexec_prepare(struct kimage *image)
 {
-   return machine_kexec_alloc_page_tables(image);
+   int error;
+   error = machine_kexec_alloc_page_tables(image);
+   if (error)
+   return error;
+   machine_kexec_prepare_page_tables(image);
+   return 0;
 }
 
 /*
@@ -140,19 +185,7 @@ NORET_TYPE void machine_kexec(struct kim
memcpy(control_page, relocate_kernel, PAGE_SIZE);
 
page_list[PA_CONTROL_PAGE] = __pa(control_page);
-   page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
page_list[PA_PGD] = __pa(image-arch_kimage.pgd);
-   page_list[VA_PGD] = (unsigned long)image-arch_kimage.pgd;
-#ifdef CONFIG_X86_PAE
-   page_list[PA_PMD_0] = __pa(image-arch_kimage.pmd0);
-   page_list[VA_PMD_0] = (unsigned long)image-arch_kimage.pmd0;
-   page_list[PA_PMD_1] = __pa(image-arch_kimage.pmd1);
-   page_list[VA_PMD_1] = (unsigned long)image-arch_kimage.pmd1;
-#endif
-   page_list[PA_PTE_0] = __pa(image-arch_kimage.pte0);
-   page_list[VA_PTE_0] = (unsigned long)image-arch_kimage.pte0;
-   page_list[PA_PTE_1] = __pa(image-arch_kimage.pte1);
-   page_list[VA_PTE_1] = (unsigned long)image-arch_kimage.pte1;
 
/* The segment registers are funny things, they have both a
 * visible and an invisible part.  Whenever the visible part is
--- a/arch/x86/kernel/relocate_kernel_32.S
+++ b/arch/x86/kernel/relocate_kernel_32.S
@@ -16,126 +16,12 @@
 
 #define PTR(x) (x  2)
 #define PAGE_ALIGNED (1  PAGE_SHIFT)
-#define PAGE_ATTR 0x63 /* _PAGE_PRESENT|_PAGE_RW|_PAGE_ACCESSED|_PAGE_DIRTY */
-#define PAE_PGD_ATTR 0x01 /* _PAGE_PRESENT */
 
.text
.align PAGE_ALIGNED
.globl relocate_kernel
 relocate_kernel:
movl8(%esp), %ebp /* list of pages */
-
-#ifdef CONFIG_X86_PAE
-   /* map the control page at its virtual address */
-
-   movlPTR(VA_PGD)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0xc000, %eax
-   shrl$27, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PMD_0)(%ebp), %edx
-   orl $PAE_PGD_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PMD_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x3fe0, %eax
-   shrl$18, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_PTE_0)(%ebp), %edx
-   orl $PAGE_ATTR, %edx
-   movl%edx, (%eax)
-
-   movlPTR(VA_PTE_0)(%ebp), %edi
-   movlPTR(VA_CONTROL_PAGE)(%ebp), %eax
-   andl$0x001ff000, %eax
-   shrl$9, %eax
-   addl%edi, %eax
-
-   movlPTR(PA_CONTROL_PAGE)(%ebp), %edx
-   orl $PAGE_ATTR, %edx
-   movl%edx, (%eax)
-
-   /* identity map the control page at its physical

Re: [PATCH -mm 0/3] i386 boot: replace boot_ioremap with enhanced bt_ioremap

2008-01-15 Thread Huang, Ying
On Tue, 2008-01-15 at 09:44 +0100, Ingo Molnar wrote:
 * Huang, Ying [EMAIL PROTECTED] wrote:
 
  This patchset replaces boot_ioremap with a enhanced version of 
  bt_ioremap and renames the bt_ioremap to early_ioremap. This reduces 
  12k from .init.data segment and increases the size of memory that can 
  be re-mapped before paging_init to 64k.
 
 in latest x86.git#mm there's an early_ioremap() introduced as part of 
 the PAT series - available on both 32-bit and 64-bit. Could you take a 
 look at it and use that if it's OK for your purposes?

After checking the early_ioremap() implementation in
arch/x86/kernel/setup_32.c, I found that it is a duplication of
bt_ioremap() implementation in arch/x86/mm/ioremap_32.c. Both
implementations use set_fixmap(), so they can be used only after
paging_init().

The early_ioremap implementation provided in this patchset works as
follow:

- Enhances bt_ioremap, make it usable before paging_init() via a
dedicated PTE page.
- Rename bt_ioremap to early_ioremap

So I think maybe we should replace the early_ioremap() implementation in
PAT series with that of this series.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [BUGFIX] x86_64: NX bit handling in change_page_attr

2007-09-12 Thread Huang, Ying
On Wed, 2007-09-12 at 15:35 +0200, Andi Kleen wrote:
   Index: linux-2.6.23-rc2-mm2/arch/x86_64/mm/pageattr.c
   ===
   --- linux-2.6.23-rc2-mm2.orig/arch/x86_64/mm/pageattr.c   2007-08-17
   12:50:25.0 +0800 +++
   linux-2.6.23-rc2-mm2/arch/x86_64/mm/pageattr.c2007-08-17
   12:50:48.0 +0800 @@ -147,6 +147,7 @@
 split = split_large_page(address, prot, ref_prot2);
 if (!split)
 return -ENOMEM;
   + pgprot_val(ref_prot2) = ~_PAGE_NX;
 set_pte(kpte, mk_pte(split, ref_prot2));
 kpte_page = split;
 }
 
  What happened with this?  Still valid?
 
 The bug is probably latent there, but I don't think it can affect anything
 in the kernel because nothing in the kernel should change NX status
 as far as I know.
 
 Where did you see it? 

I found the problem for EFI runtime service supporting. Where the EFI
runtime code (from firmware) need to be mapped without NX bit set.

 Anyways I would prefer to only clear the PMD NX when NX status actually 
 changes on the PTE.Can you do that change? 

This change is sufficient for Intel CPU. Because the NX bit of PTE is
still there, no page will be made executable if not been set explicitly
through PTE. For AMD CPU, will the page be made executable if the NX bit
of PMD is cleared and the NX bit of PTE is set? If so, I will do the
change as you said.

 Anyways; it's really not very important.

It is needed by EFI runtime service supporting.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC -mm 1/2] i386/x86_64 boot: setup data

2007-09-17 Thread Huang, Ying
This patch add a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used to define a more extensible boot parameters
passing mechanism.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/i386/Kconfig|3 ---
 arch/i386/boot/header.S  |6 ++
 arch/i386/kernel/setup.c |   20 
 arch/x86_64/kernel/setup.c   |   19 +++
 include/asm-i386/bootparam.h |   15 +++
 include/asm-i386/io.h|7 +++
 6 files changed, 67 insertions(+), 3 deletions(-)

Index: linux-2.6.23-rc4/include/asm-i386/bootparam.h
===
--- linux-2.6.23-rc4.orig/include/asm-i386/bootparam.h  2007-09-17 
14:18:24.0 +0800
+++ linux-2.6.23-rc4/include/asm-i386/bootparam.h   2007-09-17 
15:02:33.0 +0800
@@ -9,6 +9,17 @@
 #include asm/ist.h
 #include video/edid.h
 
+/* setup data types */
+#define SETUP_NONE 0
+
+/* extensible setup data list node */
+struct setup_data {
+   u64 next;
+   u32 type;
+   u32 len;
+   u8 data[0];
+} __attribute__((packed));
+
 struct setup_header {
u8  setup_sects;
u16 root_flags;
@@ -41,6 +52,10 @@
u32 initrd_addr_max;
u32 kernel_alignment;
u8  relocatable_kernel;
+   u8  _pad2[3];
+   u32 cmdline_size;
+   u32 _pad3;
+   u64 setup_data;
 } __attribute__((packed));
 
 struct sys_desc_table {
Index: linux-2.6.23-rc4/arch/i386/boot/header.S
===
--- linux-2.6.23-rc4.orig/arch/i386/boot/header.S   2007-09-17 
14:17:32.0 +0800
+++ linux-2.6.23-rc4/arch/i386/boot/header.S2007-09-17 14:18:32.0 
+0800
@@ -214,6 +214,12 @@
 #added with boot protocol
 #version 2.06
 
+pad4:  .long 0
+
+setup_data:.quad 0 # 64-bit physical pointer to
+   # single linked list of
+   # struct setup_data
+
 # End of setup header #
 
.section .inittext, ax
Index: linux-2.6.23-rc4/arch/x86_64/kernel/setup.c
===
--- linux-2.6.23-rc4.orig/arch/x86_64/kernel/setup.c2007-09-17 
14:18:23.0 +0800
+++ linux-2.6.23-rc4/arch/x86_64/kernel/setup.c 2007-09-17 15:02:33.0 
+0800
@@ -221,6 +221,23 @@
ebda_size = 64*1024;
 }
 
+void __init parse_setup_data(void)
+{
+   struct setup_data *setup_data;
+   unsigned long pa_setup_data;
+
+   pa_setup_data = boot_params.hdr.setup_data;
+   while (pa_setup_data) {
+   setup_data = early_ioremap(pa_setup_data, PAGE_SIZE);
+   switch (setup_data-type) {
+   default:
+   break;
+   }
+   pa_setup_data = setup_data-next;
+   early_iounmap(setup_data, PAGE_SIZE);
+   }
+}
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO Command line: %s\n, boot_command_line);
@@ -256,6 +273,8 @@
strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
 
+   parse_setup_data();
+
parse_early_param();
 
finish_e820_parsing();
Index: linux-2.6.23-rc4/arch/i386/kernel/setup.c
===
--- linux-2.6.23-rc4.orig/arch/i386/kernel/setup.c  2007-09-17 
14:18:23.0 +0800
+++ linux-2.6.23-rc4/arch/i386/kernel/setup.c   2007-09-17 14:18:32.0 
+0800
@@ -496,6 +496,23 @@
return machine_specific_memory_setup();
 }
 
+void __init parse_setup_data(void)
+{
+   struct setup_data *setup_data;
+   unsigned long pa_setup_data, pa_next;
+
+   pa_setup_data = boot_params.hdr.setup_data;
+   while (pa_setup_data) {
+   setup_data = boot_ioremap(pa_setup_data, PAGE_SIZE);
+   pa_next = setup_data-next;
+   switch (setup_data-type) {
+   default:
+   break;
+   }
+   pa_setup_data = pa_next;
+   }
+}
+
 /*
  * Determine if we were loaded by an EFI loader.  If so, then we have also been
  * passed the efi memmap, systab, etc., so we should use these data structures
@@ -544,6 +561,9 @@
rd_prompt = ((boot_params.hdr.ram_size  RAMDISK_PROMPT_FLAG) != 0);
rd_doload = ((boot_params.hdr.ram_size  RAMDISK_LOAD_FLAG) != 0);
 #endif
+
+   parse_setup_data();
+
ARCH_SETUP
if (efi_enabled)
efi_init();
Index: linux-2.6.23-rc4/include/asm-i386/io.h

[RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-17 Thread Huang, Ying
This patch defines a 32-bit boot protocol and adds corresponding
document.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 boot.txt |  105 ++-
 1 file changed, 104 insertions(+), 1 deletion(-)

Index: linux-2.6.23-rc4/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc4.orig/Documentation/i386/boot.txt   2007-09-17 
11:22:32.0 +0800
+++ linux-2.6.23-rc4/Documentation/i386/boot.txt2007-09-17 
11:34:10.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin [EMAIL PROTECTED]
-   Last update 2007-05-23
+   Last update 2007-09-14
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,70 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as zero page
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should fill the
+following additional fields of the zero page too.
+
+Offset TypeDescription
+--     ---
+0  32 bytesstruct screen_info, SCREEN_INFO
+   ATTENTION, overlaps the following !!!
+2  unsigned short  EXT_MEM_K, extended memory size in Kb (from int 0x15)
+ 0x20  unsigned short  CL_MAGIC, commandline magic number (=0xA33F)
+ 0x22  unsigned short  CL_OFFSET, commandline offset
+   Address of commandline is calculated:
+ 0x9 + contents of CL_OFFSET
+   (only taken, when CL_MAGIC = 0xA33F)
+ 0x40  20 bytesstruct apm_bios_info, APM_BIOS_INFO
+ 0x60  16 bytesIntel SpeedStep (IST) BIOS support information
+ 0x80  16 byteshd0-disk-parameter from intvector 0x41
+ 0x90  16 byteshd1-disk-parameter from intvector 0x46
+
+ 0xa0  16 bytesSystem description table truncated to 16 bytes.
+   ( struct sys_desc_table_struct )
+ 0xb0 - 0x13f  Free. Add more parameters here if you really need them.
+ 0x140- 0x1be  EDID_INFO Video

[RFC -mm 0/2] i386/x86_64 boot: 32-bit boot protocol

2007-09-17 Thread Huang, Ying
For machine with some new BIOS other than legacy BIOS, such as EFI,
LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
based on legacy BIOS can not be used, so a 32-bit boot protocol need
to be defined.

This patchset defines a 32-bit boot protocol for i386 and x86_64. A
linked list based boot parameters pass mechanism is also added to
improve the extensibility.

The patchset has been tested against 2.6.23-rc4-mm1 kernel on x86_64.

This patchset is based on the proposal of Peter Anvin.

Known Issues:

1. Where is safe to place the linked list of setup_data?

Because the length of the linked list of setup_data is variable, it
can not be copied into BSS segment of kernel as that of zero
page. We must find a safe place for it, where it will not be
overwritten by kernel during booting up. The i386 kernel will
overwrite some pages after _end. The x86_64 kernel will overwrite some
pages from 0x1000 on.

EFI64 runtime service is the first user of 32-bit boot protocol and
boot parameters passing mechanism. To demonstrate their usage, the
EFI64 runtime service patch is also appended with the mail.

Best Regards,
Huang Ying

---

 Documentation/i386/boot.txt|   15 
 Documentation/x86_64/boot-options.txt  |   12 
 arch/x86_64/Kconfig|   11 
 arch/x86_64/kernel/Makefile|1 
 arch/x86_64/kernel/efi.c   |  597 +
 arch/x86_64/kernel/efi_callwrap.S  |   69 +++
 arch/x86_64/kernel/reboot.c|   20 -
 arch/x86_64/kernel/setup.c |   14 
 arch/x86_64/kernel/time.c  |   48 +-
 include/asm-i386/bootparam.h   |   10 
 include/asm-x86_64/efi.h   |   18 
 include/asm-x86_64/eficallwrap.h   |   33 +
 include/asm-x86_64/emergency-restart.h |9 
 include/asm-x86_64/fixmap.h|3 
 include/asm-x86_64/time.h  |7 
 15 files changed, 842 insertions(+), 25 deletions(-)

Index: linux-2.6.23-rc4/include/asm-x86_64/eficallwrap.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc4/include/asm-x86_64/eficallwrap.h   2007-09-17 
15:03:47.0 +0800
@@ -0,0 +1,33 @@
+/*
+ *  Copyright (C) 2007 Intel Corp
+ * Bibo Mao [EMAIL PROTECTED]
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ *  Function calling ABI conversion from SYSV to Windows for x86_64
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ *
+ */
+
+#ifndef __ASM_X86_64_EFICALLWRAP_H
+#define __ASM_X86_64_EFICALLWRAP_H
+
+extern efi_status_t lin2win0(void *fp);
+extern efi_status_t lin2win1(void *fp, u64 arg1);
+extern efi_status_t lin2win2(void *fp, u64 arg1, u64 arg2);
+extern efi_status_t lin2win3(void *fp, u64 arg1, u64 arg2, u64 arg3);
+extern efi_status_t lin2win4(void *fp, u64 arg1, u64 arg2, u64 arg3, u64 arg4);
+extern efi_status_t lin2win5(void *fp, u64 arg1, u64 arg2, u64 arg3,
+u64 arg4, u64 arg5);
+extern efi_status_t lin2win6(void *fp, u64 arg1, u64 arg2, u64 arg3,
+u64 arg4, u64 arg5, u64 arg6);
+
+#endif
Index: linux-2.6.23-rc4/arch/x86_64/kernel/efi.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.23-rc4/arch/x86_64/kernel/efi.c   2007-09-17 15:03:47.0 
+0800
@@ -0,0 +1,597 @@
+/*
+ * Extensible Firmware Interface
+ *
+ * Based on Extensible Firmware Interface Specification version 1.0
+ *
+ * Copyright (C) 1999 VA Linux Systems
+ * Copyright (C) 1999 Walt Drummond [EMAIL PROTECTED]
+ * Copyright (C) 1999-2002 Hewlett-Packard Co.
+ * David Mosberger-Tang [EMAIL PROTECTED]
+ * Stephane Eranian [EMAIL PROTECTED]
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu [EMAIL PROTECTED]
+ * Bibo Mao [EMAIL PROTECTED]
+ * Chandramouli Narayanan [EMAIL PROTECTED]
+ *
+ * Code to convert EFI to E820 map has been implemented in elilo bootloader
+ * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table
+ * is setup appropriately for EFI runtime code.
+ * - mouli 06/14/2007.
+ *
+ * All EFI Runtime Services are not implemented yet as EFI only
+ * supports physical mode addressing on SoftSDV. This is to be fixed
+ * in a future version.  --drummond 1999-07-20
+ *
+ * Implemented EFI runtime services and virtual mode calls.  --davidm
+ *
+ * Goutham Rao: [EMAIL PROTECTED]
+ * Skip non-WB memory and ignore empty memory ranges

Re: [RFC -mm 0/2] i386/x86_64 boot: 32-bit boot protocol

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 10:40 +0200, Andi Kleen wrote:
 On Monday 17 September 2007 10:26:12 Huang, Ying wrote:
  For machine with some new BIOS other than legacy BIOS, such as EFI,
  LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
  based on legacy BIOS can not be used, so a 32-bit boot protocol need
  to be defined.
 
 The patch doesn't seem to be what you advertise in the description.
 Can you start with a patch that just implements the new boot protocol
 parsing for better review? The EFI code should be all in separate 
 patches.
 -Andi

The real contents of 32-bit boot protocol patch is is in another 2 mails
with the title:

[RFC -mm 1/2] i386/x86_64 boot: setup data
[RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

The EFI patch in this mail is just an example of 32-bit boot protocol
usage.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 1/2] i386/x86_64 boot: setup data

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 08:30 -0700, H. Peter Anvin wrote:
 Huang, Ying wrote:
  This patch add a field of 64-bit physical pointer to NULL terminated
  single linked list of struct setup_data to real-mode kernel
  header. This is used to define a more extensible boot parameters
  passing mechanism.
 
 You MUST NOT add a field like this without changing the version number,
 and, since you expect to enter the kernel at the PM entrypoint, you
 better *CHECK* that version number before ever descending down the chain.
 

I forgot changing the version number in boot/head.S. I will add it. And
I will add version number checking before descending down the chain.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 08:29 -0700, H. Peter Anvin wrote:
 Huang, Ying wrote:
  This patch defines a 32-bit boot protocol and adds corresponding
  document.
  +
  +In addition to read/modify/write kernel header of the zero page as
  +that of 16-bit boot protocol, the boot loader should fill the
  +following additional fields of the zero page too.
  +
  +Offset TypeDescription
  +--     ---
  +0  32 bytesstruct screen_info, SCREEN_INFO
  +   ATTENTION, overlaps the following !!!
  +2  unsigned short  EXT_MEM_K, extended memory size in Kb (from int 
  0x15)
  + 0x20  unsigned short  CL_MAGIC, commandline magic number (=0xA33F)
  + 0x22  unsigned short  CL_OFFSET, commandline offset
  +   Address of commandline is calculated:
  + 0x9 + contents of CL_OFFSET
  +   (only taken, when CL_MAGIC = 0xA33F)
  + 0x40  20 bytesstruct apm_bios_info, APM_BIOS_INFO
  + 0x60  16 bytesIntel SpeedStep (IST) BIOS support information
  + 0x80  16 byteshd0-disk-parameter from intvector 0x41
  + 0x90  16 byteshd1-disk-parameter from intvector 0x46
  +
  + 0xa0  16 bytesSystem description table truncated to 16 bytes.
  +   ( struct sys_desc_table_struct )
  + 0xb0 - 0x13f  Free. Add more parameters here if you really 
  need them.
  + 0x140- 0x1be  EDID_INFO Video mode setup
  +
  +0x1c4  unsigned long   EFI system table pointer
  +0x1c8  unsigned long   EFI memory descriptor size
  +0x1cc  unsigned long   EFI memory descriptor version
  +0x1d0  unsigned long   EFI memory descriptor map pointer
  +0x1d4  unsigned long   EFI memory descriptor map size
  +0x1e0  unsigned long   ALT_MEM_K, alternative mem check, in Kb
  +0x1e4  unsigned long   Scratch field for the kernel setup code
  +0x1e8  charnumber of entries in E820MAP (below)
  +0x1e9  unsigned char   number of entries in EDDBUF (below)
  +0x1ea  unsigned char   number of entries in EDD_MBR_SIG_BUFFER (below)
  +0x290 - 0x2cf  EDD_MBR_SIG_BUFFER (edd.S)
  +0x2d0 - 0xd00  E820MAP
  +0xd00 - 0xeff  EDDBUF (edd.S) for disk signature read sector
  +0xd00 - 0xeeb  EDDBUF (edd.S) for edd data
  +
  +After loading and setuping the zero page, the boot loader can load the
  +32/64-bit kernel in the same way as that of 16-bit boot protocol.
  +
  +In 32-bit boot protocol, the kernel is started by jumping to the
  +32-bit kernel entry point, which is the start address of loaded
  +32/64-bit kernel.
  +
  +At entry, the CPU must be in 32-bit protected mode with paging
  +disabled; the CS and DS must be 4G flat segments; %esi holds the base
  +address of the zero page; %esp, %ebp, %edi should be zero.
 
 This is just replicating the zero-page.txt document, which can best be
 described as a total lie -- compare with the actual structure.

OK, I will check the actual structure, and change the document
accordingly.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC -mm 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-17 Thread Huang, Ying
On Mon, 2007-09-17 at 18:48 -0700, H. Peter Anvin wrote:
 Huang, Ying wrote:
  
  OK, I will check the actual structure, and change the document
  accordingly.
  
 
 The best would probably be to fix zero-page.txt (and probably rename it
 something saner.)

Does the patch appended with the mail seems better?

If it is desired, I can move the zero page description into
zero-page.txt, and refer to it in 32-bit boot protocol description.

I delete the hd0_info and hd1_info from the zero page. If it is
undesired, I will move them back.

The field in zero page is fairly complex (such as struct edd_info). Do
you think it is necessary to document every field inside the first level
field, until the primary data type? Or we just provide the C struct
name?

Best Regards,
Huang Ying

---

Index: linux-2.6.23-rc4/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc4.orig/Documentation/i386/boot.txt   2007-09-18 
10:40:34.0 +0800
+++ linux-2.6.23-rc4/Documentation/i386/boot.txt2007-09-18 
10:46:13.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin [EMAIL PROTECTED]
-   Last update 2007-05-23
+   Last update 2007-09-14
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,57 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as zero page
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should fill the
+following additional fields of the zero page too.
+
+Offset Proto   NameMeaning
+/Size
+
+000/0402.07+   screen_info Text mode or frame buffer information
+   (struct screen_info)
+040/0142.07+   apm_bios_info   APM BIOS information (struct 
apm_bios_info)
+060/0102.07+   ist_infoIntel SpeedStep (IST) BIOS support 
information
+   (struct ist_info)
+0A0/0102.07+   sys_desc_table  System description table (struct 
sys_desc_table)
+140/0802.07+   edid_info   Video mode setup

[PATCH -mm -v2 1/2] i386/x86_64 boot: setup data

2007-09-18 Thread Huang, Ying
This patch add a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.

This patch has been tested against 2.6.23-rc6-mm1 kernel on x86_64. It
is based on the proposal of Peter Anvin.


Known Issues:

1. Where is safe to place the linked list of setup_data?
Because the length of the linked list of setup_data is variable, it
can not be copied into BSS segment of kernel as that of zero
page. We must find a safe place for it, where it will not be
overwritten by kernel during booting up. The i386 kernel will
overwrite some pages after _end. The x86_64 kernel will overwrite some
pages from 0x1000 on.


ChangeLog:

-- v2 --

- Increase the boot protocol version number.
- Check version number before parsing setup_data.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/i386/Kconfig|3 ---
 arch/i386/boot/header.S  |8 +++-
 arch/i386/kernel/setup.c |   22 ++
 arch/x86_64/kernel/setup.c   |   21 +
 include/asm-i386/bootparam.h |   15 +++
 include/asm-i386/io.h|7 +++
 6 files changed, 72 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc6/include/asm-i386/bootparam.h
===
--- linux-2.6.23-rc6.orig/include/asm-i386/bootparam.h  2007-09-19 
10:00:06.0 +0800
+++ linux-2.6.23-rc6/include/asm-i386/bootparam.h   2007-09-19 
10:00:08.0 +0800
@@ -9,6 +9,17 @@
 #include asm/ist.h
 #include video/edid.h
 
+/* setup data types */
+#define SETUP_NONE 0
+
+/* extensible setup data list node */
+struct setup_data {
+   u64 next;
+   u32 type;
+   u32 len;
+   u8 data[0];
+} __attribute__((packed));
+
 struct setup_header {
u8  setup_sects;
u16 root_flags;
@@ -41,6 +52,10 @@
u32 initrd_addr_max;
u32 kernel_alignment;
u8  relocatable_kernel;
+   u8  _pad2[3];
+   u32 cmdline_size;
+   u32 _pad3;
+   u64 setup_data;
 } __attribute__((packed));
 
 struct sys_desc_table {
Index: linux-2.6.23-rc6/arch/i386/boot/header.S
===
--- linux-2.6.23-rc6.orig/arch/i386/boot/header.S   2007-09-11 
10:50:29.0 +0800
+++ linux-2.6.23-rc6/arch/i386/boot/header.S2007-09-19 10:00:09.0 
+0800
@@ -119,7 +119,7 @@
# Part 2 of the header, from the old setup.S
 
.ascii  HdrS  # header signature
-   .word   0x0206  # header version number (= 0x0105)
+   .word   0x0207  # header version number (= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -214,6 +214,12 @@
 #added with boot protocol
 #version 2.06
 
+pad4:  .long 0
+
+setup_data:.quad 0 # 64-bit physical pointer to
+   # single linked list of
+   # struct setup_data
+
 # End of setup header #
 
.section .inittext, ax
Index: linux-2.6.23-rc6/arch/x86_64/kernel/setup.c
===
--- linux-2.6.23-rc6.orig/arch/x86_64/kernel/setup.c2007-09-19 
10:00:00.0 +0800
+++ linux-2.6.23-rc6/arch/x86_64/kernel/setup.c 2007-09-19 10:00:09.0 
+0800
@@ -221,6 +221,25 @@
ebda_size = 64*1024;
 }
 
+void __init parse_setup_data(void)
+{
+   struct setup_data *setup_data;
+   unsigned long pa_setup_data;
+
+   if (boot_params.hdr.version  0x0207)
+   return;
+   pa_setup_data = boot_params.hdr.setup_data;
+   while (pa_setup_data) {
+   setup_data = early_ioremap(pa_setup_data, PAGE_SIZE);
+   switch (setup_data-type) {
+   default:
+   break;
+   }
+   pa_setup_data = setup_data-next;
+   early_iounmap(setup_data, PAGE_SIZE);
+   }
+}
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO Command line: %s\n, boot_command_line);
@@ -256,6 +275,8 @@
strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
 
+   parse_setup_data();
+
parse_early_param();
 
finish_e820_parsing();
Index: linux-2.6.23-rc6/arch/i386/kernel/setup.c
===
--- linux-2.6.23-rc6.orig/arch/i386/kernel/setup.c  2007-09-19 
09:59:59.0 +0800

[PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-18 Thread Huang, Ying
This patch defines a 32-bit boot protocol and adds corresponding
document. It is based on the proposal of Peter Anvin.


Known issues:

- The hd0_info and hd1_info are deleted from the zero page. Additional
  work should be done for this? Or this is unnecessary (because no new
  fields will be added to zero page)?

- The fields in zero page are fairly complex (such as struct
  edd_info). Is it necessary to document every field inside the first
  level fields, until the primary data type? Or is it sufficient to
  provide the C struct name only?


ChangeLog:

-- v2 --

- Revise zero page description according to the source code and move
  them to zero-page.txt.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 boot.txt  |   70 +++
 zero-page.txt |  127 --
 2 files changed, 97 insertions(+), 100 deletions(-)

Index: linux-2.6.23-rc6/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt   2007-09-11 
10:50:29.0 +0800
+++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 
10:00:18.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin [EMAIL PROTECTED]
-   Last update 2007-05-23
+   Last update 2007-09-18
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,35 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as zero page
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should also fill the
+additional fields of the zero page as that described in zero-page.txt.
+
+After loading and setuping the zero page, the boot loader can load the
+32/64-bit kernel in the same way as that of 16-bit boot protocol.
+
+In 32-bit boot protocol, the kernel is started by jumping to the
+32-bit kernel entry point, which is the start address of loaded
+32/64-bit kernel.
+
+At entry, the CPU must be in 32-bit protected mode with paging
+disabled; the CS and DS must be 4G flat segments

Re: [PATCH -mm -v2 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-18 Thread Huang, Ying
On Tue, 2007-09-18 at 22:30 -0700, H. Peter Anvin wrote:
 Huang, Ying wrote:
  Known issues:
  
  - The hd0_info and hd1_info are deleted from the zero page. Additional
work should be done for this? Or this is unnecessary (because no new
fields will be added to zero page)?
  
 
 For backwards compatibility, they should be marked as there for the
 short-medium term so we don't reuse them for whatever reason.

OK, I will add them back.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm -v3 1/2] i386/x86_64 boot: setup data

2007-09-19 Thread Huang, Ying
This patch add a field of 64-bit physical pointer to NULL terminated
single linked list of struct setup_data to real-mode kernel
header. This is used as a more extensible boot parameters passing
mechanism.

This patch has been tested against 2.6.23-rc6-mm1 kernel on x86_64. It
is based on the proposal of Peter Anvin.


Known Issues:

1. Where is safe to place the linked list of setup_data?
Because the length of the linked list of setup_data is variable, it
can not be copied into BSS segment of kernel as that of zero
page. We must find a safe place for it, where it will not be
overwritten by kernel during booting up. The i386 kernel will
overwrite some pages after _end. The x86_64 kernel will overwrite some
pages from 0x1000 on.


ChangeLog:

-- v2 --

- Increase the boot protocol version number.
- Check version number before parsing setup_data.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/i386/Kconfig|3 ---
 arch/i386/boot/header.S  |8 +++-
 arch/i386/kernel/setup.c |   22 ++
 arch/x86_64/kernel/setup.c   |   21 +
 include/asm-i386/bootparam.h |   15 +++
 include/asm-i386/io.h|7 +++
 6 files changed, 72 insertions(+), 4 deletions(-)

Index: linux-2.6.23-rc6/include/asm-i386/bootparam.h
===
--- linux-2.6.23-rc6.orig/include/asm-i386/bootparam.h  2007-09-19 
10:22:02.0 +0800
+++ linux-2.6.23-rc6/include/asm-i386/bootparam.h   2007-09-19 
16:41:57.0 +0800
@@ -9,6 +9,17 @@
 #include asm/ist.h
 #include video/edid.h
 
+/* setup data types */
+#define SETUP_NONE 0
+
+/* extensible setup data list node */
+struct setup_data {
+   u64 next;
+   u32 type;
+   u32 len;
+   u8 data[0];
+} __attribute__((packed));
+
 struct setup_header {
u8  setup_sects;
u16 root_flags;
@@ -41,6 +52,10 @@
u32 initrd_addr_max;
u32 kernel_alignment;
u8  relocatable_kernel;
+   u8  _pad2[3];
+   u32 cmdline_size;
+   u32 _pad3;
+   u64 setup_data;
 } __attribute__((packed));
 
 struct sys_desc_table {
Index: linux-2.6.23-rc6/arch/i386/boot/header.S
===
--- linux-2.6.23-rc6.orig/arch/i386/boot/header.S   2007-09-19 
10:22:02.0 +0800
+++ linux-2.6.23-rc6/arch/i386/boot/header.S2007-09-19 10:47:34.0 
+0800
@@ -119,7 +119,7 @@
# Part 2 of the header, from the old setup.S
 
.ascii  HdrS  # header signature
-   .word   0x0206  # header version number (= 0x0105)
+   .word   0x0207  # header version number (= 0x0105)
# or else old loadlin-1.5 will fail)
.globl realmode_swtch
 realmode_swtch:.word   0, 0# default_switch, SETUPSEG
@@ -214,6 +214,12 @@
 #added with boot protocol
 #version 2.06
 
+pad4:  .long 0
+
+setup_data:.quad 0 # 64-bit physical pointer to
+   # single linked list of
+   # struct setup_data
+
 # End of setup header #
 
.section .inittext, ax
Index: linux-2.6.23-rc6/arch/x86_64/kernel/setup.c
===
--- linux-2.6.23-rc6.orig/arch/x86_64/kernel/setup.c2007-09-19 
10:22:02.0 +0800
+++ linux-2.6.23-rc6/arch/x86_64/kernel/setup.c 2007-09-19 16:41:57.0 
+0800
@@ -221,6 +221,25 @@
ebda_size = 64*1024;
 }
 
+void __init parse_setup_data(void)
+{
+   struct setup_data *setup_data;
+   unsigned long pa_setup_data;
+
+   if (boot_params.hdr.version  0x0207)
+   return;
+   pa_setup_data = boot_params.hdr.setup_data;
+   while (pa_setup_data) {
+   setup_data = early_ioremap(pa_setup_data, PAGE_SIZE);
+   switch (setup_data-type) {
+   default:
+   break;
+   }
+   pa_setup_data = setup_data-next;
+   early_iounmap(setup_data, PAGE_SIZE);
+   }
+}
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO Command line: %s\n, boot_command_line);
@@ -256,6 +275,8 @@
strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
 
+   parse_setup_data();
+
parse_early_param();
 
finish_e820_parsing();
Index: linux-2.6.23-rc6/arch/i386/kernel/setup.c
===
--- linux-2.6.23-rc6.orig/arch/i386/kernel/setup.c  2007-09-19 
10:22:02.0 +0800

[PATCH -mm -v3 2/2] i386/x86_64 boot: document for 32 bit boot protocol

2007-09-19 Thread Huang, Ying
This patch defines a 32-bit boot protocol and adds corresponding
document. It is based on the proposal of Peter Anvin.


Known issues:

- The fields in zero page are fairly complex (such as struct
  edd_info). Is it necessary to document every field inside the first
  level fields, until the primary data type? Or is it sufficient to
  provide the C struct name only?


ChangeLog:

-- v3 --

- Move hd0_info and hd1_info back to zero page for compatibility.

-- v2 --

- Revise zero page description according to the source code and move
  them to zero-page.txt.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 boot.txt  |   70 +++
 zero-page.txt |  129 +-
 2 files changed, 99 insertions(+), 100 deletions(-)

Index: linux-2.6.23-rc6/Documentation/i386/boot.txt
===
--- linux-2.6.23-rc6.orig/Documentation/i386/boot.txt   2007-09-19 
16:45:23.0 +0800
+++ linux-2.6.23-rc6/Documentation/i386/boot.txt2007-09-19 
16:45:27.0 +0800
@@ -2,7 +2,7 @@
 
 
H. Peter Anvin [EMAIL PROTECTED]
-   Last update 2007-05-23
+   Last update 2007-09-18
 
 On the i386 platform, the Linux kernel uses a rather complicated boot
 convention.  This has evolved partially due to historical aspects, as
@@ -42,6 +42,9 @@
 Protocol 2.06: (Kernel 2.6.22) Added a field that contains the size of
the boot command line
 
+Protocol 2.07: (kernel 2.6.23) Added a field of 64-bit physical
+   pointer to single linked list of struct setup_data.
+   Added 32-bit boot protocol.
 
  MEMORY LAYOUT
 
@@ -168,6 +171,9 @@
 0234/1 2.05+   relocatable_kernel Whether kernel is relocatable or not
 0235/3 N/A pad2Unused
 0238/4 2.06+   cmdline_sizeMaximum size of the kernel command line
+023c/4 N/A pad3Unused
+0240/8 2.07+   setup_data  64-bit physical pointer to linked list
+   of struct setup_data
 
 (1) For backwards compatibility, if the setup_sects field contains 0, the
 real value is 4.
@@ -480,6 +486,36 @@
   cmdline_size characters. With protocol version 2.05 and earlier, the
   maximum size was 255.
 
+Field name:setup_data
+Type:  write (obligatory)
+Offset/size:   0x240/8
+Protocol:  2.07+
+
+  The 64-bit physical pointer to NULL terminated single linked list of
+  struct setup_data. This is used to define a more extensible boot
+  parameters passing mechanism. The definition of struct setup_data is
+  as follow:
+
+  struct setup_data {
+ u64 next;
+ u32 type;
+ u32 len;
+ u8  data[0];
+  } __attribute__((packed));
+
+  Where, the next is a 64-bit physical pointer to the next node of
+  linked list, the next field of the last node is 0; the type is used
+  to identify the contents of data; the len is the length of data
+  field; the data holds the real payload.
+
+  With this field, to add a new boot parameter written by bootloader,
+  it is not needed to add a new field to real mode header, just add a
+  new setup_data type is sufficient. But to add a new boot parameter
+  read by bootloader, it is still needed to add a new field.
+
+  TODO: Where is the safe place to place the linked list of struct
+   setup_data?
+
 
  THE KERNEL COMMAND LINE
 
@@ -753,3 +789,35 @@
After completing your hook, you should jump to the address
that was in this field before your boot loader overwrote it
(relocated, if appropriate.)
+
+
+ SETUP DATA TYPES
+
+
+ 32-bit BOOT PROTOCOL
+
+For machine with some new BIOS other than legacy BIOS, such as EFI,
+LinuxBIOS, etc, and kexec, the 16-bit real mode setup code in kernel
+based on legacy BIOS can not be used, so a 32-bit boot protocol need
+to be defined.
+
+In 32-bit boot protocol, the first step in loading a Linux kernel
+should still be to load the real-mode code and then examine the kernel
+header at offset 0x01f1. But, it is not necessary to load all
+real-mode code, just first 4K bytes traditionally known as zero page
+is needed.
+
+In addition to read/modify/write kernel header of the zero page as
+that of 16-bit boot protocol, the boot loader should also fill the
+additional fields of the zero page as that described in zero-page.txt.
+
+After loading and setuping the zero page, the boot loader can load the
+32/64-bit kernel in the same way as that of 16-bit boot protocol.
+
+In 32-bit boot protocol, the kernel is started by jumping to the
+32-bit kernel entry point, which is the start address of loaded
+32/64-bit kernel.
+
+At entry, the CPU must be in 32-bit protected mode with paging
+disabled; the CS and DS must be 4G flat segments; %esi holds the base
+address of the zero page; %esp, %ebp, %edi should be zero.
Index: linux-2.6.23

[RFC][PATCH 0/2 -mm] kexec based hibernation -v3

2007-09-19 Thread Huang, Ying
 /proc/vmcore .
   cp /sys/kernel/kexec_jump_back_entry .

9. Shutdown or reboot in hibernating kernel (kernel B).

10. Boot kernel (kernel C) compiled for hibernating/restore usage on
the root file system /dev/hdb in memory range of kernel B.

For example, the following kernel command line parameters can be
used:

root=/dev/hdb single memmap=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED]

11. In restore kernel (kernel C), the memory image of kernel A can be
restored as follow:

cp kexec_jump_back_entry /sys/kernel/kexec_jump_back_entry
krestore vmcore

12. Jump back to hibernated kernel (kernel A)

kexec -b

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC][PATCH 1/2 -mm] kexec based hibernation -v3: kexec jump

2007-09-19 Thread Huang, Ying
This patch implements the functionality of jumping between the kexeced
kernel and the original kernel.

A new reboot command named LINUX_REBOOT_CMD_KJUMP is defined to
trigger the jumping to (executing) the new kernel and jumping back to
the original kernel.

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state (to be fully implemented), and the state
of devices and CPU is saved. After jumping back from kexeced kernel
and jumping to the new kernel, the state of devices and CPU are
restored accordingly. The devices/CPU state save/restore code of
software suspend is called to implement corresponding function.

To support jumping without preserving memory. One shadow backup page
is allocated for each page used by new (kexeced) kernel. When do
kexec_load, the image of new kernel is loaded into shadow pages, and
before executing, the original pages and the shadow pages are swapped,
so the contents of original pages are backuped. Before jumping to the
new (kexeced) kernel and after jumping back to the original kernel,
the original pages and the shadow pages are swapped too.

A jump back protocol is defined and documented.


Known issues

- A field is added to Linux kernel real-mode header. This is
  temporary, and should be replaced after the 32-bit boot protocol and
  setup data patches are accepted.

- The suspend method of device is used to put device in quiescent
  state. But if the ACPI is enabled this will also put devices into
  low power state, which prevent the new kernel from booting. So, the
  ACPI must be disabled both in original kernel and kexeced
  kernel. This is planed to be resolved after the suspend method and
  hibernate method is separated for device as proposed earlier in the
  LKML.

- The NX (none executable) bit should be turned off for the control
  page if available.


ChangeLog

-- 2007/9/19 --

1. Two reboot command are merge back to one again because the
   underlying implementation is same.

2. Jumping without preserving memory is implemented. As a side effect,
   two direction jumping is implemented.

3. A jump back protocol is defined and documented. The orignal kernel
   and kexeced kernel are more independent from each other.

4. The CPU state save/restore code are merged into relocate_kernel.S.

-- 2007/8/24 --

1. The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
   reboot command to reflect the different function.

2. Document is added for added kernel parameters.

3. /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
   memory image restoring.

4. Console restoring after jumping back is implemented.

-- 2007/7/15 --

1. The kexec jump implementation is put into the kexec/kdump framework
   instead of software suspend framework. The device and CPU state
   save/restore code of software suspend is called when needed.

2. The same code path is used for both kexec a new kernel and jump
   back to original kernel.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 Documentation/i386/jump_back_protocol.txt |   81 
 arch/i386/Kconfig |7 +
 arch/i386/boot/header.S   |2 
 arch/i386/kernel/machine_kexec.c  |   77 +---
 arch/i386/kernel/relocate_kernel.S|  187 ++
 arch/i386/kernel/setup.c  |3 
 include/asm-i386/bootparam.h  |3 
 include/asm-i386/kexec.h  |   48 ++-
 include/linux/kexec.h |9 +
 include/linux/reboot.h|2 
 kernel/kexec.c|   59 +
 kernel/ksysfs.c   |   17 ++
 kernel/power/Kconfig  |2 
 kernel/sys.c  |8 +
 14 files changed, 463 insertions(+), 42 deletions(-)

Index: linux-2.6.23-rc6/arch/i386/kernel/machine_kexec.c
===
--- linux-2.6.23-rc6.orig/arch/i386/kernel/machine_kexec.c  2007-09-20 
11:24:25.0 +0800
+++ linux-2.6.23-rc6/arch/i386/kernel/machine_kexec.c   2007-09-20 
11:24:31.0 +0800
@@ -20,6 +20,7 @@
 #include asm/cpufeature.h
 #include asm/desc.h
 #include asm/system.h
+#include asm/setup.h
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -98,23 +99,23 @@
 {
 }
 
-/*
- * Do not allocate memory (or fail in any way) in machine_kexec().
- * We are past the point of no return, committed to rebooting now.
- */
-NORET_TYPE void machine_kexec(struct kimage *image)
+static NORET_TYPE void __machine_kexec(struct kimage *image,
+  void *control_page) ATTRIB_NORET;
+
+static NORET_TYPE void __machine_kexec(struct kimage *image,
+  void *control_page)
 {
unsigned long page_list[PAGES_NR

[RFC][PATCH 2/2 -mm] kexec based hibernation -v3: kexec restore

2007-09-19 Thread Huang, Ying
This patch adds writing support for /dev/oldmem. This is used to
restore the memory contents of hibernated system.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/i386/kernel/crash_dump.c |   27 +++
 drivers/char/mem.c|   32 
 include/linux/crash_dump.h|2 ++
 3 files changed, 61 insertions(+)

Index: linux-2.6.23-rc4/arch/i386/kernel/crash_dump.c
===
--- linux-2.6.23-rc4.orig/arch/i386/kernel/crash_dump.c 2007-09-11 
16:52:14.0 +0800
+++ linux-2.6.23-rc4/arch/i386/kernel/crash_dump.c  2007-09-20 
09:48:10.0 +0800
@@ -58,6 +58,33 @@
return csize;
 }
 
+ssize_t write_oldmem_page(unsigned long pfn, const char *buf,
+ size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   if (!userbuf) {
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, buf, csize);
+   } else {
+   if (!kdump_buf_page) {
+   printk(KERN_WARNING Kdump: Kdump buffer page not
+allocated\n);
+   return -EFAULT;
+   }
+   if (copy_from_user(kdump_buf_page, buf, csize))
+   return -EFAULT;
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, kdump_buf_page, csize);
+   }
+   kunmap_atomic(vaddr, KM_PTE0);
+
+   return csize;
+}
+
 static int __init kdump_buf_page_init(void)
 {
int ret = 0;
Index: linux-2.6.23-rc4/include/linux/crash_dump.h
===
--- linux-2.6.23-rc4.orig/include/linux/crash_dump.h2007-09-11 
16:52:14.0 +0800
+++ linux-2.6.23-rc4/include/linux/crash_dump.h 2007-09-20 09:48:10.0 
+0800
@@ -11,6 +11,8 @@
 extern unsigned long long elfcorehdr_addr;
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+extern ssize_t write_oldmem_page(unsigned long, const char *, size_t,
+unsigned long, int);
 extern const struct file_operations proc_vmcore_operations;
 extern struct proc_dir_entry *proc_vmcore;
 
Index: linux-2.6.23-rc4/drivers/char/mem.c
===
--- linux-2.6.23-rc4.orig/drivers/char/mem.c2007-09-11 16:52:14.0 
+0800
+++ linux-2.6.23-rc4/drivers/char/mem.c 2007-09-20 09:48:10.0 +0800
@@ -348,6 +348,37 @@
}
return read;
 }
+
+/*
+ * Write memory corresponding to the old kernel.
+ */
+static ssize_t write_oldmem(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   unsigned long pfn, offset;
+   size_t write = 0, csize;
+   int rc = 0;
+
+   while (count) {
+   pfn = *ppos / PAGE_SIZE;
+   if (pfn  saved_max_pfn)
+   return write;
+
+   offset = (unsigned long)(*ppos % PAGE_SIZE);
+   if (count  PAGE_SIZE - offset)
+   csize = PAGE_SIZE - offset;
+   else
+   csize = count;
+   rc = write_oldmem_page(pfn, buf, csize, offset, 1);
+   if (rc  0)
+   return rc;
+   buf += csize;
+   *ppos += csize;
+   write += csize;
+   count -= csize;
+   }
+   return write;
+}
 #endif
 
 extern long vread(char *buf, char *addr, unsigned long count);
@@ -783,6 +814,7 @@
 #ifdef CONFIG_CRASH_DUMP
 static const struct file_operations oldmem_fops = {
.read   = read_oldmem,
+   .write  = write_oldmem,
.open   = open_oldmem,
 };
 #endif
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC][PATCH 2/2 -mm] kexec based hibernation -v3: kexec restore

2007-09-20 Thread Huang, Ying
On Thu, 2007-09-20 at 10:15 +0200, Pavel Machek wrote:
  This patch adds writing support for /dev/oldmem. This is used to
  restore the memory contents of hibernated system.
  
  Signed-off-by: Huang Ying [EMAIL PROTECTED]
 
 ACK. (And this can even go in before the patch #1, right?)

Yes. This patch does not depend on patch #1.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Could you please merge the x86_64 EFI boot support patchset?

2007-11-11 Thread Huang, Ying
Hi, Linus,

Could you please merge the following patchset:

[PATCH 0/2 -v3] x86_64 EFI boot support
[PATCH 1/2 -v3] x86_64 EFI boot support: EFI frame buffer driver
[PATCH 2/2 -v3] x86_64 EFI boot support: EFI boot document

The patchset has been in -mm tree from 2.6.23-rc2-mm2 on. Andrew Moton
had suggested it to be merged into 2.6.24 during early merge window of
2.6.24. It was not merged because the 32-bit boot protocol had not been
done at that time.

Now, the 32-bit boot protocol has been merged into 2.6.24. And this
patch has been in x86 patch queue.

I know that it is a little late for this patchset to be merged into
2.6.24. But this patchset is very simple, just adds a framebuffer
driver, so it is impossible for this patchset to break anything. And
this patchset will be helpful for people have machine with UEFI 64
firmware instead of legacy BIOS.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3 -mm] kexec based hibernation -v6: kexec jump

2007-11-18 Thread Huang, Ying
This patch implements the functionality of jumping between the kexeced
kernel and the original kernel.

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by new (kexeced) kernel
(destination page). When do kexec_load, the image of new kernel is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the new (kexeced) kernel and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

A jump back protocol for kexec is defined and documented. It is an
extension to ordinary function calling protocol. So, the facility
provided by this patch can be used to call ordinary C function in real
mode.

A set of flags for sys_kexec_load are added to control which state are
saved/restored before/after real mode code executing. Such as, you can
specify the device state and FPU state are saved/restored before/after
real mode code executing.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/i386/jump_back_protocol.txt |  103 ++
 arch/powerpc/kernel/machine_kexec.c   |2 
 arch/ppc/kernel/machine_kexec.c   |2 
 arch/sh/kernel/machine_kexec.c|2 
 arch/x86/kernel/machine_kexec_32.c|   88 +---
 arch/x86/kernel/machine_kexec_64.c|2 
 arch/x86/kernel/relocate_kernel_32.S  |  214 +++---
 include/asm-x86/kexec_32.h|   39 -
 include/linux/kexec.h |   39 -
 kernel/kexec.c|  131 ++
 kernel/power/Kconfig  |2 
 kernel/sys.c  |   27 ++-
 12 files changed, 585 insertions(+), 66 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -20,6 +20,7 @@
 #include asm/cpufeature.h
 #include asm/desc.h
 #include asm/system.h
+#include asm/cacheflush.h
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -83,10 +84,14 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * Turn off NX bit for control page.
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   if (nx_enabled) {
+   change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC);
+   global_flush_tlb();
+   }
return 0;
 }
 
@@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+   if (nx_enabled) {
+   change_page_attr(image-control_code_page, 1, PAGE_KERNEL);
+   global_flush_tlb();
+   }
+}
+
+void machine_kexec(struct kimage *image)
+{
+   machine_kexec_call(image, NULL, 0);
 }
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+int machine_kexec_vcall(struct kimage *image, unsigned long *ret,
+unsigned int argc, va_list args)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   asmlinkage NORET_TYPE void
+   (*relocate_kernel_ptr)(unsigned long indirection_page,
+  unsigned long control_page,
+  unsigned long start_address,
+  unsigned int has_pae) ATTRIB_NORET;
 
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
 
control_page = page_address(image-control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE);
+   memcpy(control_page, relocate_page, PAGE_SIZE/2);
+   KCALL_MAGIC(control_page) = 0;
 
+   if (image-preserve_cpu) {
+   unsigned int i;
+   KCALL_MAGIC(control_page) = KCALL_MAGIC_NUMBER;
+   KCALL_ARGC(control_page) = argc;
+   for (i = 0; i  argc; i++)
+   KCALL_ARGS(control_page)[i] = \
+   va_arg(args, unsigned long);
+
+   if (kexec_call_save_cpu(control_page)) {
+   image-start = KCALL_ENTRY(control_page);
+   if (ret)
+   *ret = KCALL_ARGS(control_page)[0

[PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume

2007-11-18 Thread Huang, Ying
This patch implements kexec based hibernate/resume. This is based on
the facility provided by kexec_jump. The ACPI methods are called at
specified environment to conform the ACPI specification. Two new
reboot commands are added to trigger hibernate/resume.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 include/linux/kexec.h   |5 +
 include/linux/reboot.h  |2 
 include/linux/suspend.h |9 ++
 kernel/power/disk.c |  155 
 kernel/sys.c|   42 +
 5 files changed, 212 insertions(+), 1 deletion(-)

--- a/kernel/power/disk.c
+++ b/kernel/power/disk.c
@@ -21,6 +21,7 @@
 #include linux/console.h
 #include linux/cpu.h
 #include linux/freezer.h
+#include linux/kexec.h
 
 #include power.h
 
@@ -438,6 +439,160 @@ int hibernate(void)
return error;
 }
 
+#ifdef CONFIG_KEXEC
+static void kexec_hibernate_power_down(void)
+{
+   switch (hibernation_mode) {
+   case HIBERNATION_TEST:
+   case HIBERNATION_TESTPROC:
+   break;
+   case HIBERNATION_REBOOT:
+   machine_restart(NULL);
+   break;
+   case HIBERNATION_PLATFORM:
+   if (!hibernation_ops)
+   break;
+   hibernation_ops-enter();
+   /* We should never get here */
+   while (1);
+   break;
+   case HIBERNATION_SHUTDOWN:
+   machine_power_off();
+   break;
+   }
+   machine_halt();
+   /*
+* Valid image is on the disk, if we continue we risk serious data
+* corruption after resume.
+*/
+   printk(KERN_CRIT Please power me down manually\n);
+   while (1);
+}
+
+int kexec_hibernate(struct kimage *image)
+{
+   int error;
+   int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM);
+   unsigned long cmd_ret;
+
+   mutex_lock(pm_mutex);
+
+   pm_prepare_console();
+   suspend_console();
+
+   error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
+   if (error)
+   goto Resume_console;
+
+   error = platform_start(platform_mode);
+   if (error)
+   goto Resume_console;
+
+   error = device_suspend(PMSG_FREEZE);
+   if (error)
+   goto Resume_console;
+
+   error = platform_pre_snapshot(platform_mode);
+   if (error)
+   goto Resume_devices;
+
+   error = disable_nonboot_cpus();
+   if (error)
+   goto Resume_devices;
+   local_irq_disable();
+   /* At this point, device_suspend() has been called, but *not*
+* device_power_down(). We *must* device_power_down() now.
+* Otherwise, drivers for some devices (e.g. interrupt
+* controllers) become desynchronized with the actual state of
+* the hardware at resume time, and evil weirdness ensues.
+*/
+   error = device_power_down(PMSG_FREEZE);
+   if (error)
+   goto Enable_irqs;
+
+   save_processor_state();
+   error = machine_kexec_jump(image, cmd_ret,
+  KJUMP_CMD_HIBERNATE_WRITE_IMAGE);
+   restore_processor_state();
+
+   if (cmd_ret == KJUMP_CMD_HIBERNATE_POWER_DOWN)
+   kexec_hibernate_power_down();
+
+   platform_leave(platform_mode);
+
+   /* NOTE:  device_power_up() is just a resume() for devices
+* that suspended with irqs off ... no overall powerup.
+*/
+   device_power_up();
+ Enable_irqs:
+   local_irq_enable();
+   enable_nonboot_cpus();
+ Resume_devices:
+   platform_finish(platform_mode);
+   device_resume();
+ Resume_console:
+   pm_notifier_call_chain(PM_POST_HIBERNATION);
+   resume_console();
+   pm_restore_console();
+   mutex_unlock(pm_mutex);
+   return error;
+}
+
+int kexec_resume(struct kimage *image)
+{
+   int error;
+   int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM);
+
+   mutex_lock(pm_mutex);
+
+   pm_prepare_console();
+   suspend_console();
+
+   error = device_suspend(PMSG_PRETHAW);
+   if (error)
+   goto Resume_console;
+
+   error = platform_pre_restore(platform_mode);
+   if (error)
+   goto Resume_devices;
+
+   error = disable_nonboot_cpus();
+   if (error)
+   goto Resume_devices;
+   local_irq_disable();
+   /* At this point, device_suspend() has been called, but *not*
+* device_power_down(). We *must* device_power_down() now.
+* Otherwise, drivers for some devices (e.g. interrupt controllers)
+* become desynchronized with the actual state of the hardware
+* at resume time, and evil weirdness ensues.
+*/
+   error = device_power_down(PMSG_PRETHAW);
+   if (error)
+   goto Enable_irqs;
+
+   save_processor_state();
+   error = machine_kexec_jump(image, NULL, KJUMP_CMD_HIBERNATE_RESUME

[PATCH 2/3 -mm] kexec based hibernation -v6: kexec restore

2007-11-18 Thread Huang, Ying
This patch adds writing support for /dev/oldmem. This is used to
restore the memory contents of hibernated system.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/crash_dump_32.c |   27 +++
 drivers/char/mem.c  |   32 
 include/linux/crash_dump.h  |2 ++
 3 files changed, 61 insertions(+)

--- a/arch/x86/kernel/crash_dump_32.c
+++ b/arch/x86/kernel/crash_dump_32.c
@@ -59,6 +59,33 @@ ssize_t copy_oldmem_page(unsigned long p
return csize;
 }
 
+ssize_t write_oldmem_page(unsigned long pfn, const char *buf,
+ size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   if (!userbuf) {
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, buf, csize);
+   } else {
+   if (!kdump_buf_page) {
+   printk(KERN_WARNING Kdump: Kdump buffer page not
+allocated\n);
+   return -EFAULT;
+   }
+   if (copy_from_user(kdump_buf_page, buf, csize))
+   return -EFAULT;
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, kdump_buf_page, csize);
+   }
+   kunmap_atomic(vaddr, KM_PTE0);
+
+   return csize;
+}
+
 static int __init kdump_buf_page_init(void)
 {
int ret = 0;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -11,6 +11,8 @@
 extern unsigned long long elfcorehdr_addr;
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+extern ssize_t write_oldmem_page(unsigned long, const char *, size_t,
+unsigned long, int);
 extern const struct file_operations proc_vmcore_operations;
 extern struct proc_dir_entry *proc_vmcore;
 
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file *
}
return read;
 }
+
+/*
+ * Write memory corresponding to the old kernel.
+ */
+static ssize_t write_oldmem(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   unsigned long pfn, offset;
+   size_t write = 0, csize;
+   int rc = 0;
+
+   while (count) {
+   pfn = *ppos / PAGE_SIZE;
+   if (pfn  saved_max_pfn)
+   return write;
+
+   offset = (unsigned long)(*ppos % PAGE_SIZE);
+   if (count  PAGE_SIZE - offset)
+   csize = PAGE_SIZE - offset;
+   else
+   csize = count;
+   rc = write_oldmem_page(pfn, buf, csize, offset, 1);
+   if (rc  0)
+   return rc;
+   buf += csize;
+   *ppos += csize;
+   write += csize;
+   count -= csize;
+   }
+   return write;
+}
 #endif
 
 extern long vread(char *buf, char *addr, unsigned long count);
@@ -783,6 +814,7 @@ static const struct file_operations full
 #ifdef CONFIG_CRASH_DUMP
 static const struct file_operations oldmem_fops = {
.read   = read_oldmem,
+   .write  = write_oldmem,
.open   = open_oldmem,
 };
 #endif
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3 -mm] kexec based hibernation -v6

2007-11-18 Thread Huang, Ying
 jump_back_param | cut -d '=' -f 2`
   vmcoreinfo_size=`grep arg4 jump_back_param | cut -d '=' -f 2`
   ./makedumpfile -D -E -d 16 -o [EMAIL PROTECTED] -j `cat 
kexec_jump_back_entry` -M `cat backup_pages_map_root_entry` /proc/vmcore 
dump.elf

10. Entering ACPI S4 state with following command line:

kexec -e -c 0x6b630002

The hibernating kernel (kernel B) will jump back to hibernated
kernel again with a special command (0x6b630002: hibernate shut
down), and the hibernated kernel (kernel A) will entering ACPI S4
state.

11. Boot kernel (kernel C) compiled for hibernating/resuming usage on
the root file system /dev/hdb in memory range of kernel B.

For example, the following kernel command line parameters can be
used:

root=/dev/hdb single memmap=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED]

12. In resuming kernel (kernel C), the memory image of kernel A can be
restored as follow:

krestore dump.elf

13. Resume the hibernated kernel (kernel A)

kexec --load-jump-back-helper --jump-back-entry=`cat kexec_jump_back_entry`
kexec --resume

The resuming kernel (kernel C) will jump back to hibernating
kernel (kernel A), and necessary ACPI methods will be executed.


Known issues:

- The suspend/resume callback of device drivers are used to put
  devices into quiescent state. This will unnecessarily (possibly
  harmfully) put devices into low power state. This is intended to be
  solved by separating device quiesce/unquiesce callback from the
  device suspend/resume callback.

- The memory image of hibernated kernel must be saved in a separate
  partition not used by hibernated kernel. This is planned to be
  solved through make hibernating/resuming kernel work on initramfs
  and write the memory image to a file in partition used by hibernated
  kernel through block list instead of ordinary file system operating.

- The setup of hibernate/resume is fairly complex. I will continue
  working on simplifying.


TODO:

- Implement sys_kexec_store, that is, store the memory image of
  kexeced kernel.

- Write the memory image to a file through block list instead of
  ordinary file system operating.

- Simplify hibernate/resume setup.

- Resume from hibernation with bootloader.


ChangeLog:

v6:

- Add ACPI support.

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

- Writing support is added for /dev/oldmem, to restore memory contents
  of hibernated system.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm -v3] x86 boot : export boot_params via sysfs

2007-12-17 Thread Huang, Ying
On Mon, 2007-12-17 at 21:34 -0700, Eric W. Biederman wrote:
 H. Peter Anvin [EMAIL PROTECTED] writes:
 
  This is directly analogous to how we treat identity information in IDE, or 
  PCI
  configuration space -- some fields are pre-digested, but the entire raw
  information is also available.
 
 Add to that a totally unchanged value can just be easier to get correct.
 
 Still the kexec code as much as it can should not look there, as we may
 get the same basic information in a couple of different ways.
 
 EFI memmap vs. e820 for example.  If/when that is the case /sbin/kexec
 should get the information and spit it out into whatever format makes
 sense for the destination kernel.  My sense is just passing through
 values is brittleness where we don't want it.
 
 However I think being able to get at the raw boot information overall
 sounds useful.  I just don't know if it is generally useful or just
 useful when debugging bootloaders though.

If struct boot_params as a whole is useless for kexec, I can move it to
debugfs, because kexec is the only normal user now. Then which fields of
struct boot_params do you think is useful for kexec?

Refer to include/asm-x86/bootparam.h

edid_info?
e820_entries and e820_map?  maybe useful for kdump
edd related fields (eddbuf, edd_mbr_sig_buffer, etc)? split fields until 
fundamental types?

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

2007-12-18 Thread Huang, Ying
On Tue, 2007-12-11 at 02:27 -0700, Eric W. Biederman wrote:
 Huang, Ying [EMAIL PROTECTED] writes:
 
  On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote:
  Huang, Ying [EMAIL PROTECTED] writes:
  [...]
/*
 * Do not allocate memory (or fail in any way) in machine_kexec().
 * We are past the point of no return, committed to rebooting now.
 */
   -NORET_TYPE void machine_kexec(struct kimage *image)
   +int machine_kexec_vcall(struct kimage *image, unsigned long *ret,
   + unsigned int argc, va_list args)
{
  
  Why do we need var arg support?
  Can't we do that with a shim we load from user space?
 
  If all parameters are provided in user space, the usage model may be as
  follow:
 
  - sys_kexec_load() /* with executable/data/parameters(A) loaded */
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with
  parameters(A)*/
  - /* jump back */
  - sys_kexec_load() /* with executable/data/parameters(B) loaded */
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with
  parameters(B)*/
  - /* jump back */
 
  That is, the kexec image should be re-loaded if the parameters are
  different, and there can be no state reserved in kexec image. This is OK
  for original kexec implementation, because there is no jumping back.
  But, for kexec with jumping back, another usage model may be useful too.
 
  - sys_kexec_load() /* with executable/data loaded */
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical 
  mode
  code with parameters(A)*/
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical 
  mode
  code with parameters(B)*/
 
  This way the kexec image need not to be re-loaded, and the state of
  kexec image can be reserved across several invoking.
 
 Interesting.  We wind up preserving the code in between invocations.
 
 I don't know about your particular issue, but I can see that clearly
 we need a way to read values back from our target image.
 
 And if we can read everything back one way to proceed is to read
 everything out modify it and then write it back.
 
 Amending a kexec image that is already stored may also make sense.
 
 I'm not convinced that the var arg parameters make sense, but you
 added them because of a real need.
 
 The kexec function is split into two separate calls so that we can
 unmount the filesystem the kexec image comes from before actually
 doing the kexec.

My real issue is that I need a kind of kernel to kernel communication
method. The var args is just a convenient way to pass an array of
unsigned longs between two kernels. The reason is as follow:

The kexec based hibernating process is as follow:

h1. put devices in quiescent state
h2. save devices/CPU state
h3. jump to kexeced kernel (kernel B)
*h4. normal kernel boot of kernel B
*h5. save devices/CPU state
*h6. jump back to original kernel (kernel A)
h7. restore devices/CPU state
h8. put devices in quiescent state
h9. put devices in low power state
h10. execute necessary ACPI method (prepare to sleep)
h11. save devices/CPU state
h12. jump to kernel B
*h13. execute necessary ACPI method (wake up)
*h14. restore devices/CPU state
*h15. put devices in normal power state
*h16. write memory image of kernel A into disk
*h17. put system into ACPI S4 state

The kexec based resuming process is as follow:

*r1. boot the resuming kernel (kernel C)
*r2. restore the memory image of kernel A
*r3. put devices in quiescent state
*r4. execute necessary ACPI method (prepare to resume)
*r5. jump to kernel A
r6. execute necessary ACPI method (wake up)
r7. restore devices/CPU state

Where, line begin with * is executed in kernel B and kernel C, others
are executed in kernel A.

The kernel A need to distinguish the difference between h7 and r6, while
the kernel B/C need to distinguish between *h13 and normal jump back.
The different kernel action need to be taken depends on the action of
peer kernel. Now, this is solved by kernel-kernel communication, a
command word is passed to peer kernel to inform the action required.

I remember you have said before that you think it is better to use only
user space to user space communication between kernel A and kernel B.
This is OK for normal kexec. But if the kexec jump is used for multiple
functions with early kernel action involved (normal kexec jump, kexec
jump to hibernate, kexec jump to resume), it is necessary to use kernel
to kernel communication.

The var args in the patch is just an array of unsigned longs, it can be
expresses as follow too.

int kexec_call(struct kimage *image, unsigned long *ret, unsigned int
argc, unsigned long argv[]);

The var args version is as follow.

int kexec_call(struct kimage *image, unsigned long *ret, unsigned int
argc, ...);

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo

[PATCH 0/3 -mm] kexec jump -v8

2007-12-20 Thread Huang, Ying
 kernel as you
want to via the following shell command line:

/sbin/kexec -e


Known issues:

- The suspend/resume callback of device drivers are used to put
  devices into quiescent state. This will unnecessarily (possibly
  harmfully) put devices into low power state. This is intended to be
  solved by separating device quiesce/unquiesce callback from the
  device suspend/resume callback.


ChangeLog:

v8:

- Split kexec jump patchset from kexec based hibernation patchset.

- Add writing support to kimgcore. This can be used as a communication
  method between kexeced kernel and original kernel.

- Merge various KEXEC_PRESERVE_* flags into one KEXEC_PRESERVE_CONTEXT
  because there is no need for such subtle control.

- Delete variable argument based kernel to kernel communication
  mechanism from basic kexec jump patchset.

v7:

- Add an interface to dump the loaded kexec_image, which may contains
  the memory image of kexeced system. This is used to accelerate kexec
  based hibernation.

- Refactor kexec jump to be a command driven programming model.

- Adjust ACPI support to mimic the ACPI support of u/swsusp.

- Use kexec_lock to do synchronization.

v6:

- Add ACPI support.

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

- Writing support is added for /dev/oldmem, to restore memory contents
  of hibernated system.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.


Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/3 -mm] kexec jump -v8 : add write support to oldmem device

2007-12-20 Thread Huang, Ying
This patch adds writing support for /dev/oldmem. This can be used to

- Communicate between original kernel and kexeced kernel through write
  to some pages in original kernel.

- Restore the memory contents of hibernated system in kexec based
  hibernation.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/crash_dump_32.c |   27 +++
 drivers/char/mem.c  |   32 
 include/linux/crash_dump.h  |2 ++
 3 files changed, 61 insertions(+)

--- a/arch/x86/kernel/crash_dump_32.c
+++ b/arch/x86/kernel/crash_dump_32.c
@@ -59,6 +59,33 @@ ssize_t copy_oldmem_page(unsigned long p
return csize;
 }
 
+ssize_t write_oldmem_page(unsigned long pfn, const char *buf,
+ size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   if (!userbuf) {
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, buf, csize);
+   } else {
+   if (!kdump_buf_page) {
+   printk(KERN_WARNING Kdump: Kdump buffer page not
+allocated\n);
+   return -EFAULT;
+   }
+   if (copy_from_user(kdump_buf_page, buf, csize))
+   return -EFAULT;
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, kdump_buf_page, csize);
+   }
+   kunmap_atomic(vaddr, KM_PTE0);
+
+   return csize;
+}
+
 static int __init kdump_buf_page_init(void)
 {
int ret = 0;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -11,6 +11,8 @@
 extern unsigned long long elfcorehdr_addr;
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+extern ssize_t write_oldmem_page(unsigned long, const char *, size_t,
+unsigned long, int);
 extern const struct file_operations proc_vmcore_operations;
 extern struct proc_dir_entry *proc_vmcore;
 
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file *
}
return read;
 }
+
+/*
+ * Write memory corresponding to the old kernel.
+ */
+static ssize_t write_oldmem(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   unsigned long pfn, offset;
+   size_t write = 0, csize;
+   int rc = 0;
+
+   while (count) {
+   pfn = *ppos / PAGE_SIZE;
+   if (pfn  saved_max_pfn)
+   return write;
+
+   offset = (unsigned long)(*ppos % PAGE_SIZE);
+   if (count  PAGE_SIZE - offset)
+   csize = PAGE_SIZE - offset;
+   else
+   csize = count;
+   rc = write_oldmem_page(pfn, buf, csize, offset, 1);
+   if (rc  0)
+   return rc;
+   buf += csize;
+   *ppos += csize;
+   write += csize;
+   count -= csize;
+   }
+   return write;
+}
 #endif
 
 extern long vread(char *buf, char *addr, unsigned long count);
@@ -783,6 +814,7 @@ static const struct file_operations full
 #ifdef CONFIG_CRASH_DUMP
 static const struct file_operations oldmem_fops = {
.read   = read_oldmem,
+   .write  = write_oldmem,
.open   = open_oldmem,
 };
 #endif

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/3 -mm] kexec jump -v8 : kexec jump basic

2007-12-20 Thread Huang, Ying
This patch implements the functionality of jumping between the kexeced
kernel and the original kernel.

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by new (kexeced) kernel
(destination page). When do kexec_load, the image of new kernel is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the new (kexeced) kernel and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

A jump back protocol for kexec is defined and documented. It is an
extension to ordinary function calling protocol. So, the facility
provided by this patch can be used to call ordinary C function in
physical mode.

A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to
indicate that the loaded kernel image is used for jumping back.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/i386/jump_back_protocol.txt |   66 ++
 arch/powerpc/kernel/machine_kexec.c   |2 
 arch/ppc/kernel/machine_kexec.c   |2 
 arch/sh/kernel/machine_kexec.c|2 
 arch/x86/kernel/machine_kexec_32.c|   39 +-
 arch/x86/kernel/machine_kexec_64.c|2 
 arch/x86/kernel/relocate_kernel_32.S  |  194 ++
 include/asm-x86/kexec_32.h|   34 -
 include/linux/kexec.h |   14 +-
 kernel/kexec.c|   65 +-
 kernel/power/Kconfig  |2 
 kernel/sys.c  |   35 +++--
 12 files changed, 403 insertions(+), 54 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -20,6 +20,7 @@
 #include asm/cpufeature.h
 #include asm/desc.h
 #include asm/system.h
+#include asm/cacheflush.h
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -83,10 +84,14 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * Turn off NX bit for control page.
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   if (nx_enabled) {
+   change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC);
+   global_flush_tlb();
+   }
return 0;
 }
 
@@ -96,25 +101,45 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+   if (nx_enabled) {
+   change_page_attr(image-control_code_page, 1, PAGE_KERNEL);
+   global_flush_tlb();
+   }
 }
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+void machine_kexec(struct kimage *image)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   asmlinkage NORET_TYPE void
+   (*relocate_kernel_ptr)(unsigned long indirection_page,
+  unsigned long control_page,
+  unsigned long start_address,
+  unsigned int has_pae) ATTRIB_NORET;
 
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
 
control_page = page_address(image-control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE);
+   memcpy(control_page, relocate_page, PAGE_SIZE/2);
+   KJUMP_MAGIC(control_page) = 0;
 
+   if (image-preserve_context) {
+   KJUMP_MAGIC(control_page) = KJUMP_MAGIC_NUMBER;
+   if (kexec_jump_save_cpu(control_page)) {
+   image-start = KJUMP_ENTRY(control_page);
+   return;
+   }
+   }
+
+   relocate_kernel_ptr = control_page +
+   ((void *)relocate_kernel - (void *)relocate_page);
page_list[PA_CONTROL_PAGE] = __pa(control_page);
-   page_list[VA_CONTROL_PAGE] = (unsigned long)relocate_kernel;
+   page_list[VA_CONTROL_PAGE] = (unsigned long)control_page;
page_list[PA_PGD] = __pa(kexec_pgd);
page_list[VA_PGD] = (unsigned long)kexec_pgd;
 #ifdef CONFIG_X86_PAE
@@ -127,6 +152,7 @@ NORET_TYPE void machine_kexec(struct kim
page_list[VA_PTE_0] = (unsigned long)kexec_pte0;
page_list[PA_PTE_1] = __pa

[PATCH 3/3 -mm] kexec jump -v8 : access memory image of kexec_image

2007-12-20 Thread Huang, Ying
This patch adds a file in proc file system to access the loaded
kexec_image, which may contains the memory image of kexeced
system. This can be used to:

- Communicate between original kernel and kexeced kernel through write
  to some pages in original kernel.

- Communicate between original kernel and kexeced kernel through read
  memory image of kexeced kernel, amend the image, and reload the
  amended image.

- Accelerate boot of kexeced kernel. If you have a memory image of
  kexeced kernel, you need not a normal boot process to jump to the
  kexeced kernel, just load the memory image, jump to the point where
  you leave last time in kexeced kernel.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 fs/proc/Makefile  |1 
 fs/proc/kimgcore.c|  277 ++
 fs/proc/proc_misc.c   |6 +
 include/linux/kexec.h |7 +
 kernel/kexec.c|5 
 5 files changed, 291 insertions(+), 5 deletions(-)

--- /dev/null
+++ b/fs/proc/kimgcore.c
@@ -0,0 +1,277 @@
+/*
+ * fs/proc/kimgcore.c - Interface for accessing the loaded
+ * kexec_image, which may contains the memory image of kexeced system.
+ * Heavily borrowed from fs/proc/kcore.c
+ *
+ * Copyright (C) 2007, Intel Corp.
+ *  Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2
+ */
+
+#include linux/mm.h
+#include linux/proc_fs.h
+#include linux/user.h
+#include linux/elf.h
+#include linux/init.h
+#include linux/kexec.h
+#include linux/io.h
+#include linux/highmem.h
+#include linux/page-flags.h
+#include asm/uaccess.h
+
+struct proc_dir_entry *proc_root_kimgcore;
+
+static u32 kimgcore_size;
+
+static char *elfcorebuf;
+static size_t elfcorebuf_sz;
+
+static void *buf_page;
+
+static ssize_t kimage_copy_to_user(struct kimage *image, char __user *buf,
+  unsigned long offset, size_t count)
+{
+   kimage_entry_t *ptr, entry;
+   unsigned long off = 0, offinp, trunk;
+   struct page *page;
+   void *vaddr;
+
+   for_each_kimage_entry(image, ptr, entry) {
+   if (!(entry  IND_SOURCE))
+   continue;
+   if (off + PAGE_SIZE  offset) {
+   offinp = offset - off;
+   if (count  PAGE_SIZE - offinp)
+   trunk = PAGE_SIZE - offinp;
+   else
+   trunk = count;
+   page = pfn_to_page(entry  PAGE_SHIFT);
+   if (PageHighMem(page)) {
+   vaddr = kmap(page);
+   memcpy(buf_page, vaddr+offinp, trunk);
+   kunmap(page);
+   vaddr = buf_page;
+   } else
+   vaddr = __va(entry  PAGE_MASK) + offinp;
+   if (copy_to_user(buf, vaddr, trunk))
+   return -EFAULT;
+   buf += trunk;
+   offset += trunk;
+   count -= trunk;
+   if (!count)
+   break;
+   }
+   off += PAGE_SIZE;
+   }
+   return count;
+}
+
+static ssize_t kimage_copy_from_user(struct kimage *image,
+const char __user *buf,
+unsigned long offset,
+size_t count)
+{
+   kimage_entry_t *ptr, entry;
+   unsigned long off = 0, offinp, trunk;
+   struct page *page;
+   void *vaddr;
+
+   for_each_kimage_entry(image, ptr, entry) {
+   if (!(entry  IND_SOURCE))
+   continue;
+   if (off + PAGE_SIZE  offset) {
+   offinp = offset - off;
+   if (count  PAGE_SIZE - offinp)
+   trunk = PAGE_SIZE - offinp;
+   else
+   trunk = count;
+   page = pfn_to_page(entry  PAGE_SHIFT);
+   if (PageHighMem(page))
+   vaddr = buf_page;
+   else
+   vaddr = __va(entry  PAGE_MASK) + offinp;
+   if (copy_from_user(vaddr, buf, trunk))
+   return -EFAULT;
+   if (PageHighMem(page)) {
+   vaddr = kmap(page);
+   memcpy(vaddr+offinp, buf_page, trunk);
+   kunmap(page);
+   }
+   buf += trunk;
+   offset += trunk;
+   count -= trunk;
+   if (!count)
+   break;
+   }
+   off += PAGE_SIZE;
+   }
+   return count;
+}
+
+static ssize_t read_kimgcore(struct file *file, char __user

Re: [PATCH 0/3 -mm] kexec jump -v8

2007-12-21 Thread Huang, Ying
On Fri, 2007-12-21 at 19:35 +1100, Nigel Cunningham wrote:
 Hi.
 
 Huang, Ying wrote:
  This patchset provides an enhancement to kexec/kdump. It implements
  the following features:
  
  - Backup/restore memory used both by the original kernel and the
kexeced kernel.
 
 Why the kexeced kernel as well?

The memory range used by kexeced kernel is also the usable memory range
in original kernel. Maybe should be: backup/restore memory used by both
the original kernel and the kexeced kernel. My English is poor.

 [...]
 
  The features of this patchset can be used as follow:
  
  - Kernel/system debug through making system snapshot. You can make
system snapshot, jump back, do some thing and make another system
snapshot.
 
 Are you somehow recording all the filesystem changes after the first
 snapshot? If not, this is pointless (you'll end up with filesystem
 corruption).

This snapshot is not used for restore/resume. It is just used for
debugging. You can check the system state with these snapshots. So I
think it is useful even without recording filesystem changes.

 [...]
 
  - Cooperative multi-kernel/system. With kexec jump, you can switch
between several kernels/systems quickly without boot process except
the first time. This appears like swap a whole kernel/system out/in.
 
 How is this useful to the end user?

I am not sure how useful is this. Maybe I can run a Redhat and a Debian
on my machine and switch between them.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/3 -mm] kexec jump -v8 : add write support to oldmem device

2007-12-21 Thread huang ying
On Dec 21, 2007 6:17 PM, Pavel Machek [EMAIL PROTECTED] wrote:
 Hi!

  This patch adds writing support for /dev/oldmem. This can be used to
 
  - Communicate between original kernel and kexeced kernel through write
to some pages in original kernel.
 
  - Restore the memory contents of hibernated system in kexec based
hibernation.
 
  Signed-off-by: Huang Ying [EMAIL PROTECTED]
 
  --- a/arch/x86/kernel/crash_dump_32.c
  +++ b/arch/x86/kernel/crash_dump_32.c
  +ssize_t write_oldmem_page(unsigned long pfn, const char *buf,
  +   size_t csize, unsigned long offset, int
  userbuf)

  --- a/drivers/char/mem.c
  +++ b/drivers/char/mem.c
  @@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file *
}
return read;
   }
  +
  +/*
  + * Write memory corresponding to the old kernel.
  + */
  +static ssize_t write_oldmem(struct file *file, const char __user *buf,
  + size_t count, loff_t *ppos)
  +{
 ...
  + rc = write_oldmem_page(pfn, buf, csize, offset, 1);

 I believe this is going to break compilation on non-32bit
 machines.

Yes, I will fix this.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] x86_64 EFI runtime service support : Calling convention fix (resend, cc LKML)

2007-12-05 Thread Huang, Ying
In EFI calling convention, %xmm0 - %xmm5 are specified as the scratch
registers (UEFI Specification 2.1, 2.3.4.2). To conforms to EFI
specification, this patch save/restore %xmm0 - %xmm5 registers
before/after invoking EFI runtime service. At the same time, the stack
is aligned in 16 bytes, and TS in CR0 in clear/restore to make it
possible to use SSE2 in EFI runtime service.

This patch is based on 2.6.24-rc4-mm1. And it has been tested on Intel
platforms with 64-bit UEFI 2.0 firmware.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/efi_stub_64.S |   71 +-
 1 file changed, 56 insertions(+), 15 deletions(-)

--- a/arch/x86/kernel/efi_stub_64.S
+++ b/arch/x86/kernel/efi_stub_64.S
@@ -8,61 +8,102 @@
 
 #include linux/linkage.h
 
+#define SAVE_XMM   \
+   mov %rsp, %rax; \
+   subq $0x70, %rsp;   \
+   and $~0xf, %rsp;\
+   mov %rax, (%rsp);   \
+   mov %cr0, %rax; \
+   clts;   \
+   mov %rax, 0x8(%rsp);\
+   movaps %xmm0, 0x60(%rsp);   \
+   movaps %xmm1, 0x50(%rsp);   \
+   movaps %xmm2, 0x40(%rsp);   \
+   movaps %xmm3, 0x30(%rsp);   \
+   movaps %xmm4, 0x20(%rsp);   \
+   movaps %xmm5, 0x10(%rsp)
+
+#define RESTORE_XMM\
+   movaps 0x60(%rsp), %xmm0;   \
+   movaps 0x50(%rsp), %xmm1;   \
+   movaps 0x40(%rsp), %xmm2;   \
+   movaps 0x30(%rsp), %xmm3;   \
+   movaps 0x20(%rsp), %xmm4;   \
+   movaps 0x10(%rsp), %xmm5;   \
+   mov 0x8(%rsp), %rsi;\
+   mov %rsi, %cr0; \
+   mov (%rsp), %rsp
+
 ENTRY(efi_call0)
-   subq $40, %rsp
+   SAVE_XMM
+   subq $32, %rsp
call *%rdi
-   addq $40, %rsp
+   addq $32, %rsp
+   RESTORE_XMM
ret
 
 ENTRY(efi_call1)
-   subq $40, %rsp
+   SAVE_XMM
+   subq $32, %rsp
mov  %rsi, %rcx
call *%rdi
-   addq $40, %rsp
+   addq $32, %rsp
+   RESTORE_XMM
ret
 
 ENTRY(efi_call2)
-   subq $40, %rsp
+   SAVE_XMM
+   subq $32, %rsp
mov  %rsi, %rcx
call *%rdi
-   addq $40, %rsp
+   addq $32, %rsp
+   RESTORE_XMM
ret
 
 ENTRY(efi_call3)
-   subq $40, %rsp
+   SAVE_XMM
+   subq $32, %rsp
mov  %rcx, %r8
mov  %rsi, %rcx
call *%rdi
-   addq $40, %rsp
+   addq $32, %rsp
+   RESTORE_XMM
ret
 
 ENTRY(efi_call4)
-   subq $40, %rsp
+   SAVE_XMM
+   subq $32, %rsp
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
call *%rdi
-   addq $40, %rsp
+   addq $32, %rsp
+   RESTORE_XMM
ret
 
 ENTRY(efi_call5)
-   subq $40, %rsp
+   SAVE_XMM
+   subq $48, %rsp
mov %r9, 32(%rsp)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
call *%rdi
-   addq $40, %rsp
+   addq $48, %rsp
+   RESTORE_XMM
ret
 
 ENTRY(efi_call6)
-   subq $56, %rsp
-   mov 56+8(%rsp), %rax
+   SAVE_XMM
+   mov (%rsp), %rax
+   mov 8(%rax), %rax
+   subq $48, %rsp
mov %r9, 32(%rsp)
mov %rax, 40(%rsp)
mov %r8, %r9
mov %rcx, %r8
mov %rsi, %rcx
call *%rdi
-   addq $56, %rsp
+   addq $48, %rsp
+   RESTORE_XMM
ret
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore

2007-12-06 Thread Huang, Ying
This patch adds a file in proc file system to access the loaded
kexec_image, which may contains the memory image of kexeced
system. This can be used by kexec based hibernation to create a file
image of hibernating kernel, so that a kernel booting process is not
needed for each hibernating.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 fs/proc/Makefile  |1 
 fs/proc/kimgcore.c|  204 ++
 fs/proc/proc_misc.c   |5 +
 include/linux/kexec.h |7 +
 kernel/kexec.c|5 -
 5 files changed, 217 insertions(+), 5 deletions(-)

--- /dev/null
+++ b/fs/proc/kimgcore.c
@@ -0,0 +1,204 @@
+/*
+ * fs/proc/kimgcore.c - Interface for accessing the loaded
+ * kexec_image, which may contains the memory image of kexeced system.
+ * Heavily borrowed from fs/proc/kcore.c
+ *
+ * Copyright (C) 2007, Intel Corp.
+ *  Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2
+ */
+
+#include linux/mm.h
+#include linux/proc_fs.h
+#include linux/user.h
+#include linux/elf.h
+#include linux/init.h
+#include linux/kexec.h
+#include linux/io.h
+#include linux/highmem.h
+#include asm/uaccess.h
+
+struct proc_dir_entry *proc_root_kimgcore;
+
+static u32 kimgcore_size;
+
+static char *elfcorebuf;
+static size_t elfcorebuf_sz;
+
+static void *buf_page;
+
+static ssize_t kimage_copy_to_user(struct kimage *image, char __user *buf,
+ unsigned long offset, size_t count)
+{
+   kimage_entry_t *ptr, entry;
+   unsigned long off = 0, offinp, trunk;
+   struct page *page;
+   void *vaddr;
+
+   for_each_kimage_entry(image, ptr, entry) {
+   if (!(entry  IND_SOURCE))
+   continue;
+   if (off + PAGE_SIZE  offset) {
+   offinp = offset - off;
+   if (count  PAGE_SIZE - offinp)
+   trunk = PAGE_SIZE - offinp;
+   else
+   trunk = count;
+   page = pfn_to_page(entry  PAGE_SHIFT);
+   if (PageHighMem(page)) {
+   vaddr = kmap(page);
+   memcpy(buf_page, vaddr+offinp, trunk);
+   kunmap(page);
+   vaddr = buf_page;
+   } else
+   vaddr = __va(entry  PAGE_MASK) + offinp;
+   if (copy_to_user(buf, vaddr, trunk))
+   return -EFAULT;
+   buf += trunk;
+   offset += trunk;
+   count -= trunk;
+   if (!count)
+   break;
+   }
+   off += PAGE_SIZE;
+   }
+   return count;
+}
+
+static ssize_t read_kimgcore(struct file *file, char __user *buffer,
+size_t buflen, loff_t *fpos)
+{
+   size_t acc = 0;
+   size_t tsz;
+   ssize_t ssz;
+
+   if (buflen == 0 || *fpos = kimgcore_size)
+   return 0;
+
+   /* trim buflen to not go beyond EOF */
+   if (buflen  kimgcore_size - *fpos)
+   buflen = kimgcore_size - *fpos;
+   /* Read ELF core header */
+   if (*fpos  elfcorebuf_sz) {
+   tsz = elfcorebuf_sz - *fpos;
+   if (buflen  tsz)
+   tsz = buflen;
+   if (copy_to_user(buffer, elfcorebuf + *fpos, tsz))
+   return -EFAULT;
+   buflen -= tsz;
+   *fpos += tsz;
+   buffer += tsz;
+   acc += tsz;
+
+   /* leave now if filled buffer already */
+   if (buflen == 0)
+   return acc;
+   }
+
+   ssz = kimage_copy_to_user(kexec_image, buffer,
+ *fpos - elfcorebuf_sz, buflen);
+   if (ssz  0)
+   return ssz;
+
+   *fpos += (buflen - ssz);
+   acc += (buflen - ssz);
+
+   return acc;
+}
+
+static int init_kimgcore(void)
+{
+   Elf64_Ehdr *ehdr;
+   Elf64_Phdr *phdr;
+   struct kexec_segment *seg;
+   Elf64_Off off;
+   unsigned long i;
+
+   elfcorebuf_sz = sizeof(Elf64_Ehdr) +
+   kexec_image-nr_segments * sizeof(Elf64_Phdr);
+   elfcorebuf = kzalloc(elfcorebuf_sz, GFP_KERNEL);
+   if (!elfcorebuf)
+   return -ENOMEM;
+   ehdr = (Elf64_Ehdr *)elfcorebuf;
+   memcpy(ehdr-e_ident, ELFMAG, SELFMAG);
+   ehdr-e_ident[EI_CLASS] = ELFCLASS64;
+   ehdr-e_ident[EI_DATA]  = ELFDATA2LSB;
+   ehdr-e_ident[EI_VERSION] = EV_CURRENT;
+   ehdr-e_ident[EI_OSABI] = ELFOSABI_NONE;
+   memset(ehdr-e_ident+EI_PAD, 0, EI_NIDENT-EI_PAD);
+   ehdr-e_type = ET_CORE;
+   ehdr-e_machine = ELF_ARCH;
+   ehdr-e_version = EV_CURRENT;
+   ehdr-e_entry = kexec_image-start;
+   ehdr-e_phoff = sizeof

[PATCH 2/4 -mm] kexec based hibernation -v7 : kexec restore

2007-12-06 Thread Huang, Ying
This patch adds writing support for /dev/oldmem. This is used to
restore the memory contents of hibernated system.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/crash_dump_32.c |   27 +++
 drivers/char/mem.c  |   32 
 include/linux/crash_dump.h  |2 ++
 3 files changed, 61 insertions(+)

--- a/arch/x86/kernel/crash_dump_32.c
+++ b/arch/x86/kernel/crash_dump_32.c
@@ -59,6 +59,33 @@ ssize_t copy_oldmem_page(unsigned long p
return csize;
 }
 
+ssize_t write_oldmem_page(unsigned long pfn, const char *buf,
+ size_t csize, unsigned long offset, int userbuf)
+{
+   void  *vaddr;
+
+   if (!csize)
+   return 0;
+
+   if (!userbuf) {
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, buf, csize);
+   } else {
+   if (!kdump_buf_page) {
+   printk(KERN_WARNING Kdump: Kdump buffer page not
+allocated\n);
+   return -EFAULT;
+   }
+   if (copy_from_user(kdump_buf_page, buf, csize))
+   return -EFAULT;
+   vaddr = kmap_atomic_pfn(pfn, KM_PTE0);
+   memcpy(vaddr + offset, kdump_buf_page, csize);
+   }
+   kunmap_atomic(vaddr, KM_PTE0);
+
+   return csize;
+}
+
 static int __init kdump_buf_page_init(void)
 {
int ret = 0;
--- a/include/linux/crash_dump.h
+++ b/include/linux/crash_dump.h
@@ -11,6 +11,8 @@
 extern unsigned long long elfcorehdr_addr;
 extern ssize_t copy_oldmem_page(unsigned long, char *, size_t,
unsigned long, int);
+extern ssize_t write_oldmem_page(unsigned long, const char *, size_t,
+unsigned long, int);
 extern const struct file_operations proc_vmcore_operations;
 extern struct proc_dir_entry *proc_vmcore;
 
--- a/drivers/char/mem.c
+++ b/drivers/char/mem.c
@@ -348,6 +348,37 @@ static ssize_t read_oldmem(struct file *
}
return read;
 }
+
+/*
+ * Write memory corresponding to the old kernel.
+ */
+static ssize_t write_oldmem(struct file *file, const char __user *buf,
+   size_t count, loff_t *ppos)
+{
+   unsigned long pfn, offset;
+   size_t write = 0, csize;
+   int rc = 0;
+
+   while (count) {
+   pfn = *ppos / PAGE_SIZE;
+   if (pfn  saved_max_pfn)
+   return write;
+
+   offset = (unsigned long)(*ppos % PAGE_SIZE);
+   if (count  PAGE_SIZE - offset)
+   csize = PAGE_SIZE - offset;
+   else
+   csize = count;
+   rc = write_oldmem_page(pfn, buf, csize, offset, 1);
+   if (rc  0)
+   return rc;
+   buf += csize;
+   *ppos += csize;
+   write += csize;
+   count -= csize;
+   }
+   return write;
+}
 #endif
 
 extern long vread(char *buf, char *addr, unsigned long count);
@@ -783,6 +814,7 @@ static const struct file_operations full
 #ifdef CONFIG_CRASH_DUMP
 static const struct file_operations oldmem_fops = {
.read   = read_oldmem,
+   .write  = write_oldmem,
.open   = open_oldmem,
 };
 #endif
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/4 -mm] kexec based hibernation -v7

2007-12-06 Thread Huang, Ying
. Load the memory image of hibernating kernel with following shell
   command line:

   kexec -l --args-none --flags=0x3e kimgcore

7. Start the real hibernating process with following shell command line:

   kexec -e -c 0x6b630001

   The hibernating kernel will write the memory image of hibernated
   kernel and go to ACPI S4 state automatically.

8. Boot kernel (kernel C) compiled for hibernating/resuming usage in
   memory range of kernel B.

   The go_to_resume should be specified in kernel command line to
   trigger the resuming process automatically. For example, the
   following kernel command line parameters can be used:

   memmap=exactmap [EMAIL PROTECTED] [EMAIL PROTECTED] mem=16M go_to_resume 
khdev=3:7

   The initramfs should be used too. In GRUB, this can be specified
   with following grub command:

   initrd /boot/rootfs.gz

   The resuming kernel will restore the memory image of hibernated
   kernel and jump back to hibernated kernel automatically.


Known issues:

- The suspend/resume callback of device drivers are used to put
  devices into quiescent state. This will unnecessarily (possibly
  harmfully) put devices into low power state. This is intended to be
  solved by separating device quiesce/unquiesce callback from the
  device suspend/resume callback.

- The memory image of hibernated kernel must be saved in a separate
  partition not used by hibernated kernel. This is planned to be
  solved through making hibernating/resuming kernel write the memory
  image to a file in partition used by hibernated kernel through block
  list instead.

- The hibernating/resuming code are duplicated with current u/swsusp
  code. They will be merged when kexec based hibernation goes more
  stable.

- The setup of hibernate/resume is fairly complex. I will continue
  working on simplifying.


TODO:

- Write the memory image to a file through block list instead of
  ordinary file system operating.

- Merge duplicated code between kexec based hibernation and u/swsusp.

- Simplify hibernate/resume setup.

- Resume from hibernation with bootloader.


ChangeLog:

v7:

- Add an interface to dump the loaded kexec_image, which may contains
  the memory image of kexeced system. This is used to accelerate kexec
  based hibernation.

- Refactor kexec jump to be a command driven programming model.

- Adjust ACPI support to mimic the ACPI support of u/swsusp.

- Use kexec_lock to do synchronization.

v6:

- Add ACPI support.

- Refactor kexec jump to be a general facility to call real mode code.

v5:

- A flag (KEXEC_JUMP_BACK) is added to indicate the loaded kernel
  image is used for jumping back. The reboot command for jumping back
  is removed. This interface is more stable (proposed by Eric
  Biederman).

- NX bit handling support for kexec is added.

- Merge machine_kexec and machine_kexec_jump, remove NO_RET attribute
  from machine_kexec.

- Passing jump back entry to kexeced kernel via kernel command line
  (parsed by user space tool via /proc/cmdline instead of
  kernel). Original corresponding boot parameter and sysfs code is
  removed.

v4:

- Two reboot command are merged back to one because the underlying
  implementation is same.

- Jumping without reserving memory is implemented. As a side effect,
  two direction jumping is implemented.

- A jump back protocol is defined and documented. The original kernel
  and kexeced kernel are more independent from each other.

- The CPU state save/restore code are merged into relocate_kernel.S.

v3:

- The reboot command LINUX_REBOOT_CMD_KJUMP is split into to two
  reboot command to reflect the different function.

- Document is added for added kernel parameters.

- /sys/kernel/kexec_jump_buf_pfn is made writable, it is used for
  memory image restoring.

- Console restoring after jumping back is implemented.

- Writing support is added for /dev/oldmem, to restore memory contents
  of hibernated system.

v2:

- The kexec jump implementation is put into the kexec/kdump framework
  instead of software suspend framework. The device and CPU state
  save/restore code of software suspend is called when needed.

- The same code path is used for both kexec a new kernel and jump back
  to original kernel.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

2007-12-06 Thread Huang, Ying
This patch implements the functionality of jumping between the kexeced
kernel and the original kernel.

To support jumping between two kernels, before jumping to (executing)
the new kernel and jumping back to the original kernel, the devices
are put into quiescent state, and the state of devices and CPU is
saved. After jumping back from kexeced kernel and jumping to the new
kernel, the state of devices and CPU are restored accordingly. The
devices/CPU state save/restore code of software suspend is called to
implement corresponding function.

To support jumping without reserving memory. One shadow backup page
(source page) is allocated for each page used by new (kexeced) kernel
(destination page). When do kexec_load, the image of new kernel is
loaded into source pages, and before executing, the destination pages
and the source pages are swapped, so the contents of destination pages
are backupped. Before jumping to the new (kexeced) kernel and after
jumping back to the original kernel, the destination pages and the
source pages are swapped too.

A jump back protocol for kexec is defined and documented. It is an
extension to ordinary function calling protocol. So, the facility
provided by this patch can be used to call ordinary C function in real
mode.

A set of flags for sys_kexec_load are added to control which state are
saved/restored before/after real mode code executing. For example, you
can specify the device state and FPU state are saved/restored
before/after real mode code executing.

The states (exclude CPU state) save/restore code can be overridden
based on the command parameter of kexec jump. Because more states
need to be saved/restored by hibernating/resuming.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/i386/jump_back_protocol.txt |  103 ++
 arch/powerpc/kernel/machine_kexec.c   |2 
 arch/ppc/kernel/machine_kexec.c   |2 
 arch/sh/kernel/machine_kexec.c|2 
 arch/x86/kernel/machine_kexec_32.c|   88 +---
 arch/x86/kernel/machine_kexec_64.c|2 
 arch/x86/kernel/relocate_kernel_32.S  |  214 +++---
 include/asm-x86/kexec_32.h|   39 -
 include/linux/kexec.h |   40 +
 kernel/kexec.c|  188 ++
 kernel/power/Kconfig  |2 
 kernel/sys.c  |   35 +++-
 12 files changed, 648 insertions(+), 69 deletions(-)

--- a/arch/x86/kernel/machine_kexec_32.c
+++ b/arch/x86/kernel/machine_kexec_32.c
@@ -20,6 +20,7 @@
 #include asm/cpufeature.h
 #include asm/desc.h
 #include asm/system.h
+#include asm/cacheflush.h
 
 #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
 static u32 kexec_pgd[1024] PAGE_ALIGNED;
@@ -83,10 +84,14 @@ static void load_segments(void)
  * reboot code buffer to allow us to avoid allocations
  * later.
  *
- * Currently nothing.
+ * Turn off NX bit for control page.
  */
 int machine_kexec_prepare(struct kimage *image)
 {
+   if (nx_enabled) {
+   change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC);
+   global_flush_tlb();
+   }
return 0;
 }
 
@@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage 
  */
 void machine_kexec_cleanup(struct kimage *image)
 {
+   if (nx_enabled) {
+   change_page_attr(image-control_code_page, 1, PAGE_KERNEL);
+   global_flush_tlb();
+   }
+}
+
+void machine_kexec(struct kimage *image)
+{
+   machine_kexec_call(image, NULL, 0);
 }
 
 /*
  * Do not allocate memory (or fail in any way) in machine_kexec().
  * We are past the point of no return, committed to rebooting now.
  */
-NORET_TYPE void machine_kexec(struct kimage *image)
+int machine_kexec_vcall(struct kimage *image, unsigned long *ret,
+unsigned int argc, va_list args)
 {
unsigned long page_list[PAGES_NR];
void *control_page;
+   asmlinkage NORET_TYPE void
+   (*relocate_kernel_ptr)(unsigned long indirection_page,
+  unsigned long control_page,
+  unsigned long start_address,
+  unsigned int has_pae) ATTRIB_NORET;
 
/* Interrupts aren't acceptable while we reboot */
local_irq_disable();
 
control_page = page_address(image-control_code_page);
-   memcpy(control_page, relocate_kernel, PAGE_SIZE);
+   memcpy(control_page, relocate_page, PAGE_SIZE/2);
+   KCALL_MAGIC(control_page) = 0;
 
+   if (image-preserve_cpu) {
+   unsigned int i;
+   KCALL_MAGIC(control_page) = KCALL_MAGIC_NUMBER;
+   KCALL_ARGC(control_page) = argc;
+   for (i = 0; i  argc; i++)
+   KCALL_ARGS(control_page)[i] = \
+   va_arg(args, unsigned long

[PATCH 3/4 -mm] kexec based hibernation -v7 : kexec hibernate/resume

2007-12-06 Thread Huang, Ying
This patch implements kexec based hibernate/resume. This is based on
the facility provided by kexec_jump. The states save/restore code of
ordinary kexec_jump is overridden by hibernate/resume specific
code. The ACPI methods are called at specified environment to conform
the ACPI specification. A new reboot command is added to go to ACPI S4
state from user space.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 include/linux/kexec.h   |4 
 include/linux/reboot.h  |1 
 include/linux/suspend.h |1 
 kernel/power/disk.c |  244 +++-
 kernel/sys.c|5 
 5 files changed, 251 insertions(+), 4 deletions(-)

--- a/kernel/power/disk.c
+++ b/kernel/power/disk.c
@@ -21,6 +21,7 @@
 #include linux/console.h
 #include linux/cpu.h
 #include linux/freezer.h
+#include linux/kexec.h
 
 #include power.h
 
@@ -365,13 +366,13 @@ int hibernation_platform_enter(void)
 }
 
 /**
- * power_down - Shut the machine down for hibernation.
+ * hibernate_power_down - Shut the machine down for hibernation.
  *
  * Use the platform driver, if configured so; otherwise try
  * to power off or reboot.
  */
 
-static void power_down(void)
+void hibernate_power_down(void)
 {
switch (hibernation_mode) {
case HIBERNATION_TEST:
@@ -461,7 +462,7 @@ int hibernate(void)
error = swsusp_write(flags);
swsusp_free();
if (!error)
-   power_down();
+   hibernate_power_down();
} else {
pr_debug(PM: Image restored successfully.\n);
swsusp_free();
@@ -478,6 +479,243 @@ int hibernate(void)
return error;
 }
 
+#ifdef CONFIG_KEXEC
+static int kexec_snapshot(struct notifier_block *nb,
+ unsigned long cmd, void *arg)
+{
+   int error;
+   int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM);
+
+   if (cmd != KJUMP_CMD_HIBERNATE_WRITE_IMAGE)
+   return NOTIFY_DONE;
+
+   pm_prepare_console();
+
+   error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
+   if (error)
+   goto Exit;
+
+   error = freeze_processes();
+   if (error) {
+   error = -EBUSY;
+   goto Exit;
+   }
+
+   if (hibernation_test(TEST_FREEZER) ||
+   hibernation_testmode(HIBERNATION_TESTPROC)) {
+   error = -EAGAIN;
+   goto Resume_process;
+   }
+
+   error = platform_start(platform_mode);
+   if (error)
+   goto Resume_process;
+
+   suspend_console();
+   error = device_suspend(PMSG_FREEZE);
+   if (error)
+   goto Resume_console;
+
+   if (hibernation_test(TEST_DEVICES)) {
+   error = -EAGAIN;
+   goto Resume_devices;
+   }
+
+   error = platform_pre_snapshot(platform_mode);
+   if (error)
+   goto Resume_devices;
+
+   if (hibernation_test(TEST_PLATFORM)) {
+   error = -EAGAIN;
+   goto Resume_devices;
+   }
+
+   error = disable_nonboot_cpus();
+   if (error)
+   goto Resume_devices;
+
+   if (hibernation_test(TEST_CPUS) ||
+   hibernation_testmode(HIBERNATION_TEST)) {
+   error = -EAGAIN;
+   goto Enable_cpus;
+   }
+
+   local_irq_disable();
+   /* At this point, device_suspend() has been called, but *not*
+* device_power_down(). We *must* device_power_down() now.
+* Otherwise, drivers for some devices (e.g. interrupt
+* controllers) become desynchronized with the actual state of
+* the hardware at resume time, and evil weirdness ensues.
+*/
+   error = device_power_down(PMSG_FREEZE);
+   if (error)
+   goto Enable_irqs;
+
+   if (hibernation_test(TEST_CORE)) {
+   error = -EAGAIN;
+   goto Power_up;
+   }
+
+   return NOTIFY_STOP;
+
+ Power_up:
+   device_power_up();
+ Enable_irqs:
+   local_irq_enable();
+ Enable_cpus:
+   enable_nonboot_cpus();
+ Resume_devices:
+   platform_finish(platform_mode);
+   device_resume();
+ Resume_console:
+   resume_console();
+ Resume_process:
+   thaw_processes();
+ Exit:
+   pm_notifier_call_chain(PM_POST_HIBERNATION);
+   pm_restore_console();
+   return notifier_from_errno(error);
+}
+
+static int kexec_prepare_write_image(struct notifier_block *nb,
+unsigned long cmd, void *arg)
+{
+   int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM);
+
+   if (cmd != KJUMP_CMD_HIBERNATE_WRITE_IMAGE)
+   return NOTIFY_DONE;
+
+   device_power_up();
+   local_irq_enable();
+   enable_nonboot_cpus();
+   platform_finish(platform_mode);
+   device_resume();
+   resume_console();
+   thaw_processes();
+   pm_restore_console();
+   return

Re: [PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore

2007-12-07 Thread huang ying
On Dec 7, 2007 8:33 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 On Friday, 7 of December 2007, Huang, Ying wrote:
  This patch adds a file in proc file system to access the loaded
  kexec_image, which may contains the memory image of kexeced
  system. This can be used by kexec based hibernation to create a file
  image of hibernating kernel, so that a kernel booting process is not
  needed for each hibernating.

 Hm, I'm not sure what you mean.

 Can you explain a bit, please?

The normal kexec based hibernation procedure is as follow:

1. kexec_load the kernel image and initramfs
2. jump to hibernating kernel
3. the normal boot process of kexeced kernel
4. jump back to hibernated kernel
5. execute ACPI methods
6. jump to hibernating kernel
7. write memory image of hibernated kernel
8. go to ACPI S4 state

With kimgcore:

A. Prepare a memory image of hibernation kernel:

A.1 kexec_load the kernel image and initramfs
A.2 jump to hibernating kernel
A.3 the normal boot process of kexeced kernel
A.4 jump back to hibernated kernel
A.5 save the memory image of hibernating kernel via kimgcore

The normal hibernate process is as follow:

1. kexec load the kimgcore of hibernatin kernel
2. jump to the hibernating kernel
3. execute ACPI methods
4. jump to hibernating kernel
5. write memory image of hibernated kernel
6. go to ACPI S4 state

So the boot process of hibernating kernel needs only once unless the
hardware configuration is changed.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/4 -mm] kexec based hibernation -v7 : kexec hibernate/resume

2007-12-07 Thread huang ying
On Dec 7, 2007 8:52 PM, Rafael J. Wysocki [EMAIL PROTECTED] wrote:
 On Friday, 7 of December 2007, Huang, Ying wrote:
  This patch implements kexec based hibernate/resume. This is based on
  the facility provided by kexec_jump. The states save/restore code of
  ordinary kexec_jump is overridden by hibernate/resume specific
  code.

 Can you explain in more details how this works?

Two blocking notifier chain named kjump_chain_pre and kjump_chain_post
are defined, the basic procedure of kexec jump is as follow:

call functions in kjump_chain_pre
jump to peer kernel
call functions in kjump_chain_post

A command is the first parameter of functions in chain. If A command
is processed in a function, the function will execute and stop the
chain (return NOTIFY_STOP), otherwise it will do nothing (return
NOTIFY_DONE). If no function has interest in the command, the default
behavior will be executed (kexec_vcall_pre, kexec_vcall_post).

So for each command the procedure is as follow:

KJUMP_CMD_HIBERNATE_WRITE_IMAGE:
[chain] kexec_snapshot
jump to kexeced kernel
[chain] kexec_prepare_write_image /* in kexeced kernel */

KJUMP_HIBERNATE_RESUME:
[chain] kexec_prepare_resume /* in kexeced kernel */
jump to kexec kernel
[chain] kexec_resume

  The ACPI methods are called at specified environment to conform
  the ACPI specification. A new reboot command is added to go to ACPI S4
  state from user space.

 Well, I still don't like the amount of duplicated code introduced by this 
 patch.

Yes, there are too many duplicated code. They should be merged. But I
want to delay the merging until the kexec based hibernation code goes
more stable.

 Also, IMO it should be using the mutual exclusion mechanisms used by the
 existing hibernation code, ie. pm_mutex and the snapshot_device_available
 atomic variable.

Now the kexec_lock is used as a mutex between kexec related
operations. It seems reasonable to use pm_mutex and maybe
snapshot_device_available to eliminate potential conflict between
kexec based hibernation and u/swsusp.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume

2007-11-19 Thread Huang, Ying
On Mon, 2007-11-19 at 19:22 +0100, Rafael J. Wysocki wrote:
  +#ifdef CONFIG_KEXEC
  +static void kexec_hibernate_power_down(void)
  +{
  +   switch (hibernation_mode) {
  +   case HIBERNATION_TEST:
  +   case HIBERNATION_TESTPROC:
  +   break;
  +   case HIBERNATION_REBOOT:
  +   machine_restart(NULL);
  +   break;
  +   case HIBERNATION_PLATFORM:
  +   if (!hibernation_ops)
  +   break;
  +   hibernation_ops-enter();
 
 hibernation_platform_enter() should be used here (as of the current mainline).

The power_down will be called with interrupt disabled, device suspended,
non-boot CPU disabled. But the latest hibernate_platform_enter calls the
device_suspend, disable_nonboot_cpus etc function. So, I use
hibernation_ops-enter() directly instead of
hibernation_platform_enter().

  +   /* We should never get here */
  +   while (1);
  +   break;
  +   case HIBERNATION_SHUTDOWN:
  +   machine_power_off();
  +   break;
  +   }
  +   machine_halt();
  +   /*
  +* Valid image is on the disk, if we continue we risk serious data
  +* corruption after resume.
  +*/
  +   printk(KERN_CRIT Please power me down manually\n);
  +   while (1);
  +}
 
 Hm, what's the difference between the above function and power_down(),
 actually?

Same as above.

  +
  +int kexec_hibernate(struct kimage *image)
  +{
  +   int error;
  +   int platform_mode = (hibernation_mode == HIBERNATION_PLATFORM);
  +   unsigned long cmd_ret;
  +
  +   mutex_lock(pm_mutex);
  +
  +   pm_prepare_console();
  +   suspend_console();
  +
  +   error = pm_notifier_call_chain(PM_HIBERNATION_PREPARE);
  +   if (error)
  +   goto Resume_console;
  +
  +   error = platform_start(platform_mode);
  +   if (error)
  +   goto Resume_console;
  +
  +   error = device_suspend(PMSG_FREEZE);
  +   if (error)
  +   goto Resume_console;
  +
  +   error = platform_pre_snapshot(platform_mode);
  +   if (error)
  +   goto Resume_devices;
  +
  +   error = disable_nonboot_cpus();
  +   if (error)
  +   goto Resume_devices;
 
 I wonder if it's viable to merge the above with hibernate() and
 hibernation_snapshot() somehow, to avoid code duplication?

Yes. Most code are duplicated. But there is one advantage not to merge
them: power_down can called with IRQ disabled to make it possible to
eliminate the freezer.

I think it is possible to merge the two implementation. I will try to do
it.

 Apart from the above, there's some new debug code to be added to disk.c
 in 2.6.25.  It's in the ACPI test tree right now and you can get it as
 individual patches from:
 http://www.sisk.pl/kernel/hibernation_and_suspend/2.6.24-rc2/patches/
 (patches 10-12).
 
 Please base your changes on top of that.

OK, I will do it.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume

2007-11-19 Thread Huang, Ying
On Tue, 2007-11-20 at 03:24 +0100, Rafael J. Wysocki wrote:
 On Tuesday, 20 of November 2007, Huang, Ying wrote:
  On Mon, 2007-11-19 at 19:22 +0100, Rafael J. Wysocki wrote:
+#ifdef CONFIG_KEXEC
+static void kexec_hibernate_power_down(void)
+{
+   switch (hibernation_mode) {
+   case HIBERNATION_TEST:
+   case HIBERNATION_TESTPROC:
+   break;
+   case HIBERNATION_REBOOT:
+   machine_restart(NULL);
+   break;
+   case HIBERNATION_PLATFORM:
+   if (!hibernation_ops)
+   break;
+   hibernation_ops-enter();
   
   hibernation_platform_enter() should be used here (as of the current 
   mainline).
  
  The power_down will be called with interrupt disabled, device suspended,
  non-boot CPU disabled. But the latest hibernate_platform_enter calls the
  device_suspend, disable_nonboot_cpus etc function. So, I use
  hibernation_ops-enter() directly instead of
  hibernation_platform_enter().
 
 Hm, you need to call device_power_down(PMSG_SUSPEND) before
 hibernation_ops-enter().
 
 Also, all of the ACPI global calls need to be carried out before that and the
 devices should be suspended rather than shut down in that case.
 
 That's why hibernation_platform_enter() has been introduced, BTW.

Situation is a little different between u/swsusp and khiberantion.

u/swsusp:

platform_start();
suspend console();
device_suspend(PMSG_FREEZE);
platform_pre_snapshot();
disable_nonboot_cpus();
local_irq_disable();
device_power_down(PMSG_FREEZE);
/* create snapshot */
device_power_up();
local_irq_enable();
enable_nonboot_cpus();
platform_finish();
device_resume();
resume_console();
/* write the image out */
hibernation_ops-start();
suspend_console();
device_suspend(PMSG_SUSPEND);
hibernation_ops-prepare();
disable_nonboot_cpus();
local_irq_disable();
device_power_down(PMSG_SUSPEND);
hibernation_ops-enter();

khibernation:

suspend_console();
platform_start();
device_suspend(PMSG_FREEZE);
platform_pre_snapshot();
disable_nonboot_cpus();
local_irq_disable();
device_power_down(PMSG_FREEZE);
/* jump to kexeced (hibernating) kernel */
/* in kexeced kernel */
device_power_up();
local_irq_eanble();
enable_nonboot_cpus();
device_resume();
resume_console();
/* write the image */
suspend_console();
device_suspend(PMSG_FREEZE);
disable_nonboot_cpus();
local_irq_disable();
device_power_down(PMSG_FREEZE);
/* jump to original (hibernated) kernel */
/* in original kernel */
hibernation_ops-enter();

The difference is:

- In u/swsusp, ACPI methods are executed twice, before writing out the
image and after writing out the image.
- After writing out the image, the PMSG_SUSPEND is used instead of
PMSG_FREEZE.

Some questions:

- What is the difference between PMSG_SUSPEND and PMSG_FREEZE?
- The ACPI methods should be executed once or twice? According to ACPI
specification?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/3 -mm] kexec based hibernation -v6: kexec hibernate/resume

2007-11-21 Thread Huang, Ying
On Wed, 2007-11-21 at 01:00 +0100, Rafael J. Wysocki wrote:
 On Tuesday, 20 of November 2007, Huang, Ying wrote:
  On Tue, 2007-11-20 at 03:24 +0100, Rafael J. Wysocki wrote:
   On Tuesday, 20 of November 2007, Huang, Ying wrote:
On Mon, 2007-11-19 at 19:22 +0100, Rafael J. Wysocki wrote:
  +#ifdef CONFIG_KEXEC
  +static void kexec_hibernate_power_down(void)
  +{
  +   switch (hibernation_mode) {
  +   case HIBERNATION_TEST:
  +   case HIBERNATION_TESTPROC:
  +   break;
  +   case HIBERNATION_REBOOT:
  +   machine_restart(NULL);
  +   break;
  +   case HIBERNATION_PLATFORM:
  +   if (!hibernation_ops)
  +   break;
  +   hibernation_ops-enter();
 
 hibernation_platform_enter() should be used here (as of the current 
 mainline).

The power_down will be called with interrupt disabled, device suspended,
non-boot CPU disabled. But the latest hibernate_platform_enter calls the
device_suspend, disable_nonboot_cpus etc function. So, I use
hibernation_ops-enter() directly instead of
hibernation_platform_enter().
   
   Hm, you need to call device_power_down(PMSG_SUSPEND) before
   hibernation_ops-enter().
   
   Also, all of the ACPI global calls need to be carried out before that and 
   the
   devices should be suspended rather than shut down in that case.
   
   That's why hibernation_platform_enter() has been introduced, BTW.
  
  Situation is a little different between u/swsusp and khiberantion.
  
  u/swsusp:
  
  platform_start();
  suspend console();
  device_suspend(PMSG_FREEZE);
  platform_pre_snapshot();
  disable_nonboot_cpus();
  local_irq_disable();
  device_power_down(PMSG_FREEZE);
  /* create snapshot */
  device_power_up();
  local_irq_enable();
  enable_nonboot_cpus();
  platform_finish();
  device_resume();
  resume_console();
  /* write the image out */
  hibernation_ops-start();
  suspend_console();
  device_suspend(PMSG_SUSPEND);
  hibernation_ops-prepare();
  disable_nonboot_cpus();
  local_irq_disable();
  device_power_down(PMSG_SUSPEND);
  hibernation_ops-enter();
  
  khibernation:
  
  suspend_console();
  platform_start();
  device_suspend(PMSG_FREEZE);
  platform_pre_snapshot();
  disable_nonboot_cpus();
  local_irq_disable();
  device_power_down(PMSG_FREEZE);
  /* jump to kexeced (hibernating) kernel */
  /* in kexeced kernel */
  device_power_up();
  local_irq_eanble();
  enable_nonboot_cpus();
 
 You should call platform_finish() here, or device_resume() will not work
 appropriately on some systems.
 
 However, after platform_finish() has been executed, the ACPI firmware will
 assume that the hibernation has been canceled, so you need to tell it that
 you'd like to go into the low power state after all.
 
  device_resume();
  resume_console();
  /* write the image */
 
 For this reason, you have to call hibernation_ops-start() once again
 and the other functions like in the swsusp case, in that order.
 
  suspend_console();
  device_suspend(PMSG_FREEZE);
  disable_nonboot_cpus();
  local_irq_disable();
  device_power_down(PMSG_FREEZE);
  /* jump to original (hibernated) kernel */
 
 This looks too fragile to my eyes.
 
 Why don't you call hibernation_ops-enter() directly from the kexeced 
 kernel?

I don't know whether there are ACPI global state inside Linux kernel. So
I restrict all ACPI method calling in original kernel. If the ACPI
global state in Linux kernel is not a issue, I can call
hibernation_ops-enter() directly in the kexeced kernel.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 3/4 -v6] x86_64 EFI runtime service support: document for EFI runtime services

2007-11-26 Thread Huang, Ying
This patch adds document for EFI x86_64 runtime services support.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/x86_64/boot-options.txt |9 -
 Documentation/x86_64/uefi.txt |9 +
 2 files changed, 17 insertions(+), 1 deletion(-)

--- a/Documentation/x86_64/boot-options.txt
+++ b/Documentation/x86_64/boot-options.txt
@@ -110,7 +110,7 @@ Idle loop
 
 Rebooting
 
-   reboot=b[ios] | t[riple] | k[bd] | a[cpi] [, [w]arm | [c]old]
+   reboot=b[ios] | t[riple] | k[bd] | a[cpi] | e[fi] [, [w]arm | [c]old]
bios  Use the CPU reboot vector for warm reset
warm   Don't set the cold reboot flag
cold   Set the cold reboot flag
@@ -119,6 +119,9 @@ Rebooting
acpi   Use the ACPI RESET_REG in the FADT. If ACPI is not configured or the
   ACPI reset does not work, the reboot path attempts the reset using
   the keyboard controller.
+   efiUse efi reset_system runtime service. If EFI is not configured or the
+  EFI reset does not work, the reboot path attempts the reset using
+  the keyboard controller.
 
Using warm reset will be much faster especially on big memory
systems because the BIOS will not go through the memory check.
@@ -303,4 +306,8 @@ Debugging
newfallback: use new unwinder but fall back to old if it gets
stuck (default)
 
+EFI
+
+  noefiDisable EFI support
+
 Miscellaneous
--- a/Documentation/x86_64/uefi.txt
+++ b/Documentation/x86_64/uefi.txt
@@ -19,6 +19,10 @@ Mechanics:
 - Build the kernel with the following configuration.
CONFIG_FB_EFI=y
CONFIG_FRAMEBUFFER_CONSOLE=y
+  If EFI runtime services are expected, the following configuration should
+  be selected.
+   CONFIG_EFI=y
+   CONFIG_EFI_VARS=y or m  # optional
 - Create a VFAT partition on the disk
 - Copy the following to the VFAT partition:
elilo bootloader with x86_64 support, elilo configuration file,
@@ -27,3 +31,8 @@ Mechanics:
can be found in the elilo sourceforge project.
 - Boot to EFI shell and invoke elilo choosing the kernel image built
   in first step.
+- If some or all EFI runtime services don't work, you can try following
+  kernel command line parameters to turn off some or all EFI runtime
+  services.
+   noefi   turn off all EFI runtime services
+   reboot_type=k   turn off EFI reboot runtime service
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 0/4 -v6] x86_64 EFI runtime service support

2007-11-26 Thread Huang, Ying
Following patchset adds EFI/UEFI (Unified Extensible Firmware
Interface) runtime services support to x86_64 architecture.

The patchset have been tested against 2.6.24-rc3-mm1 kernel on Intel
platforms with 64-bit EFI1.10 and UEFI2.0 firmware. Because the
duplicated code between efi_32.c and efi_64.c is removed, the patchset
is also tested on Intel platform with 32-bit EFI firmware.


v6:

- Fix a bug about runtime service memory mapping.

- Rebase on 2.6.24-rc3-mm1

v5:

- Remove the duplicated code between efi_32.c and efi_64.c.

- Rename lin2winx to efi_callx.

- Make EFI time runtime service default to off.

- Use different bootloader signature for EFI32 and EFI64, so that
  kernel can know whether underlaying EFI firmware is 64-bit or
  32-bit.

v4:

- EFI boot parameters are extended for 64-bit EFI in a 32-bit EFI
  compatible way.

- Add EFI runtime services document.

v3:

- Remove E820_RUNTIME_CODE, the EFI memory map is used to deal with
  EFI runtime code area.

- The method used to make EFI runtime code area executable is change:

  a. Before page allocation is usable, the PMD of direct mapping is
 changed temporarily before and after each EFI call.

  b. After page allocation is usable, change_page_attr_addr is used to
 change corresponding page attribute.

- Use fixmap to map EFI memory mapped IO memory area to make kexec
  workable.

- Add a kernel command line option noefi to make it possible to turn
  off EFI runtime services support.

- Function pointers are used for EFI time runtime service.

- EFI reboot runtime service is embedded into the framework of
  reboot_type.

- A kernel command line option noefi_time is added to make it
  possible to fall back to CMOS based implementation.

v2:

- The EFI callwrapper is re-implemented in assembler.


Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 1/4 -v6] x86_64 EFI runtime service support: EFI basic runtime service support

2007-11-26 Thread Huang, Ying
This patch adds basic runtime services support for EFI x86_64
system. The main file of the patch is the addition of efi_64.c for
x86_64. This file is modeled after the EFI IA32 avatar. EFI runtime
services initialization are implemented in efi_64.c. Some x86_64
specifics are worth noting here. On x86_64, parameters passed to EFI
firmware services need to follow the EFI calling convention. For this
purpose, a set of functions named efi_callx (x is the number of
parameters) are implemented. EFI function calls are wrapped before
calling the firmware service. The duplicated code between efi_32.c and
efi_64.c is placed in efi.c to remove them from efi_32.c.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/Kconfig  |2 
 arch/x86/kernel/Makefile_64   |1 
 arch/x86/kernel/efi.c |  484 ++
 arch/x86/kernel/efi_64.c  |  171 ++
 arch/x86/kernel/efi_stub_64.S |   68 +
 arch/x86/kernel/setup_64.c|   17 +
 include/asm-x86/bootparam.h   |5 
 include/asm-x86/efi.h |   70 ++
 include/asm-x86/fixmap_64.h   |3 
 9 files changed, 817 insertions(+), 4 deletions(-)

--- /dev/null
+++ b/arch/x86/kernel/efi_64.c
@@ -0,0 +1,171 @@
+/*
+ * x86_64 specific EFI support functions
+ * Based on Extensible Firmware Interface Specification version 1.0
+ *
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu [EMAIL PROTECTED]
+ * Bibo Mao [EMAIL PROTECTED]
+ * Chandramouli Narayanan [EMAIL PROTECTED]
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ * Code to convert EFI to E820 map has been implemented in elilo bootloader
+ * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table
+ * is setup appropriately for EFI runtime code.
+ * - mouli 06/14/2007.
+ *
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/mm.h
+#include linux/types.h
+#include linux/spinlock.h
+#include linux/bootmem.h
+#include linux/ioport.h
+#include linux/module.h
+#include linux/efi.h
+#include linux/uaccess.h
+#include linux/io.h
+#include linux/reboot.h
+
+#include asm/setup.h
+#include asm/page.h
+#include asm/e820.h
+#include asm/pgtable.h
+#include asm/tlbflush.h
+#include asm/cacheflush.h
+#include asm/proto.h
+#include asm/efi.h
+
+static pgd_t save_pgd __initdata;
+static unsigned long efi_flags __initdata;
+/* efi_lock protects efi physical mode call */
+static __initdata DEFINE_SPINLOCK(efi_lock);
+
+static int __init setup_noefi(char *arg)
+{
+   efi_enabled = 0;
+   return 0;
+}
+early_param(noefi, setup_noefi);
+
+static void __init early_mapping_set_exec(unsigned long start,
+ unsigned long end,
+ int executable)
+{
+   pte_t *kpte;
+
+   while (start  end) {
+   kpte = lookup_address((unsigned long)__va(start));
+   BUG_ON(!kpte);
+   if (executable)
+   set_pte(kpte, pte_mkexec(*kpte));
+   else
+   set_pte(kpte, __pte((pte_val(*kpte) | _PAGE_NX)  \
+   __supported_pte_mask));
+   if (pte_huge(*kpte))
+   start = (start + PMD_SIZE)  PMD_MASK;
+   else
+   start = (start + PAGE_SIZE)  PAGE_MASK;
+   }
+}
+
+static void __init early_runtime_code_mapping_set_exec(int executable)
+{
+   efi_memory_desc_t *md;
+   void *p;
+
+   /* Make EFI runtime service code area executable */
+   for (p = memmap.map; p  memmap.map_end; p += memmap.desc_size) {
+   md = p;
+   if (md-type == EFI_RUNTIME_SERVICES_CODE) {
+   unsigned long end;
+   end = md-phys_addr + (md-num_pages  PAGE_SHIFT);
+   early_mapping_set_exec(md-phys_addr, end, executable);
+   }
+   }
+}
+
+void __init efi_call_phys_prelog(void) __acquires(efi_lock)
+{
+   unsigned long vaddress;
+
+   /*
+* Lock sequence is different from normal case because
+* efi_flags is global
+*/
+   spin_lock(efi_lock);
+   local_irq_save(efi_flags);
+   early_runtime_code_mapping_set_exec(1);
+   vaddress = (unsigned long)__va(0x0UL);
+   pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
+   set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
+   global_flush_tlb();
+}
+
+void __init efi_call_phys_epilog(void) __releases(efi_lock)
+{
+   /*
+* After the lock is released, the original page table is restored.
+*/
+   set_pgd(pgd_offset_k(0x0UL), save_pgd);
+   early_runtime_code_mapping_set_exec(0);
+   global_flush_tlb();
+   local_irq_restore(efi_flags);
+   spin_unlock(efi_lock);
+}
+
+/*
+ * We need to map the EFI memory map again after init_memory_mapping().
+ */
+void __init efi_map_memmap

[PATCH -mm 2/4 -v6] x86_64 EFI runtime service support: EFI runtime services

2007-11-26 Thread Huang, Ying
This patch adds support for several EFI runtime services for EFI
x86_64 system.

The EFI support for emergency_restart is added.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/reboot_64.c |   20 +---
 include/asm-x86/emergency-restart.h |9 +
 2 files changed, 22 insertions(+), 7 deletions(-)

--- a/arch/x86/kernel/reboot_64.c
+++ b/arch/x86/kernel/reboot_64.c
@@ -9,6 +9,7 @@
 #include linux/pm.h
 #include linux/kdebug.h
 #include linux/sched.h
+#include linux/efi.h
 #include acpi/reboot.h
 #include asm/io.h
 #include asm/delay.h
@@ -27,20 +28,17 @@ void (*pm_power_off)(void);
 EXPORT_SYMBOL(pm_power_off);
 
 static long no_idt[3];
-static enum { 
-   BOOT_TRIPLE = 't',
-   BOOT_KBD = 'k',
-   BOOT_ACPI = 'a'
-} reboot_type = BOOT_KBD;
+enum reboot_type reboot_type = BOOT_KBD;
 static int reboot_mode = 0;
 int reboot_force;
 
-/* reboot=t[riple] | k[bd] [, [w]arm | [c]old]
+/* reboot=t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
warm   Don't set the cold reboot flag
cold   Set the cold reboot flag
triple Force a triple fault (init)
kbdUse the keyboard controller. cold reset (default)
acpi   Use the RESET_REG in the FADT
+   efiUse efi reset_system runtime service
force  Avoid anything that could hang.
  */ 
 static int __init reboot_setup(char *str)
@@ -59,6 +57,7 @@ static int __init reboot_setup(char *str
case 'a':
case 'b':
case 'k':
+   case 'e':
reboot_type = *str;
break;
case 'f':
@@ -151,7 +150,14 @@ void machine_emergency_restart(void)
acpi_reboot();
reboot_type = BOOT_KBD;
break;
-   }  
+
+   case BOOT_EFI:
+   if (efi_enabled)
+   efi.reset_system(reboot_mode ? EFI_RESET_WARM : 
EFI_RESET_COLD,
+EFI_SUCCESS, 0, NULL);
+   reboot_type = BOOT_KBD;
+   break;
+   }
}  
 }
 
--- a/include/asm-x86/emergency-restart.h
+++ b/include/asm-x86/emergency-restart.h
@@ -1,6 +1,15 @@
 #ifndef _ASM_EMERGENCY_RESTART_H
 #define _ASM_EMERGENCY_RESTART_H
 
+enum reboot_type {
+   BOOT_TRIPLE = 't',
+   BOOT_KBD = 'k',
+   BOOT_ACPI = 'a',
+   BOOT_EFI = 'e'
+};
+
+extern enum reboot_type reboot_type;
+
 extern void machine_emergency_restart(void);
 
 #endif /* _ASM_EMERGENCY_RESTART_H */
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm 4/4 -v6] x86_64 EFI runtime service support: remove duplicated code from efi_32.c

2007-11-26 Thread Huang, Ying
This patch removes the duplicated code between efi_32.c and efi.c.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/Makefile_32 |2 
 arch/x86/kernel/e820_32.c   |5 
 arch/x86/kernel/efi_32.c|  430 
 arch/x86/kernel/setup_32.c  |   11 -
 include/asm-x86/efi.h   |   42 
 5 files changed, 47 insertions(+), 443 deletions(-)

--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -35,7 +35,7 @@ obj-$(CONFIG_KPROBES) += kprobes_32.o
 obj-$(CONFIG_MODULES)  += module_32.o
 obj-y  += sysenter_32.o vsyscall_32.o
 obj-$(CONFIG_ACPI_SRAT)+= srat_32.o
-obj-$(CONFIG_EFI)  += efi_32.o efi_stub_32.o
+obj-$(CONFIG_EFI)  += efi.o efi_32.o efi_stub_32.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault_32.o
 obj-$(CONFIG_VM86) += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
--- a/arch/x86/kernel/efi_32.c
+++ b/arch/x86/kernel/efi_32.c
@@ -39,21 +39,8 @@
 #include asm/desc.h
 #include asm/tlbflush.h
 
-#define EFI_DEBUG  0
 #define PFXEFI: 
 
-extern efi_status_t asmlinkage efi_call_phys(void *, ...);
-
-struct efi efi;
-EXPORT_SYMBOL(efi);
-static struct efi efi_phys;
-struct efi_memory_map memmap;
-
-/*
- * We require an early boot_ioremap mapping mechanism initially
- */
-extern void * boot_ioremap(unsigned long, unsigned long);
-
 /*
  * To make EFI call EFI runtime service in physical addressing mode we need
  * prelog/epilog before/after the invocation to disable interrupt, to
@@ -65,7 +52,7 @@ static unsigned long efi_rt_eflags;
 static DEFINE_SPINLOCK(efi_rt_lock);
 static pgd_t efi_bak_pg_dir_pointer[2];
 
-static void efi_call_phys_prelog(void) __acquires(efi_rt_lock)
+void efi_call_phys_prelog(void) __acquires(efi_rt_lock)
 {
unsigned long cr4;
unsigned long temp;
@@ -108,7 +95,7 @@ static void efi_call_phys_prelog(void) _
load_gdt(gdt_descr);
 }
 
-static void efi_call_phys_epilog(void) __releases(efi_rt_lock)
+void efi_call_phys_epilog(void) __releases(efi_rt_lock)
 {
unsigned long cr4;
struct Xgt_desc_struct gdt_descr;
@@ -138,87 +125,6 @@ static void efi_call_phys_epilog(void) _
spin_unlock(efi_rt_lock);
 }
 
-static efi_status_t
-phys_efi_set_virtual_address_map(unsigned long memory_map_size,
-unsigned long descriptor_size,
-u32 descriptor_version,
-efi_memory_desc_t *virtual_map)
-{
-   efi_status_t status;
-
-   efi_call_phys_prelog();
-   status = efi_call_phys(efi_phys.set_virtual_address_map,
-memory_map_size, descriptor_size,
-descriptor_version, virtual_map);
-   efi_call_phys_epilog();
-   return status;
-}
-
-static efi_status_t
-phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
-{
-   efi_status_t status;
-
-   efi_call_phys_prelog();
-   status = efi_call_phys(efi_phys.get_time, tm, tc);
-   efi_call_phys_epilog();
-   return status;
-}
-
-inline int efi_set_rtc_mmss(unsigned long nowtime)
-{
-   int real_seconds, real_minutes;
-   efi_status_tstatus;
-   efi_time_t  eft;
-   efi_time_cap_t  cap;
-
-   spin_lock(efi_rt_lock);
-   status = efi.get_time(eft, cap);
-   spin_unlock(efi_rt_lock);
-   if (status != EFI_SUCCESS)
-   panic(Ooops, efitime: can't read time!\n);
-   real_seconds = nowtime % 60;
-   real_minutes = nowtime / 60;
-
-   if (((abs(real_minutes - eft.minute) + 15)/30)  1)
-   real_minutes += 30;
-   real_minutes %= 60;
-
-   eft.minute = real_minutes;
-   eft.second = real_seconds;
-
-   if (status != EFI_SUCCESS) {
-   printk(Ooops: efitime: can't read time!\n);
-   return -1;
-   }
-   return 0;
-}
-/*
- * This is used during kernel init before runtime
- * services have been remapped and also during suspend, therefore,
- * we'll need to call both in physical and virtual modes.
- */
-inline unsigned long efi_get_time(void)
-{
-   efi_status_t status;
-   efi_time_t eft;
-   efi_time_cap_t cap;
-
-   if (efi.get_time) {
-   /* if we are in virtual mode use remapped function */
-   status = efi.get_time(eft, cap);
-   } else {
-   /* we are in physical mode */
-   status = phys_efi_get_time(eft, cap);
-   }
-
-   if (status != EFI_SUCCESS)
-   printk(Oops: efitime: can't read time status: 0x%lx\n,status);
-
-   return mktime(eft.year, eft.month, eft.day, eft.hour,
-   eft.minute, eft.second);
-}
-
 int is_available_memory(efi_memory_desc_t * md)
 {
if (!(md-attribute  EFI_MEMORY_WB))
@@ -250,24 +156,6 @@ void __init efi_map_memmap(void)
memmap.map_end = memmap.map

Re: [PATCH -mm 1/4 -v6] x86_64 EFI runtime service support: EFI basic runtime service support

2007-11-27 Thread Huang, Ying
 for posterity
  +*/
  +   c16 = tmp = efi_early_ioremap(efi.systab-fw_vendor, 2);
  +   if (c16) {
  +   for (i = 0; i  sizeof(vendor)  *c16; ++i)
  +   vendor[i] = *c16++;
  +   vendor[i] = '\0';
  +   } else
  +   printk(KERN_ERR Could not map the firmware vendor!\n);
 
 That would be a very confusing error message to any poor soul who received
 it.  Please consider prefixing all such things with (say) efi: .

I will do it.

  +/*
  + * This function will switch the EFI runtime services to virtual mode.
  + * Essentially, look through the EFI memmap and map every region that
  + * has the runtime attribute bit set in its memory descriptor and update
  + * that memory descriptor with the virtual address obtained from ioremap().
  + * This enables the runtime services to be called without having to
  + * thunk back into physical mode for every invocation.
  + */
  +void __init efi_enter_virtual_mode(void)
  +{
  +   efi_memory_desc_t *md;
  +   efi_status_t status;
  +   unsigned long end;
  +   void *p;
  +
  +   efi.systab = NULL;
  +   for (p = memmap.map; p  memmap.map_end; p += memmap.desc_size) {
  +   md = p;
  +   if (!(md-attribute  EFI_MEMORY_RUNTIME))
  +   continue;
  +   if ((md-attribute  EFI_MEMORY_WB) 
  +   (((md-phys_addr + (md-num_pagesEFI_PAGE_SHIFT)) 
  + PAGE_SHIFT)  end_pfn_map))
  +   md-virt_addr = (unsigned long)__va(md-phys_addr);
  +   else
  +   md-virt_addr = (unsigned long)
  +   efi_ioremap(md-phys_addr,
  +   md-num_pages  EFI_PAGE_SHIFT);
  +   if (!md-virt_addr)
  +   printk(KERN_ERR ioremap of 0x%llX failed!\n,
  +  (unsigned long long)md-phys_addr);
  +   end = md-phys_addr + (md-num_pages  EFI_PAGE_SHIFT);
  +   if ((md-phys_addr = (unsigned long)efi_phys.systab) 
  +   ((unsigned long)efi_phys.systab  end))
  +   efi.systab = (efi_system_table_t *)(unsigned long)
  +   (md-virt_addr - md-phys_addr +
  +(unsigned long)efi_phys.systab);
  +   }
  +
  +   BUG_ON(!efi.systab);
  +
  +   status = phys_efi_set_virtual_address_map(
  +   memmap.desc_size * memmap.nr_map,
  +   memmap.desc_size,
  +   memmap.desc_version,
  +   memmap.phys_map);
  +
  +   if (status != EFI_SUCCESS) {
  +   printk(KERN_ALERT You are screwed! 
 
 This came over when you copied the original file.  This patchset would be a
 decent opportunity to de-stupid these messages.  Frankly.

I will do it. And I will recheck all messages.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm] x86_64 EFI runtime service support: EFI basic runtime service support fixes

2007-11-28 Thread Huang, Ying
This patch fixes several issues of x86_64 EFI basic runtime service
support patch per comments from Andrew Moton.

- Delete efi_lock because it is used during system early boot, before
  smp is initialized. The global_flush_tlb() is changed to
  __flush_tlb_all for some reason.

- Revise some messages.

- Turn on debug by default.

- Remove unnecessary memset of static variable.


This patch has been tested against 2.6.24-rc3-mm1 kernel on Intel
platforms with 64-bit EFI1.10 and UEFI2.0 firmware.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/efi.c|   24 ++--
 arch/x86/kernel/efi_64.c |   18 +-
 2 files changed, 15 insertions(+), 27 deletions(-)

--- a/arch/x86/kernel/efi_64.c
+++ b/arch/x86/kernel/efi_64.c
@@ -39,8 +39,6 @@
 
 static pgd_t save_pgd __initdata;
 static unsigned long efi_flags __initdata;
-/* efi_lock protects efi physical mode call */
-static __initdata DEFINE_SPINLOCK(efi_lock);
 
 static int __init setup_noefi(char *arg)
 {
@@ -86,33 +84,27 @@ static void __init early_runtime_code_ma
}
 }
 
-void __init efi_call_phys_prelog(void) __acquires(efi_lock)
+void __init efi_call_phys_prelog(void)
 {
unsigned long vaddress;
 
-   /*
-* Lock sequence is different from normal case because
-* efi_flags is global
-*/
-   spin_lock(efi_lock);
local_irq_save(efi_flags);
early_runtime_code_mapping_set_exec(1);
vaddress = (unsigned long)__va(0x0UL);
pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
-   global_flush_tlb();
+   __flush_tlb_all();
 }
 
-void __init efi_call_phys_epilog(void) __releases(efi_lock)
+void __init efi_call_phys_epilog(void)
 {
/*
 * After the lock is released, the original page table is restored.
 */
set_pgd(pgd_offset_k(0x0UL), save_pgd);
early_runtime_code_mapping_set_exec(0);
-   global_flush_tlb();
+   __flush_tlb_all();
local_irq_restore(efi_flags);
-   spin_unlock(efi_lock);
 }
 
 /*
@@ -143,7 +135,7 @@ void __init runtime_code_page_mkexec(voi
  md-num_pages,
  PAGE_KERNEL_EXEC);
}
-   global_flush_tlb();
+   __flush_tlb_all();
 }
 
 void __iomem * __init efi_ioremap(unsigned long offset,
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -41,7 +41,8 @@
 #include asm/efi.h
 #include asm/time.h
 
-#define EFI_DEBUG  0
+#define EFI_DEBUG  1
+#define PFXEFI: 
 
 int efi_enabled;
 EXPORT_SYMBOL(efi_enabled);
@@ -214,7 +215,7 @@ static void __init print_efi_memmap(void
 p  memmap.map_end;
 p += memmap.desc_size, i++) {
md = p;
-   printk(KERN_INFO mem%02u: type=%u, attr=0x%llx, 
+   printk(KERN_INFO PFX mem%02u: type=%u, attr=0x%llx, 
range=[0x%016llx-0x%016llx) (%lluMB)\n,
i, md-type, md-attribute, md-phys_addr,
md-phys_addr + (md-num_pages  EFI_PAGE_SHIFT),
@@ -232,9 +233,6 @@ void __init efi_init(void)
int i = 0;
void *tmp;
 
-   memset(efi, 0, sizeof(efi));
-   memset(efi_phys, 0, sizeof(efi_phys));
-
 #ifdef CONFIG_X86_32
efi_phys.systab = (efi_system_table_t *)boot_params.efi_info.efi_systab;
memmap.phys_map = (void *)boot_params.efi_info.efi_memmap;
@@ -254,7 +252,7 @@ void __init efi_init(void)
efi.systab = efi_early_ioremap((unsigned long)efi_phys.systab,
   sizeof(efi_system_table_t));
if (efi.systab == NULL)
-   printk(KERN_ERR Woah! Couldn't map the EFI systema table.\n);
+   printk(KERN_ERR Couldn't map the EFI system table!\n);
memcpy(efi_systab, efi.systab, sizeof(efi_system_table_t));
efi_early_iounmap(efi.systab, sizeof(efi_system_table_t));
efi.systab = efi_systab;
@@ -263,11 +261,10 @@ void __init efi_init(void)
 * Verify the EFI Table
 */
if (efi.systab-hdr.signature != EFI_SYSTEM_TABLE_SIGNATURE)
-   printk(KERN_ERR Woah! EFI system table 
-  signature incorrect\n);
+   printk(KERN_ERR EFI system table signature incorrect!\n);
if ((efi.systab-hdr.revision  16) == 0)
printk(KERN_ERR Warning: EFI system table version 
-  %d.%02d, expected 1.00 or greater\n,
+  %d.%02d, expected 1.00 or greater!\n,
   efi.systab-hdr.revision  16,
   efi.systab-hdr.revision  0x);
 
@@ -280,7 +277,7 @@ void __init efi_init(void)
vendor[i] = *c16++;
vendor[i] = '\0';
} else
-   printk(KERN_ERR Could not map the firmware vendor!\n);
+   printk(KERN_ERR PFX Could

Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

2007-12-10 Thread Huang, Ying
On Mon, 2007-12-10 at 14:55 -0500, Vivek Goyal wrote:
 On Fri, Dec 07, 2007 at 03:53:30PM +, Huang, Ying wrote:
  This patch implements the functionality of jumping between the kexeced
  kernel and the original kernel.
  
 
 Hi,
 
 I am just going through your patches and trying to understand it. Don't
 understand many things. Asking is easy so here you go...
 
  To support jumping between two kernels, before jumping to (executing)
  the new kernel and jumping back to the original kernel, the devices
  are put into quiescent state, and the state of devices and CPU is
  saved. After jumping back from kexeced kernel and jumping to the new
  kernel, the state of devices and CPU are restored accordingly. The
  devices/CPU state save/restore code of software suspend is called to
  implement corresponding function.
  
 
 I need jumping back to restore a already hibernated kernel image? Can
 you please tell little more about jumping back and why it is needed?

Now, the jumping back is used to implement kexec based hibernation,
which uses kexec/kdump to save the memory image of hibernated kernel
during hibernating, and uses /dev/oldmem to restore the memory image of
hibernated kernel and jump back to the hibernated kernel to continue
run.

The other usage model maybe include:

- Dump the system memory image then continue to run, that is, get some
memory snapshot of system during system running.
- Cooperative multi-task of different OS. You can load another OS (B)
from current OS (A), and jump between the two OSes upon needed.
- Call some code (such as firmware, etc) in physical mode. 

  To support jumping without reserving memory. One shadow backup page
  (source page) is allocated for each page used by new (kexeced) kernel
  (destination page). When do kexec_load, the image of new kernel is
  loaded into source pages, and before executing, the destination pages
  and the source pages are swapped, so the contents of destination pages
  are backupped. Before jumping to the new (kexeced) kernel and after
  jumping back to the original kernel, the destination pages and the
  source pages are swapped too.
  
 
 Ok, so due to swapping of source and destination pages first kernel's data
 is still preserved.  How do I get the dynamic memory required for second
 kernel boot (without writing first kernel's data)?

All dynamic memory required for second kernel should be loaded by
sys_kexec_load in first kernel. For example, not only the Linux kernel
should be loaded at 1M, the memory 0~16M (exclude kernel) should be
loaded (all zero) by /sbin/kexec via sys_kexec_load too.

  A jump back protocol for kexec is defined and documented. It is an
  extension to ordinary function calling protocol. So, the facility
  provided by this patch can be used to call ordinary C function in real
  mode.
  
  A set of flags for sys_kexec_load are added to control which state are
  saved/restored before/after real mode code executing. For example, you
  can specify the device state and FPU state are saved/restored
  before/after real mode code executing.
  
  The states (exclude CPU state) save/restore code can be overridden
  based on the command parameter of kexec jump. Because more states
  need to be saved/restored by hibernating/resuming.
  
  Signed-off-by: Huang Ying [EMAIL PROTECTED]
  
  ---
   Documentation/i386/jump_back_protocol.txt |  103 ++
   arch/powerpc/kernel/machine_kexec.c   |2 
   arch/ppc/kernel/machine_kexec.c   |2 
   arch/sh/kernel/machine_kexec.c|2 
   arch/x86/kernel/machine_kexec_32.c|   88 +---
   arch/x86/kernel/machine_kexec_64.c|2 
   arch/x86/kernel/relocate_kernel_32.S  |  214 
  +++---
   include/asm-x86/kexec_32.h|   39 -
   include/linux/kexec.h |   40 +
   kernel/kexec.c|  188 ++
   kernel/power/Kconfig  |2 
   kernel/sys.c  |   35 +++-
   12 files changed, 648 insertions(+), 69 deletions(-)
  
  --- a/arch/x86/kernel/machine_kexec_32.c
  +++ b/arch/x86/kernel/machine_kexec_32.c
  @@ -20,6 +20,7 @@
   #include asm/cpufeature.h
   #include asm/desc.h
   #include asm/system.h
  +#include asm/cacheflush.h
   
   #define PAGE_ALIGNED __attribute__ ((__aligned__(PAGE_SIZE)))
   static u32 kexec_pgd[1024] PAGE_ALIGNED;
  @@ -83,10 +84,14 @@ static void load_segments(void)
* reboot code buffer to allow us to avoid allocations
* later.
*
  - * Currently nothing.
  + * Turn off NX bit for control page.
*/
   int machine_kexec_prepare(struct kimage *image)
   {
  +   if (nx_enabled) {
  +   change_page_attr(image-control_code_page, 1, PAGE_KERNEL_EXEC);
  +   global_flush_tlb();
  +   }
  return 0;
   }
   
  @@ -96,25 +101,59 @@ int machine_kexec_prepare(struct kimage 
*/
   void machine_kexec_cleanup(struct kimage *image

Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

2007-12-10 Thread Huang, Ying
On Mon, 2007-12-10 at 17:31 -0500, Vivek Goyal wrote:
 [..]
   
  -#define KEXEC_ON_CRASH  0x0001
  -#define KEXEC_ARCH_MASK 0x
  +#define KEXEC_ON_CRASH 0x0001
  +#define KEXEC_PRESERVE_CPU 0x0002
  +#define KEXEC_PRESERVE_CPU_EXT 0x0004
  +#define KEXEC_SINGLE_CPU   0x0008
  +#define KEXEC_PRESERVE_DEVICE  0x0010
  +#define KEXEC_PRESERVE_CONSOLE 0x0020
 
 Hi,
 
 Why do we need so many different flags for preserving different types
 of state (CPU, CPU_EXT, Device, console) ? To keep things simple,
 can't we can create just one flag KEXEC_PRESERVE_CONTEXT, which will
 indicate any special action required for preserving the previous kernel's
 context so that one can swith back to old kernel?

Yes. There are too many flags, especially when we have no users of these
flags now. It is better to use one flag such as KEXEC_PRESERVE_CONTEXT
now, and create the others required flags when really needed.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

2007-12-10 Thread Huang, Ying
On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote:
 Huang, Ying [EMAIL PROTECTED] writes:
[...]
   /*
* Do not allocate memory (or fail in any way) in machine_kexec().
* We are past the point of no return, committed to rebooting now.
*/
  -NORET_TYPE void machine_kexec(struct kimage *image)
  +int machine_kexec_vcall(struct kimage *image, unsigned long *ret,
  +unsigned int argc, va_list args)
   {
 
 Why do we need var arg support?
 Can't we do that with a shim we load from user space?

If all parameters are provided in user space, the usage model may be as
follow:

- sys_kexec_load() /* with executable/data/parameters(A) loaded */
- sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with 
parameters(A)*/
- /* jump back */
- sys_kexec_load() /* with executable/data/parameters(B) loaded */
- sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with 
parameters(B)*/
- /* jump back */

That is, the kexec image should be re-loaded if the parameters are
different, and there can be no state reserved in kexec image. This is OK
for original kexec implementation, because there is no jumping back.
But, for kexec with jumping back, another usage model may be useful too.

- sys_kexec_load() /* with executable/data loaded */
- sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical mode 
code with parameters(A)*/
- sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical mode 
code with parameters(B)*/

This way the kexec image need not to be re-loaded, and the state of
kexec image can be reserved across several invoking.


Another usage model may be useful is invoking the kexec image (such as
firmware) from kernel space.

- kmalloc the needed memory and loaded the firmware image (if needed)
- sys_kexec_load() with a fake image (one segment with size 0), the
entry point of the fake image is the entry point of the firmware image.
- kexec_call(fake_image, ...) /* maybe change entry point if needed */

This way, some kernel code can invoke the firmware in physical mode just
like invoking an ordinary function.

[...]
  -   /* The segment registers are funny things, they have both a
  -* visible and an invisible part.  Whenever the visible part is
  -* set to a specific selector, the invisible part is loaded
  -* with from a table in memory.  At no other time is the
  -* descriptor table in memory accessed.
  -*
  -* I take advantage of this here by force loading the
  -* segments, before I zap the gdt with an invalid value.
  -*/
  -   load_segments();
  -   /* The gdt  idt are now invalid.
  -* If you want to load them you must set up your own idt  gdt.
  -*/
  -   set_gdt(phys_to_virt(0),0);
  -   set_idt(phys_to_virt(0),0);
  +   if (image-preserve_cpu_ext) {
  +   /* The segment registers are funny things, they have
  +* both a visible and an invisible part.  Whenever the
  +* visible part is set to a specific selector, the
  +* invisible part is loaded with from a table in
  +* memory.  At no other time is the descriptor table
  +* in memory accessed.
  +*
  +* I take advantage of this here by force loading the
  +* segments, before I zap the gdt with an invalid
  +* value.
  +*/
  +   load_segments();
  +   /* The gdt  idt are now invalid.  If you want to load
  +* them you must set up your own idt  gdt.
  +*/
  +   set_gdt(phys_to_virt(0), 0);
  +   set_idt(phys_to_virt(0), 0);
  +   }
 
 We can't keep the same idt and gdt as the pages they are on will be
 overwritten/reused.  So explictily stomping on them sounds better
 so they never work.  We can restore them on kernel reentry.

The original idea about this code is:

If the kexec image is claimed that it need not to perserving extensive
CPU state (such as FPU/MMX/GDT/LDT/IDT/CS/DS/ES/FS/GS/SS etc), the
IDT/GDT/CS/DS/ES/FS/GS/SS are not touched in kexec image code. So the
segment registers need not to be set.

But this is not clear. At least more description should be provided for
each preserve flag.

  /* now call it */
  -   relocate_kernel((unsigned long)image-head, (unsigned long)page_list,
  -   image-start, cpu_has_pae);
  +   relocate_kernel_ptr((unsigned long)image-head,
  +   (unsigned long)page_list,
  +   image-start, cpu_has_pae);
 
 Why rename relocate_kernel?
 Ah.  I see.  You need to make it into a pointer again.  The crazy don't
 stop the pgd support strikes again.  It used to be named rnk.

You mean I should change the function pointer name to rnk to keep
consistency? I find rnk in IA64 implementation.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info

[PATCH -mm] x86 boot : Use E820 memory map on EFI 32 platform

2007-12-11 Thread Huang, Ying
Because the EFI memory map are converted to e820 memory map in
bootloader, the EFI memory map handling code is removed to clean up.

This patch is based on 2.6.24-rc4-mm1 and has been tested on Intel
32-bit platform with EFI 32 and UEFI 32 firmware.
 
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/e820_32.c  |  117 +++
 arch/x86/kernel/efi_32.c   |  150 -
 arch/x86/kernel/setup_32.c |   16 +---
 arch/x86/mm/init_32.c  |   18 -
 include/asm-x86/e820_32.h  |2 
 5 files changed, 16 insertions(+), 287 deletions(-)

--- a/arch/x86/kernel/e820_32.c
+++ b/arch/x86/kernel/e820_32.c
@@ -7,7 +7,6 @@
 #include linux/kexec.h
 #include linux/module.h
 #include linux/mm.h
-#include linux/efi.h
 #include linux/pfn.h
 #include linux/uaccess.h
 #include linux/suspend.h
@@ -181,7 +180,7 @@ static void __init probe_roms(void)
  * Request address space for all standard RAM and ROM resources
  * and also for regions reported as reserved by the e820.
  */
-void __init legacy_init_iomem_resources(struct resource *code_resource,
+void __init init_iomem_resources(struct resource *code_resource,
struct resource *data_resource,
struct resource *bss_resource)
 {
@@ -261,19 +260,17 @@ void __init add_memory_region(unsigned l
 {
int x;
 
-   if (!efi_enabled) {
-   x = e820.nr_map;
+   x = e820.nr_map;
 
-   if (x == E820MAX) {
-   printk(KERN_ERR Ooops! Too many entries in the memory 
map!\n);
-   return;
-   }
-
-   e820.map[x].addr = start;
-   e820.map[x].size = size;
-   e820.map[x].type = type;
-   e820.nr_map++;
+   if (x == E820MAX) {
+   printk(KERN_ERR Ooops! Too many entries in the memory map!\n);
+   return;
}
+
+   e820.map[x].addr = start;
+   e820.map[x].size = size;
+   e820.map[x].type = type;
+   e820.nr_map++;
 } /* add_memory_region */
 
 /*
@@ -489,29 +486,6 @@ int __init copy_e820_map(struct e820entr
 }
 
 /*
- * Callback for efi_memory_walk.
- */
-static int __init
-efi_find_max_pfn(unsigned long start, unsigned long end, void *arg)
-{
-   unsigned long *max_pfn = arg, pfn;
-
-   if (start  end) {
-   pfn = PFN_UP(end -1);
-   if (pfn  *max_pfn)
-   *max_pfn = pfn;
-   }
-   return 0;
-}
-
-static int __init
-efi_memory_present_wrapper(unsigned long start, unsigned long end, void *arg)
-{
-   memory_present(0, PFN_UP(start), PFN_DOWN(end));
-   return 0;
-}
-
-/*
  * Find the highest page frame number we have available
  */
 void __init find_max_pfn(void)
@@ -519,11 +493,6 @@ void __init find_max_pfn(void)
int i;
 
max_pfn = 0;
-   if (efi_enabled) {
-   efi_memmap_walk(efi_find_max_pfn, max_pfn);
-   efi_memmap_walk(efi_memory_present_wrapper, NULL);
-   return;
-   }
 
for (i = 0; i  e820.nr_map; i++) {
unsigned long start, end;
@@ -541,34 +510,12 @@ void __init find_max_pfn(void)
 }
 
 /*
- * Free all available memory for boot time allocation.  Used
- * as a callback function by efi_memory_walk()
- */
-
-static int __init
-free_available_memory(unsigned long start, unsigned long end, void *arg)
-{
-   /* check max_low_pfn */
-   if (start = (max_low_pfn  PAGE_SHIFT))
-   return 0;
-   if (end = (max_low_pfn  PAGE_SHIFT))
-   end = max_low_pfn  PAGE_SHIFT;
-   if (start  end)
-   free_bootmem(start, end - start);
-
-   return 0;
-}
-/*
  * Register fully available low RAM pages with the bootmem allocator.
  */
 void __init register_bootmem_low_pages(unsigned long max_low_pfn)
 {
int i;
 
-   if (efi_enabled) {
-   efi_memmap_walk(free_available_memory, NULL);
-   return;
-   }
for (i = 0; i  e820.nr_map; i++) {
unsigned long curr_pfn, last_pfn, size;
/*
@@ -676,56 +623,12 @@ void __init print_memory_map(char *who)
}
 }
 
-static __init __always_inline void efi_limit_regions(unsigned long long size)
-{
-   unsigned long long current_addr = 0;
-   efi_memory_desc_t *md, *next_md;
-   void *p, *p1;
-   int i, j;
-
-   j = 0;
-   p1 = memmap.map;
-   for (p = p1, i = 0; p  memmap.map_end; p += memmap.desc_size, i++) {
-   md = p;
-   next_md = p1;
-   current_addr = md-phys_addr +
-   PFN_PHYS(md-num_pages);
-   if (is_available_memory(md)) {
-   if (md-phys_addr = size) continue;
-   memcpy(next_md, md, memmap.desc_size);
-   if (current_addr = size) {
-   next_md-num_pages

[PATCH -mm] x86 boot : export boot_params via sysfs

2007-12-11 Thread Huang, Ying
This patch export the boot parameters via sysfs. This can be used for
debugging and kexec.

The files added are as follow:

/sys/kernel/boot_params/data: binary file for struct boot_params
/sys/kernel/boot_params/version : boot protocol version

This patch is based on 2.6.24-rc4-mm1 and has been tested on i386 and
x86_64 platoform.

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/x86/kernel/Makefile_32 |1 
 arch/x86/kernel/Makefile_64 |1 
 arch/x86/kernel/ksysfs.c|   94 
 arch/x86/kernel/setup64.c   |2 
 arch/x86/kernel/setup_32.c  |2 
 5 files changed, 98 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/Makefile_64
+++ b/arch/x86/kernel/Makefile_64
@@ -39,6 +39,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp_64.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit_64.o
 obj-$(CONFIG_EFI)  += efi.o efi_64.o efi_stub_64.o
+obj-$(CONFIG_SYSFS)+= ksysfs.o
 
 obj-$(CONFIG_MODULES)  += module_64.o
 obj-$(CONFIG_PCI)  += early-quirks.o
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -24,7 +24,7 @@
 #include asm/sections.h
 #include asm/setup.h
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
--- /dev/null
+++ b/arch/x86/kernel/ksysfs.c
@@ -0,0 +1,94 @@
+/*
+ * arch/i386/ksysfs.c - architecture specific sysfs attributes in /sys/kernel
+ *
+ * Copyright (C) 2007, Intel Corp.
+ *  Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2
+ */
+
+#include linux/kobject.h
+#include linux/string.h
+#include linux/sysfs.h
+#include linux/init.h
+#include linux/stat.h
+#include linux/mm.h
+
+#include asm/setup.h
+
+static ssize_t boot_params_version_show(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+   return sprintf(buf, 0x%04x\n, boot_params.hdr.version);
+}
+
+static struct kobj_attribute boot_params_version_attr = {
+   .attr = {
+   .name = version,
+   .mode = S_IRUGO,
+   },
+   .show = boot_params_version_show,
+};
+
+static struct attribute *boot_params_attrs[] = {
+   boot_params_version_attr.attr,
+   NULL
+};
+
+static struct attribute_group boot_params_attr_group = {
+   .attrs = boot_params_attrs,
+};
+
+static ssize_t boot_params_data_read(struct kobject *kobj,
+struct bin_attribute *bin_attr,
+char *buf, loff_t off, size_t count)
+{
+   memcpy(buf, (void *)boot_params + off, count);
+   return count;
+}
+
+static struct bin_attribute boot_params_data_attr = {
+   .attr = {
+   .name = data,
+   .mode = S_IRUGO,
+   },
+   .read = boot_params_data_read,
+   .size = sizeof(boot_params),
+};
+
+static int __init boot_params_ksysfs_init(void)
+{
+   int error;
+   struct kobject *boot_params_kobj;
+
+   boot_params_kobj = kobject_create_and_register(boot_params,
+  kernel_kobj);
+   if (!boot_params_kobj) {
+   error = -ENOMEM;
+   goto err_return;
+   }
+   error = sysfs_create_group(boot_params_kobj,
+  boot_params_attr_group);
+   if (error)
+   goto err_boot_params_subsys_unregister;
+   error = sysfs_create_bin_file(boot_params_kobj,
+ boot_params_data_attr);
+   if (error)
+   goto err_boot_params_subsys_unregister;
+   return 0;
+err_boot_params_subsys_unregister:
+   kobject_unregister(boot_params_kobj);
+err_return:
+   return error;
+}
+
+static int __init arch_ksysfs_init(void)
+{
+   int error;
+
+   error = boot_params_ksysfs_init();
+
+   return error;
+}
+
+arch_initcall(arch_ksysfs_init);
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -44,6 +44,7 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o
+obj-$(CONFIG_SYSFS)+= ksysfs.o
 
 obj-$(CONFIG_VMI)  += vmi_32.o vmiclock_32.o
 obj-$(CONFIG_PARAVIRT) += paravirt_32.o
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -194,7 +194,7 @@ unsigned long saved_videomode;
 
 static char __initdata command_line[COMMAND_LINE_SIZE];
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
 struct edd edd;
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org

Re: [PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

2007-12-11 Thread Huang, Ying
On Tue, 2007-12-11 at 02:27 -0700, Eric W. Biederman wrote:
 Huang, Ying [EMAIL PROTECTED] writes:
 
  On Mon, 2007-12-10 at 19:25 -0700, Eric W. Biederman wrote:
  Huang, Ying [EMAIL PROTECTED] writes:
  [...]
/*
 * Do not allocate memory (or fail in any way) in machine_kexec().
 * We are past the point of no return, committed to rebooting now.
 */
   -NORET_TYPE void machine_kexec(struct kimage *image)
   +int machine_kexec_vcall(struct kimage *image, unsigned long *ret,
   + unsigned int argc, va_list args)
{
  
  Why do we need var arg support?
  Can't we do that with a shim we load from user space?
 
  If all parameters are provided in user space, the usage model may be as
  follow:
 
  - sys_kexec_load() /* with executable/data/parameters(A) loaded */
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with
  parameters(A)*/
  - /* jump back */
  - sys_kexec_load() /* with executable/data/parameters(B) loaded */
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with
  parameters(B)*/
  - /* jump back */
 
  That is, the kexec image should be re-loaded if the parameters are
  different, and there can be no state reserved in kexec image. This is OK
  for original kexec implementation, because there is no jumping back.
  But, for kexec with jumping back, another usage model may be useful too.
 
  - sys_kexec_load() /* with executable/data loaded */
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(A)) /* execute physical 
  mode
  code with parameters(A)*/
  - sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,parameters(B)) /* execute physical 
  mode
  code with parameters(B)*/
 
  This way the kexec image need not to be re-loaded, and the state of
  kexec image can be reserved across several invoking.
 
 Interesting.  We wind up preserving the code in between invocations.
 
 I don't know about your particular issue, but I can see that clearly
 we need a way to read values back from our target image.
 
 And if we can read everything back one way to proceed is to read
 everything out modify it and then write it back.
 
 Amending a kexec image that is already stored may also make sense.
 
 I'm not convinced that the var arg parameters make sense, but you
 added them because of a real need.
 
 The kexec function is split into two separate calls so that we can
 unmount the filesystem the kexec image comes from before actually
 doing the kexec.

Yes. Reading/Modifying the loaded kexec image is another way to do
necessary communication between the first kernel and the second kernel.
In fact, the patch [4/4] of this series with title:

[PATCH 4/4 -mm] kexec based hibernation -v7 : kimgcore

provide a ELF CORE file in /proc (/proc/kimgcore) to read the loaded
kexec image. The writing function can be added easily.

But I think communication between the first kernel and the second kernel
via reading/modifying the loaded kernel image is not very convenient
way. The usage mode may be as follow:

- sys_kexec_load() /* with executable/data loaded */
- modify the loaded kexec image to set the parameters (A)
- sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with 
parameters(A)*/
- In physical mode code, check the parameters A and executing accordingly
- modify the loaded kexec image to set the parameters (B)
- sys_reboot(,,LINUX_REBOOT_CMD_KEXEC,) /* execute physical mode code with 
parameters(B)*/
- In physical mode code, check the parameters B and executing accordingly

There are some issues with this usage model:

- Some parameters in kernel needed to be exported (such as the
kimage-head to let the second kernel to read the memory contents of
backupped memory).

- The physical mode code invoker (the first kernel) need to know where
to write the parameters. A common protocol or a protocol case by case
should be defined. For example, the memory address after the entry point
of kexec image is a good candidate. But for Linux kernel, there are two
types of entry point, the jump back entry or purgatory. Maybe
different protocol should be defined for these two types of entry point.

- For the user space of the second kernel to get the parameters. A
interface (maybe a file in /proc or /sys) should be provided to export
the parameters to user space.

So I think the current parameters passing mechanism may be more simple
and convenient (defined in Document/i386/jump_back_protocol.txt in the
patch).

There is only one user of var args. But I think it is simple to be
implemented and may be used by others.

 If extensive user space shutdown or startup is needed I will argue
 that doing the work in the sys_reboot call is the wrong place to
 do it.  Although if a jump back is happening we should not need
 much restart.

Now, the user space is not shut down or started up across kexec/jump
back, just the sys_reboot call is used to trigger the kexec/jump back.
Maybe sys_reboot is not the right place to do this. Can you recommended
a more

[PATCH -mm -v2] x86 boot : export boot_params via sysfs

2007-12-12 Thread Huang, Ying
This patch export the boot parameters via sysfs. This can be used for
debugging and kexec.

The files added are as follow:

/sys/kernel/boot_params/data: binary file for struct boot_params
/sys/kernel/boot_params/version : boot protocol version

This patch is based on 2.6.24-rc4-mm1 and has been tested on i386 and
x86_64 platoform.

This patch is based on the Peter Anvin's proposal.


v2:

- Add document in Document/ABI.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/ABI/testing/sysfs-kernel-boot_params |   14 +++
 arch/x86/kernel/Makefile_32|1 
 arch/x86/kernel/Makefile_64|1 
 arch/x86/kernel/ksysfs.c   |   89 +
 arch/x86/kernel/setup64.c  |2 
 arch/x86/kernel/setup_32.c |2 
 6 files changed, 107 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/Makefile_64
+++ b/arch/x86/kernel/Makefile_64
@@ -39,6 +39,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp_64.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit_64.o
 obj-$(CONFIG_EFI)  += efi.o efi_64.o efi_stub_64.o
+obj-$(CONFIG_SYSFS)+= ksysfs.o
 
 obj-$(CONFIG_MODULES)  += module_64.o
 obj-$(CONFIG_PCI)  += early-quirks.o
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -24,7 +24,7 @@
 #include asm/sections.h
 #include asm/setup.h
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
--- /dev/null
+++ b/arch/x86/kernel/ksysfs.c
@@ -0,0 +1,89 @@
+/*
+ * Architecture specific sysfs attributes in /sys/kernel
+ *
+ * Copyright (C) 2007, Intel Corp.
+ *  Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2
+ */
+
+#include linux/kobject.h
+#include linux/string.h
+#include linux/sysfs.h
+#include linux/init.h
+#include linux/stat.h
+#include linux/mm.h
+
+#include asm/setup.h
+
+static ssize_t boot_params_version_show(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+   return sprintf(buf, 0x%04x\n, boot_params.hdr.version);
+}
+
+static struct kobj_attribute boot_params_version_attr =
+   __ATTR(version, S_IRUGO, boot_params_version_show, NULL);
+
+static struct attribute *boot_params_attrs[] = {
+   boot_params_version_attr.attr,
+   NULL
+};
+
+static struct attribute_group boot_params_attr_group = {
+   .attrs = boot_params_attrs,
+};
+
+static ssize_t boot_params_data_read(struct kobject *kobj,
+struct bin_attribute *bin_attr,
+char *buf, loff_t off, size_t count)
+{
+   memcpy(buf, (void *)boot_params + off, count);
+   return count;
+}
+
+static struct bin_attribute boot_params_data_attr = {
+   .attr = {
+   .name = data,
+   .mode = S_IRUGO,
+   },
+   .read = boot_params_data_read,
+   .size = sizeof(boot_params),
+};
+
+static int __init boot_params_ksysfs_init(void)
+{
+   int error;
+   struct kobject *boot_params_kobj;
+
+   boot_params_kobj = kobject_create_and_register(boot_params,
+  kernel_kobj);
+   if (!boot_params_kobj) {
+   error = -ENOMEM;
+   goto err_return;
+   }
+   error = sysfs_create_group(boot_params_kobj,
+  boot_params_attr_group);
+   if (error)
+   goto err_boot_params_subsys_unregister;
+   error = sysfs_create_bin_file(boot_params_kobj,
+ boot_params_data_attr);
+   if (error)
+   goto err_boot_params_subsys_unregister;
+   return 0;
+err_boot_params_subsys_unregister:
+   kobject_unregister(boot_params_kobj);
+err_return:
+   return error;
+}
+
+static int __init arch_ksysfs_init(void)
+{
+   int error;
+
+   error = boot_params_ksysfs_init();
+
+   return error;
+}
+
+arch_initcall(arch_ksysfs_init);
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -44,6 +44,7 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o
+obj-$(CONFIG_SYSFS)+= ksysfs.o
 
 obj-$(CONFIG_VMI)  += vmi_32.o vmiclock_32.o
 obj-$(CONFIG_PARAVIRT) += paravirt_32.o
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -194,7 +194,7 @@ unsigned long saved_videomode;
 
 static char __initdata command_line[COMMAND_LINE_SIZE];
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
 struct edd edd;
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-boot_params
@@ -0,0

[PATCH -mm] i386 EFI runtime service support : fixes in sync with x86_64 support

2007-12-13 Thread Huang, Ying
This patch fixes several issues of i386 EFI basic runtime service
support according to fixes of x86_64 support.

- Delete efi_rt_lock because it is used during system early boot,
  before SMP is initialized.

- Change local_flush_tlb() to __flush_tlb_all() to flush global page
  mapping.

- Clean up includes.

- Revise Kconfig description.

- Enable noefi kernel parameter on i386.


This patch has been tested against 2.6.24-rc5-mm1 kernel on Intel
platforms with 32-bit EFI1.10 and UEFI2.0 firmware.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/kernel-parameters.txt   |2 ++
 Documentation/x86_64/boot-options.txt |4 
 arch/x86/Kconfig  |   19 ---
 arch/x86/kernel/efi.c |7 +++
 arch/x86/kernel/efi_32.c  |   25 +
 arch/x86/kernel/efi_64.c  |7 ---
 arch/x86/kernel/setup_32.c|6 +++---
 7 files changed, 25 insertions(+), 45 deletions(-)

--- a/arch/x86/kernel/efi_32.c
+++ b/arch/x86/kernel/efi_32.c
@@ -20,27 +20,15 @@
  */
 
 #include linux/kernel.h
-#include linux/init.h
-#include linux/mm.h
 #include linux/types.h
-#include linux/time.h
-#include linux/spinlock.h
-#include linux/bootmem.h
 #include linux/ioport.h
-#include linux/module.h
 #include linux/efi.h
-#include linux/kexec.h
 
-#include asm/setup.h
 #include asm/io.h
 #include asm/page.h
 #include asm/pgtable.h
-#include asm/processor.h
-#include asm/desc.h
 #include asm/tlbflush.h
 
-#define PFXEFI: 
-
 /*
  * To make EFI call EFI runtime service in physical addressing mode we need
  * prelog/epilog before/after the invocation to disable interrupt, to
@@ -49,16 +37,14 @@
  */
 
 static unsigned long efi_rt_eflags;
-static DEFINE_SPINLOCK(efi_rt_lock);
 static pgd_t efi_bak_pg_dir_pointer[2];
 
-void efi_call_phys_prelog(void) __acquires(efi_rt_lock)
+void efi_call_phys_prelog(void)
 {
unsigned long cr4;
unsigned long temp;
struct Xgt_desc_struct gdt_descr;
 
-   spin_lock(efi_rt_lock);
local_irq_save(efi_rt_eflags);
 
/*
@@ -88,14 +74,14 @@ void efi_call_phys_prelog(void) __acquir
/*
 * After the lock is released, the original page table is restored.
 */
-   local_flush_tlb();
+   __flush_tlb_all();
 
gdt_descr.address = __pa(get_cpu_gdt_table(0));
gdt_descr.size = GDT_SIZE - 1;
load_gdt(gdt_descr);
 }
 
-void efi_call_phys_epilog(void) __releases(efi_rt_lock)
+void efi_call_phys_epilog(void)
 {
unsigned long cr4;
struct Xgt_desc_struct gdt_descr;
@@ -119,10 +105,9 @@ void efi_call_phys_epilog(void) __releas
/*
 * After the lock is released, the original page table is restored.
 */
-   local_flush_tlb();
+   __flush_tlb_all();
 
local_irq_restore(efi_rt_eflags);
-   spin_unlock(efi_rt_lock);
 }
 
 /*
@@ -135,7 +120,7 @@ void __init efi_map_memmap(void)
memmap.map = bt_ioremap((unsigned long) memmap.phys_map,
(memmap.nr_map * memmap.desc_size));
if (memmap.map == NULL)
-   printk(KERN_ERR PFX Could not remap the EFI memmap!\n);
+   printk(KERN_ERR Could not remap the EFI memmap!\n);
 
memmap.map_end = memmap.map + (memmap.nr_map * memmap.desc_size);
 }
--- a/arch/x86/kernel/efi.c
+++ b/arch/x86/kernel/efi.c
@@ -55,6 +55,13 @@ struct efi_memory_map memmap;
 struct efi efi_phys __initdata;
 static efi_system_table_t efi_systab __initdata;
 
+static int __init setup_noefi(char *arg)
+{
+   efi_enabled = 0;
+   return 0;
+}
+early_param(noefi, setup_noefi);
+
 static efi_status_t virt_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
 {
return efi_call_virt2(get_time, tm, tc);
--- a/arch/x86/kernel/efi_64.c
+++ b/arch/x86/kernel/efi_64.c
@@ -40,13 +40,6 @@
 static pgd_t save_pgd __initdata;
 static unsigned long efi_flags __initdata;
 
-static int __init setup_noefi(char *arg)
-{
-   efi_enabled = 0;
-   return 0;
-}
-early_param(noefi, setup_noefi);
-
 static void __init early_mapping_set_exec(unsigned long start,
  unsigned long end,
  int executable)
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -651,9 +651,6 @@ void __init setup_arch(char **cmdline_p)
printk(KERN_INFO BIOS-provided physical RAM map:\n);
print_memory_map(memory_setup());
 
-   if (efi_enabled)
-   efi_init();
-
copy_edd();
 
if (!boot_params.hdr.root_flags)
@@ -680,6 +677,9 @@ void __init setup_arch(char **cmdline_p)
strlcpy(command_line, boot_command_line, COMMAND_LINE_SIZE);
*cmdline_p = command_line;
 
+   if (efi_enabled)
+   efi_init();
+
max_low_pfn = setup_memory();
 
 #ifdef CONFIG_VMI
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -1001,21 +1001,18 @@ config MTRR

[PATCH -mm -v3] x86 boot : export boot_params via sysfs

2007-12-14 Thread Huang, Ying
This patch export the boot parameters via sysfs. This can be used for
debugging and kexec.

The files added are as follow:

/sys/kernel/boot_params/data: binary file for struct boot_params
/sys/kernel/boot_params/version : boot protocol version

This patch is based on 2.6.24-rc5-mm1 and has been tested on i386 and
x86_64 platoform.

This patch is based on the Peter Anvin's proposal.


v3:

- Use updated API: kobject_create_and_add.

v2:

- Add document in Document/ABI.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 Documentation/ABI/testing/sysfs-kernel-boot_params |   14 +++
 arch/x86/kernel/Makefile_32|1 
 arch/x86/kernel/Makefile_64|1 
 arch/x86/kernel/ksysfs.c   |   88 +
 arch/x86/kernel/setup64.c  |2 
 arch/x86/kernel/setup_32.c |2 
 6 files changed, 106 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/Makefile_64
+++ b/arch/x86/kernel/Makefile_64
@@ -40,6 +40,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp_64.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit_64.o
 obj-$(CONFIG_EFI)  += efi.o efi_64.o efi_stub_64.o
+obj-$(CONFIG_SYSFS)+= ksysfs.o
 
 obj-$(CONFIG_MODULES)  += module_64.o
 obj-$(CONFIG_PCI)  += early-quirks.o
--- a/arch/x86/kernel/setup64.c
+++ b/arch/x86/kernel/setup64.c
@@ -24,7 +24,7 @@
 #include asm/sections.h
 #include asm/setup.h
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 cpumask_t cpu_initialized __cpuinitdata = CPU_MASK_NONE;
 
--- /dev/null
+++ b/arch/x86/kernel/ksysfs.c
@@ -0,0 +1,88 @@
+/*
+ * Architecture specific sysfs attributes in /sys/kernel
+ *
+ * Copyright (C) 2007, Intel Corp.
+ *  Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2
+ */
+
+#include linux/kobject.h
+#include linux/string.h
+#include linux/sysfs.h
+#include linux/init.h
+#include linux/stat.h
+#include linux/mm.h
+
+#include asm/setup.h
+
+static ssize_t boot_params_version_show(struct kobject *kobj,
+   struct kobj_attribute *attr, char *buf)
+{
+   return sprintf(buf, 0x%04x\n, boot_params.hdr.version);
+}
+
+static struct kobj_attribute boot_params_version_attr =
+   __ATTR(version, S_IRUGO, boot_params_version_show, NULL);
+
+static struct attribute *boot_params_attrs[] = {
+   boot_params_version_attr.attr,
+   NULL
+};
+
+static struct attribute_group boot_params_attr_group = {
+   .attrs = boot_params_attrs,
+};
+
+static ssize_t boot_params_data_read(struct kobject *kobj,
+struct bin_attribute *bin_attr,
+char *buf, loff_t off, size_t count)
+{
+   memcpy(buf, (void *)boot_params + off, count);
+   return count;
+}
+
+static struct bin_attribute boot_params_data_attr = {
+   .attr = {
+   .name = data,
+   .mode = S_IRUGO,
+   },
+   .read = boot_params_data_read,
+   .size = sizeof(boot_params),
+};
+
+static int __init boot_params_ksysfs_init(void)
+{
+   int error;
+   struct kobject *boot_params_kobj;
+
+   boot_params_kobj = kobject_create_and_add(boot_params, kernel_kobj);
+   if (!boot_params_kobj) {
+   error = -ENOMEM;
+   goto err_return;
+   }
+   error = sysfs_create_group(boot_params_kobj,
+  boot_params_attr_group);
+   if (error)
+   goto err_boot_params_subsys_unregister;
+   error = sysfs_create_bin_file(boot_params_kobj,
+ boot_params_data_attr);
+   if (error)
+   goto err_boot_params_subsys_unregister;
+   return 0;
+err_boot_params_subsys_unregister:
+   kobject_unregister(boot_params_kobj);
+err_return:
+   return error;
+}
+
+static int __init arch_ksysfs_init(void)
+{
+   int error;
+
+   error = boot_params_ksysfs_init();
+
+   return error;
+}
+
+arch_initcall(arch_ksysfs_init);
--- a/arch/x86/kernel/Makefile_32
+++ b/arch/x86/kernel/Makefile_32
@@ -45,6 +45,7 @@ obj-$(CONFIG_EARLY_PRINTK)+= early_prin
 obj-$(CONFIG_HPET_TIMER)   += hpet.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_MGEODE_LX)+= geode_32.o mfgpt_32.o
+obj-$(CONFIG_SYSFS)+= ksysfs.o
 
 obj-$(CONFIG_VMI)  += vmi_32.o vmiclock_32.o
 obj-$(CONFIG_PARAVIRT) += paravirt_32.o
--- a/arch/x86/kernel/setup_32.c
+++ b/arch/x86/kernel/setup_32.c
@@ -194,7 +194,7 @@ unsigned long saved_videomode;
 
 static char __initdata command_line[COMMAND_LINE_SIZE];
 
-struct boot_params __initdata boot_params;
+struct boot_params boot_params;
 
 #if defined(CONFIG_EDD) || defined(CONFIG_EDD_MODULE)
 struct edd edd;
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-kernel-boot_params
@@ -0,0 +1,14

Re: [PATCH 0/3 -mm] kexec jump -v8

2007-12-26 Thread Huang, Ying
On Wed, 2007-12-26 at 20:57 -0500, Vivek Goyal wrote:
[...]
  9. Now, you are in the original kernel again. You can read/write the
 memory image of kexeced kernel via /proc/kimgcore.
  
 
 Why do we need two interfaces, /proc/vmcore and /proc/kimgcore? Can't
 we have just one say /proc/vmcore. Irrespective of what kernel you are
 in /proc/vmcore gives you the access to the memory of kernel which was
 previously booted.

In theory we can kexec another kernel even in a kexeced kernel, that is,
in kernel A kexec kernel B, and in kernel B kexec another kernel C. In
this situation, both /proc/vmcore and /proc/kimgcore has valid contents.
So I think, it may be better to keep two interfaces.

In fact, current kexec jump implementation use a dummy jump back helper
image in kexeced kernel to jump back to the original kernel. The jump
back helper image has no PT_LOAD segment, it is used to provide a
struct kimage (including control page, swap page) and entry point to
jump back.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 -mm] kexec jump -v8

2007-12-27 Thread Huang, Ying
On Thu, 2007-12-27 at 13:12 -0500, Vivek Goyal wrote:
 On Thu, Dec 27, 2007 at 10:33:13AM +0800, Huang, Ying wrote:
  On Wed, 2007-12-26 at 20:57 -0500, Vivek Goyal wrote:
  [...]
9. Now, you are in the original kernel again. You can read/write the
   memory image of kexeced kernel via /proc/kimgcore.

   
   Why do we need two interfaces, /proc/vmcore and /proc/kimgcore? Can't
   we have just one say /proc/vmcore. Irrespective of what kernel you are
   in /proc/vmcore gives you the access to the memory of kernel which was
   previously booted.
  
  In theory we can kexec another kernel even in a kexeced kernel, that is,
  in kernel A kexec kernel B, and in kernel B kexec another kernel C. In
  this situation, both /proc/vmcore and /proc/kimgcore has valid contents.
  So I think, it may be better to keep two interfaces.
  
 
 In those situations I think only one interface is better. For example, 
 above will be broken if somebody kexec 4 kernels.
 
 A--B---C---D

I don't think the two interfaces will be broken if somebody kexec 4
kernels. For example, when kexec D from C, the /proc/vmcore is contents
of B, /proc/kimgcore is contents of D. To jump back from C to B, the D
is unloaded, and a jump back helper image is loaded.

 I think better option might be if it is stack like situation. A kernel
 shows you only the previous kernel's memory contents through /proc/vmcore
 interface. So If I am in kernel D, I see only kernel C's memory image.
 To see kernel B's memory image, one shall have to go back to kernel C.

Maybe it is not sufficient to only show the previous kernel's memory
contents. In kernel C, you maybe need to access the memory image of
kernel B and memory image of kernel D.

That is, /proc/vmcore is the memory image of the previous kernel,
and /proc/kimgcore is the memory image of the next kernel.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 -mm] kexec jump -v8

2007-12-28 Thread Huang, Ying
On Fri, 2007-12-28 at 16:33 -0500, Vivek Goyal wrote:
 On Fri, Dec 21, 2007 at 03:33:19PM +0800, Huang, Ying wrote:
  This patchset provides an enhancement to kexec/kdump. It implements
  the following features:
  
  - Backup/restore memory used both by the original kernel and the
kexeced kernel.
  
  - Jumping between the original kernel and the kexeced kernel.
  
  - Read/write memory image of the kexeced kernel in the original kernel
and write memory image of the original kernel in the kexeced
kernel. This can be used as a communication method between the
kexeced kernel and the original kernel.
  
  
  The features of this patchset can be used as follow:
  
  - Kernel/system debug through making system snapshot. You can make
system snapshot, jump back, do some thing and make another system
snapshot.
  
 
 How do you differentiate between whether a core is resumable or not.
 IOW, how do you know the generated /proc/vmcore has been generated after
 a real crash hence can't be resumed (using krestore) or it has been 
 generated because of hibernation/debug purposes and can be resumed?
 
 I think you might have to add an extra ELF NOTE to vmcore which can help
 decide whether kernel memory snapshot is resumable or not.

The current solution is as follow:

1. The original kernel will set %edi to jump back entry if resumable and
set %edi to 0 if not.

2. The purgatory of loaded kernel will check %edi, if it is not zero,
the string jump_back_entry=jump_back_entry will be appended to
kernel command line parameter.

3. In kexeced kernel, if there is jump_back_entry=jump_back_entry
in /proc/cmdline, the previous kernel is resumable, otherwise not.

As for ELF NOTE, in fact, the ELF NOTE does not work for resumable
kernel. Because the contents of source page and destination page is
swapped during kexec, and the kernel access the destination page
directly during parsing ELF NOTE. All memory that is swapped need to be
accessed via the backup pages map (image-head). I think these
information can be exchanged between two kernels via kernel command line
or /proc/kimgcore.

 [..]
  2. Build an initramfs image contains kexec-tool, or download the
 pre-built initramfs image, called rootfs.gz in following text.
  
  3. Boot kernel compiled in step 1.
  
  4. Load kernel compiled in step 1 with /sbin/kexec. If You want to use
 krestore tool, the --elf64-core-headers should be specified in
 command line of /sbin/kexec. The shell command line can be as
 follow:
  
 /sbin/kexec --load-jump-back /boot/bzImage --mem-min=0x10
   --mem-max=0xff --elf64-core-headers --initrd=rootfs.gz
  
 
 How about a different name like --load-preserve-context. This will
 just mean that kexec need to preserve the context while kexeing to 
 image being loaded. Combination of --load-jump-back and
 --load-jump-back-helper is becoming little confusing.

Yes, this is better. I will change it.

  5. Boot the kexeced kernel with following shell command line:
  
 /sbin/kexec -e
  
  6. The kexeced kernel will boot as normal kexec. In kexeced kernel the
 memory image of original kernel can read via /proc/vmcore or
 /dev/oldmem, and can be written via /dev/oldmem. You can
 save/restore/modify it as you want to.
  
 
 Restoring a hibernated image using /dev/oldmem should be easy and I 
 think one should be able to launch it back using --load-jump-back-helper.

Yes. I think so too. The current implementation of krestore restoring
the hibernated image using /dev/oldmem. And the hibernated image can be
launched using --load-jump-back-helper.

 How do you restore already kexeced kernel? For example if I got two
 kernels A and B. A is the one which will hibernate and B will be used
 to store the hibernated kernel. I think as per the procedure one needs
 to first boot into kernel B and then jump back to kernel A. This will
 make image of B available in /proc/kimgcore. If I save /proc/kimgcore
 to disk and want to jump back to it, how do I do it? I guess I need
 to kexec again using --load-jump-back and not restore using krestore?

The image of B is made as you said. And it can be restored as follow:

/sbin/kexec -l --args-none --flags=0x2 kimgecore
/sbin/kexec -e

That is, the image of B is loaded as a ordinary ELF file. A option
to /sbin/kexec named --flags are added to specify the
KEXEC_PRESERVE_CONTEXT flags for sys_kexec_load. This has been tested.

  7. Prepare jumping back from kexeced kernel with following shell
 command lines:
  
 jump_back_entry=`cat /proc/cmdline | tr ' ' '\n' | grep 
  kexec_jump_back_entry | cut -d '='`
 /sbin/kexec --load-jump-back-helper=$jump_back_entry
  
 
 How about decoupling entry point from --load-jump-back-helper. We can
 introduce a separate option for entry point. Something like.
 
 kexec --load-jump-back-helper --entry=$jump_back_entry
 
 May be we can generalize the --entry so that a user can override the 
 entry point of the normal

[PATCH -mm] EFI : Split EFI tables parsing code from EFI runtime service support code

2007-12-28 Thread Huang, Ying
This patch split EFI tables parsing code from EFI runtime service
support code. This makes ACPI support and DMI support on EFI platform
not depend on EFI runtime service support. Both EFI32 and EFI64 tables
parsing functions are provided on i386 and x86_64. This makes it
possible to use EFI information in i386 kernel on x86_64 with EFI64
firmware or in x86_64 kernel on x86_64 with EFI32 firmware.

This patch is based on 2.6.24-rc5-mm1 and has been tested for
following combinations:

i386   kernel on EFI 32
i386   kernel on EFI 64
x86_64 kernel on EFI 32
x86_64 kernel on EFI 64
ia64   kernel on EFI 64

Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/ia64/kernel/acpi.c  |6 -
 arch/ia64/kernel/efi.c   |   30 
 arch/ia64/kernel/setup.c |2 
 arch/ia64/sn/kernel/setup.c  |4 -
 arch/x86/Kconfig |4 -
 arch/x86/kernel/Makefile_32  |3 
 arch/x86/kernel/Makefile_64  |2 
 arch/x86/kernel/efi.c|  111 +++--
 arch/x86/kernel/efi_tables.c |  144 +++
 arch/x86/kernel/setup_32.c   |9 ++
 arch/x86/kernel/setup_64.c   |9 ++
 drivers/acpi/osl.c   |   11 +--
 drivers/firmware/dmi_scan.c  |7 +-
 drivers/firmware/efivars.c   |   53 ---
 drivers/firmware/pcdp.c  |6 -
 include/asm-ia64/setup.h |5 +
 include/asm-ia64/sn/sn_sal.h |2 
 include/asm-x86/efi.h|7 ++
 include/asm-x86/setup.h  |9 ++
 include/linux/efi.h  |   64 ---
 20 files changed, 331 insertions(+), 157 deletions(-)

--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -212,6 +212,16 @@ typedef struct {
unsigned long table;
 } efi_config_table_t;
 
+struct efi_config_table64 {
+   efi_guid_t guid;
+   u64 table;
+};
+
+struct efi_config_table32 {
+   efi_guid_t guid;
+   u32 table;
+};
+
 #define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL)
 
 typedef struct {
@@ -230,6 +240,39 @@ typedef struct {
unsigned long tables;
 } efi_system_table_t;
 
+struct efi_system_table64 {
+   efi_table_hdr_t hdr;
+   u64 fw_vendor;
+   u32 fw_revision;
+   u32 _pad1;
+   u64 con_in_handle;
+   u64 con_in;
+   u64 con_out_handle;
+   u64 con_out;
+   u64 stderr_handle;
+   u64 stderr;
+   u64 runtime;
+   u64 boottime;
+   u64 nr_tables;
+   u64 tables;
+};
+
+struct efi_system_table32 {
+   efi_table_hdr_t hdr;
+   u32 fw_vendor;
+   u32 fw_revision;
+   u32 con_in_handle;
+   u32 con_in;
+   u32 con_out_handle;
+   u32 con_out;
+   u32 stderr_handle;
+   u32 stderr;
+   u32 runtime;
+   u32 boottime;
+   u32 nr_tables;
+   u32 tables;
+};
+
 struct efi_memory_map {
void *phys_map;
void *map;
@@ -246,14 +289,6 @@ struct efi_memory_map {
  */
 extern struct efi {
efi_system_table_t *systab; /* EFI system table */
-   unsigned long mps;  /* MPS table */
-   unsigned long acpi; /* ACPI table  (IA64 ext 0.71) */
-   unsigned long acpi20;   /* ACPI table  (ACPI 2.0) */
-   unsigned long smbios;   /* SM BIOS table */
-   unsigned long sal_systab;   /* SAL system table */
-   unsigned long boot_info;/* boot info table */
-   unsigned long hcdp; /* HCDP table */
-   unsigned long uga;  /* UGA table */
efi_get_time_t *get_time;
efi_set_time_t *set_time;
efi_get_wakeup_time_t *get_wakeup_time;
@@ -266,6 +301,19 @@ extern struct efi {
efi_set_virtual_address_map_t *set_virtual_address_map;
 } efi;
 
+struct efi_tables {
+   unsigned long mps;  /* MPS table */
+   unsigned long acpi; /* ACPI table  (IA64 ext 0.71) */
+   unsigned long acpi20;   /* ACPI table  (ACPI 2.0) */
+   unsigned long smbios;   /* SM BIOS table */
+   unsigned long sal_systab;   /* SAL system table */
+   unsigned long boot_info;/* boot info table */
+   unsigned long hcdp; /* HCDP table */
+   unsigned long uga;  /* UGA table */
+};
+
+extern struct efi_tables efi_tables;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
--- /dev/null
+++ b/arch/x86/kernel/efi_tables.c
@@ -0,0 +1,144 @@
+/*
+ * EFI tables parsing functions
+ *
+ * Copyright (C) 2007 Intel Co.
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/efi.h
+#include linux/io.h
+
+#include asm/setup.h
+#include asm/efi.h
+
+struct efi_tables efi_tables;
+EXPORT_SYMBOL(efi_tables);
+
+#define EFI_TABLE_PARSE(bt)\
+static void __init efi_tables_parse ## bt(void

Re: [PATCH -mm] EFI : Split EFI tables parsing code from EFI runtime service support code

2008-01-02 Thread Huang, Ying
On Sun, 2007-12-30 at 15:28 +0100, Ingo Molnar wrote:
 * Huang, Ying [EMAIL PROTECTED] wrote:
 
  +struct efi_tables efi_tables;
  +EXPORT_SYMBOL(efi_tables);
 
  +enum bios_type bios_type = BIOS_LEGACY;
  +EXPORT_SYMBOL(bios_type);
 
 please make all the new exports EXPORT_SYMBOL_GPL().

OK, I will change it.

Best Regards,
Huang Ying
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3 -mm] kexec jump -v8

2008-01-03 Thread Huang, Ying
 be added to setup page.
4. Before one kernel jump to another kernel, the parameters are prepared
by current kernel.
5. One kernel can check the parameters of another kernel by
reading /proc/vmcore or /proc/kimgcore.
6. When memory image is saved in file. The parameters of hibernated
kernel can be check by reading memory location jump_back_entry + 0x800.

You can check the details of this mechanism in my previous patch with
title:

[PATCH 1/4 -mm] kexec based hibernation -v7 : kexec jump

The main issue of this mechanism is that: it is a kernel-to-kernel
communication mechanism, while Eric Biederman thinks we should use only
user-to-user communication mechanism. And he is not persuaded now.

Because kernel operations such as re-initialize/re-construct
the /proc/vmcore, etc are needed for kexec jump or resuming. I think a
kernel-to-kernel mechanism may be needed. But I don't know if Eric
Biederman will agree with this.

Best Regards,
Huang Ying

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH -mm -v2] EFI : Split EFI tables parsing code from EFI runtime service support code

2008-01-03 Thread Huang, Ying
This patch split EFI tables parsing code from EFI runtime service
support code. This makes ACPI support and DMI support on EFI platform
not depend on EFI runtime service support. Both EFI32 and EFI64 tables
parsing functions are provided on i386 and x86_64. This makes it
possible to use EFI information in i386 kernel on x86_64 with EFI64
firmware or in x86_64 kernel on x86_64 with EFI32 firmware.

This patch is based on 2.6.24-rc5-mm1 and has been tested for
following combinations:

i386   kernel on EFI 32
i386   kernel on EFI 64
x86_64 kernel on EFI 32
x86_64 kernel on EFI 64
ia64   kernel on EFI 64


ChangeLog

v2:

- Change EXPORT_SYMBOL to EXPORT_SYMBOL_GPL.


Signed-off-by: Huang Ying [EMAIL PROTECTED]

---
 arch/ia64/kernel/acpi.c  |6 -
 arch/ia64/kernel/efi.c   |   30 
 arch/ia64/kernel/setup.c |2 
 arch/ia64/sn/kernel/setup.c  |4 -
 arch/x86/Kconfig |4 -
 arch/x86/kernel/Makefile_32  |3 
 arch/x86/kernel/Makefile_64  |2 
 arch/x86/kernel/efi.c|  115 --
 arch/x86/kernel/efi_tables.c |  144 +++
 arch/x86/kernel/setup_32.c   |9 ++
 arch/x86/kernel/setup_64.c   |9 ++
 drivers/acpi/osl.c   |   11 +--
 drivers/firmware/dmi_scan.c  |7 +-
 drivers/firmware/efivars.c   |   53 ---
 drivers/firmware/pcdp.c  |6 -
 include/asm-ia64/setup.h |5 +
 include/asm-ia64/sn/sn_sal.h |2 
 include/asm-x86/efi.h|7 ++
 include/asm-x86/setup.h  |9 ++
 include/linux/efi.h  |   64 ---
 20 files changed, 333 insertions(+), 159 deletions(-)

--- a/include/linux/efi.h
+++ b/include/linux/efi.h
@@ -212,6 +212,16 @@ typedef struct {
unsigned long table;
 } efi_config_table_t;
 
+struct efi_config_table64 {
+   efi_guid_t guid;
+   u64 table;
+};
+
+struct efi_config_table32 {
+   efi_guid_t guid;
+   u32 table;
+};
+
 #define EFI_SYSTEM_TABLE_SIGNATURE ((u64)0x5453595320494249ULL)
 
 typedef struct {
@@ -230,6 +240,39 @@ typedef struct {
unsigned long tables;
 } efi_system_table_t;
 
+struct efi_system_table64 {
+   efi_table_hdr_t hdr;
+   u64 fw_vendor;
+   u32 fw_revision;
+   u32 _pad1;
+   u64 con_in_handle;
+   u64 con_in;
+   u64 con_out_handle;
+   u64 con_out;
+   u64 stderr_handle;
+   u64 stderr;
+   u64 runtime;
+   u64 boottime;
+   u64 nr_tables;
+   u64 tables;
+};
+
+struct efi_system_table32 {
+   efi_table_hdr_t hdr;
+   u32 fw_vendor;
+   u32 fw_revision;
+   u32 con_in_handle;
+   u32 con_in;
+   u32 con_out_handle;
+   u32 con_out;
+   u32 stderr_handle;
+   u32 stderr;
+   u32 runtime;
+   u32 boottime;
+   u32 nr_tables;
+   u32 tables;
+};
+
 struct efi_memory_map {
void *phys_map;
void *map;
@@ -246,14 +289,6 @@ struct efi_memory_map {
  */
 extern struct efi {
efi_system_table_t *systab; /* EFI system table */
-   unsigned long mps;  /* MPS table */
-   unsigned long acpi; /* ACPI table  (IA64 ext 0.71) */
-   unsigned long acpi20;   /* ACPI table  (ACPI 2.0) */
-   unsigned long smbios;   /* SM BIOS table */
-   unsigned long sal_systab;   /* SAL system table */
-   unsigned long boot_info;/* boot info table */
-   unsigned long hcdp; /* HCDP table */
-   unsigned long uga;  /* UGA table */
efi_get_time_t *get_time;
efi_set_time_t *set_time;
efi_get_wakeup_time_t *get_wakeup_time;
@@ -266,6 +301,19 @@ extern struct efi {
efi_set_virtual_address_map_t *set_virtual_address_map;
 } efi;
 
+struct efi_tables {
+   unsigned long mps;  /* MPS table */
+   unsigned long acpi; /* ACPI table  (IA64 ext 0.71) */
+   unsigned long acpi20;   /* ACPI table  (ACPI 2.0) */
+   unsigned long smbios;   /* SM BIOS table */
+   unsigned long sal_systab;   /* SAL system table */
+   unsigned long boot_info;/* boot info table */
+   unsigned long hcdp; /* HCDP table */
+   unsigned long uga;  /* UGA table */
+};
+
+extern struct efi_tables efi_tables;
+
 static inline int
 efi_guidcmp (efi_guid_t left, efi_guid_t right)
 {
--- /dev/null
+++ b/arch/x86/kernel/efi_tables.c
@@ -0,0 +1,144 @@
+/*
+ * EFI tables parsing functions
+ *
+ * Copyright (C) 2007 Intel Co.
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ * This file is released under the GPLv2.
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/efi.h
+#include linux/io.h
+
+#include asm/setup.h
+#include asm/efi.h
+
+struct efi_tables efi_tables;
+EXPORT_SYMBOL_GPL(efi_tables);
+
+#define EFI_TABLE_PARSE(bt)\
+static void __init efi_tables_parse ## bt(void

[PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
This patch adds basic runtime services support for EFI x86_64
system. The main file of the patch is the addition of efi.c for
x86_64. This file is modeled after the EFI IA32 avatar. EFI runtime
services initialization are implemented in efi.c. Some x86_64
specifics are worth noting here. On x86_64, parameters passed to UEFI
firmware services need to follow the UEFI calling convention. For this
purpose, a set of functions named lin2winx (x is the number of
parameters) are implemented. EFI function calls are wrapped before
calling the firmware service.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/x86/kernel/Makefile_64   |1 
 arch/x86/kernel/efi_64.c  |  593 ++
 arch/x86/kernel/efi_callwrap_64.S |   69 
 arch/x86/kernel/setup_64.c|   15 
 arch/x86_64/Kconfig   |   11 
 include/asm-x86/bootparam.h   |5 
 include/asm-x86/efi_64.h  |8 
 include/asm-x86/eficallwrap_64.h  |   33 ++
 include/asm-x86/fixmap_64.h   |3 
 9 files changed, 735 insertions(+), 3 deletions(-)

Index: linux-2.6.24-rc1/include/asm-x86/eficallwrap_64.h
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc1/include/asm-x86/eficallwrap_64.h   2007-10-25 
13:58:18.0 +0800
@@ -0,0 +1,33 @@
+/*
+ *  Copyright (C) 2007 Intel Corp
+ * Bibo Mao [EMAIL PROTECTED]
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ *  Function calling ABI conversion from SYSV to Windows for x86_64
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ *  GNU General Public License for more details.
+ *
+ */
+
+#ifndef __ASM_X86_64_EFICALLWRAP_H
+#define __ASM_X86_64_EFICALLWRAP_H
+
+extern efi_status_t lin2win0(void *fp);
+extern efi_status_t lin2win1(void *fp, u64 arg1);
+extern efi_status_t lin2win2(void *fp, u64 arg1, u64 arg2);
+extern efi_status_t lin2win3(void *fp, u64 arg1, u64 arg2, u64 arg3);
+extern efi_status_t lin2win4(void *fp, u64 arg1, u64 arg2, u64 arg3, u64 arg4);
+extern efi_status_t lin2win5(void *fp, u64 arg1, u64 arg2, u64 arg3,
+u64 arg4, u64 arg5);
+extern efi_status_t lin2win6(void *fp, u64 arg1, u64 arg2, u64 arg3,
+u64 arg4, u64 arg5, u64 arg6);
+
+#endif
Index: linux-2.6.24-rc1/arch/x86/kernel/efi_64.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc1/arch/x86/kernel/efi_64.c   2007-10-25 14:51:41.0 
+0800
@@ -0,0 +1,593 @@
+/*
+ * Extensible Firmware Interface
+ *
+ * Based on Extensible Firmware Interface Specification version 1.0
+ *
+ * Copyright (C) 1999 VA Linux Systems
+ * Copyright (C) 1999 Walt Drummond [EMAIL PROTECTED]
+ * Copyright (C) 1999-2002 Hewlett-Packard Co.
+ * David Mosberger-Tang [EMAIL PROTECTED]
+ * Stephane Eranian [EMAIL PROTECTED]
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu [EMAIL PROTECTED]
+ * Bibo Mao [EMAIL PROTECTED]
+ * Chandramouli Narayanan [EMAIL PROTECTED]
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ * Code to convert EFI to E820 map has been implemented in elilo bootloader
+ * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table
+ * is setup appropriately for EFI runtime code.
+ * - mouli 06/14/2007.
+ *
+ * All EFI Runtime Services are not implemented yet as EFI only
+ * supports physical mode addressing on SoftSDV. This is to be fixed
+ * in a future version.  --drummond 1999-07-20
+ *
+ * Implemented EFI runtime services and virtual mode calls.  --davidm
+ *
+ * Goutham Rao: [EMAIL PROTECTED]
+ * Skip non-WB memory and ignore empty memory ranges.
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/mm.h
+#include linux/types.h
+#include linux/time.h
+#include linux/spinlock.h
+#include linux/bootmem.h
+#include linux/ioport.h
+#include linux/module.h
+#include linux/efi.h
+#include linux/uaccess.h
+#include linux/io.h
+#include linux/reboot.h
+
+#include asm/setup.h
+#include asm/bootparam.h
+#include asm/page.h
+#include asm/e820.h
+#include asm/pgtable.h
+#include asm/tlbflush.h
+#include asm/cacheflush.h
+#include asm/proto.h
+#include asm/eficallwrap_64.h
+#include asm/efi_64.h
+#include asm/time_64.h
+
+int efi_enabled;
+EXPORT_SYMBOL(efi_enabled);
+
+struct efi efi;
+EXPORT_SYMBOL(efi);
+
+struct efi_memory_map memmap;
+
+struct efi efi_phys __initdata;
+static efi_system_table_t efi_systab __initdata

[PATCH 2/3 -v4] x86_64 EFI runtime service support: EFI runtime services

2007-10-25 Thread Huang, Ying
This patch adds support for several EFI runtime services for EFI
x86_64 system.

The EFI support for emergency_restart and RTC clock is added. The EFI
based implementation and legacy BIOS or CMOS based implementation are
put in separate functions and can be chosen with kernel boot options.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/x86/kernel/reboot_64.c |   19 +-
 arch/x86/kernel/time_64.c   |   48 
 include/asm-x86/emergency-restart.h |8 ++
 include/asm-x86/time_64.h   |7 +
 4 files changed, 60 insertions(+), 22 deletions(-)

Index: linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c
===
--- linux-2.6.24-rc1.orig/arch/x86/kernel/reboot_64.c   2007-10-25 
11:25:29.0 +0800
+++ linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c2007-10-25 
11:25:38.0 +0800
@@ -9,6 +9,7 @@
 #include linux/pm.h
 #include linux/kdebug.h
 #include linux/sched.h
+#include linux/efi.h
 #include asm/io.h
 #include asm/delay.h
 #include asm/desc.h
@@ -26,18 +27,16 @@
 EXPORT_SYMBOL(pm_power_off);
 
 static long no_idt[3];
-static enum { 
-   BOOT_TRIPLE = 't',
-   BOOT_KBD = 'k'
-} reboot_type = BOOT_KBD;
+enum reboot_type reboot_type = BOOT_KBD;
 static int reboot_mode = 0;
 int reboot_force;
 
-/* reboot=t[riple] | k[bd] [, [w]arm | [c]old]
+/* reboot=t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
warm   Don't set the cold reboot flag
cold   Set the cold reboot flag
triple Force a triple fault (init)
kbdUse the keyboard controller. cold reset (default)
+   efiUse efi reset_system runtime service
force  Avoid anything that could hang.
  */ 
 static int __init reboot_setup(char *str)
@@ -55,6 +54,7 @@
case 't':
case 'b':
case 'k':
+   case 'e':
reboot_type = *str;
break;
case 'f':
@@ -142,7 +142,14 @@
 
reboot_type = BOOT_KBD;
break;
-   }  
+
+   case BOOT_EFI:
+   if (efi_enabled)
+   efi.reset_system(reboot_mode ? EFI_RESET_WARM : 
EFI_RESET_COLD,
+EFI_SUCCESS, 0, NULL);
+   reboot_type = BOOT_KBD;
+   break;
+   }
}  
 }
 
Index: linux-2.6.24-rc1/arch/x86/kernel/time_64.c
===
--- linux-2.6.24-rc1.orig/arch/x86/kernel/time_64.c 2007-10-25 
11:25:29.0 +0800
+++ linux-2.6.24-rc1/arch/x86/kernel/time_64.c  2007-10-25 11:25:38.0 
+0800
@@ -25,6 +25,7 @@
 #include linux/notifier.h
 #include linux/cpu.h
 #include linux/kallsyms.h
+#include linux/efi.h
 #include linux/acpi.h
 #include linux/clockchips.h
 
@@ -45,12 +46,19 @@
 #include asm/mpspec.h
 #include asm/nmi.h
 #include asm/vgtod.h
+#include asm/time_64.h
 
 DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL(rtc_lock);
 
 volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES;
 
+static int set_rtc_mmss(unsigned long nowtime);
+static unsigned long read_cmos_clock(void);
+
+unsigned long (*get_wallclock)(void) = read_cmos_clock;
+int (*set_wallclock)(unsigned long nowtime) = set_rtc_mmss;
+
 unsigned long profile_pc(struct pt_regs *regs)
 {
unsigned long pc = instruction_pointer(regs);
@@ -84,13 +92,6 @@
unsigned char control, freq_select;
 
 /*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
- */
-
-   spin_lock(rtc_lock);
-
-/*
  * Tell the clock it's being set and stop it.
  */
 
@@ -138,14 +139,23 @@
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
-   spin_unlock(rtc_lock);
-
return retval;
 }
 
 int update_persistent_clock(struct timespec now)
 {
-   return set_rtc_mmss(now.tv_sec);
+   int retval;
+
+/*
+ * IRQs are disabled when we're called from the timer interrupt,
+ * no need for spin_lock_irqsave()
+ */
+
+   spin_lock(rtc_lock);
+   retval = set_wallclock(now.tv_sec);
+   spin_unlock(rtc_lock);
+
+   return retval;
 }
 
 static irqreturn_t timer_event_interrupt(int irq, void *dev_id)
@@ -157,14 +167,11 @@
return IRQ_HANDLED;
 }
 
-unsigned long read_persistent_clock(void)
+static unsigned long read_cmos_clock(void)
 {
unsigned int year, mon, day, hour, min, sec;
-   unsigned long flags;
unsigned century = 0;
 
-   spin_lock_irqsave(rtc_lock, flags);
-
do {
sec = CMOS_READ(RTC_SECONDS);
min = CMOS_READ(RTC_MINUTES);
@@ -179,8 +186,6 @@
 #endif
} while (sec != CMOS_READ(RTC_SECONDS));
 
-   spin_unlock_irqrestore(rtc_lock, flags

[PATCH 3/3 -v4] x86_64 EFI runtime service support: document for EFI runtime services

2007-10-25 Thread Huang, Ying
This patch adds document for EFI x86_64 runtime services support.

---

 boot-options.txt |   12 +++-
 uefi.txt |   10 ++
 2 files changed, 21 insertions(+), 1 deletion(-)

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

Index: linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt
===
--- linux-2.6.24-rc1.orig/Documentation/x86_64/boot-options.txt 2007-10-25 
13:58:14.0 +0800
+++ linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt  2007-10-25 
13:58:18.0 +0800
@@ -110,12 +110,15 @@
 
 Rebooting
 
-   reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old]
+   reboot=b[ios] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
bios  Use the CPU reboot vector for warm reset
warm   Don't set the cold reboot flag
cold   Set the cold reboot flag
triple Force a triple fault (init)
kbdUse the keyboard controller. cold reset (default)
+   efiUse efi reset_system runtime service. If EFI is not configured or the
+  EFI reset does not work, the reboot path attempts the reset using
+  the keyboard controller.
 
Using warm reset will be much faster especially on big memory
systems because the BIOS will not go through the memory check.
@@ -300,4 +303,11 @@
newfallback: use new unwinder but fall back to old if it gets
stuck (default)
 
+EFI
+
+  noefiDisable EFI support
+
+  noefi_time   Disable EFI time runtime service, programming CMOS
+   hardware directly
+
 Miscellaneous
Index: linux-2.6.24-rc1/Documentation/x86_64/uefi.txt
===
--- linux-2.6.24-rc1.orig/Documentation/x86_64/uefi.txt 2007-10-25 
13:58:18.0 +0800
+++ linux-2.6.24-rc1/Documentation/x86_64/uefi.txt  2007-10-25 
13:58:18.0 +0800
@@ -19,6 +19,10 @@
 - Build the kernel with the following configuration.
CONFIG_FB_EFI=y
CONFIG_FRAMEBUFFER_CONSOLE=y
+  If EFI runtime services are expected, the following configuration should
+  be selected.
+   CONFIG_EFI=y
+   CONFIG_EFI_VARS=y or m  # optional
 - Create a VFAT partition on the disk
 - Copy the following to the VFAT partition:
elilo bootloader with x86_64 support, elilo configuration file,
@@ -27,3 +31,9 @@
can be found in the elilo sourceforge project.
 - Boot to EFI shell and invoke elilo choosing the kernel image built
   in first step.
+- If some or all EFI runtime services don't work, you can try following
+  kernel command line parameters to turn off some or all EFI runtime
+  services.
+   noefi   turn off all EFI runtime services
+   noefi_time  turn off EFI time runtime service
+   reboot_type=k   turn off EFI reboot runtime service
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/3 -v4] x86_64 EFI runtime service support

2007-10-25 Thread Huang, Ying
Following sets of patches add EFI/UEFI (Unified Extensible Firmware
Interface) runtime services support to x86_64 architecture. The
patches have been tested against 2.6.24-rc1 kernel on Intel platforms
with EFI1.10 and UEFI2.0 firmware.


v4:

- EFI boot parameters are extended for 64-bit EFI in a 32-bit EFI
  compatible way.

- Add EFI runtime services document.

v3:

- Remove E820_RUNTIME_CODE, the EFI memory map is used to deal with
  EFI runtime code area.

- The method used to make EFI runtime code area executable is change:

  a. Before page allocation is usable, the PMD of direct mapping is
 changed temporarily before and after each EFI call.

  b. After page allocation is usable, change_page_attr_addr is used to
 change corresponding page attribute.

- Use fixmap to map EFI memory mapped IO memory area to make kexec
  workable.

- Add a kernel command line option noefi to make it possible to turn
  off EFI runtime services support.

- Function pointers are used for EFI time runtime service.

- EFI reboot runtime service is embedded into the framework of
  reboot_type.

- A kernel command line option noefi_time is added to make it
  possible to fall back to CMOS based implementation.

v2:

- The EFI callwrapper is re-implemented in assembler.


Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 18:09 +0200, Thomas Gleixner wrote:
  EFI runtime
  services initialization are implemented in efi.c. Some x86_64
  specifics are worth noting here. On x86_64, parameters passed to UEFI
  firmware services need to follow the UEFI calling convention. For this
  purpose, a set of functions named lin2winx (x is the number of
  parameters) are implemented. EFI function calls are wrapped before
  calling the firmware service.
 
 Why needs this to be called lin2win? We do not call Windows, we call
 EFI services, so please use a naming convention which is related to
 the functionality of the code.
 
  + *
  + *  Function calling ABI conversion from SYSV to Windows for x86_64
 
 Again, these are wrappers to access EFI and not Windows.

EFI uses the Windows x86_64 calling convention. The lin2win may be a
more general naming convention that can be used for some other code (the
NDISwrapper?) in the future. Do you agree?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 11:01 -0600, Eric W. Biederman wrote:
  +static efi_status_t __init phys_efi_set_virtual_address_map(
  +   unsigned long memory_map_size,
  +   unsigned long descriptor_size,
  +   u32 descriptor_version,
  +   efi_memory_desc_t *virtual_map)
  +{
  +   efi_status_t status;
  +
  +   efi_call_phys_prelog();
  +   status = lin2win4((void *)efi_phys.set_virtual_address_map,
  + (u64)memory_map_size, (u64)descriptor_size,
  + (u64)descriptor_version, (u64)virtual_map);
  +   efi_call_phys_epilog();
  +   return status;
  +}
 
 So you still have this piece of code which makes a kernel using
 efi not compatible with kexec.  But you are still supporting a physical
 call mode for efi.  If you are going to do this can we please just
 remove the hacks that make the EFI physical call mode early boot only
 and just always use that mode.  Depending on weird call once functions
 like efi_set_virtual_address_map makes me very uncomfortable.

The kexec issue is solved as that of IA-64. The EFI runtime code and
data memory area is mapped with identity mapping, so they will have same
virtual address across kexec. The memory mapped IO memory area used by
EFI runtime services is mapped with fixmap, so they will have same
virtual address across kexec too. And the efi_set_virtual_address_map
runtime service function will be skipped in the kexeced kernel (set to
nop in /sbin/kexec). So the kexeced kernel can use the virtual mode of
EFI from kernel bootstrap on.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 11:06 -0600, Eric W. Biederman wrote:
 Arjan van de Ven [EMAIL PROTECTED] writes:
 
  On Thu, 25 Oct 2007 10:55:44 -0600
  [EMAIL PROTECTED] (Eric W. Biederman) wrote:
 
  I don't think there is a compelling case for us to use any efi
  services at this time
 
  I would almost agree with this if it wasn't for the 1 call that OS
  installers need to tell EFI about bootloader stuff; I've cc'd Peter
  Jones since he'll know better what OS installers need; if they don't
  need it after all...
 
 Yes.  I think that is usage of the variable service.  Although
 I don't know if that is actually needed.
 
 Support for the variable service is not implemented in this
 patchset.

Support for the variable service has been implemented in this patchset.
The interfaces are:

efi.get_variable
efi.get_next_variable
efi.set_variable

And a sysfs interface (/sys/firmware/efi/vars) is provided in
drivers/firmware/efivars.c, which depends on this patchset to work.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 11:30 -0600, Eric W. Biederman wrote:
 H. Peter Anvin [EMAIL PROTECTED] writes:
 
  Andi Kleen wrote:
  Especially for accessing the real time clock that has a well
  defined hardware interface going through efi an additional
  software emulation layer looks like asking for trouble.
 
  I agree it's pointless for the hardware clock, but EFI also offers 
  services to
  write some data to the CMOS RAM
  which could be very useful to save oops data over reboot.
  I don't think this can be done safely otherwise without BIOS cooperation.
 
 
  The ability to scurry away even a small amount of data without relying on 
  the
  disk system is highly desirable.  Think next-boot type information.
 
 Yes.  If that were to be the justifying case and if that was what
 the code was implementing I could see the point.
 
 However this point was made in an earlier review.  This point
 was already been made, and still this patchset doesn't
 include that functionality and it still includes the code
 to disable direct hardware access for no seemingly sane
 reason.

The EFI variable runtime service is included in this patchset.

The EFI time runtime service is selectable via kernel command line
parameter now. If it is desired, it can be disabled by default, and only
enabled if specified in kernel command line parameter. I think the time
runtime service may be useful if the underlying hardware is changed
silently by some vendor.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 13:36 -0700, H. Peter Anvin wrote:
 Eric W. Biederman wrote:
 
  Ying claimed that GOP requires EFI runtime services.  Is that not true?
  
  None of the EFI framebuffer patches that I saw used EFI runtime services.
  
 
 Ying, could you please clarify this situation?
 
 (Eric: do note that there are two EFI framebuffer standard, UGA and GOP. 
   Apparently UGA is obsolete and we have always been at war with GOP at 
 the moment.)

EFI framebuffer doesn't depend on EFI runtime service. It only depends
on kernel boot parameters (screen_info).

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 15:29 -0700, H. Peter Anvin wrote:
 Eric W. Biederman wrote:
  H. Peter Anvin [EMAIL PROTECTED] writes:
  
  Eric W. Biederman wrote:
  Ying claimed that GOP requires EFI runtime services.  Is that not true?
  None of the EFI framebuffer patches that I saw used EFI runtime services.
 
  Ying, could you please clarify this situation?
 
  (Eric: do note that there are two EFI framebuffer standard, UGA and
  GOP. Apparently UGA is obsolete and we have always been at war with GOP at 
  the
  moment.)
  
  Peter please look back in your email archives to yesterday and
  see Ying's patch:
  
  [PATCH 1/2 -v2 resend] x86_64 EFI boot support: EFI frame buffer driver
  
  All of the data the GOP needs is acquired through the a query made
  by the bootloader and passed through screen info.
  
 
 Then I fully agree with your assessment.

EFI framebuffer doesn't depend on EFI runtime service.

But EFI variable service depends on EFI runtime service, and most people
think it is useful. It can be used to:

- Provide a standard method to communicate with BIOS, such as specifying
the boot device or bootloader for the next boot.
- Provide a standard method to write the OOPS information to flash.

To improve the reliability of OOPS information writing, the virtual mode
of EFI should be used. And through mapping all memory area used by EFI
to the same virtual address across kexec, EFI can work with kexec under
virtual mode just like that of IA-64.

So, I think the EFI runtime service is useful and it does not break
anything. But the code duplication between efi_32.c and efi_64.c should
be eliminated and I will work on this.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-25 Thread Huang, Ying
On Thu, 2007-10-25 at 18:09 +0200, Thomas Gleixner wrote:
 On Thu, 25 Oct 2007, Huang, Ying wrote:
 
  This patch adds basic runtime services support for EFI x86_64
  system. The main file of the patch is the addition of efi.c for
  x86_64. This file is modeled after the EFI IA32 avatar.
 
 modeled means copied and modified, right?
 
 This is wrong. I compared efi_32.c and efi_64.c and a large amount of
 the code is simply the same. The small details can be sorted out by
 two sets of macros/inline functions easily.
 
 Please fix this up.

Yes. There are many duplicated code between efi_32.c and efi_64.c, and
they should be merged. But there are some code that is different between
efi_32.c and efi_64.c. For example, there is different implementations
of efi_call_phys_prelog in both files, and there is an implementation of
efi_memmap_walk only in efi_32.c not in efi_64.c.

3 possible schemes are as follow:

- One efi.c, with EFI 32/64 specific code inside corresponding
#ifdef/#endif.

- 3 files: efi.c, efi_32.c, efi_64.c, common code goes in efi.c, EFI
32/64 specific code goes in efi_32/64.c. This will make some variable,
function external instead of static.

- 3 files: efi.c, efi_32.c, efi_64.c, common code goes in efi.c, EFI
32/64 specific code goes in efi_32/64.c. efi.c include efi_32/64.c
according to architecture.

Which one is preferred? Or I should take another scheme?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-26 Thread Huang, Ying
On Fri, 2007-10-26 at 10:48 +0200, Thomas Gleixner wrote:
  EFI uses the Windows x86_64 calling convention. The lin2win may be a
  more general naming convention that can be used for some other code (the
  NDISwrapper?) in the future. Do you agree?
 
 I agree not at all. I do not care whether the EFI creators smoked the
 Windows-crackpipe or some other hallucinogen when they decided to use
 this calling convention. We definitely do not want to think about
 NDISwrapper or any other Windows related hackery in the kernel.

OK, I will change the name to something like lin2efi.

 I still do not understand why we need all this EFI hackery at all
 aside of the possible usage for saving a crash dump on FLASH, which we
 could do directly from the kernel as well.

Ask every user to setup a crash dump environment is a bit difficult
because some configuration like reserving memory, loading crash dump
kernel must be done. But saving OOPS information in FLASH via EFI
variable runtime service is quite simple, without configuration
requirement. That is, there could be more bug report with OOPS
information. I think this is useful.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/3 -v4] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-28 Thread Huang, Ying
On Fri, 2007-10-26 at 12:31 +0100, Alan Cox wrote:
 On Fri, 26 Oct 2007 09:03:11 +0800
 Huang, Ying [EMAIL PROTECTED] wrote:
 
  On Thu, 2007-10-25 at 18:09 +0200, Thomas Gleixner wrote:
EFI runtime
services initialization are implemented in efi.c. Some x86_64
specifics are worth noting here. On x86_64, parameters passed to UEFI
firmware services need to follow the UEFI calling convention. For this
purpose, a set of functions named lin2winx (x is the number of
parameters) are implemented. EFI function calls are wrapped before
calling the firmware service.
   
   Why needs this to be called lin2win? We do not call Windows, we call
   EFI services, so please use a naming convention which is related to
   the functionality of the code.
   
+ *
+ *  Function calling ABI conversion from SYSV to Windows for x86_64
   
   Again, these are wrappers to access EFI and not Windows.
  
  EFI uses the Windows x86_64 calling convention. The lin2win may be a
  more general naming convention that can be used for some other code (the
  NDISwrapper?) in the future. Do you agree?
 
 The SYSV description is wrong as well. SYSV has no calling convention. I
 think you mean iABI or iBCS2 ?

The SYSV description comes from the following document:
http://www.x86-64.org/documentation/abi-0.98.pdf


 Whats wrong with following the pattern of other calls like syscall(...)
 and just having eficall() ?

Yes. This is better.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/4 -v5] x86_64 EFI runtime service support: remove duplicated code from efi_32.c

2007-10-29 Thread Huang, Ying
This patch removes the duplicated code between efi_32.c and efi.c.

---

 arch/x86/kernel/Makefile_32 |2 
 arch/x86/kernel/e820_32.c   |5 
 arch/x86/kernel/efi_32.c|  430 
 arch/x86/kernel/setup_32.c  |   11 -
 include/asm-x86/efi.h   |   37 +++
 5 files changed, 42 insertions(+), 443 deletions(-)

Signed-off-by: Huang Ying [EMAIL PROTECTED]

Index: linux-2.6.24-rc1/arch/x86/kernel/Makefile_32
===
--- linux-2.6.24-rc1.orig/arch/x86/kernel/Makefile_32   2007-10-30 
11:05:57.0 +0800
+++ linux-2.6.24-rc1/arch/x86/kernel/Makefile_322007-10-30 
11:10:33.0 +0800
@@ -34,7 +34,7 @@
 obj-$(CONFIG_MODULES)  += module_32.o
 obj-y  += sysenter_32.o vsyscall_32.o
 obj-$(CONFIG_ACPI_SRAT)+= srat_32.o
-obj-$(CONFIG_EFI)  += efi_32.o efi_stub_32.o
+obj-$(CONFIG_EFI)  += efi.o efi_32.o efi_stub_32.o
 obj-$(CONFIG_DOUBLEFAULT)  += doublefault_32.o
 obj-$(CONFIG_VM86) += vm86_32.o
 obj-$(CONFIG_EARLY_PRINTK) += early_printk.o
Index: linux-2.6.24-rc1/arch/x86/kernel/efi_32.c
===
--- linux-2.6.24-rc1.orig/arch/x86/kernel/efi_32.c  2007-10-30 
11:05:57.0 +0800
+++ linux-2.6.24-rc1/arch/x86/kernel/efi_32.c   2007-10-30 11:10:33.0 
+0800
@@ -39,21 +39,8 @@
 #include asm/desc.h
 #include asm/tlbflush.h
 
-#define EFI_DEBUG  0
 #define PFXEFI: 
 
-extern efi_status_t asmlinkage efi_call_phys(void *, ...);
-
-struct efi efi;
-EXPORT_SYMBOL(efi);
-static struct efi efi_phys;
-struct efi_memory_map memmap;
-
-/*
- * We require an early boot_ioremap mapping mechanism initially
- */
-extern void * boot_ioremap(unsigned long, unsigned long);
-
 /*
  * To make EFI call EFI runtime service in physical addressing mode we need
  * prelog/epilog before/after the invocation to disable interrupt, to
@@ -65,7 +52,7 @@
 static DEFINE_SPINLOCK(efi_rt_lock);
 static pgd_t efi_bak_pg_dir_pointer[2];
 
-static void efi_call_phys_prelog(void) __acquires(efi_rt_lock)
+void efi_call_phys_prelog(void) __acquires(efi_rt_lock)
 {
unsigned long cr4;
unsigned long temp;
@@ -108,7 +95,7 @@
load_gdt(gdt_descr);
 }
 
-static void efi_call_phys_epilog(void) __releases(efi_rt_lock)
+void efi_call_phys_epilog(void) __releases(efi_rt_lock)
 {
unsigned long cr4;
struct Xgt_desc_struct gdt_descr;
@@ -138,87 +125,6 @@
spin_unlock(efi_rt_lock);
 }
 
-static efi_status_t
-phys_efi_set_virtual_address_map(unsigned long memory_map_size,
-unsigned long descriptor_size,
-u32 descriptor_version,
-efi_memory_desc_t *virtual_map)
-{
-   efi_status_t status;
-
-   efi_call_phys_prelog();
-   status = efi_call_phys(efi_phys.set_virtual_address_map,
-memory_map_size, descriptor_size,
-descriptor_version, virtual_map);
-   efi_call_phys_epilog();
-   return status;
-}
-
-static efi_status_t
-phys_efi_get_time(efi_time_t *tm, efi_time_cap_t *tc)
-{
-   efi_status_t status;
-
-   efi_call_phys_prelog();
-   status = efi_call_phys(efi_phys.get_time, tm, tc);
-   efi_call_phys_epilog();
-   return status;
-}
-
-inline int efi_set_rtc_mmss(unsigned long nowtime)
-{
-   int real_seconds, real_minutes;
-   efi_status_tstatus;
-   efi_time_t  eft;
-   efi_time_cap_t  cap;
-
-   spin_lock(efi_rt_lock);
-   status = efi.get_time(eft, cap);
-   spin_unlock(efi_rt_lock);
-   if (status != EFI_SUCCESS)
-   panic(Ooops, efitime: can't read time!\n);
-   real_seconds = nowtime % 60;
-   real_minutes = nowtime / 60;
-
-   if (((abs(real_minutes - eft.minute) + 15)/30)  1)
-   real_minutes += 30;
-   real_minutes %= 60;
-
-   eft.minute = real_minutes;
-   eft.second = real_seconds;
-
-   if (status != EFI_SUCCESS) {
-   printk(Ooops: efitime: can't read time!\n);
-   return -1;
-   }
-   return 0;
-}
-/*
- * This is used during kernel init before runtime
- * services have been remapped and also during suspend, therefore,
- * we'll need to call both in physical and virtual modes.
- */
-inline unsigned long efi_get_time(void)
-{
-   efi_status_t status;
-   efi_time_t eft;
-   efi_time_cap_t cap;
-
-   if (efi.get_time) {
-   /* if we are in virtual mode use remapped function */
-   status = efi.get_time(eft, cap);
-   } else {
-   /* we are in physical mode */
-   status = phys_efi_get_time(eft, cap);
-   }
-
-   if (status != EFI_SUCCESS)
-   printk(Oops: efitime: can't read time status: 0x%lx\n,status

[PATCH 3/4 -v5] x86_64 EFI runtime service support: document for EFI runtime services

2007-10-29 Thread Huang, Ying
This patch adds document for EFI x86_64 runtime services support.

---

 boot-options.txt |   11 ++-
 uefi.txt |9 +
 2 files changed, 19 insertions(+), 1 deletion(-)

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

Index: linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt
===
--- linux-2.6.24-rc1.orig/Documentation/x86_64/boot-options.txt 2007-10-30 
10:15:00.0 +0800
+++ linux-2.6.24-rc1/Documentation/x86_64/boot-options.txt  2007-10-30 
10:23:52.0 +0800
@@ -110,12 +110,15 @@
 
 Rebooting
 
-   reboot=b[ios] | t[riple] | k[bd] [, [w]arm | [c]old]
+   reboot=b[ios] | t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
bios  Use the CPU reboot vector for warm reset
warm   Don't set the cold reboot flag
cold   Set the cold reboot flag
triple Force a triple fault (init)
kbdUse the keyboard controller. cold reset (default)
+   efiUse efi reset_system runtime service. If EFI is not configured or the
+  EFI reset does not work, the reboot path attempts the reset using
+  the keyboard controller.
 
Using warm reset will be much faster especially on big memory
systems because the BIOS will not go through the memory check.
@@ -300,4 +303,10 @@
newfallback: use new unwinder but fall back to old if it gets
stuck (default)
 
+EFI
+
+  noefiDisable EFI support
+
+  efi_time=on  Enable EFI time runtime service
+
 Miscellaneous
Index: linux-2.6.24-rc1/Documentation/x86_64/uefi.txt
===
--- linux-2.6.24-rc1.orig/Documentation/x86_64/uefi.txt 2007-10-30 
10:15:00.0 +0800
+++ linux-2.6.24-rc1/Documentation/x86_64/uefi.txt  2007-10-30 
10:25:39.0 +0800
@@ -19,6 +19,10 @@
 - Build the kernel with the following configuration.
CONFIG_FB_EFI=y
CONFIG_FRAMEBUFFER_CONSOLE=y
+  If EFI runtime services are expected, the following configuration should
+  be selected.
+   CONFIG_EFI=y
+   CONFIG_EFI_VARS=y or m  # optional
 - Create a VFAT partition on the disk
 - Copy the following to the VFAT partition:
elilo bootloader with x86_64 support, elilo configuration file,
@@ -27,3 +31,8 @@
can be found in the elilo sourceforge project.
 - Boot to EFI shell and invoke elilo choosing the kernel image built
   in first step.
+- If some or all EFI runtime services don't work, you can try following
+  kernel command line parameters to turn off some or all EFI runtime
+  services.
+   noefi   turn off all EFI runtime services
+   reboot_type=k   turn off EFI reboot runtime service
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/4 -v5] x86_64 EFI runtime service support: EFI runtime services

2007-10-29 Thread Huang, Ying
This patch adds support for several EFI runtime services for EFI
x86_64 system.

The EFI support for emergency_restart and RTC clock is added. The EFI
based implementation and legacy BIOS or CMOS based implementation are
put in separate functions and can be chosen with kernel boot options.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/x86/kernel/reboot_64.c |   19 +-
 arch/x86/kernel/time_64.c   |   47 +++-
 include/asm-x86/emergency-restart.h |8 ++
 include/asm-x86/time.h  |   47 +++-
 include/asm-x86/time_32.h   |   44 +
 include/asm-x86/time_64.h   |7 +
 6 files changed, 107 insertions(+), 65 deletions(-)

Index: linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c
===
--- linux-2.6.24-rc1.orig/arch/x86/kernel/reboot_64.c   2007-10-30 
10:15:03.0 +0800
+++ linux-2.6.24-rc1/arch/x86/kernel/reboot_64.c2007-10-30 
10:22:00.0 +0800
@@ -9,6 +9,7 @@
 #include linux/pm.h
 #include linux/kdebug.h
 #include linux/sched.h
+#include linux/efi.h
 #include asm/io.h
 #include asm/delay.h
 #include asm/desc.h
@@ -26,18 +27,16 @@
 EXPORT_SYMBOL(pm_power_off);
 
 static long no_idt[3];
-static enum { 
-   BOOT_TRIPLE = 't',
-   BOOT_KBD = 'k'
-} reboot_type = BOOT_KBD;
+enum reboot_type reboot_type = BOOT_KBD;
 static int reboot_mode = 0;
 int reboot_force;
 
-/* reboot=t[riple] | k[bd] [, [w]arm | [c]old]
+/* reboot=t[riple] | k[bd] | e[fi] [, [w]arm | [c]old]
warm   Don't set the cold reboot flag
cold   Set the cold reboot flag
triple Force a triple fault (init)
kbdUse the keyboard controller. cold reset (default)
+   efiUse efi reset_system runtime service
force  Avoid anything that could hang.
  */ 
 static int __init reboot_setup(char *str)
@@ -55,6 +54,7 @@
case 't':
case 'b':
case 'k':
+   case 'e':
reboot_type = *str;
break;
case 'f':
@@ -142,7 +142,14 @@
 
reboot_type = BOOT_KBD;
break;
-   }  
+
+   case BOOT_EFI:
+   if (efi_enabled)
+   efi.reset_system(reboot_mode ? EFI_RESET_WARM : 
EFI_RESET_COLD,
+EFI_SUCCESS, 0, NULL);
+   reboot_type = BOOT_KBD;
+   break;
+   }
}  
 }
 
Index: linux-2.6.24-rc1/arch/x86/kernel/time_64.c
===
--- linux-2.6.24-rc1.orig/arch/x86/kernel/time_64.c 2007-10-30 
10:15:03.0 +0800
+++ linux-2.6.24-rc1/arch/x86/kernel/time_64.c  2007-10-30 10:22:04.0 
+0800
@@ -45,12 +45,19 @@
 #include asm/mpspec.h
 #include asm/nmi.h
 #include asm/vgtod.h
+#include asm/time.h
 
 DEFINE_SPINLOCK(rtc_lock);
 EXPORT_SYMBOL(rtc_lock);
 
 volatile unsigned long __jiffies __section_jiffies = INITIAL_JIFFIES;
 
+static int set_rtc_mmss(unsigned long nowtime);
+static unsigned long read_cmos_clock(void);
+
+unsigned long (*get_wallclock)(void) = read_cmos_clock;
+int (*set_wallclock)(unsigned long nowtime) = set_rtc_mmss;
+
 unsigned long profile_pc(struct pt_regs *regs)
 {
unsigned long pc = instruction_pointer(regs);
@@ -84,13 +91,6 @@
unsigned char control, freq_select;
 
 /*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
- */
-
-   spin_lock(rtc_lock);
-
-/*
  * Tell the clock it's being set and stop it.
  */
 
@@ -138,14 +138,23 @@
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
-   spin_unlock(rtc_lock);
-
return retval;
 }
 
 int update_persistent_clock(struct timespec now)
 {
-   return set_rtc_mmss(now.tv_sec);
+   int retval;
+
+/*
+ * IRQs are disabled when we're called from the timer interrupt,
+ * no need for spin_lock_irqsave()
+ */
+
+   spin_lock(rtc_lock);
+   retval = set_wallclock(now.tv_sec);
+   spin_unlock(rtc_lock);
+
+   return retval;
 }
 
 static irqreturn_t timer_event_interrupt(int irq, void *dev_id)
@@ -157,14 +166,11 @@
return IRQ_HANDLED;
 }
 
-unsigned long read_persistent_clock(void)
+static unsigned long read_cmos_clock(void)
 {
unsigned int year, mon, day, hour, min, sec;
-   unsigned long flags;
unsigned century = 0;
 
-   spin_lock_irqsave(rtc_lock, flags);
-
do {
sec = CMOS_READ(RTC_SECONDS);
min = CMOS_READ(RTC_MINUTES);
@@ -179,8 +185,6 @@
 #endif
} while (sec != CMOS_READ(RTC_SECONDS));
 
-   spin_unlock_irqrestore(rtc_lock, flags

[PATCH 0/4 -v5] x86_64 EFI runtime service support

2007-10-29 Thread Huang, Ying
Following patchset adds EFI/UEFI (Unified Extensible Firmware
Interface) runtime services support to x86_64 architecture.

The patchset have been tested against 2.6.24-rc1 kernel on Intel
platforms with 64-bit EFI1.10 and UEFI2.0 firmware. Because the
duplicated code between efi_32.c and efi_64.c is removed, the patchset
is also tested on Intel platform with 32-bit EFI firmware.


v5:

- Remove the duplicated code between efi_32.c and efi_64.c.

- Rename lin2winx to efi_callx.

- Make EFI time runtime service default to off.

- Use different bootloader signature for EFI32 and EFI64, so that
  kernel can know whether underlaying EFI firmware is 64-bit or
  32-bit.

v4:

- EFI boot parameters are extended for 64-bit EFI in a 32-bit EFI
  compatible way.

- Add EFI runtime services document.

v3:

- Remove E820_RUNTIME_CODE, the EFI memory map is used to deal with
  EFI runtime code area.

- The method used to make EFI runtime code area executable is change:

  a. Before page allocation is usable, the PMD of direct mapping is
 changed temporarily before and after each EFI call.

  b. After page allocation is usable, change_page_attr_addr is used to
 change corresponding page attribute.

- Use fixmap to map EFI memory mapped IO memory area to make kexec
  workable.

- Add a kernel command line option noefi to make it possible to turn
  off EFI runtime services support.

- Function pointers are used for EFI time runtime service.

- EFI reboot runtime service is embedded into the framework of
  reboot_type.

- A kernel command line option noefi_time is added to make it
  possible to fall back to CMOS based implementation.

v2:

- The EFI callwrapper is re-implemented in assembler.


Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/4 -v5] x86_64 EFI runtime service support: EFI basic runtime service support

2007-10-29 Thread Huang, Ying
This patch adds basic runtime services support for EFI x86_64
system. The main file of the patch is the addition of efi_64.c for
x86_64. This file is modeled after the EFI IA32 avatar. EFI runtime
services initialization are implemented in efi_64.c. Some x86_64
specifics are worth noting here. On x86_64, parameters passed to EFI
firmware services need to follow the EFI calling convention. For this
purpose, a set of functions named efi_callx (x is the number of
parameters) are implemented. EFI function calls are wrapped before
calling the firmware service. The duplicated code between efi_32.c and
efi_64.c is placed in efi.c to remove them from efi_32.c.

Signed-off-by: Chandramouli Narayanan [EMAIL PROTECTED]
Signed-off-by: Huang Ying [EMAIL PROTECTED]

---

 arch/x86/kernel/Makefile_64   |1 
 arch/x86/kernel/efi.c |  483 ++
 arch/x86/kernel/efi_64.c  |  181 +++
 arch/x86/kernel/efi_stub_64.S |   68 +
 arch/x86/kernel/setup_64.c|   17 +
 arch/x86_64/Kconfig   |   11 
 include/asm-x86/bootparam.h   |5 
 include/asm-x86/efi.h |   70 ++
 include/asm-x86/fixmap_64.h   |3 
 9 files changed, 836 insertions(+), 3 deletions(-)

Index: linux-2.6.24-rc1/arch/x86/kernel/efi_64.c
===
--- /dev/null   1970-01-01 00:00:00.0 +
+++ linux-2.6.24-rc1/arch/x86/kernel/efi_64.c   2007-10-30 10:07:57.0 
+0800
@@ -0,0 +1,181 @@
+/*
+ * x86_64 specific EFI support functions
+ * Based on Extensible Firmware Interface Specification version 1.0
+ *
+ * Copyright (C) 2005-2008 Intel Co.
+ * Fenghua Yu [EMAIL PROTECTED]
+ * Bibo Mao [EMAIL PROTECTED]
+ * Chandramouli Narayanan [EMAIL PROTECTED]
+ * Huang Ying [EMAIL PROTECTED]
+ *
+ * Code to convert EFI to E820 map has been implemented in elilo bootloader
+ * based on a EFI patch by Edgar Hucek. Based on the E820 map, the page table
+ * is setup appropriately for EFI runtime code.
+ * - mouli 06/14/2007.
+ *
+ */
+
+#include linux/kernel.h
+#include linux/init.h
+#include linux/mm.h
+#include linux/types.h
+#include linux/spinlock.h
+#include linux/bootmem.h
+#include linux/ioport.h
+#include linux/module.h
+#include linux/efi.h
+#include linux/uaccess.h
+#include linux/io.h
+#include linux/reboot.h
+
+#include asm/setup.h
+#include asm/page.h
+#include asm/e820.h
+#include asm/pgtable.h
+#include asm/tlbflush.h
+#include asm/cacheflush.h
+#include asm/proto.h
+#include asm/efi.h
+
+int efi_time __initdata;
+
+static pgd_t save_pgd __initdata;
+static unsigned long efi_flags __initdata;
+/* efi_lock protects efi physical mode call */
+static __initdata DEFINE_SPINLOCK(efi_lock);
+
+static int __init setup_noefi(char *arg)
+{
+   efi_enabled = 0;
+   return 0;
+}
+early_param(noefi, setup_noefi);
+
+static int __init setup_efi_time(char *arg)
+{
+   if (arg  !strcmp(on, arg))
+   efi_time = 1;
+   return 0;
+}
+early_param(efi_time, setup_efi_time);
+
+static void __init early_mapping_set_exec(unsigned long start,
+ unsigned long end,
+ int executable)
+{
+   pte_t *kpte;
+
+   while (start  end) {
+   kpte = lookup_address((unsigned long)__va(start));
+   BUG_ON(!kpte);
+   if (executable)
+   set_pte(kpte, pte_mkexec(*kpte));
+   else
+   set_pte(kpte, __pte((pte_val(*kpte) | _PAGE_NX)  \
+   __supported_pte_mask));
+   if (pte_huge(*kpte))
+   start = (start + PMD_SIZE)  PMD_MASK;
+   else
+   start = (start + PAGE_SIZE)  PAGE_MASK;
+   }
+}
+
+static void __init early_runtime_code_mapping_set_exec(int executable)
+{
+   efi_memory_desc_t *md;
+   void *p;
+
+   /* Make EFI runtime service code area executable */
+   for (p = memmap.map; p  memmap.map_end; p += memmap.desc_size) {
+   md = p;
+   if (md-type == EFI_RUNTIME_SERVICES_CODE) {
+   unsigned long end;
+   end = md-phys_addr + (md-num_pages  PAGE_SHIFT);
+   early_mapping_set_exec(md-phys_addr, end, executable);
+   }
+   }
+}
+
+void __init efi_call_phys_prelog(void) __acquires(efi_lock)
+{
+   unsigned long vaddress;
+
+   /*
+* Lock sequence is different from normal case because
+* efi_flags is global
+*/
+   spin_lock(efi_lock);
+   local_irq_save(efi_flags);
+   early_runtime_code_mapping_set_exec(1);
+   vaddress = (unsigned long)__va(0x0UL);
+   pgd_val(save_pgd) = pgd_val(*pgd_offset_k(0x0UL));
+   set_pgd(pgd_offset_k(0x0UL), *pgd_offset_k(vaddress));
+   global_flush_tlb();
+}
+
+void __init efi_call_phys_epilog(void) __releases

Re: [PATCH 2/4 -v5] x86_64 EFI runtime service support: EFI runtime services

2007-10-30 Thread Huang, Ying
On Tue, 2007-10-30 at 15:58 +, Denys Vlasenko wrote:
 On Tuesday 30 October 2007 05:55, Huang, Ying wrote:
  +static inline unsigned long native_get_wallclock(void)
  +{
  +   unsigned long retval;
  +
  +   if (efi_enabled)
  +   retval = efi_get_time();
  +   else
  +   retval = mach_get_cmos_time();
  +
  +   return retval;
  +}
 
 mach_get_cmos_time() is itself an inline, and a _large_ one
 (~20 LOC with macro and function calls).
 
 efi_get_time() is an inline too, although strange one:
 it is declared inline *only* in efi.c file:
   inline unsigned long efi_get_time(void)
 (yes, just inline, not static/extern),
 while efi.h has normal extern for it:
   extern unsigned long efi_get_time(void);
 
 Is it supposed to be like that?

efi_get_time is no longer inline in this patch. See efi.c of this patch.

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/2 -v2 resend] x86_64 EFI boot support

2007-10-30 Thread Huang, Ying
Can this patchset be merged into mainline kernel? This patchset has been
in -mm tree from 2.6.23-rc2-mm2 on. Andrew Moton has suggested it to be
merged into 2.6.24 during early merge window of 2.6.24. It was not
merged into mainline because the 32-bit boot protocol has not been done.

But now, the 32-bit boot protocol has been merged into mainline. So can
this patchset be merged into mainline kernel now?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH -mm -v5 0/3] i386/x86_64 boot: 32-bit boot protocol

2007-10-18 Thread Huang, Ying
On Wed, 2007-10-17 at 11:24 +0200, Andi Kleen wrote:
  Can you tell me what that early reservation interface is? What I find in
  x86_64 that does early memory allocation is alloc_low_page, which gets
  non-conflict memory area through e820 map.
 
 It's a new interface I only recently wrote:
 
 ftp://ftp.firstfloor.org/pub/ak/x86_64/quilt/patches/early-reserve
 
 Then you can use early_reserve() and the e820 allocator will not touch
 it.
 
  Because setup data is allocated by bootloader or kernel 16-bit setup
  code, and the e820 map is created there too, the memory area used by
  setup data can be made reserved memory area in e820 map by bootloader or
  kernel 16-bit setup code. This way, they will not be overwritten by
  kernel. Do you think this works.
 
 It has a little of a chicken'n'egg problem because the e820 map will
 be actually in the area you want to reserve. But it might work too.
 Boot data is normally copied before other allocations in head64.c
 If you do variable size boot data that might not work though.  And might
 be a little fragile overall.

Although variable size boot data (such as setup data) can be reserved
via early_reserve or e820 map, they may conflict with hard-coded memory
area used by kernel. This means boot loader must know the hard-coded
memory area used by kernel.

Another possible solution is as follow:
1. Bootloader allocates memory for setup data. Just avoid the memory
area after kernel loaded address.
2. In the very early stage of kernel boot (head64.c). Copy all the setup
data to the memory area after _end. And reserve that memory area with
early_reserve (or bad_addr for old code).

In this solution, the only unsafe memory area for setup data from
bootloader is memory area after _end. And kernel can use hard coded
memory area without the risk of conflicting with setup data from
bootloader.

Do you think this solution is better?

Best Regards,
Huang Ying
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >