Re: [PATCH 21/25][V3] export cpu_gdt_descr

2007-08-15 Thread Glauber de Oliveira Costa

Andi Kleen escreveu:

On Wed, Aug 15, 2007 at 11:25:43AM -0300, Glauber de Oliveira Costa wrote:

On 8/15/07, Andi Kleen [EMAIL PROTECTED] wrote:

+#ifdef CONFIG_PARAVIRT
+extern unsigned long *cpu_gdt_descr;

No externs in .c files

Normally they should be where the variable is defined
anyways.

Given that this  variable is defined in head.S, what do you propose?


Move it to C code first.



duh. I haven't noticed that this variable is already defined as extern 
in desc.h.


If you don't oppose, I'll just include it in x8664_ksyms.c

Thanks for the nitpicking, Andi.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/25][V3] irq_flags / halt routines

2007-08-15 Thread Glauber de Oliveira Costa

Andi Kleen escreveu:

On Wed, Aug 15, 2007 at 12:09:42PM -0300, Glauber de Oliveira Costa wrote:

Again, this is the code of such function:

static inline int raw_irqs_disabled_flags(unsigned long flags)
{
return !(flags  X86_EFLAGS_IF);
}
so all it is doing is getting a parameter (flags), and bitmasking it. It 
is not talking to any hypervisor. I can't see your point. Unless you are

arguing that it _should_ be talking to a hypervisor. Is that your point?


vSMP is a hypervisor based architecture. For some reason that is not
100% clear to me, but Kiran or Shai can probably explain, it needs this 
additional bit in EFLAGS when interrupts are disabled. That gives

it some hints and then it goes somehow faster. That is clearly
paravirtualization.

Since paravirtops is designed to handle such hooks cleanly I request
that you move vSMP over to it or work with the vSMP maintainers to 
do that. Otherwise we have two different ways to do paravirtualization 
which is wrong.




Thanks for the explanation, Andi. I understand it much better now, and 
agree with you.


As alternatives what we have now, we can either keep the paravirt_ops as 
it is now for the native case, just hooking the vsmp functions in place 
of the normal one, (there are just three ops anyway), refill the 
paravirt_ops entirely in somewhere like vsmp.c, or similar (or maybe 
even assigning paravirt_ops.fn = vsmp_fn on the fly, but early enough).


Maybe we could even make VSMP depend on PARAVIRT, to make it sure it is 
completely a paravirt client.


But as you could see, my knowledge of vsmp does not go that far, and I 
would really like to have input from the vsmp guys prior to touch 
anything here.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/25][V3] irq_flags / halt routines

2007-08-15 Thread Glauber de Oliveira Costa

Chris Wright escreveu:

* Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
As alternatives what we have now, we can either keep the paravirt_ops as 
it is now for the native case, just hooking the vsmp functions in place 
of the normal one, (there are just three ops anyway), refill the 
paravirt_ops entirely in somewhere like vsmp.c, or similar (or maybe 
even assigning paravirt_ops.fn = vsmp_fn on the fly, but early enough).


It will definitely keep the code shorter, and to be honest, I'd feel 
more confortable with (since I don't know the subtles of the architecture).


Only caveat, is that it has to be done before smp gets in the game, and 
with interrupts disabled. (which makes the function in vsmp.c not eligible).


My current option is to force VSMP to use PARAVIRT, as said before, and 
then fill paravirt_arch_setup, which is currently unused, with code to 
replace the needed paravirt_ops.fn.


I don't know if there is any method to dynamically determine (at this 
point) that we are in a vsmp arch, and if there are not, it will have to 
get ifdefs anyway. But at least, they are far more local.



This is the best (just override pvops.fn for the few needed for VSMP).
The irq_disabled_flags() is the only problem.  For i386 we dropped it
(disabled_flags) as a pvop and forced the backend to provide a flags
(via save_flags) that conforms to IF only.


I am okay with both, but after all the explanation, I don't think that 
adding a new pvops is a bad idea. It would make things less cumbersome 
in this case. Also, hacks like this save_fl may require changes to the 
hypervisor, right? I don't even know where such hypervisor is, and how 
easy it is to replace it (it may be deeply hidden in firmware)


A question raises here: Would vsmp turn paravirt_enabled to 1 ?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/25][V3] irq_flags / halt routines

2007-08-15 Thread Glauber de Oliveira Costa
On 8/15/07, Chris Wright [EMAIL PROTECTED] wrote:
 * Glauber de Oliveira Costa ([EMAIL PROTECTED]) wrote:
  Only caveat, is that it has to be done before smp gets in the game, and
  with interrupts disabled. (which makes the function in vsmp.c not eligible).
 
  My current option is to force VSMP to use PARAVIRT, as said before, and
  then fill paravirt_arch_setup, which is currently unused, with code to
  replace the needed paravirt_ops.fn.
 
  I don't know if there is any method to dynamically determine (at this
  point) that we are in a vsmp arch, and if there are not, it will have to
  get ifdefs anyway. But at least, they are far more local.

 between __cacheline_aligned_in_smp and other compile time bits based on
 VSMP specific INTERNODE_CACHE, etc. I think compile time the way to go.

  I am okay with both, but after all the explanation, I don't think that
  adding a new pvops is a bad idea. It would make things less cumbersome
  in this case. Also, hacks like this save_fl may require changes to the
  hypervisor, right? I don't even know where such hypervisor is, and how
  easy it is to replace it (it may be deeply hidden in firmware)

 No hypervisor change needed.  Just the pv backend needs to return 0 or
 X86_EFLAGS_IF for save_flags (and similar translation on restore_flags).
 Xen uses a simple shared memory flag and does something which you could
 roughly translate into this:

 xen_save_flags()
 if (xen_vcpu_interrupts_enabled)
 return X86_EFLAGS_IF;
 else
 return 0;

 This doesn't require any hypervisor changes.  Similarly, VSMP could do
 something along the lines of:

 vsmp_save_flags()
 flags = native_save_flags();
 if (flags  X86_EFLAGS_IF) || (flags  X86_EFLAGS_AC)
 return X86_EFLAGS_IF;
 else
 return 0;


I'm attaching to this message my idea on how would it work.
This is just for comments/considerations. If you all ack this, I'll
spread the changes over the patch series as needed, and then resend
the patches.

-- 
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index 00b2fc9..1204b08 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -150,6 +150,7 @@ config X86_PC
 config X86_VSMP
 	bool Support for ScaleMP vSMP
 	depends on PCI
+	select PARAVIRT
 	 help
 	  Support for ScaleMP vSMP systems.  Say 'Y' here if this kernel is
 	  supposed to run on these EM64T-based machines.  Only choose this option
diff --git a/arch/x86_64/kernel/paravirt.c b/arch/x86_64/kernel/paravirt.c
index dcd9919..23a8786 100644
--- a/arch/x86_64/kernel/paravirt.c
+++ b/arch/x86_64/kernel/paravirt.c
@@ -22,6 +22,8 @@
 #include linux/efi.h
 #include linux/bcd.h
 #include linux/start_kernel.h
+#include linux/pci_regs.h
+#include linux/pci_ids.h
 
 #include asm/bug.h
 #include asm/paravirt.h
@@ -40,15 +42,30 @@
 #include asm/asm-offsets.h
 #include asm/smp.h
 #include asm/irqflags.h
+#include asm/pci-direct.h
 
 /* nop stub */
 void _paravirt_nop(void)
 {
 }
 
+int arch_is_vsmp = 0;
+
 /* natively, we do normal setup, but we still need to return something */
 static int native_arch_setup(void)
 {
+	if (!early_pci_allowed())
+		goto out;
+		
+	if ((read_pci_config_16(0, 0x1f, 0, PCI_VENDOR_ID) == PCI_VENDOR_ID_SCALEMP) 
+(read_pci_config_16(0, 0x1f, 0, PCI_DEVICE_ID) == PCI_DEVICE_ID_SCALEMP_VSMP_CTL)) {
+		paravirt_ops.irq_disable = vsmp_irq_disable;
+		paravirt_ops.irq_enable  = vsmp_irq_enable;
+		paravirt_ops.save_fl  = vsmp_save_flags;
+		arch_is_vsmp = 1;
+	}
+
+out:
 	return 0;
 }
 
@@ -103,8 +120,6 @@ static unsigned native_patch(u8 type, u16 clobbers, void *insns, unsigned len)
 
 	switch(type) {
 #define SITE(x)	case PARAVIRT_PATCH(x):	start = start_##x; end = end_##x; goto patch_site
-		SITE(irq_disable);
-		SITE(irq_enable);
 		SITE(restore_fl);
 		SITE(save_fl);
 		SITE(iret);
@@ -117,7 +132,23 @@ static unsigned native_patch(u8 type, u16 clobbers, void *insns, unsigned len)
 		SITE(flush_tlb_single);
 		SITE(wbinvd);
 #undef SITE
-
+	case PARAVIRT_PATCH(irq_disable): 
+	case PARAVIRT_PATCH(irq_enable): 
+		start = start_irq_disable;
+		end = end_irq_disable;
+
+		if (type == PARAVIRT_PATCH(irq_enable)) {
+			start = start_irq_enable;
+			end = end_irq_enable;
+		}
+
+		if (arch_is_vsmp) {
+			ret = paravirt_patch_default(type,
+		 clobbers,
+		 insns,
+		 len);
+			break;
+		}
 	patch_site:
 		ret = paravirt_patch_insns(insns, len, start, end);
 		break;
@@ -214,30 +245,6 @@ void init_IRQ(void)
 	paravirt_ops.init_IRQ();
 }
 
-static unsigned long native_save_fl(void)
-{
-	unsigned long f;
-	asm volatile(pushfq ; popq %0:=g (f): /* no input */);
-	return f;
-}
-
-static void native_restore_fl(unsigned long f)
-{
-	asm volatile(pushq %0 ; popfq: /* no output */
-			 :g (f)
-			 :memory, cc

Re: [PATCH 3/25][V3] irq_flags / halt routines

2007-08-15 Thread Glauber de Oliveira Costa

Jeremy Fitzhardinge escreveu:

Glauber de Oliveira Costa wrote:

Thanks for the explanation, Andi. I understand it much better now, and
agree with you.

As alternatives what we have now, we can either keep the paravirt_ops
as it is now for the native case, just hooking the vsmp functions in
place of the normal one, (there are just three ops anyway), refill the
paravirt_ops entirely in somewhere like vsmp.c, or similar (or maybe
even assigning paravirt_ops.fn = vsmp_fn on the fly, but early enough). 


One thing to note is that current code assumes the IF flag is always in
bit 9, so if you paravirtualize this, you need to either a) make the
vsmp version copy AC into IF to satisfy the interface, or b) add a new
op meaning tell me if this eflags has interrupts enabled or not.  I
went for option a), and it seems to work OK (using bit 9 for interrupt
enabled is pretty arbitrary from a Xen perspective, but not very hard
to implement, and more localized than making all eflags tests a pvop).

J
It is implemented like a) in the latest patch I send, following chris' 
suggestion.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/25 -v2] add paravirtualization support for x86_64

2007-08-10 Thread Glauber de Oliveira Costa
This is finally, the patch we were all looking for. This
patch adds a paravirt.h header with the definition of paravirt_ops
struct. Also, it defines a bunch of inline functions that will
replace, or hook, the other calls. Every one of those functions
adds an entry in the parainstructions section (see vmlinux.lds.S).
Those entries can then be used to runtime-patch the paravirt_ops
functions.

paravirt.c contains implementations of paravirt functions that
are used natively, such as the native_patch. It also fill the
paravirt_ops structure with the whole lot of functions that
were (re)defined throughout this patch set.

There are also changes in asm-offsets.c. paravirt.h needs it
to find out the offsets into the structure of functions
such as irq_enable, used in assembly files.

[  updates from v1
   * make PARAVIRT hidden in Kconfig (Andi Kleen)
   * cleanups in paravirt.h (Andi Kleen)
   * modifications needed to accomodate other parts of the
   patch that changed, such as getting rid of ebda_info
   * put the integers at struct paravirt_ops at the end
   (Jeremy)
]
Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/Kconfig  |   11 +++
 arch/x86_64/kernel/Makefile  |1 +
 arch/x86_64/kernel/asm-offsets.c |   14 ++
 arch/x86_64/kernel/vmlinux.lds.S |6 ++
 include/asm-x86_64/smp.h |2 +-
 5 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index ffa0364..00b2fc9 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -373,6 +373,17 @@ config NODES_SHIFT
 
 # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig.
 
+config PARAVIRT
+   bool
+   depends on EXPERIMENTAL
+   help
+ Paravirtualization is a way of running multiple instances of
+ Linux on the same machine, under a hypervisor.  This option
+ changes the kernel so it can modify itself when it is run
+ under a hypervisor, improving performance significantly.
+ However, when run without a hypervisor the kernel is
+ theoretically slower.  If in doubt, say N.
+
 config X86_64_ACPI_NUMA
bool "ACPI NUMA detection"
depends on NUMA
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index ff5d8c9..120467f 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit.o
 
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
 
diff --git a/arch/x86_64/kernel/asm-offsets.c b/arch/x86_64/kernel/asm-offsets.c
index 778953b..f5eff70 100644
--- a/arch/x86_64/kernel/asm-offsets.c
+++ b/arch/x86_64/kernel/asm-offsets.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_PARAVIRT
+#include 
+#endif
 
 #define DEFINE(sym, val) \
 asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -72,6 +75,17 @@ int main(void)
   offsetof (struct rt_sigframe32, uc.uc_mcontext));
BLANK();
 #endif
+#ifdef CONFIG_PARAVIRT
+#define ENTRY(entry) DEFINE(PARAVIRT_ ## entry, offsetof(struct paravirt_ops, 
entry))
+   ENTRY(paravirt_enabled);
+   ENTRY(irq_disable);
+   ENTRY(irq_enable);
+   ENTRY(syscall_return);
+   ENTRY(iret);
+   ENTRY(read_cr2);
+   ENTRY(swapgs);
+   BLANK();
+#endif
DEFINE(pbe_address, offsetof(struct pbe, address));
DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address));
DEFINE(pbe_next, offsetof(struct pbe, next));
diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S
index ba8ea97..c3fce85 100644
--- a/arch/x86_64/kernel/vmlinux.lds.S
+++ b/arch/x86_64/kernel/vmlinux.lds.S
@@ -185,6 +185,12 @@ SECTIONS
   .altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
*(.altinstr_replacement)
   }
+  . = ALIGN(8);
+  .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
+  __parainstructions = .;
+   *(.parainstructions)
+  __parainstructions_end = .;
+  }
   /* .exit.text is discard at runtime, not link time, to deal with references
  from .altinstructions and .eh_frame */
   .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
diff --git a/include/asm-x86_64/smp.h b/include/asm-x86_64/smp.h
index 6b4..403901b 100644
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -22,7 +22,7 @@ extern int disable_apic;
 #ifdef CONFIG_PARAVIRT
 #include 
 void native_flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
-   unsigned long va);
+   unsigned long va);
 #else
 #define star

[PATCH 15/25 -v2] introducing paravirt_activate_mm

2007-08-10 Thread Glauber de Oliveira Costa
This function/macro will allow a paravirt guest to be notified we changed
the current task cr3, and act upon it. It's up to them

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/mmu_context.h |   17 ++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/mmu_context.h b/include/asm-x86_64/mmu_context.h
index 9592698..77ce047 100644
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
@@ -7,7 +7,16 @@
 #include 
 #include 
 #include 
+
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
 #include 
+static inline void paravirt_activate_mm(struct mm_struct *prev,
+   struct mm_struct *next)
+{
+}
+#endif /* CONFIG_PARAVIRT */
 
 /*
  * possibly do the LDT unload here?
@@ -67,8 +76,10 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
asm volatile("movl %0,%%fs"::"r"(0));  \
 } while(0)
 
-#define activate_mm(prev, next) \
-   switch_mm((prev),(next),NULL)
-
+#define activate_mm(prev, next)\
+do {   \
+   paravirt_activate_mm(prev, next);   \
+   switch_mm((prev),(next),NULL);  \
+} while (0)
 
 #endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/25 -v2] report ring kernel is running without paravirt

2007-08-10 Thread Glauber de Oliveira Costa
When paravirtualization is disabled, the kernel is always
running at ring 0. So report it in the appropriate macro

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/segment.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86_64/segment.h b/include/asm-x86_64/segment.h
index 04b8ab2..240c1bf 100644
--- a/include/asm-x86_64/segment.h
+++ b/include/asm-x86_64/segment.h
@@ -50,4 +50,8 @@
 #define GDT_SIZE (GDT_ENTRIES * 8)
 #define TLS_SIZE (GDT_ENTRY_TLS_ENTRIES * 8) 
 
+#ifndef CONFIG_PARAVIRT
+#define get_kernel_rpl()  0
+#endif
+
 #endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/25 -v2] replace syscall_init

2007-08-10 Thread Glauber de Oliveira Costa
This patch replaces syscall_init by x86_64_syscall_init.
The former will be later replaced by a paravirt replacement
in case paravirt is on

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/setup64.c |8 +++-
 include/asm-x86_64/proto.h   |3 +++
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 49f7342..723822c 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -153,7 +153,7 @@ __attribute__((section(".bss.page_aligned")));
 extern asmlinkage void ignore_sysret(void);
 
 /* May not be marked __init: used by software suspend */
-void syscall_init(void)
+void x86_64_syscall_init(void)
 {
/* 
 * LSTAR and STAR live in a bit strange symbiosis.
@@ -172,6 +172,12 @@ void syscall_init(void)
wrmsrl(MSR_SYSCALL_MASK, EF_TF|EF_DF|EF_IE|0x3000); 
 }
 
+/* Overriden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) syscall_init(void)
+{
+   x86_64_syscall_init();
+}
+
 void __cpuinit check_efer(void)
 {
unsigned long efer;
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 31f20ad..77ed2de 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -18,6 +18,9 @@ extern void init_memory_mapping(unsigned long start, unsigned 
long end);
 
 extern void system_call(void); 
 extern int kernel_syscall(void);
+#ifdef CONFIG_PARAVIRT
+extern void x86_64_syscall_init(void);
+#endif
 extern void syscall_init(void);
 
 extern void ia32_syscall(void);
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/25 -v2] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa
This patch add paravirtualization hooks in the arch initialization
process. paravirt_arch_setup() lets the guest issue any specific
initialization routine

Also, there is memory_setup(), so guests can handle it their way.

[  updates from v1
   * Don't use a separate ebda pv hook (Jeremy/Andi)
   * Make paravirt_setup_arch() void (Andi)
]

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/setup.c |   32 +++-
 include/asm-x86_64/e820.h  |6 ++
 include/asm-x86_64/page.h  |1 +
 3 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index af838f6..19e0d90 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -44,6 +44,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -65,6 +66,12 @@
 #include 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define paravirt_arch_setup()  do {} while (0)
+#endif
+
 /*
  * Machine setup..
  */
@@ -208,6 +215,16 @@ static void discover_ebda(void)
 * 4K EBDA area at 0x40E
 */
ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
+   /*
+* There can be some situations, like paravirtualized guests,
+* in which there is no available ebda information. In such
+* case, just skip it
+*/
+   if (!ebda_addr) {
+   ebda_size = 0;
+   return;
+   }
+
ebda_addr <<= 4;
 
ebda_size = *(unsigned short *)__va(ebda_addr);
@@ -221,6 +238,13 @@ static void discover_ebda(void)
ebda_size = 64*1024;
 }
 
+/* Overridden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) memory_setup(void)
+{
+   return setup_memory_region();
+}
+
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO "Command line: %s\n", boot_command_line);
@@ -231,12 +255,18 @@ void __init setup_arch(char **cmdline_p)
saved_video_mode = SAVED_VIDEO_MODE;
bootloader_type = LOADER_TYPE;
 
+   /*
+* By returning non-zero here, a paravirt impl can choose to
+* skip the rest of the setup process
+*/
+   paravirt_arch_setup();
+
 #ifdef CONFIG_BLK_DEV_RAM
rd_image_start = RAMDISK_FLAGS & RAMDISK_IMAGE_START_MASK;
rd_prompt = ((RAMDISK_FLAGS & RAMDISK_PROMPT_FLAG) != 0);
rd_doload = ((RAMDISK_FLAGS & RAMDISK_LOAD_FLAG) != 0);
 #endif
-   setup_memory_region();
+   memory_setup();
copy_edd();
 
if (!MOUNT_ROOT_RDONLY)
diff --git a/include/asm-x86_64/e820.h b/include/asm-x86_64/e820.h
index 3486e70..2ced3ba 100644
--- a/include/asm-x86_64/e820.h
+++ b/include/asm-x86_64/e820.h
@@ -20,7 +20,12 @@
 #define E820_ACPI  3
 #define E820_NVS   4
 
+#define MAP_TYPE_STR   "BIOS-e820"
+
 #ifndef __ASSEMBLY__
+
+void native_ebda_info(unsigned *addr, unsigned *size);
+
 struct e820entry {
u64 addr;   /* start of memory segment */
u64 size;   /* size of memory segment */
@@ -56,6 +61,7 @@ extern struct e820map e820;
 
 extern unsigned ebda_addr, ebda_size;
 extern unsigned long nodemap_addr, nodemap_size;
+
 #endif/*!__ASSEMBLY__*/
 
 #endif/*__E820_HEADER*/
diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index ec8b245..8c40fb2 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -149,6 +149,7 @@ extern unsigned long __phys_addr(unsigned long);
 #define __boot_pa(x)   __pa(x)
 #ifdef CONFIG_FLATMEM
 #define pfn_valid(pfn) ((pfn) < end_pfn)
+
 #endif
 
 #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr) >> PAGE_SHIFT)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/25 -v2] introduce paravirt_release_pgd()

2007-08-10 Thread Glauber de Oliveira Costa
This patch introduces a new macro/function that informs a paravirt
guest when its page table is not more in use, and can be released.
In case we're not paravirt, just do nothing.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/pgalloc.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86_64/pgalloc.h b/include/asm-x86_64/pgalloc.h
index b467be6..dbe1267 100644
--- a/include/asm-x86_64/pgalloc.h
+++ b/include/asm-x86_64/pgalloc.h
@@ -9,6 +9,12 @@
 #define QUICK_PGD 0/* We preserve special mappings over free */
 #define QUICK_PT 1 /* Other page table pages that are zero on free */
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define paravirt_release_pgd(pgd) do { } while (0)
+#endif
+
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
 #define pud_populate(mm, pud, pmd) \
@@ -100,6 +106,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 static inline void pgd_free(pgd_t *pgd)
 {
BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
+   paravirt_release_pgd(pgd);
quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/25 -v2] turn priviled operation into a macro

2007-08-10 Thread Glauber de Oliveira Costa
under paravirt, read cr2 cannot be issued directly anymore.
So wrap it in a macro, defined to the operation itself in case
paravirt is off, but to something else if we have paravirt
in the game

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/head.S |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/head.S b/arch/x86_64/kernel/head.S
index e89abcd..1bb6c55 100644
--- a/arch/x86_64/kernel/head.S
+++ b/arch/x86_64/kernel/head.S
@@ -18,6 +18,12 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_PARAVIRT
+#include 
+#include 
+#else
+#define GET_CR2_INTO_RCX mov %cr2, %rcx
+#endif
 
 /* we are not able to switch in one step to the final KERNEL ADRESS SPACE
  * because we need identity-mapped pages.
@@ -267,7 +273,9 @@ ENTRY(early_idt_handler)
xorl %eax,%eax
movq 8(%rsp),%rsi   # get rip
movq (%rsp),%rdx
-   movq %cr2,%rcx
+   /* When PARAVIRT is on, this operation may clobber rax. It is
+ something safe to do, because we've just zeroed rax. */
+   GET_CR2_INTO_RCX
leaq early_idt_msg(%rip),%rdi
call early_printk
cmpl $2,early_recursion_flag(%rip)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/25 -v2] provide paravirt patching function

2007-08-10 Thread Glauber de Oliveira Costa
This patch introduces apply_paravirt(), a function that shall
be called by i386/alternative.c to apply replacements to
paravirt_functions. It is defined to an do-nothing function
if paravirt is not enabled.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/alternative.h |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/alternative.h b/include/asm-x86_64/alternative.h
index ab161e8..e69a141 100644
--- a/include/asm-x86_64/alternative.h
+++ b/include/asm-x86_64/alternative.h
@@ -143,12 +143,14 @@ static inline void alternatives_smp_switch(int smp) {}
  */
 #define ASM_OUTPUT2(a, b) a, b
 
-struct paravirt_patch;
+struct paravirt_patch_site;
 #ifdef CONFIG_PARAVIRT
-void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end);
+void apply_paravirt(struct paravirt_patch_site *start,
+   struct paravirt_patch_site *end);
 #else
 static inline void
-apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
+apply_paravirt(struct paravirt_patch_site *start,
+   struct paravirt_patch_site *end)
 {}
 #define __parainstructions NULL
 #define __parainstructions_end NULL
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/25 -v2] turn page operations into native versions

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the page operations (set and make a page table)
into native_ versions. The operations itself will be later
overriden by paravirt.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/page.h |   36 +++-
 1 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index 88adf1a..ec8b245 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -64,16 +64,42 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 
 extern unsigned long phys_base;
 
-#define pte_val(x) ((x).pte)
-#define pmd_val(x) ((x).pmd)
-#define pud_val(x) ((x).pud)
-#define pgd_val(x) ((x).pgd)
-#define pgprot_val(x)  ((x).pgprot)
+static inline unsigned long native_pte_val(pte_t pte)
+{
+   return pte.pte;
+}
+
+static inline unsigned long native_pud_val(pud_t pud)
+{
+   return pud.pud;
+}
+
+
+static inline unsigned long native_pmd_val(pmd_t pmd)
+{
+   return pmd.pmd;
+}
+
+static inline unsigned long native_pgd_val(pgd_t pgd)
+{
+   return pgd.pgd;
+}
+
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define pte_val(x) native_pte_val(x)
+#define pmd_val(x) native_pmd_val(x)
+#define pud_val(x) native_pud_val(x)
+#define pgd_val(x) native_pgd_val(x)
 
 #define __pte(x) ((pte_t) { (x) } )
 #define __pmd(x) ((pmd_t) { (x) } )
 #define __pud(x) ((pud_t) { (x) } )
 #define __pgd(x) ((pgd_t) { (x) } )
+#endif /* CONFIG_PARAVIRT */
+
+#define pgprot_val(x)  ((x).pgprot)
 #define __pgprot(x)((pgprot_t) { (x) } )
 
 #endif /* !__ASSEMBLY__ */
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/25 -v2] interrupt related native paravirt functions.

2007-08-10 Thread Glauber de Oliveira Costa
The interrupt initialization routine becomes native_init_IRQ and will
be overriden later in case paravirt is on.

[  updates from v1
   * After a talk with Jeremy Fitzhardinge, it turned out that making the
   interrupt vector global was not a good idea. So it is removed in this
   patch
]
Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/i8259.c |5 -
 include/asm-x86_64/irq.h   |2 ++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c
index 948cae6..048e3cb 100644
--- a/arch/x86_64/kernel/i8259.c
+++ b/arch/x86_64/kernel/i8259.c
@@ -484,7 +484,10 @@ static int __init init_timer_sysfs(void)
 
 device_initcall(init_timer_sysfs);
 
-void __init init_IRQ(void)
+/* Overridden in paravirt.c */
+void init_IRQ(void) __attribute__((weak, alias("native_init_IRQ")));
+
+void __init native_init_IRQ(void)
 {
int i;
 
diff --git a/include/asm-x86_64/irq.h b/include/asm-x86_64/irq.h
index 5006c6e..be55299 100644
--- a/include/asm-x86_64/irq.h
+++ b/include/asm-x86_64/irq.h
@@ -46,6 +46,8 @@ static __inline__ int irq_canonicalize(int irq)
 extern void fixup_irqs(cpumask_t map);
 #endif
 
+void native_init_IRQ(void);
+
 #define __ARCH_HAS_DO_SOFTIRQ 1
 
 #endif /* _ASM_IRQ_H */
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 14/25 -v2] get rid of inline asm for load_cr3

2007-08-10 Thread Glauber de Oliveira Costa
Besides not elegant, it is now even forbidden, since it can
break paravirtualized guests. load_cr3 should call write_cr3()
instead.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/mmu_context.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-x86_64/mmu_context.h b/include/asm-x86_64/mmu_context.h
index c8cdc1e..9592698 100644
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
@@ -25,7 +25,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
 
 static inline void load_cr3(pgd_t *pgd)
 {
-   asm volatile("movq %0,%%cr3" :: "r" (__pa(pgd)) : "memory");
+   write_cr3(__pa(pgd));
 }
 
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/25 -v2] export cpu_gdt_descr

2007-08-10 Thread Glauber de Oliveira Costa
With paravirualization, hypervisors needs to handle the gdt,
that was right to this point only used at very early
inialization code. Hypervisors are commonly modules, so make
it an export

[  updates from v1
   * make it an EXPORT_SYMBOL_GPL.
   Suggested by Arjan van de Ven
]

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/x8664_ksyms.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..2d3932d 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -60,3 +60,9 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+#ifdef CONFIG_PARAVIRT
+extern unsigned long *cpu_gdt_descr;
+/* Virtualized guests may want to use it */
+EXPORT_SYMBOL_GPL(cpu_gdt_descr);
+#endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/25 -v2] time-related functions paravirt provisions

2007-08-10 Thread Glauber de Oliveira Costa
This patch add provisions for time related functions so they
can be later replaced by paravirt versions.

it basically encloses {g,s}et_wallclock inside the
already existent functions update_persistent_clock and
read_persistent_clock, and defines {s,g}et_wallclock
to the core of such functions.

The timer interrupt setup also have to be replaced.
The job is done by time_init_hook().

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/time.c |   37 +
 include/asm-x86_64/time.h |   18 ++
 2 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 6d48a4e..29fcd91 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -82,18 +83,12 @@ EXPORT_SYMBOL(profile_pc);
  * sheet for details.
  */
 
-static int set_rtc_mmss(unsigned long nowtime)
+int do_set_rtc_mmss(unsigned long nowtime)
 {
int retval = 0;
int real_seconds, real_minutes, cmos_minutes;
unsigned char control, freq_select;
 
-/*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
- */
-
-   spin_lock(_lock);
 
 /*
  * Tell the clock it's being set and stop it.
@@ -143,14 +138,22 @@ static int set_rtc_mmss(unsigned long nowtime)
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
-   spin_unlock(_lock);
-
return retval;
 }
 
 int update_persistent_clock(struct timespec now)
 {
-   return set_rtc_mmss(now.tv_sec);
+   int retval;
+
+/*
+ * IRQs are disabled when we're called from the timer interrupt,
+ * no need for spin_lock_irqsave()
+ */
+   spin_lock(_lock);
+   retval = set_wallclock(now.tv_sec);
+   spin_unlock(_lock);
+
+   return retval;
 }
 
 void main_timer_handler(void)
@@ -195,7 +198,7 @@ static irqreturn_t timer_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-unsigned long read_persistent_clock(void)
+unsigned long do_get_cmos_time(void)
 {
unsigned int year, mon, day, hour, min, sec;
unsigned long flags;
@@ -246,6 +249,11 @@ unsigned long read_persistent_clock(void)
return mktime(year, mon, day, hour, min, sec);
 }
 
+unsigned long read_persistent_clock(void)
+{
+   return get_wallclock();
+}
+
 /* calibrate_cpu is used on systems with fixed rate TSCs to determine
  * processor frequency */
 #define TICK_COUNT 1
@@ -365,6 +373,11 @@ static struct irqaction irq0 = {
.name   = "timer"
 };
 
+inline void time_init_hook()
+{
+   setup_irq(0, );
+}
+
 void __init time_init(void)
 {
if (nohpet)
@@ -403,7 +416,7 @@ void __init time_init(void)
cpu_khz / 1000, cpu_khz % 1000);
init_tsc_clocksource();
 
-   setup_irq(0, );
+   do_time_init();
 }
 
 /*
diff --git a/include/asm-x86_64/time.h b/include/asm-x86_64/time.h
new file mode 100644
index 000..9a72355
--- /dev/null
+++ b/include/asm-x86_64/time.h
@@ -0,0 +1,18 @@
+#ifndef _ASM_X86_64_TIME_H
+#define _ASM_X86_64_TIME_H
+
+inline void time_init_hook(void);
+unsigned long do_get_cmos_time(void);
+int do_set_rtc_mmss(unsigned long nowtime);
+
+#ifdef CONFIG_PARAVIRT
+#include 
+#else /* !CONFIG_PARAVIRT */
+
+#define get_wallclock() do_get_cmos_time()
+#define set_wallclock(x) do_set_rtc_mmss(x)
+#define do_time_init() time_init_hook()
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/25 -v2] add native functions for descriptors handling

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the basic descriptor handling into native_
functions. It is basically write_idt, load_idt, write_gdt,
load_gdt, set_ldt, store_tr, load_tls, and the ones
for updating a single entry.

In the process of doing that, we change the definition of
load_LDT_nolock, and caller sites have to be patched. We
also patch call sites that now needs a typecast.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/head64.c  |2 +-
 arch/x86_64/kernel/ldt.c |6 +-
 arch/x86_64/kernel/reboot.c  |3 +-
 arch/x86_64/kernel/setup64.c |4 +-
 arch/x86_64/kernel/suspend.c |   11 ++-
 include/asm-x86_64/desc.h|  183 +++--
 include/asm-x86_64/mmu_context.h |4 +-
 7 files changed, 148 insertions(+), 65 deletions(-)

diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c
index 6c34bdd..a0d05d7 100644
--- a/arch/x86_64/kernel/head64.c
+++ b/arch/x86_64/kernel/head64.c
@@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
for (i = 0; i < IDT_ENTRIES; i++)
set_intr_gate(i, early_idt_handler);
-   asm volatile("lidt %0" :: "m" (idt_descr));
+   load_idt(_descr);
 
early_printk("Kernel alive\n");
 
diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c
index bc9ffd5..8e6fcc1 100644
--- a/arch/x86_64/kernel/ldt.c
+++ b/arch/x86_64/kernel/ldt.c
@@ -173,7 +173,7 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
 {
struct task_struct *me = current;
struct mm_struct * mm = me->mm;
-   __u32 entry_1, entry_2, *lp;
+   __u32 entry_1, entry_2;
int error;
struct user_desc ldt_info;
 
@@ -202,7 +202,6 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
goto out_unlock;
}
 
-   lp = (__u32 *) ((ldt_info.entry_number << 3) + (char *) 
mm->context.ldt);
 
/* Allow LDTs to be cleared by the user. */
if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
@@ -220,8 +219,7 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
 
/* Install the new entry ...  */
 install:
-   *lp = entry_1;
-   *(lp+1) = entry_2;
+   write_ldt_entry(mm->context.ldt, ldt_info.entry_number, entry_1, 
entry_2);
error = 0;
 
 out_unlock:
diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 368db2b..ebc242c 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -136,7 +137,7 @@ void machine_emergency_restart(void)
}
 
case BOOT_TRIPLE: 
-   __asm__ __volatile__("lidt (%0)": :"r" (_idt));
+   load_idt((struct desc_ptr *)_idt);
__asm__ __volatile__("int3");
 
reboot_type = BOOT_KBD;
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 395cf02..49f7342 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -224,8 +224,8 @@ void __cpuinit cpu_init (void)
memcpy(cpu_gdt(cpu), cpu_gdt_table, GDT_SIZE);
 
cpu_gdt_descr[cpu].size = GDT_SIZE;
-   asm volatile("lgdt %0" :: "m" (cpu_gdt_descr[cpu]));
-   asm volatile("lidt %0" :: "m" (idt_descr));
+   load_gdt(_gdt_descr[cpu]);
+   load_idt(_descr);
 
memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
syscall_init();
diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c
index 573c0a6..24055b6 100644
--- a/arch/x86_64/kernel/suspend.c
+++ b/arch/x86_64/kernel/suspend.c
@@ -32,9 +32,9 @@ void __save_processor_state(struct saved_context *ctxt)
/*
 * descriptor tables
 */
-   asm volatile ("sgdt %0" : "=m" (ctxt->gdt_limit));
-   asm volatile ("sidt %0" : "=m" (ctxt->idt_limit));
-   asm volatile ("str %0"  : "=m" (ctxt->tr));
+   store_gdt((struct desc_ptr *)>gdt_limit);
+   store_idt((struct desc_ptr *)>idt_limit);
+   store_tr(ctxt->tr);
 
/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
/*
@@ -91,8 +91,9 @@ void __restore_processor_state(struct saved_context *ctxt)
 * now restore the descriptor tables to their proper values
 * ltr is done i fix_processor_context().
 */
-   asm volatile ("lgdt %0" :: "m" (ctxt->gdt_limit));
-   asm volatile ("lidt %0" :: "m" (ctxt->idt_limit));
+   load_gdt((struct desc_ptr *)&

[PATCH 12/25 -v2] turn msr.h functions into native versions

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns makes the basic operations in msr.h out of native
ones. Those operations are: rdmsr, wrmsr, rdtsc, rdtscp, rdpmc, and
cpuid. After they are turned into functions, some call sites need
casts, and so we provide them.

There is also a fixup needed in the functions located in the vsyscall
area, as they cannot call any of them anymore (otherwise, the call
would go through a kernel address, invalid in userspace mapping).

The solution is to call the now-provided native_ versions instead.

[  updates from v1
   * Call read_tscp rdtscp, to match instruction name
   * Avoid duplication of code in get_cycles_sync
   * Get rid of rdtsc(), since it is used nowhere else
   All three suggested by Andi Kleen
]

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/ia32/syscall32.c  |2 +-
 arch/x86_64/kernel/setup64.c  |6 +-
 arch/x86_64/kernel/tsc.c  |   24 -
 arch/x86_64/kernel/vsyscall.c |4 +-
 arch/x86_64/vdso/vgetcpu.c|4 +-
 include/asm-i386/tsc.h|   12 ++-
 include/asm-x86_64/msr.h  |  277 +
 7 files changed, 211 insertions(+), 118 deletions(-)

diff --git a/arch/x86_64/ia32/syscall32.c b/arch/x86_64/ia32/syscall32.c
index 15013ba..dd1b4a3 100644
--- a/arch/x86_64/ia32/syscall32.c
+++ b/arch/x86_64/ia32/syscall32.c
@@ -79,5 +79,5 @@ void syscall32_cpu_init(void)
checking_wrmsrl(MSR_IA32_SYSENTER_ESP, 0ULL);
checking_wrmsrl(MSR_IA32_SYSENTER_EIP, (u64)ia32_sysenter_target);
 
-   wrmsrl(MSR_CSTAR, ia32_cstar_target);
+   wrmsrl(MSR_CSTAR, (u64)ia32_cstar_target);
 }
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 1200aaa..395cf02 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -122,7 +122,7 @@ void pda_init(int cpu)
asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0)); 
/* Memory clobbers used to order PDA accessed */
mb();
-   wrmsrl(MSR_GS_BASE, pda);
+   wrmsrl(MSR_GS_BASE, (u64)pda);
mb();
 
pda->cpunumber = cpu; 
@@ -161,8 +161,8 @@ void syscall_init(void)
 * but only a 32bit target. LSTAR sets the 64bit rip.
 */ 
wrmsrl(MSR_STAR,  ((u64)__USER32_CS)<<48  | ((u64)__KERNEL_CS)<<32); 
-   wrmsrl(MSR_LSTAR, system_call); 
-   wrmsrl(MSR_CSTAR, ignore_sysret);
+   wrmsrl(MSR_LSTAR, (u64)system_call);
+   wrmsrl(MSR_CSTAR, (u64)ignore_sysret);
 
 #ifdef CONFIG_IA32_EMULATION   
syscall32_cpu_init ();
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
index 2a59bde..2a5fbc9 100644
--- a/arch/x86_64/kernel/tsc.c
+++ b/arch/x86_64/kernel/tsc.c
@@ -9,6 +9,28 @@
 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+/*
+ * When paravirt is on, some functionalities are executed through function
+ * pointers in the paravirt_ops structure, for both the host and guest.
+ * These function pointers exist inside the kernel and can not
+ * be accessed by user space. To avoid this, we make a copy of the
+ * get_cycles_sync (called in kernel) but force the use of native_read_tsc.
+ * For the host, it will simply do the native rdtsc. The guest
+ * should set up it's own clock and vread
+ */
+static __always_inline long long vget_cycles_sync(void)
+{
+   unsigned long long ret;
+   ret =__get_cycles_sync();
+   if (!ret)
+   ret = native_read_tsc();
+   return ret;
+}
+#else
+# define vget_cycles_sync() get_cycles_sync()
+#endif
+
 static int notsc __initdata = 0;
 
 unsigned int cpu_khz;  /* TSC clocks / usec, not used here */
@@ -165,7 +187,7 @@ static cycle_t read_tsc(void)
 
 static cycle_t __vsyscall_fn vread_tsc(void)
 {
-   cycle_t ret = (cycle_t)get_cycles_sync();
+   cycle_t ret = (cycle_t)vget_cycles_sync();
return ret;
 }
 
diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index 06c3494..757874e 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -184,7 +184,7 @@ time_t __vsyscall(1) vtime(time_t *t)
 long __vsyscall(2)
 vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
 {
-   unsigned int dummy, p;
+   unsigned int p;
unsigned long j = 0;
 
/* Fast cache - only recompute value once per jiffies and avoid
@@ -199,7 +199,7 @@ vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache 
*tcache)
p = tcache->blob[1];
} else if (__vgetcpu_mode == VGETCPU_RDTSCP) {
/* Load per CPU data from RDTSCP */
-   rdtscp(dummy, dummy, p);
+   native_rdtscp();
} else {
/* Load per CPU data from GDT */
asm("lsl %1,%0" : "=r" (p) : "r" (__PER_CPU_SEG));
diff --git a/arch/x86_64/vdso/vgetcpu.c b/arch/x86_64/vdso/vgetcpu.c
index 91f6e85..1f38f61 1

[PATCH 18/25 -v2] turn priviled operations into macros in entry.S

2007-08-10 Thread Glauber de Oliveira Costa
With paravirt on, we cannot issue operations like swapgs, sysretq,
iretq, cli, sti. So they have to be changed into macros, that will
be later properly replaced for the paravirt case.

The sysretq is a little bit more complicated, and is replaced
by a sequence of three instructions. It is basically because if
we had already issued an swapgs, we would be with a user stack
at this point. So we do it all-in-one.

The clobber list follows the idea of the i386 version closely,
and represents which caller-saved registers are safe to modify
at the point the function is called. So for example, CLBR_ANY
says we can clobber rax, rdi, rsi, rdx, rcx, r8-r11, while
CLBR_NONE says we cannot touch annything.

[  updates from v1
   * renamed SYSRETQ to SYSCALL_RETURN
   * don't use ENTRY/ENDPROC for native_{syscall_return,iret}
   * fix one use of the clobber list
   * rename SWAPGS_NOSTACK to SWAPGS_UNSAFE_STACK
   * change the unexpressive 1b label to do_iret
   All suggested by Andi Kleen
]

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/entry.S |  130 +---
 1 files changed, 87 insertions(+), 43 deletions(-)

diff --git a/arch/x86_64/kernel/entry.S b/arch/x86_64/kernel/entry.S
index 1d232e5..db8707a 100644
--- a/arch/x86_64/kernel/entry.S
+++ b/arch/x86_64/kernel/entry.S
@@ -51,8 +51,31 @@
 #include 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define ENABLE_INTERRUPTS(x)   sti
+#define DISABLE_INTERRUPTS(x)  cli
+#define INTERRUPT_RETURN   iretq
+#define SWAPGS swapgs
+#define SYSCALL_RETURN \
+   movq%gs:pda_oldrsp,%rsp;\
+   swapgs; \
+   sysretq;
+#endif
+
.code64
 
+/* Currently paravirt can't handle swapgs nicely when we
+ * don't have a stack we can rely on (such as a user space
+ * stack).  So we either find a way around these or just fault
+ * and emulate if a guest tries to call swapgs directly.
+ *
+ * Either way, this is a good way to document that we don't
+ * have a reliable stack.
+ */
+#define SWAPGS_UNSAFE_STACKswapgs
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif 
@@ -216,14 +239,23 @@ ENTRY(system_call)
CFI_DEF_CFA rsp,PDA_STACKOFFSET
CFI_REGISTERrip,rcx
/*CFI_REGISTER  rflags,r11*/
-   swapgs
+   SWAPGS_UNSAFE_STACK
+#ifdef CONFIG_PARAVIRT
+   /*
+* A hypervisor implementation might want to use a label
+* after the swapgs, so that it can do the swapgs
+* for the guest and jump here on syscall.
+*/
+   .globl system_call_after_swapgs
+system_call_after_swapgs:
+#endif
movq%rsp,%gs:pda_oldrsp 
movq%gs:pda_kernelstack,%rsp
/*
 * No need to follow this irqs off/on section - it's straight
 * and short:
 */
-   sti 
+   ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_ARGS 8,1
movq  %rax,ORIG_RAX-ARGOFFSET(%rsp) 
movq  %rcx,RIP-ARGOFFSET(%rsp)
@@ -245,7 +277,7 @@ ret_from_sys_call:
/* edi: flagmask */
 sysret_check:  
GET_THREAD_INFO(%rcx)
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
movl threadinfo_flags(%rcx),%edx
andl %edi,%edx
@@ -259,9 +291,7 @@ sysret_check:
CFI_REGISTERrip,rcx
RESTORE_ARGS 0,-ARG_SKIP,1
/*CFI_REGISTER  rflags,r11*/
-   movq%gs:pda_oldrsp,%rsp
-   swapgs
-   sysretq
+   SYSCALL_RETURN
 
CFI_RESTORE_STATE
/* Handle reschedules */
@@ -270,7 +300,7 @@ sysret_careful:
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
@@ -281,7 +311,7 @@ sysret_careful:
/* Handle a signal */ 
 sysret_signal:
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
testl $(_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz1f
 
@@ -294,7 +324,7 @@ sysret_signal:
 1: movl $_TIF_NEED_RESCHED,%edi
/* Use IRET because user could have changed frame. This
   works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check

@@ -326,7 +356,7 @@ tracesys:
  */
.globl int_ret_from_sys_call
 int_ret_from_sys_call:
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
testl $3,CS-ARGOFFSET(%rsp)
je retint_restore_args
@@ -347,20 +377,20 @@ int_careful:
bt $TIF_NEED_RESCHED,%edx
jnc  int_very_careful
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
  

[PATCH 11/25 -v2] native versions for set pagetables

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the set_p{te,md,ud,gd} functions into their
native_ versions. There is no need to patch any caller.

Also, it adds pte_update() and pte_update_defer() calls whenever
we modify a page table entry. This last part was coded to match
i386 as close as possible.

Pieces of the header are moved to below the #ifdef CONFIG_PARAVIRT
site, as they are users of the newly defined set_* macros.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/pgtable.h |  152 -
 1 files changed, 89 insertions(+), 63 deletions(-)

diff --git a/include/asm-x86_64/pgtable.h b/include/asm-x86_64/pgtable.h
index c9d8764..dd572a2 100644
--- a/include/asm-x86_64/pgtable.h
+++ b/include/asm-x86_64/pgtable.h
@@ -57,55 +57,77 @@ extern unsigned long 
empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
  */
 #define PTRS_PER_PTE   512
 
-#ifndef __ASSEMBLY__
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
 
-#define pte_ERROR(e) \
-   printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pte_val(e))
-#define pmd_ERROR(e) \
-   printk("%s:%d: bad pmd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pmd_val(e))
-#define pud_ERROR(e) \
-   printk("%s:%d: bad pud %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pud_val(e))
-#define pgd_ERROR(e) \
-   printk("%s:%d: bad pgd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pgd_val(e))
+#define set_pte native_set_pte
+#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+#define set_pmd native_set_pmd
+#define set_pud native_set_pud
+#define set_pgd native_set_pgd
+#define pte_clear(mm,addr,xp)  do { set_pte_at(mm, addr, xp, __pte(0)); } 
while (0)
+#define pmd_clear(xp)  do { set_pmd(xp, __pmd(0)); } while (0)
+#define pud_clear native_pud_clear
+#define pgd_clear native_pgd_clear
+#define pte_update(mm, addr, ptep)  do { } while (0)
+#define pte_update_defer(mm, addr, ptep)do { } while (0)
 
-#define pgd_none(x)(!pgd_val(x))
-#define pud_none(x)(!pud_val(x))
+#endif
+
+#ifndef __ASSEMBLY__
 
-static inline void set_pte(pte_t *dst, pte_t val)
+static inline void native_set_pte(pte_t *dst, pte_t val)
 {
-   pte_val(*dst) = pte_val(val);
+   dst->pte = pte_val(val);
 } 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
 
-static inline void set_pmd(pmd_t *dst, pmd_t val)
+
+static inline void native_set_pmd(pmd_t *dst, pmd_t val)
 {
-pmd_val(*dst) = pmd_val(val); 
+   dst->pmd = pmd_val(val);
 } 
 
-static inline void set_pud(pud_t *dst, pud_t val)
+static inline void native_set_pud(pud_t *dst, pud_t val)
 {
-   pud_val(*dst) = pud_val(val);
+   dst->pud = pud_val(val);
 }
 
-static inline void pud_clear (pud_t *pud)
+static inline void native_set_pgd(pgd_t *dst, pgd_t val)
 {
-   set_pud(pud, __pud(0));
+   dst->pgd = pgd_val(val);
 }
-
-static inline void set_pgd(pgd_t *dst, pgd_t val)
+static inline void native_pud_clear (pud_t *pud)
 {
-   pgd_val(*dst) = pgd_val(val); 
-} 
+   set_pud(pud, __pud(0));
+}
 
-static inline void pgd_clear (pgd_t * pgd)
+static inline void native_pgd_clear (pgd_t * pgd)
 {
set_pgd(pgd, __pgd(0));
 }
 
-#define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte, 0))
+#define pte_ERROR(e) \
+   printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pte_val(e))
+#define pmd_ERROR(e) \
+   printk("%s:%d: bad pmd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pmd_val(e))
+#define pud_ERROR(e) \
+   printk("%s:%d: bad pud %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pud_val(e))
+#define pgd_ERROR(e) \
+   printk("%s:%d: bad pgd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pgd_val(e))
+
+#define pgd_none(x)(!pgd_val(x))
+#define pud_none(x)(!pud_val(x))
 
 struct mm_struct;
 
+static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+   pte_t pte = __pte(xchg(>pte, 0));
+   pte_update(mm, addr, ptep);
+   return pte;
+}
+
 static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned 
long addr, pte_t *ptep, int full)
 {
pte_t pte;
@@ -245,7 +267,6 @@ static inline unsigned long pmd_bad(pmd_t pmd)
 
 #define pte_none(x)(!pte_val(x))
 #define pte_present(x) (pte_val(x) & (_PAGE_PRESENT | _PAGE_PROTNONE))
-#define pte_clear(mm,addr,xp)  do { set_pte_at(mm, addr, xp, __pte(0)); } 
while (0)
 
 #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT))/* FIXME: is this
   right? */
@@ -254,11 +275,11 @@ static inline unsigned long pmd_bad(pmd_t pmd)
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
-   pte_t pte;
-   pte_val(pte) = (page_nr << PAGE_SHIFT);
-   pte_val(pte) |= pgprot_val(pgpr

[PATCH 10/25 -v2] export math_state_restore

2007-08-10 Thread Glauber de Oliveira Costa
Export math_state_restore symbol, so it can be used for hypervisors.
They are commonly loaded as modules.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/traps.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
index 0388842..aacbe12 100644
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -1081,6 +1081,7 @@ asmlinkage void math_state_restore(void)
task_thread_info(me)->status |= TS_USEDFPU;
me->fpu_counter++;
 }
+EXPORT_SYMBOL_GPL(math_state_restore);
 
 void __init trap_init(void)
 {
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/25 -v2] tlb flushing routines

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the flush_tlb routines into native versions.
In case paravirt is not defined, the natives are defined into
the actually used ones. flush_tlb_others() goes in smp.c, unless
smp is not in the game

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/smp.c  |   10 +-
 include/asm-x86_64/smp.h  |8 
 include/asm-x86_64/tlbflush.h |   22 ++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86_64/kernel/smp.c b/arch/x86_64/kernel/smp.c
index 673a300..39f5f6b 100644
--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -165,7 +165,7 @@ out:
cpu_clear(cpu, f->flush_cpumask);
 }
 
-static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
+void native_flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
unsigned long va)
 {
int sender;
@@ -198,6 +198,14 @@ static void flush_tlb_others(cpumask_t cpumask, struct 
mm_struct *mm,
spin_unlock(>tlbstate_lock);
 }
 
+/* Overriden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) flush_tlb_others(cpumask_t cpumask,
+   struct mm_struct *mm,
+   unsigned long va)
+{
+   native_flush_tlb_others(cpumask, mm, va);
+}
+
 int __cpuinit init_smp_flush(void)
 {
int i;
diff --git a/include/asm-x86_64/smp.h b/include/asm-x86_64/smp.h
index 3f303d2..6b4 100644
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -19,6 +19,14 @@ extern int disable_apic;
 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+#include 
+void native_flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
+   unsigned long va);
+#else
+#define startup_ipi_hook(apicid, rip, rsp) do { } while (0)
+#endif
+
 struct pt_regs;
 
 extern cpumask_t cpu_present_mask;
diff --git a/include/asm-x86_64/tlbflush.h b/include/asm-x86_64/tlbflush.h
index 888eb4a..1c68cc8 100644
--- a/include/asm-x86_64/tlbflush.h
+++ b/include/asm-x86_64/tlbflush.h
@@ -6,21 +6,30 @@
 #include 
 #include 
 
-static inline void __flush_tlb(void)
+static inline void native_flush_tlb(void)
 {
write_cr3(read_cr3());
 }
 
-static inline void __flush_tlb_all(void)
+static inline void native_flush_tlb_all(void)
 {
unsigned long cr4 = read_cr4();
write_cr4(cr4 & ~X86_CR4_PGE);  /* clear PGE */
write_cr4(cr4); /* write old PGE again and flush TLBs */
 }
 
-#define __flush_tlb_one(addr) \
-   __asm__ __volatile__("invlpg (%0)" :: "r" (addr) : "memory")
+static inline void native_flush_tlb_one(unsigned long addr)
+{
+   asm volatile ("invlpg (%0)" :: "r" (addr) : "memory");
+}
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define __flush_tlb()  native_flush_tlb()
+#define __flush_tlb_all()  native_flush_tlb_all()
+#define __flush_tlb_one(addr)  native_flush_tlb_one(addr)
+#endif /* CONFIG_PARAVIRT */
 
 /*
  * TLB flushing:
@@ -64,6 +73,11 @@ static inline void flush_tlb_range(struct vm_area_struct 
*vma,
__flush_tlb();
 }
 
+static inline void native_flush_tlb_others(cpumask_t *cpumask,
+  struct mm_struct *mm, unsigned long 
va)
+{
+}
+
 #else
 
 #include 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/25 -v2] irq_flags / halt routines

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the irq_flags and halt routines into the
native versions.

[ updates from v1
Move raw_irqs_disabled_flags outside of the PARAVIRT ifdef to
avoid increasing the mess, suggested by Andi Kleen
]

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/irqflags.h |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/asm-x86_64/irqflags.h b/include/asm-x86_64/irqflags.h
index 86e70fe..fe0d346 100644
--- a/include/asm-x86_64/irqflags.h
+++ b/include/asm-x86_64/irqflags.h
@@ -16,6 +16,10 @@
  * Interrupt control:
  */
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else /* PARAVIRT */
+
 static inline unsigned long __raw_local_save_flags(void)
 {
unsigned long flags;
@@ -31,9 +35,6 @@ static inline unsigned long __raw_local_save_flags(void)
return flags;
 }
 
-#define raw_local_save_flags(flags) \
-   do { (flags) = __raw_local_save_flags(); } while (0)
-
 static inline void raw_local_irq_restore(unsigned long flags)
 {
__asm__ __volatile__(
@@ -64,11 +65,6 @@ static inline void raw_local_irq_enable(void)
raw_local_irq_restore((flags | X86_EFLAGS_IF) & (~X86_EFLAGS_AC));
 }
 
-static inline int raw_irqs_disabled_flags(unsigned long flags)
-{
-   return !(flags & X86_EFLAGS_IF) || (flags & X86_EFLAGS_AC);
-}
-
 #else /* CONFIG_X86_VSMP */
 
 static inline void raw_local_irq_disable(void)
@@ -81,13 +77,27 @@ static inline void raw_local_irq_enable(void)
__asm__ __volatile__("sti" : : : "memory");
 }
 
+#endif /* CONFIG_X86_VSMP */
+#endif /* CONFIG_PARAVIRT */
+
+/* Those are not paravirt stubs, so they live out of the PARAVIRT ifdef */
+
+#ifdef CONFIG_X86_VSMP
+static inline int raw_irqs_disabled_flags(unsigned long flags)
+{
+   return !(flags & X86_EFLAGS_IF) || (flags & X86_EFLAGS_AC);
+}
+
+#else
 static inline int raw_irqs_disabled_flags(unsigned long flags)
 {
return !(flags & X86_EFLAGS_IF);
 }
 
-#endif
+#endif /* CONFIG_X86_VSMP */
 
+#define raw_local_save_flags(flags) \
+   do { (flags) = __raw_local_save_flags(); } while (0)
 /*
  * For spinlocks, etc.:
  */
@@ -115,7 +125,7 @@ static inline int raw_irqs_disabled(void)
  * Used in the idle loop; sti takes one instruction cycle
  * to complete:
  */
-static inline void raw_safe_halt(void)
+static inline void native_raw_safe_halt(void)
 {
__asm__ __volatile__("sti; hlt" : : : "memory");
 }
@@ -124,11 +134,16 @@ static inline void raw_safe_halt(void)
  * Used when interrupts are already enabled or to
  * shutdown the processor:
  */
-static inline void halt(void)
+static inline void native_halt(void)
 {
__asm__ __volatile__("hlt": : :"memory");
 }
 
+#ifndef CONFIG_PARAVIRT
+#define raw_safe_halt  native_raw_safe_halt
+#define halt   native_halt
+#endif /* ! CONFIG_PARAVIRT */
+
 #else /* __ASSEMBLY__: */
 # ifdef CONFIG_TRACE_IRQFLAGS
 #  define TRACE_IRQS_ONcall trace_hardirqs_on_thunk
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/25 -v2] Add debugreg/load_rsp native hooks

2007-08-10 Thread Glauber de Oliveira Costa
This patch adds native hooks for debugreg handling functions,
and for the native load_rsp0 function. The later also have its
call sites patched.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/process.c   |2 +-
 arch/x86_64/kernel/smpboot.c   |2 +-
 include/asm-x86_64/processor.h |   71 
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
index 2842f50..33046f1 100644
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -595,7 +595,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct 
*next_p)
/*
 * Reload esp0, LDT and the page table pointer:
 */
-   tss->rsp0 = next->rsp0;
+   load_rsp0(tss, next);
 
/* 
 * Switch DS and ES.
diff --git a/arch/x86_64/kernel/smpboot.c b/arch/x86_64/kernel/smpboot.c
index 32f5078..f99ced6 100644
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -620,7 +620,7 @@ do_rest:
start_rip = setup_trampoline();
 
init_rsp = c_idle.idle->thread.rsp;
-   per_cpu(init_tss,cpu).rsp0 = init_rsp;
+   load_rsp0(_cpu(init_tss,cpu), _idle.idle->thread);
initial_code = start_secondary;
clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 
diff --git a/include/asm-x86_64/processor.h b/include/asm-x86_64/processor.h
index 1952517..65f689b 100644
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
@@ -249,6 +249,12 @@ struct thread_struct {
.rsp0 = (unsigned long)_stack + sizeof(init_stack) \
 }
 
+static inline void native_load_rsp0(struct tss_struct *tss,
+   struct thread_struct *thread)
+{
+   tss->rsp0 = thread->rsp0;
+}
+
 #define INIT_MMAP \
 { _mm, 0, 0, NULL, PAGE_SHARED, VM_READ | VM_WRITE | VM_EXEC, 1, NULL, 
NULL }
 
@@ -264,13 +270,64 @@ struct thread_struct {
set_fs(USER_DS);
 \
 } while(0) 
 
-#define get_debugreg(var, register)\
-   __asm__("movq %%db" #register ", %0"\
-   :"=r" (var))
-#define set_debugreg(value, register)  \
-   __asm__("movq %0,%%db" #register\
-   : /* no output */   \
-   :"r" (value))
+static inline unsigned long native_get_debugreg(int regno)
+{
+   unsigned long val;
+
+   switch (regno) {
+   case 0:
+   asm("movq %%db0, %0" :"=r" (val)); break;
+   case 1:
+   asm("movq %%db1, %0" :"=r" (val)); break;
+   case 2:
+   asm("movq %%db2, %0" :"=r" (val)); break;
+   case 3:
+   asm("movq %%db3, %0" :"=r" (val)); break;
+   case 6:
+   asm("movq %%db6, %0" :"=r" (val)); break;
+   case 7:
+   asm("movq %%db7, %0" :"=r" (val)); break;
+   default:
+   val = 0; /* assign it to keep gcc quiet */
+   WARN_ON(1);
+   }
+   return val;
+}
+
+static inline void native_set_debugreg(unsigned long value, int regno)
+{
+   switch (regno) {
+   case 0:
+   asm("movq %0,%%db0" : /* no output */ :"r" (value));
+   break;
+   case 1:
+   asm("movq %0,%%db1" : /* no output */ :"r" (value));
+   break;
+   case 2:
+   asm("movq %0,%%db2" : /* no output */ :"r" (value));
+   break;
+   case 3:
+   asm("movq %0,%%db3" : /* no output */ :"r" (value));
+   break;
+   case 6:
+   asm("movq %0,%%db6" : /* no output */ :"r" (value));
+   break;
+   case 7:
+   asm("movq %0,%%db7" : /* no output */ :"r" (value));
+   break;
+   default:
+   BUG();
+   }
+}
+
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define paravirt_enabled() 0
+#define load_rsp0  native_load_rsp0
+#define set_debugreg(val, reg) native_set_debugreg(reg, val)
+#define get_debugreg(var, reg) (var) = native_get_debugreg(reg)
+#endif
 
 struct task_struct;
 struct mm_struct;
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/25 -v2] use macro for sti/cli in spinlock definitions

2007-08-10 Thread Glauber de Oliveira Costa
This patch switches the cli and sti instructions into macros.
In this header, they're just defined to the instructions they
refer to. Later on, when paravirt is defined, they will be
defined to something with paravirt abilities.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/spinlock.h |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/spinlock.h b/include/asm-x86_64/spinlock.h
index 88bf981..5bb5bf8 100644
--- a/include/asm-x86_64/spinlock.h
+++ b/include/asm-x86_64/spinlock.h
@@ -5,6 +5,14 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define CLI_STI_INPUT_ARGS
+#define CLI_STI_CLOBBERS
+#define CLI_STRING "cli"
+#define STI_STRING "sti"
+#endif
 
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
@@ -48,12 +56,12 @@ static inline void __raw_spin_lock_flags(raw_spinlock_t 
*lock, unsigned long fla
"jns 5f\n"
"testl $0x200, %1\n\t"  /* interrupts were disabled? */
"jz 4f\n\t"
-   "sti\n"
+   STI_STRING "\n\t"
"3:\t"
"rep;nop\n\t"
"cmpl $0, %0\n\t"
"jle 3b\n\t"
-   "cli\n\t"
+   CLI_STRING "\n\t"
"jmp 1b\n"
"4:\t"
"rep;nop\n\t"
@@ -61,7 +69,9 @@ static inline void __raw_spin_lock_flags(raw_spinlock_t 
*lock, unsigned long fla
"jg 1b\n\t"
"jmp 4b\n"
"5:\n\t"
-   : "+m" (lock->slock) : "r" ((unsigned)flags) : "memory");
+   : "+m" (lock->slock)
+   : "r" ((unsigned)flags) CLI_STI_INPUT_ARGS
+   : "memory" CLI_STI_CLOBBERS);
 }
 #endif
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/25 -v2] paravirt_ops for x86_64, second round

2007-08-10 Thread Glauber de Oliveira Costa
Here is an slightly updated version of the paravirt_ops patch.
If your comments and criticism were welcome before, now it's even more!

There are some issues that are _not_ addressed in this revision, and here
are the causes:

* split debugreg into multiple functions, suggested by Andi:
  - Me and jsfg agree that introducing more pvops (specially 14!) is
not worthwhile. So, although we do can keep one pvops function and turn
the set/get debugreg macros into multiple ones, this is a general kernel
issue, and can be addressed by a later patch.

* 2MB pages, and other functions that lives in pgalloc.h in the i386 version
  - As xen is the main user of it (i.e., lguest does not), we'd prefer to
see an implementation of it from xen folks, or any other that understand
it better. This way we don't delay the merge process of the
already-written chunk. On the contrary, it will be easier to get that,
as it will be smaller

If you addressed some concern before that is _not_ covered in this revision,
so it is my fault. Please fell free to voice it

Have fun!
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/25 -v2] native versions for system.h functions

2007-08-10 Thread Glauber de Oliveira Costa
This patch adds the native hook for the functions in system.h
They are the read/write_crX, clts and wbinvd. The later, also
gets its call sites patched.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/tce.c|2 +-
 arch/x86_64/mm/pageattr.c   |2 +-
 include/asm-x86_64/system.h |   54 +-
 3 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/arch/x86_64/kernel/tce.c b/arch/x86_64/kernel/tce.c
index e3f2569..587f0c2 100644
--- a/arch/x86_64/kernel/tce.c
+++ b/arch/x86_64/kernel/tce.c
@@ -42,7 +42,7 @@ static inline void flush_tce(void* tceaddr)
if (cpu_has_clflush)
asm volatile("clflush (%0)" :: "r" (tceaddr));
else
-   asm volatile("wbinvd":::"memory");
+   wbinvd();
 }
 
 void tce_build(struct iommu_table *tbl, unsigned long index,
diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c
index 7e161c6..b497afd 100644
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@@ -76,7 +76,7 @@ static void flush_kernel_map(void *arg)
/* When clflush is available always use it because it is
   much cheaper than WBINVD. */
if (!cpu_has_clflush)
-   asm volatile("wbinvd" ::: "memory");
+   wbinvd();
else list_for_each_entry(pg, l, lru) {
void *adr = page_address(pg);
cache_flush_page(adr);
diff --git a/include/asm-x86_64/system.h b/include/asm-x86_64/system.h
index 02175aa..20ed9df 100644
--- a/include/asm-x86_64/system.h
+++ b/include/asm-x86_64/system.h
@@ -68,53 +68,56 @@ extern void load_gs_index(unsigned);
 /*
  * Clear and set 'TS' bit respectively
  */
-#define clts() __asm__ __volatile__ ("clts")
+static inline void native_clts(void)
+{
+   asm volatile ("clts");
+}
 
-static inline unsigned long read_cr0(void)
-{ 
+static inline unsigned long native_read_cr0(void)
+{
unsigned long cr0;
asm volatile("movq %%cr0,%0" : "=r" (cr0));
return cr0;
 }
 
-static inline void write_cr0(unsigned long val) 
-{ 
+static inline void native_write_cr0(unsigned long val)
+{
asm volatile("movq %0,%%cr0" :: "r" (val));
 }
 
-static inline unsigned long read_cr2(void)
+static inline unsigned long native_read_cr2(void)
 {
unsigned long cr2;
asm("movq %%cr2,%0" : "=r" (cr2));
return cr2;
 }
 
-static inline void write_cr2(unsigned long val)
+static inline void native_write_cr2(unsigned long val)
 {
asm volatile("movq %0,%%cr2" :: "r" (val));
 }
 
-static inline unsigned long read_cr3(void)
-{ 
+static inline unsigned long native_read_cr3(void)
+{
unsigned long cr3;
asm("movq %%cr3,%0" : "=r" (cr3));
return cr3;
 }
 
-static inline void write_cr3(unsigned long val)
+static inline void native_write_cr3(unsigned long val)
 {
asm volatile("movq %0,%%cr3" :: "r" (val) : "memory");
 }
 
-static inline unsigned long read_cr4(void)
-{ 
+static inline unsigned long native_read_cr4(void)
+{
unsigned long cr4;
asm("movq %%cr4,%0" : "=r" (cr4));
return cr4;
 }
 
-static inline void write_cr4(unsigned long val)
-{ 
+static inline void native_write_cr4(unsigned long val)
+{
asm volatile("movq %0,%%cr4" :: "r" (val) : "memory");
 }
 
@@ -130,10 +133,27 @@ static inline void write_cr8(unsigned long val)
asm volatile("movq %0,%%cr8" :: "r" (val) : "memory");
 }
 
-#define stts() write_cr0(8 | read_cr0())
+static inline void native_wbinvd(void)
+{
+   asm volatile ("wbinvd" ::: "memory");
+}
 
-#define wbinvd() \
-   __asm__ __volatile__ ("wbinvd": : :"memory")
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define clts   native_clts
+#define wbinvd native_wbinvd
+#define read_cr0   native_read_cr0
+#define read_cr2   native_read_cr2
+#define read_cr3   native_read_cr3
+#define read_cr4   native_read_cr4
+#define write_cr0  native_write_cr0
+#define write_cr2  native_write_cr2
+#define write_cr3  native_write_cr3
+#define write_cr4  native_write_cr4
+#endif
+
+#define stts() write_cr0(8 | read_cr0())
 
 #endif /* __KERNEL__ */
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/25 -v2] add native_apic read and write functions, as well as boot clocks ones

2007-08-10 Thread Glauber de Oliveira Costa
Time for the apic handling functions to get their native counterparts.
Also, put the native hook for the boot clocks functions in the apic.h header

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/apic.c|2 +-
 arch/x86_64/kernel/smpboot.c |8 +++-
 include/asm-x86_64/apic.h|   13 +++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 900ff38..2d233ef 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -1193,7 +1193,7 @@ int __init APIC_init_uniprocessor (void)
setup_IO_APIC();
else
nr_ioapics = 0;
-   setup_boot_APIC_clock();
+   setup_boot_clock();
check_nmi_watchdog();
return 0;
 }
diff --git a/arch/x86_64/kernel/smpboot.c b/arch/x86_64/kernel/smpboot.c
index f99ced6..12d653d 100644
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -338,7 +338,7 @@ void __cpuinit start_secondary(void)
check_tsc_sync_target();
 
Dprintk("cpu %d: setting up apic clock\n", smp_processor_id()); 
-   setup_secondary_APIC_clock();
+   setup_secondary_clock();
 
Dprintk("cpu %d: enabling apic timer\n", smp_processor_id());
 
@@ -468,6 +468,12 @@ static int __cpuinit wakeup_secondary_via_INIT(int 
phys_apicid, unsigned int sta
num_starts = 2;
 
/*
+* Paravirt wants a startup IPI hook here to set up the
+* target processor state.
+*/
+   startup_ipi_hook(phys_apicid, (unsigned long) start_rip,
+(unsigned long) init_rsp);
+   /*
 * Run STARTUP IPI loop.
 */
Dprintk("#startup loops: %d.\n", num_starts);
diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h
index 85125ef..de17908 100644
--- a/include/asm-x86_64/apic.h
+++ b/include/asm-x86_64/apic.h
@@ -38,16 +38,25 @@ struct pt_regs;
  * Basic functions accessing APICs.
  */
 
-static __inline void apic_write(unsigned long reg, unsigned int v)
+static __inline void native_apic_write(unsigned long reg, unsigned int v)
 {
*((volatile unsigned int *)(APIC_BASE+reg)) = v;
 }
 
-static __inline unsigned int apic_read(unsigned long reg)
+static __inline unsigned int native_apic_read(unsigned long reg)
 {
return *((volatile unsigned int *)(APIC_BASE+reg));
 }
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define apic_write(reg, v) native_apic_write(reg, v)
+#define apic_read(reg)  native_apic_read(reg)
+#define setup_boot_clock(void) setup_boot_APIC_clock(void)
+#define setup_secondary_clock(void) setup_secondary_APIC_clock(void)
+#endif
+
 extern void apic_wait_icr_idle(void);
 extern unsigned int safe_apic_wait_icr_idle(void);
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/25 -v2] header file move

2007-08-10 Thread Glauber de Oliveira Costa
Later on, the paravirt_ops patch will deference the vm_area_struct
in asm/pgtable.h. It means this define must be after the struct
definition

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/linux/mm.h |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 655094d..c3f8561 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -35,11 +35,6 @@ extern int sysctl_legacy_va_layout;
 #define sysctl_legacy_va_layout 0
 #endif
 
-#include 
-#include 
-#include 
-
-#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
 /*
  * Linux kernel virtual memory manager primitives.
@@ -113,6 +108,15 @@ struct vm_area_struct {
 #endif
 };
 
+#include 
+/*
+ * pgtable.h must be included after the definition of vm_area_struct.
+ * x86_64 pgtable.h is one of the dereferencers of this struct
+ */
+#include 
+#include 
+
+#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 extern struct kmem_cache *vm_area_cachep;
 
 /*
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa

Jeremy Fitzhardinge escreveu:

Glauber de Oliveira Costa wrote:

I think the idea you gave me earlier of using probe_kernel_address could
work. Xen/lguest/put_yours_here that won't use an ebda would then have
to unmap the page, to make sure a read would fault.


Hm, the memory might be mapped anyway, but we could make sure its all
zero.  discover_ebda should be able to deal with that OK.

J

Indeed, as the EBDA_ADDR_POINTER is not aligned, this may work even better.

It seems to me safe to assume that if we read zero on that line:

ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);

We could just do ebda_size = 0 and go home happy, skipping the rest of 
the process.


Andi, are you okay with it ?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa

Jeremy Fitzhardinge escreveu:

Glauber de Oliveira Costa wrote:

On 8/9/07, Alan Cox <[EMAIL PROTECTED]> wrote:
  

What's the EBDA actually used for?  The only place which seems to use
ebda_addr is in the e820 code to avoid that area as RAM.
  

It belongs to the firmware.


Wouldn't it be better, then, to just skip this step unconditionally if
we are running a paravirtualized guest? What do we from doing it?
  


It's better to make discover_ebda() quietly cope with a missing ebda for
whatever reason.  We could add an explicit interface to paravirt_ops to
handle this one little corner, but it isn't very important, not very
general and really its just clutter.  Its much better to have things
cope with being virtualized quietly on their own rather than hit them
all with the pv_ops hammer.   pv_ops is really for things where the
hypervisor-specific code really has to get actively involved.


I think the idea you gave me earlier of using probe_kernel_address could
work. Xen/lguest/put_yours_here that won't use an ebda would then have 
to unmap the page, to make sure a read would fault.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa
On 8/9/07, Alan Cox <[EMAIL PROTECTED]> wrote:
> > What's the EBDA actually used for?  The only place which seems to use
> > ebda_addr is in the e820 code to avoid that area as RAM.
>
> It belongs to the firmware.

Wouldn't it be better, then, to just skip this step unconditionally if
we are running a paravirtualized guest? What do we from doing it?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa

Jeremy Fitzhardinge escreveu:

Glauber de Oliveira Costa wrote:

On 8/9/07, Alan Cox [EMAIL PROTECTED] wrote:
  

What's the EBDA actually used for?  The only place which seems to use
ebda_addr is in the e820 code to avoid that area as RAM.
  

It belongs to the firmware.


Wouldn't it be better, then, to just skip this step unconditionally if
we are running a paravirtualized guest? What do we from doing it?
  


It's better to make discover_ebda() quietly cope with a missing ebda for
whatever reason.  We could add an explicit interface to paravirt_ops to
handle this one little corner, but it isn't very important, not very
general and really its just clutter.  Its much better to have things
cope with being virtualized quietly on their own rather than hit them
all with the pv_ops hammer.   pv_ops is really for things where the
hypervisor-specific code really has to get actively involved.


I think the idea you gave me earlier of using probe_kernel_address could
work. Xen/lguest/put_yours_here that won't use an ebda would then have 
to unmap the page, to make sure a read would fault.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa
On 8/9/07, Alan Cox [EMAIL PROTECTED] wrote:
  What's the EBDA actually used for?  The only place which seems to use
  ebda_addr is in the e820 code to avoid that area as RAM.

 It belongs to the firmware.

Wouldn't it be better, then, to just skip this step unconditionally if
we are running a paravirtualized guest? What do we from doing it?

-- 
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa

Jeremy Fitzhardinge escreveu:

Glauber de Oliveira Costa wrote:

I think the idea you gave me earlier of using probe_kernel_address could
work. Xen/lguest/put_yours_here that won't use an ebda would then have
to unmap the page, to make sure a read would fault.


Hm, the memory might be mapped anyway, but we could make sure its all
zero.  discover_ebda should be able to deal with that OK.

J

Indeed, as the EBDA_ADDR_POINTER is not aligned, this may work even better.

It seems to me safe to assume that if we read zero on that line:

ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);

We could just do ebda_size = 0 and go home happy, skipping the rest of 
the process.


Andi, are you okay with it ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 5/25 -v2] native versions for system.h functions

2007-08-10 Thread Glauber de Oliveira Costa
This patch adds the native hook for the functions in system.h
They are the read/write_crX, clts and wbinvd. The later, also
gets its call sites patched.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/tce.c|2 +-
 arch/x86_64/mm/pageattr.c   |2 +-
 include/asm-x86_64/system.h |   54 +-
 3 files changed, 39 insertions(+), 19 deletions(-)

diff --git a/arch/x86_64/kernel/tce.c b/arch/x86_64/kernel/tce.c
index e3f2569..587f0c2 100644
--- a/arch/x86_64/kernel/tce.c
+++ b/arch/x86_64/kernel/tce.c
@@ -42,7 +42,7 @@ static inline void flush_tce(void* tceaddr)
if (cpu_has_clflush)
asm volatile(clflush (%0) :: r (tceaddr));
else
-   asm volatile(wbinvd:::memory);
+   wbinvd();
 }
 
 void tce_build(struct iommu_table *tbl, unsigned long index,
diff --git a/arch/x86_64/mm/pageattr.c b/arch/x86_64/mm/pageattr.c
index 7e161c6..b497afd 100644
--- a/arch/x86_64/mm/pageattr.c
+++ b/arch/x86_64/mm/pageattr.c
@@ -76,7 +76,7 @@ static void flush_kernel_map(void *arg)
/* When clflush is available always use it because it is
   much cheaper than WBINVD. */
if (!cpu_has_clflush)
-   asm volatile(wbinvd ::: memory);
+   wbinvd();
else list_for_each_entry(pg, l, lru) {
void *adr = page_address(pg);
cache_flush_page(adr);
diff --git a/include/asm-x86_64/system.h b/include/asm-x86_64/system.h
index 02175aa..20ed9df 100644
--- a/include/asm-x86_64/system.h
+++ b/include/asm-x86_64/system.h
@@ -68,53 +68,56 @@ extern void load_gs_index(unsigned);
 /*
  * Clear and set 'TS' bit respectively
  */
-#define clts() __asm__ __volatile__ (clts)
+static inline void native_clts(void)
+{
+   asm volatile (clts);
+}
 
-static inline unsigned long read_cr0(void)
-{ 
+static inline unsigned long native_read_cr0(void)
+{
unsigned long cr0;
asm volatile(movq %%cr0,%0 : =r (cr0));
return cr0;
 }
 
-static inline void write_cr0(unsigned long val) 
-{ 
+static inline void native_write_cr0(unsigned long val)
+{
asm volatile(movq %0,%%cr0 :: r (val));
 }
 
-static inline unsigned long read_cr2(void)
+static inline unsigned long native_read_cr2(void)
 {
unsigned long cr2;
asm(movq %%cr2,%0 : =r (cr2));
return cr2;
 }
 
-static inline void write_cr2(unsigned long val)
+static inline void native_write_cr2(unsigned long val)
 {
asm volatile(movq %0,%%cr2 :: r (val));
 }
 
-static inline unsigned long read_cr3(void)
-{ 
+static inline unsigned long native_read_cr3(void)
+{
unsigned long cr3;
asm(movq %%cr3,%0 : =r (cr3));
return cr3;
 }
 
-static inline void write_cr3(unsigned long val)
+static inline void native_write_cr3(unsigned long val)
 {
asm volatile(movq %0,%%cr3 :: r (val) : memory);
 }
 
-static inline unsigned long read_cr4(void)
-{ 
+static inline unsigned long native_read_cr4(void)
+{
unsigned long cr4;
asm(movq %%cr4,%0 : =r (cr4));
return cr4;
 }
 
-static inline void write_cr4(unsigned long val)
-{ 
+static inline void native_write_cr4(unsigned long val)
+{
asm volatile(movq %0,%%cr4 :: r (val) : memory);
 }
 
@@ -130,10 +133,27 @@ static inline void write_cr8(unsigned long val)
asm volatile(movq %0,%%cr8 :: r (val) : memory);
 }
 
-#define stts() write_cr0(8 | read_cr0())
+static inline void native_wbinvd(void)
+{
+   asm volatile (wbinvd ::: memory);
+}
 
-#define wbinvd() \
-   __asm__ __volatile__ (wbinvd: : :memory)
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define clts   native_clts
+#define wbinvd native_wbinvd
+#define read_cr0   native_read_cr0
+#define read_cr2   native_read_cr2
+#define read_cr3   native_read_cr3
+#define read_cr4   native_read_cr4
+#define write_cr0  native_write_cr0
+#define write_cr2  native_write_cr2
+#define write_cr3  native_write_cr3
+#define write_cr4  native_write_cr4
+#endif
+
+#define stts() write_cr0(8 | read_cr0())
 
 #endif /* __KERNEL__ */
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 6/25 -v2] add native_apic read and write functions, as well as boot clocks ones

2007-08-10 Thread Glauber de Oliveira Costa
Time for the apic handling functions to get their native counterparts.
Also, put the native hook for the boot clocks functions in the apic.h header

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/apic.c|2 +-
 arch/x86_64/kernel/smpboot.c |8 +++-
 include/asm-x86_64/apic.h|   13 +++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/arch/x86_64/kernel/apic.c b/arch/x86_64/kernel/apic.c
index 900ff38..2d233ef 100644
--- a/arch/x86_64/kernel/apic.c
+++ b/arch/x86_64/kernel/apic.c
@@ -1193,7 +1193,7 @@ int __init APIC_init_uniprocessor (void)
setup_IO_APIC();
else
nr_ioapics = 0;
-   setup_boot_APIC_clock();
+   setup_boot_clock();
check_nmi_watchdog();
return 0;
 }
diff --git a/arch/x86_64/kernel/smpboot.c b/arch/x86_64/kernel/smpboot.c
index f99ced6..12d653d 100644
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -338,7 +338,7 @@ void __cpuinit start_secondary(void)
check_tsc_sync_target();
 
Dprintk(cpu %d: setting up apic clock\n, smp_processor_id()); 
-   setup_secondary_APIC_clock();
+   setup_secondary_clock();
 
Dprintk(cpu %d: enabling apic timer\n, smp_processor_id());
 
@@ -468,6 +468,12 @@ static int __cpuinit wakeup_secondary_via_INIT(int 
phys_apicid, unsigned int sta
num_starts = 2;
 
/*
+* Paravirt wants a startup IPI hook here to set up the
+* target processor state.
+*/
+   startup_ipi_hook(phys_apicid, (unsigned long) start_rip,
+(unsigned long) init_rsp);
+   /*
 * Run STARTUP IPI loop.
 */
Dprintk(#startup loops: %d.\n, num_starts);
diff --git a/include/asm-x86_64/apic.h b/include/asm-x86_64/apic.h
index 85125ef..de17908 100644
--- a/include/asm-x86_64/apic.h
+++ b/include/asm-x86_64/apic.h
@@ -38,16 +38,25 @@ struct pt_regs;
  * Basic functions accessing APICs.
  */
 
-static __inline void apic_write(unsigned long reg, unsigned int v)
+static __inline void native_apic_write(unsigned long reg, unsigned int v)
 {
*((volatile unsigned int *)(APIC_BASE+reg)) = v;
 }
 
-static __inline unsigned int apic_read(unsigned long reg)
+static __inline unsigned int native_apic_read(unsigned long reg)
 {
return *((volatile unsigned int *)(APIC_BASE+reg));
 }
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define apic_write(reg, v) native_apic_write(reg, v)
+#define apic_read(reg)  native_apic_read(reg)
+#define setup_boot_clock(void) setup_boot_APIC_clock(void)
+#define setup_secondary_clock(void) setup_secondary_APIC_clock(void)
+#endif
+
 extern void apic_wait_icr_idle(void);
 extern unsigned int safe_apic_wait_icr_idle(void);
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/25 -v2] header file move

2007-08-10 Thread Glauber de Oliveira Costa
Later on, the paravirt_ops patch will deference the vm_area_struct
in asm/pgtable.h. It means this define must be after the struct
definition

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/linux/mm.h |   14 +-
 1 files changed, 9 insertions(+), 5 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 655094d..c3f8561 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -35,11 +35,6 @@ extern int sysctl_legacy_va_layout;
 #define sysctl_legacy_va_layout 0
 #endif
 
-#include asm/page.h
-#include asm/pgtable.h
-#include asm/processor.h
-
-#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
 /*
  * Linux kernel virtual memory manager primitives.
@@ -113,6 +108,15 @@ struct vm_area_struct {
 #endif
 };
 
+#include asm/page.h
+/*
+ * pgtable.h must be included after the definition of vm_area_struct.
+ * x86_64 pgtable.h is one of the dereferencers of this struct
+ */
+#include asm/pgtable.h
+#include asm/processor.h
+
+#define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 extern struct kmem_cache *vm_area_cachep;
 
 /*
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 0/25 -v2] paravirt_ops for x86_64, second round

2007-08-10 Thread Glauber de Oliveira Costa
Here is an slightly updated version of the paravirt_ops patch.
If your comments and criticism were welcome before, now it's even more!

There are some issues that are _not_ addressed in this revision, and here
are the causes:

* split debugreg into multiple functions, suggested by Andi:
  - Me and jsfg agree that introducing more pvops (specially 14!) is
not worthwhile. So, although we do can keep one pvops function and turn
the set/get debugreg macros into multiple ones, this is a general kernel
issue, and can be addressed by a later patch.

* 2MB pages, and other functions that lives in pgalloc.h in the i386 version
  - As xen is the main user of it (i.e., lguest does not), we'd prefer to
see an implementation of it from xen folks, or any other that understand
it better. This way we don't delay the merge process of the
already-written chunk. On the contrary, it will be easier to get that,
as it will be smaller

If you addressed some concern before that is _not_ covered in this revision,
so it is my fault. Please fell free to voice it

Have fun!
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 8/25 -v2] use macro for sti/cli in spinlock definitions

2007-08-10 Thread Glauber de Oliveira Costa
This patch switches the cli and sti instructions into macros.
In this header, they're just defined to the instructions they
refer to. Later on, when paravirt is defined, they will be
defined to something with paravirt abilities.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/spinlock.h |   16 +---
 1 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/spinlock.h b/include/asm-x86_64/spinlock.h
index 88bf981..5bb5bf8 100644
--- a/include/asm-x86_64/spinlock.h
+++ b/include/asm-x86_64/spinlock.h
@@ -5,6 +5,14 @@
 #include asm/rwlock.h
 #include asm/page.h
 #include asm/processor.h
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define CLI_STI_INPUT_ARGS
+#define CLI_STI_CLOBBERS
+#define CLI_STRING cli
+#define STI_STRING sti
+#endif
 
 /*
  * Your basic SMP spinlocks, allowing only a single CPU anywhere
@@ -48,12 +56,12 @@ static inline void __raw_spin_lock_flags(raw_spinlock_t 
*lock, unsigned long fla
jns 5f\n
testl $0x200, %1\n\t  /* interrupts were disabled? */
jz 4f\n\t
-   sti\n
+   STI_STRING \n\t
3:\t
rep;nop\n\t
cmpl $0, %0\n\t
jle 3b\n\t
-   cli\n\t
+   CLI_STRING \n\t
jmp 1b\n
4:\t
rep;nop\n\t
@@ -61,7 +69,9 @@ static inline void __raw_spin_lock_flags(raw_spinlock_t 
*lock, unsigned long fla
jg 1b\n\t
jmp 4b\n
5:\n\t
-   : +m (lock-slock) : r ((unsigned)flags) : memory);
+   : +m (lock-slock)
+   : r ((unsigned)flags) CLI_STI_INPUT_ARGS
+   : memory CLI_STI_CLOBBERS);
 }
 #endif
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/25 -v2] tlb flushing routines

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the flush_tlb routines into native versions.
In case paravirt is not defined, the natives are defined into
the actually used ones. flush_tlb_others() goes in smp.c, unless
smp is not in the game

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/smp.c  |   10 +-
 include/asm-x86_64/smp.h  |8 
 include/asm-x86_64/tlbflush.h |   22 ++
 3 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/arch/x86_64/kernel/smp.c b/arch/x86_64/kernel/smp.c
index 673a300..39f5f6b 100644
--- a/arch/x86_64/kernel/smp.c
+++ b/arch/x86_64/kernel/smp.c
@@ -165,7 +165,7 @@ out:
cpu_clear(cpu, f-flush_cpumask);
 }
 
-static void flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
+void native_flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
unsigned long va)
 {
int sender;
@@ -198,6 +198,14 @@ static void flush_tlb_others(cpumask_t cpumask, struct 
mm_struct *mm,
spin_unlock(f-tlbstate_lock);
 }
 
+/* Overriden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) flush_tlb_others(cpumask_t cpumask,
+   struct mm_struct *mm,
+   unsigned long va)
+{
+   native_flush_tlb_others(cpumask, mm, va);
+}
+
 int __cpuinit init_smp_flush(void)
 {
int i;
diff --git a/include/asm-x86_64/smp.h b/include/asm-x86_64/smp.h
index 3f303d2..6b4 100644
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -19,6 +19,14 @@ extern int disable_apic;
 
 #include asm/pda.h
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+void native_flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
+   unsigned long va);
+#else
+#define startup_ipi_hook(apicid, rip, rsp) do { } while (0)
+#endif
+
 struct pt_regs;
 
 extern cpumask_t cpu_present_mask;
diff --git a/include/asm-x86_64/tlbflush.h b/include/asm-x86_64/tlbflush.h
index 888eb4a..1c68cc8 100644
--- a/include/asm-x86_64/tlbflush.h
+++ b/include/asm-x86_64/tlbflush.h
@@ -6,21 +6,30 @@
 #include asm/processor.h
 #include asm/system.h
 
-static inline void __flush_tlb(void)
+static inline void native_flush_tlb(void)
 {
write_cr3(read_cr3());
 }
 
-static inline void __flush_tlb_all(void)
+static inline void native_flush_tlb_all(void)
 {
unsigned long cr4 = read_cr4();
write_cr4(cr4  ~X86_CR4_PGE);  /* clear PGE */
write_cr4(cr4); /* write old PGE again and flush TLBs */
 }
 
-#define __flush_tlb_one(addr) \
-   __asm__ __volatile__(invlpg (%0) :: r (addr) : memory)
+static inline void native_flush_tlb_one(unsigned long addr)
+{
+   asm volatile (invlpg (%0) :: r (addr) : memory);
+}
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define __flush_tlb()  native_flush_tlb()
+#define __flush_tlb_all()  native_flush_tlb_all()
+#define __flush_tlb_one(addr)  native_flush_tlb_one(addr)
+#endif /* CONFIG_PARAVIRT */
 
 /*
  * TLB flushing:
@@ -64,6 +73,11 @@ static inline void flush_tlb_range(struct vm_area_struct 
*vma,
__flush_tlb();
 }
 
+static inline void native_flush_tlb_others(cpumask_t *cpumask,
+  struct mm_struct *mm, unsigned long 
va)
+{
+}
+
 #else
 
 #include asm/smp.h
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 3/25 -v2] irq_flags / halt routines

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the irq_flags and halt routines into the
native versions.

[ updates from v1
Move raw_irqs_disabled_flags outside of the PARAVIRT ifdef to
avoid increasing the mess, suggested by Andi Kleen
]

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/irqflags.h |   37 ++---
 1 files changed, 26 insertions(+), 11 deletions(-)

diff --git a/include/asm-x86_64/irqflags.h b/include/asm-x86_64/irqflags.h
index 86e70fe..fe0d346 100644
--- a/include/asm-x86_64/irqflags.h
+++ b/include/asm-x86_64/irqflags.h
@@ -16,6 +16,10 @@
  * Interrupt control:
  */
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else /* PARAVIRT */
+
 static inline unsigned long __raw_local_save_flags(void)
 {
unsigned long flags;
@@ -31,9 +35,6 @@ static inline unsigned long __raw_local_save_flags(void)
return flags;
 }
 
-#define raw_local_save_flags(flags) \
-   do { (flags) = __raw_local_save_flags(); } while (0)
-
 static inline void raw_local_irq_restore(unsigned long flags)
 {
__asm__ __volatile__(
@@ -64,11 +65,6 @@ static inline void raw_local_irq_enable(void)
raw_local_irq_restore((flags | X86_EFLAGS_IF)  (~X86_EFLAGS_AC));
 }
 
-static inline int raw_irqs_disabled_flags(unsigned long flags)
-{
-   return !(flags  X86_EFLAGS_IF) || (flags  X86_EFLAGS_AC);
-}
-
 #else /* CONFIG_X86_VSMP */
 
 static inline void raw_local_irq_disable(void)
@@ -81,13 +77,27 @@ static inline void raw_local_irq_enable(void)
__asm__ __volatile__(sti : : : memory);
 }
 
+#endif /* CONFIG_X86_VSMP */
+#endif /* CONFIG_PARAVIRT */
+
+/* Those are not paravirt stubs, so they live out of the PARAVIRT ifdef */
+
+#ifdef CONFIG_X86_VSMP
+static inline int raw_irqs_disabled_flags(unsigned long flags)
+{
+   return !(flags  X86_EFLAGS_IF) || (flags  X86_EFLAGS_AC);
+}
+
+#else
 static inline int raw_irqs_disabled_flags(unsigned long flags)
 {
return !(flags  X86_EFLAGS_IF);
 }
 
-#endif
+#endif /* CONFIG_X86_VSMP */
 
+#define raw_local_save_flags(flags) \
+   do { (flags) = __raw_local_save_flags(); } while (0)
 /*
  * For spinlocks, etc.:
  */
@@ -115,7 +125,7 @@ static inline int raw_irqs_disabled(void)
  * Used in the idle loop; sti takes one instruction cycle
  * to complete:
  */
-static inline void raw_safe_halt(void)
+static inline void native_raw_safe_halt(void)
 {
__asm__ __volatile__(sti; hlt : : : memory);
 }
@@ -124,11 +134,16 @@ static inline void raw_safe_halt(void)
  * Used when interrupts are already enabled or to
  * shutdown the processor:
  */
-static inline void halt(void)
+static inline void native_halt(void)
 {
__asm__ __volatile__(hlt: : :memory);
 }
 
+#ifndef CONFIG_PARAVIRT
+#define raw_safe_halt  native_raw_safe_halt
+#define halt   native_halt
+#endif /* ! CONFIG_PARAVIRT */
+
 #else /* __ASSEMBLY__: */
 # ifdef CONFIG_TRACE_IRQFLAGS
 #  define TRACE_IRQS_ONcall trace_hardirqs_on_thunk
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 4/25 -v2] Add debugreg/load_rsp native hooks

2007-08-10 Thread Glauber de Oliveira Costa
This patch adds native hooks for debugreg handling functions,
and for the native load_rsp0 function. The later also have its
call sites patched.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/process.c   |2 +-
 arch/x86_64/kernel/smpboot.c   |2 +-
 include/asm-x86_64/processor.h |   71 
 3 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/arch/x86_64/kernel/process.c b/arch/x86_64/kernel/process.c
index 2842f50..33046f1 100644
--- a/arch/x86_64/kernel/process.c
+++ b/arch/x86_64/kernel/process.c
@@ -595,7 +595,7 @@ __switch_to(struct task_struct *prev_p, struct task_struct 
*next_p)
/*
 * Reload esp0, LDT and the page table pointer:
 */
-   tss-rsp0 = next-rsp0;
+   load_rsp0(tss, next);
 
/* 
 * Switch DS and ES.
diff --git a/arch/x86_64/kernel/smpboot.c b/arch/x86_64/kernel/smpboot.c
index 32f5078..f99ced6 100644
--- a/arch/x86_64/kernel/smpboot.c
+++ b/arch/x86_64/kernel/smpboot.c
@@ -620,7 +620,7 @@ do_rest:
start_rip = setup_trampoline();
 
init_rsp = c_idle.idle-thread.rsp;
-   per_cpu(init_tss,cpu).rsp0 = init_rsp;
+   load_rsp0(per_cpu(init_tss,cpu), c_idle.idle-thread);
initial_code = start_secondary;
clear_tsk_thread_flag(c_idle.idle, TIF_FORK);
 
diff --git a/include/asm-x86_64/processor.h b/include/asm-x86_64/processor.h
index 1952517..65f689b 100644
--- a/include/asm-x86_64/processor.h
+++ b/include/asm-x86_64/processor.h
@@ -249,6 +249,12 @@ struct thread_struct {
.rsp0 = (unsigned long)init_stack + sizeof(init_stack) \
 }
 
+static inline void native_load_rsp0(struct tss_struct *tss,
+   struct thread_struct *thread)
+{
+   tss-rsp0 = thread-rsp0;
+}
+
 #define INIT_MMAP \
 { init_mm, 0, 0, NULL, PAGE_SHARED, VM_READ | VM_WRITE | VM_EXEC, 1, NULL, 
NULL }
 
@@ -264,13 +270,64 @@ struct thread_struct {
set_fs(USER_DS);
 \
 } while(0) 
 
-#define get_debugreg(var, register)\
-   __asm__(movq %%db #register , %0\
-   :=r (var))
-#define set_debugreg(value, register)  \
-   __asm__(movq %0,%%db #register\
-   : /* no output */   \
-   :r (value))
+static inline unsigned long native_get_debugreg(int regno)
+{
+   unsigned long val;
+
+   switch (regno) {
+   case 0:
+   asm(movq %%db0, %0 :=r (val)); break;
+   case 1:
+   asm(movq %%db1, %0 :=r (val)); break;
+   case 2:
+   asm(movq %%db2, %0 :=r (val)); break;
+   case 3:
+   asm(movq %%db3, %0 :=r (val)); break;
+   case 6:
+   asm(movq %%db6, %0 :=r (val)); break;
+   case 7:
+   asm(movq %%db7, %0 :=r (val)); break;
+   default:
+   val = 0; /* assign it to keep gcc quiet */
+   WARN_ON(1);
+   }
+   return val;
+}
+
+static inline void native_set_debugreg(unsigned long value, int regno)
+{
+   switch (regno) {
+   case 0:
+   asm(movq %0,%%db0 : /* no output */ :r (value));
+   break;
+   case 1:
+   asm(movq %0,%%db1 : /* no output */ :r (value));
+   break;
+   case 2:
+   asm(movq %0,%%db2 : /* no output */ :r (value));
+   break;
+   case 3:
+   asm(movq %0,%%db3 : /* no output */ :r (value));
+   break;
+   case 6:
+   asm(movq %0,%%db6 : /* no output */ :r (value));
+   break;
+   case 7:
+   asm(movq %0,%%db7 : /* no output */ :r (value));
+   break;
+   default:
+   BUG();
+   }
+}
+
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define paravirt_enabled() 0
+#define load_rsp0  native_load_rsp0
+#define set_debugreg(val, reg) native_set_debugreg(reg, val)
+#define get_debugreg(var, reg) (var) = native_get_debugreg(reg)
+#endif
 
 struct task_struct;
 struct mm_struct;
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/25 -v2] native versions for set pagetables

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the set_p{te,md,ud,gd} functions into their
native_ versions. There is no need to patch any caller.

Also, it adds pte_update() and pte_update_defer() calls whenever
we modify a page table entry. This last part was coded to match
i386 as close as possible.

Pieces of the header are moved to below the #ifdef CONFIG_PARAVIRT
site, as they are users of the newly defined set_* macros.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/pgtable.h |  152 -
 1 files changed, 89 insertions(+), 63 deletions(-)

diff --git a/include/asm-x86_64/pgtable.h b/include/asm-x86_64/pgtable.h
index c9d8764..dd572a2 100644
--- a/include/asm-x86_64/pgtable.h
+++ b/include/asm-x86_64/pgtable.h
@@ -57,55 +57,77 @@ extern unsigned long 
empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
  */
 #define PTRS_PER_PTE   512
 
-#ifndef __ASSEMBLY__
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
 
-#define pte_ERROR(e) \
-   printk(%s:%d: bad pte %p(%016lx).\n, __FILE__, __LINE__, (e), 
pte_val(e))
-#define pmd_ERROR(e) \
-   printk(%s:%d: bad pmd %p(%016lx).\n, __FILE__, __LINE__, (e), 
pmd_val(e))
-#define pud_ERROR(e) \
-   printk(%s:%d: bad pud %p(%016lx).\n, __FILE__, __LINE__, (e), 
pud_val(e))
-#define pgd_ERROR(e) \
-   printk(%s:%d: bad pgd %p(%016lx).\n, __FILE__, __LINE__, (e), 
pgd_val(e))
+#define set_pte native_set_pte
+#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+#define set_pmd native_set_pmd
+#define set_pud native_set_pud
+#define set_pgd native_set_pgd
+#define pte_clear(mm,addr,xp)  do { set_pte_at(mm, addr, xp, __pte(0)); } 
while (0)
+#define pmd_clear(xp)  do { set_pmd(xp, __pmd(0)); } while (0)
+#define pud_clear native_pud_clear
+#define pgd_clear native_pgd_clear
+#define pte_update(mm, addr, ptep)  do { } while (0)
+#define pte_update_defer(mm, addr, ptep)do { } while (0)
 
-#define pgd_none(x)(!pgd_val(x))
-#define pud_none(x)(!pud_val(x))
+#endif
+
+#ifndef __ASSEMBLY__
 
-static inline void set_pte(pte_t *dst, pte_t val)
+static inline void native_set_pte(pte_t *dst, pte_t val)
 {
-   pte_val(*dst) = pte_val(val);
+   dst-pte = pte_val(val);
 } 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
 
-static inline void set_pmd(pmd_t *dst, pmd_t val)
+
+static inline void native_set_pmd(pmd_t *dst, pmd_t val)
 {
-pmd_val(*dst) = pmd_val(val); 
+   dst-pmd = pmd_val(val);
 } 
 
-static inline void set_pud(pud_t *dst, pud_t val)
+static inline void native_set_pud(pud_t *dst, pud_t val)
 {
-   pud_val(*dst) = pud_val(val);
+   dst-pud = pud_val(val);
 }
 
-static inline void pud_clear (pud_t *pud)
+static inline void native_set_pgd(pgd_t *dst, pgd_t val)
 {
-   set_pud(pud, __pud(0));
+   dst-pgd = pgd_val(val);
 }
-
-static inline void set_pgd(pgd_t *dst, pgd_t val)
+static inline void native_pud_clear (pud_t *pud)
 {
-   pgd_val(*dst) = pgd_val(val); 
-} 
+   set_pud(pud, __pud(0));
+}
 
-static inline void pgd_clear (pgd_t * pgd)
+static inline void native_pgd_clear (pgd_t * pgd)
 {
set_pgd(pgd, __pgd(0));
 }
 
-#define ptep_get_and_clear(mm,addr,xp) __pte(xchg((xp)-pte, 0))
+#define pte_ERROR(e) \
+   printk(%s:%d: bad pte %p(%016lx).\n, __FILE__, __LINE__, (e), 
pte_val(e))
+#define pmd_ERROR(e) \
+   printk(%s:%d: bad pmd %p(%016lx).\n, __FILE__, __LINE__, (e), 
pmd_val(e))
+#define pud_ERROR(e) \
+   printk(%s:%d: bad pud %p(%016lx).\n, __FILE__, __LINE__, (e), 
pud_val(e))
+#define pgd_ERROR(e) \
+   printk(%s:%d: bad pgd %p(%016lx).\n, __FILE__, __LINE__, (e), 
pgd_val(e))
+
+#define pgd_none(x)(!pgd_val(x))
+#define pud_none(x)(!pud_val(x))
 
 struct mm_struct;
 
+static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+   pte_t pte = __pte(xchg(ptep-pte, 0));
+   pte_update(mm, addr, ptep);
+   return pte;
+}
+
 static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned 
long addr, pte_t *ptep, int full)
 {
pte_t pte;
@@ -245,7 +267,6 @@ static inline unsigned long pmd_bad(pmd_t pmd)
 
 #define pte_none(x)(!pte_val(x))
 #define pte_present(x) (pte_val(x)  (_PAGE_PRESENT | _PAGE_PROTNONE))
-#define pte_clear(mm,addr,xp)  do { set_pte_at(mm, addr, xp, __pte(0)); } 
while (0)
 
 #define pages_to_mb(x) ((x)  (20-PAGE_SHIFT))/* FIXME: is this
   right? */
@@ -254,11 +275,11 @@ static inline unsigned long pmd_bad(pmd_t pmd)
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
-   pte_t pte;
-   pte_val(pte) = (page_nr  PAGE_SHIFT);
-   pte_val(pte) |= pgprot_val(pgprot);
-   pte_val(pte) = __supported_pte_mask;
-   return pte;
+   unsigned long pte;
+   pte = (page_nr  PAGE_SHIFT);
+   pte |= pgprot_val(pgprot);
+   pte

[PATCH 10/25 -v2] export math_state_restore

2007-08-10 Thread Glauber de Oliveira Costa
Export math_state_restore symbol, so it can be used for hypervisors.
They are commonly loaded as modules.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/traps.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/traps.c b/arch/x86_64/kernel/traps.c
index 0388842..aacbe12 100644
--- a/arch/x86_64/kernel/traps.c
+++ b/arch/x86_64/kernel/traps.c
@@ -1081,6 +1081,7 @@ asmlinkage void math_state_restore(void)
task_thread_info(me)-status |= TS_USEDFPU;
me-fpu_counter++;
 }
+EXPORT_SYMBOL_GPL(math_state_restore);
 
 void __init trap_init(void)
 {
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/25 -v2] turn msr.h functions into native versions

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns makes the basic operations in msr.h out of native
ones. Those operations are: rdmsr, wrmsr, rdtsc, rdtscp, rdpmc, and
cpuid. After they are turned into functions, some call sites need
casts, and so we provide them.

There is also a fixup needed in the functions located in the vsyscall
area, as they cannot call any of them anymore (otherwise, the call
would go through a kernel address, invalid in userspace mapping).

The solution is to call the now-provided native_ versions instead.

[  updates from v1
   * Call read_tscp rdtscp, to match instruction name
   * Avoid duplication of code in get_cycles_sync
   * Get rid of rdtsc(), since it is used nowhere else
   All three suggested by Andi Kleen
]

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/ia32/syscall32.c  |2 +-
 arch/x86_64/kernel/setup64.c  |6 +-
 arch/x86_64/kernel/tsc.c  |   24 -
 arch/x86_64/kernel/vsyscall.c |4 +-
 arch/x86_64/vdso/vgetcpu.c|4 +-
 include/asm-i386/tsc.h|   12 ++-
 include/asm-x86_64/msr.h  |  277 +
 7 files changed, 211 insertions(+), 118 deletions(-)

diff --git a/arch/x86_64/ia32/syscall32.c b/arch/x86_64/ia32/syscall32.c
index 15013ba..dd1b4a3 100644
--- a/arch/x86_64/ia32/syscall32.c
+++ b/arch/x86_64/ia32/syscall32.c
@@ -79,5 +79,5 @@ void syscall32_cpu_init(void)
checking_wrmsrl(MSR_IA32_SYSENTER_ESP, 0ULL);
checking_wrmsrl(MSR_IA32_SYSENTER_EIP, (u64)ia32_sysenter_target);
 
-   wrmsrl(MSR_CSTAR, ia32_cstar_target);
+   wrmsrl(MSR_CSTAR, (u64)ia32_cstar_target);
 }
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 1200aaa..395cf02 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -122,7 +122,7 @@ void pda_init(int cpu)
asm volatile(movl %0,%%fs ; movl %0,%%gs :: r (0)); 
/* Memory clobbers used to order PDA accessed */
mb();
-   wrmsrl(MSR_GS_BASE, pda);
+   wrmsrl(MSR_GS_BASE, (u64)pda);
mb();
 
pda-cpunumber = cpu; 
@@ -161,8 +161,8 @@ void syscall_init(void)
 * but only a 32bit target. LSTAR sets the 64bit rip.
 */ 
wrmsrl(MSR_STAR,  ((u64)__USER32_CS)48  | ((u64)__KERNEL_CS)32); 
-   wrmsrl(MSR_LSTAR, system_call); 
-   wrmsrl(MSR_CSTAR, ignore_sysret);
+   wrmsrl(MSR_LSTAR, (u64)system_call);
+   wrmsrl(MSR_CSTAR, (u64)ignore_sysret);
 
 #ifdef CONFIG_IA32_EMULATION   
syscall32_cpu_init ();
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
index 2a59bde..2a5fbc9 100644
--- a/arch/x86_64/kernel/tsc.c
+++ b/arch/x86_64/kernel/tsc.c
@@ -9,6 +9,28 @@
 
 #include asm/timex.h
 
+#ifdef CONFIG_PARAVIRT
+/*
+ * When paravirt is on, some functionalities are executed through function
+ * pointers in the paravirt_ops structure, for both the host and guest.
+ * These function pointers exist inside the kernel and can not
+ * be accessed by user space. To avoid this, we make a copy of the
+ * get_cycles_sync (called in kernel) but force the use of native_read_tsc.
+ * For the host, it will simply do the native rdtsc. The guest
+ * should set up it's own clock and vread
+ */
+static __always_inline long long vget_cycles_sync(void)
+{
+   unsigned long long ret;
+   ret =__get_cycles_sync();
+   if (!ret)
+   ret = native_read_tsc();
+   return ret;
+}
+#else
+# define vget_cycles_sync() get_cycles_sync()
+#endif
+
 static int notsc __initdata = 0;
 
 unsigned int cpu_khz;  /* TSC clocks / usec, not used here */
@@ -165,7 +187,7 @@ static cycle_t read_tsc(void)
 
 static cycle_t __vsyscall_fn vread_tsc(void)
 {
-   cycle_t ret = (cycle_t)get_cycles_sync();
+   cycle_t ret = (cycle_t)vget_cycles_sync();
return ret;
 }
 
diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index 06c3494..757874e 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -184,7 +184,7 @@ time_t __vsyscall(1) vtime(time_t *t)
 long __vsyscall(2)
 vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
 {
-   unsigned int dummy, p;
+   unsigned int p;
unsigned long j = 0;
 
/* Fast cache - only recompute value once per jiffies and avoid
@@ -199,7 +199,7 @@ vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache 
*tcache)
p = tcache-blob[1];
} else if (__vgetcpu_mode == VGETCPU_RDTSCP) {
/* Load per CPU data from RDTSCP */
-   rdtscp(dummy, dummy, p);
+   native_rdtscp(p);
} else {
/* Load per CPU data from GDT */
asm(lsl %1,%0 : =r (p) : r (__PER_CPU_SEG));
diff --git a/arch/x86_64/vdso/vgetcpu.c b/arch/x86_64/vdso/vgetcpu.c
index 91f6e85..1f38f61 100644
--- a/arch/x86_64/vdso/vgetcpu.c
+++ b/arch/x86_64/vdso/vgetcpu.c
@@ -15,7 +15,7

[PATCH 18/25 -v2] turn priviled operations into macros in entry.S

2007-08-10 Thread Glauber de Oliveira Costa
With paravirt on, we cannot issue operations like swapgs, sysretq,
iretq, cli, sti. So they have to be changed into macros, that will
be later properly replaced for the paravirt case.

The sysretq is a little bit more complicated, and is replaced
by a sequence of three instructions. It is basically because if
we had already issued an swapgs, we would be with a user stack
at this point. So we do it all-in-one.

The clobber list follows the idea of the i386 version closely,
and represents which caller-saved registers are safe to modify
at the point the function is called. So for example, CLBR_ANY
says we can clobber rax, rdi, rsi, rdx, rcx, r8-r11, while
CLBR_NONE says we cannot touch annything.

[  updates from v1
   * renamed SYSRETQ to SYSCALL_RETURN
   * don't use ENTRY/ENDPROC for native_{syscall_return,iret}
   * fix one use of the clobber list
   * rename SWAPGS_NOSTACK to SWAPGS_UNSAFE_STACK
   * change the unexpressive 1b label to do_iret
   All suggested by Andi Kleen
]

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/entry.S |  130 +---
 1 files changed, 87 insertions(+), 43 deletions(-)

diff --git a/arch/x86_64/kernel/entry.S b/arch/x86_64/kernel/entry.S
index 1d232e5..db8707a 100644
--- a/arch/x86_64/kernel/entry.S
+++ b/arch/x86_64/kernel/entry.S
@@ -51,8 +51,31 @@
 #include asm/page.h
 #include asm/irqflags.h
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define ENABLE_INTERRUPTS(x)   sti
+#define DISABLE_INTERRUPTS(x)  cli
+#define INTERRUPT_RETURN   iretq
+#define SWAPGS swapgs
+#define SYSCALL_RETURN \
+   movq%gs:pda_oldrsp,%rsp;\
+   swapgs; \
+   sysretq;
+#endif
+
.code64
 
+/* Currently paravirt can't handle swapgs nicely when we
+ * don't have a stack we can rely on (such as a user space
+ * stack).  So we either find a way around these or just fault
+ * and emulate if a guest tries to call swapgs directly.
+ *
+ * Either way, this is a good way to document that we don't
+ * have a reliable stack.
+ */
+#define SWAPGS_UNSAFE_STACKswapgs
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif 
@@ -216,14 +239,23 @@ ENTRY(system_call)
CFI_DEF_CFA rsp,PDA_STACKOFFSET
CFI_REGISTERrip,rcx
/*CFI_REGISTER  rflags,r11*/
-   swapgs
+   SWAPGS_UNSAFE_STACK
+#ifdef CONFIG_PARAVIRT
+   /*
+* A hypervisor implementation might want to use a label
+* after the swapgs, so that it can do the swapgs
+* for the guest and jump here on syscall.
+*/
+   .globl system_call_after_swapgs
+system_call_after_swapgs:
+#endif
movq%rsp,%gs:pda_oldrsp 
movq%gs:pda_kernelstack,%rsp
/*
 * No need to follow this irqs off/on section - it's straight
 * and short:
 */
-   sti 
+   ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_ARGS 8,1
movq  %rax,ORIG_RAX-ARGOFFSET(%rsp) 
movq  %rcx,RIP-ARGOFFSET(%rsp)
@@ -245,7 +277,7 @@ ret_from_sys_call:
/* edi: flagmask */
 sysret_check:  
GET_THREAD_INFO(%rcx)
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
movl threadinfo_flags(%rcx),%edx
andl %edi,%edx
@@ -259,9 +291,7 @@ sysret_check:
CFI_REGISTERrip,rcx
RESTORE_ARGS 0,-ARG_SKIP,1
/*CFI_REGISTER  rflags,r11*/
-   movq%gs:pda_oldrsp,%rsp
-   swapgs
-   sysretq
+   SYSCALL_RETURN
 
CFI_RESTORE_STATE
/* Handle reschedules */
@@ -270,7 +300,7 @@ sysret_careful:
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
@@ -281,7 +311,7 @@ sysret_careful:
/* Handle a signal */ 
 sysret_signal:
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
testl $(_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz1f
 
@@ -294,7 +324,7 @@ sysret_signal:
 1: movl $_TIF_NEED_RESCHED,%edi
/* Use IRET because user could have changed frame. This
   works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check

@@ -326,7 +356,7 @@ tracesys:
  */
.globl int_ret_from_sys_call
 int_ret_from_sys_call:
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
testl $3,CS-ARGOFFSET(%rsp)
je retint_restore_args
@@ -347,20 +377,20 @@ int_careful:
bt $TIF_NEED_RESCHED,%edx
jnc  int_very_careful
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS

[PATCH 19/25 -v2] time-related functions paravirt provisions

2007-08-10 Thread Glauber de Oliveira Costa
This patch add provisions for time related functions so they
can be later replaced by paravirt versions.

it basically encloses {g,s}et_wallclock inside the
already existent functions update_persistent_clock and
read_persistent_clock, and defines {s,g}et_wallclock
to the core of such functions.

The timer interrupt setup also have to be replaced.
The job is done by time_init_hook().

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/time.c |   37 +
 include/asm-x86_64/time.h |   18 ++
 2 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 6d48a4e..29fcd91 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -42,6 +42,7 @@
 #include asm/sections.h
 #include linux/hpet.h
 #include asm/apic.h
+#include asm/time.h
 #include asm/hpet.h
 #include asm/mpspec.h
 #include asm/nmi.h
@@ -82,18 +83,12 @@ EXPORT_SYMBOL(profile_pc);
  * sheet for details.
  */
 
-static int set_rtc_mmss(unsigned long nowtime)
+int do_set_rtc_mmss(unsigned long nowtime)
 {
int retval = 0;
int real_seconds, real_minutes, cmos_minutes;
unsigned char control, freq_select;
 
-/*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
- */
-
-   spin_lock(rtc_lock);
 
 /*
  * Tell the clock it's being set and stop it.
@@ -143,14 +138,22 @@ static int set_rtc_mmss(unsigned long nowtime)
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
-   spin_unlock(rtc_lock);
-
return retval;
 }
 
 int update_persistent_clock(struct timespec now)
 {
-   return set_rtc_mmss(now.tv_sec);
+   int retval;
+
+/*
+ * IRQs are disabled when we're called from the timer interrupt,
+ * no need for spin_lock_irqsave()
+ */
+   spin_lock(rtc_lock);
+   retval = set_wallclock(now.tv_sec);
+   spin_unlock(rtc_lock);
+
+   return retval;
 }
 
 void main_timer_handler(void)
@@ -195,7 +198,7 @@ static irqreturn_t timer_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-unsigned long read_persistent_clock(void)
+unsigned long do_get_cmos_time(void)
 {
unsigned int year, mon, day, hour, min, sec;
unsigned long flags;
@@ -246,6 +249,11 @@ unsigned long read_persistent_clock(void)
return mktime(year, mon, day, hour, min, sec);
 }
 
+unsigned long read_persistent_clock(void)
+{
+   return get_wallclock();
+}
+
 /* calibrate_cpu is used on systems with fixed rate TSCs to determine
  * processor frequency */
 #define TICK_COUNT 1
@@ -365,6 +373,11 @@ static struct irqaction irq0 = {
.name   = timer
 };
 
+inline void time_init_hook()
+{
+   setup_irq(0, irq0);
+}
+
 void __init time_init(void)
 {
if (nohpet)
@@ -403,7 +416,7 @@ void __init time_init(void)
cpu_khz / 1000, cpu_khz % 1000);
init_tsc_clocksource();
 
-   setup_irq(0, irq0);
+   do_time_init();
 }
 
 /*
diff --git a/include/asm-x86_64/time.h b/include/asm-x86_64/time.h
new file mode 100644
index 000..9a72355
--- /dev/null
+++ b/include/asm-x86_64/time.h
@@ -0,0 +1,18 @@
+#ifndef _ASM_X86_64_TIME_H
+#define _ASM_X86_64_TIME_H
+
+inline void time_init_hook(void);
+unsigned long do_get_cmos_time(void);
+int do_set_rtc_mmss(unsigned long nowtime);
+
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else /* !CONFIG_PARAVIRT */
+
+#define get_wallclock() do_get_cmos_time()
+#define set_wallclock(x) do_set_rtc_mmss(x)
+#define do_time_init() time_init_hook()
+
+#endif /* CONFIG_PARAVIRT */
+
+#endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/25 -v2] add native functions for descriptors handling

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the basic descriptor handling into native_
functions. It is basically write_idt, load_idt, write_gdt,
load_gdt, set_ldt, store_tr, load_tls, and the ones
for updating a single entry.

In the process of doing that, we change the definition of
load_LDT_nolock, and caller sites have to be patched. We
also patch call sites that now needs a typecast.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/head64.c  |2 +-
 arch/x86_64/kernel/ldt.c |6 +-
 arch/x86_64/kernel/reboot.c  |3 +-
 arch/x86_64/kernel/setup64.c |4 +-
 arch/x86_64/kernel/suspend.c |   11 ++-
 include/asm-x86_64/desc.h|  183 +++--
 include/asm-x86_64/mmu_context.h |4 +-
 7 files changed, 148 insertions(+), 65 deletions(-)

diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c
index 6c34bdd..a0d05d7 100644
--- a/arch/x86_64/kernel/head64.c
+++ b/arch/x86_64/kernel/head64.c
@@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
for (i = 0; i  IDT_ENTRIES; i++)
set_intr_gate(i, early_idt_handler);
-   asm volatile(lidt %0 :: m (idt_descr));
+   load_idt(idt_descr);
 
early_printk(Kernel alive\n);
 
diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c
index bc9ffd5..8e6fcc1 100644
--- a/arch/x86_64/kernel/ldt.c
+++ b/arch/x86_64/kernel/ldt.c
@@ -173,7 +173,7 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
 {
struct task_struct *me = current;
struct mm_struct * mm = me-mm;
-   __u32 entry_1, entry_2, *lp;
+   __u32 entry_1, entry_2;
int error;
struct user_desc ldt_info;
 
@@ -202,7 +202,6 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
goto out_unlock;
}
 
-   lp = (__u32 *) ((ldt_info.entry_number  3) + (char *) 
mm-context.ldt);
 
/* Allow LDTs to be cleared by the user. */
if (ldt_info.base_addr == 0  ldt_info.limit == 0) {
@@ -220,8 +219,7 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
 
/* Install the new entry ...  */
 install:
-   *lp = entry_1;
-   *(lp+1) = entry_2;
+   write_ldt_entry(mm-context.ldt, ldt_info.entry_number, entry_1, 
entry_2);
error = 0;
 
 out_unlock:
diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 368db2b..ebc242c 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -11,6 +11,7 @@
 #include linux/sched.h
 #include asm/io.h
 #include asm/delay.h
+#include asm/desc.h
 #include asm/hw_irq.h
 #include asm/system.h
 #include asm/pgtable.h
@@ -136,7 +137,7 @@ void machine_emergency_restart(void)
}
 
case BOOT_TRIPLE: 
-   __asm__ __volatile__(lidt (%0): :r (no_idt));
+   load_idt((struct desc_ptr *)no_idt);
__asm__ __volatile__(int3);
 
reboot_type = BOOT_KBD;
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 395cf02..49f7342 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -224,8 +224,8 @@ void __cpuinit cpu_init (void)
memcpy(cpu_gdt(cpu), cpu_gdt_table, GDT_SIZE);
 
cpu_gdt_descr[cpu].size = GDT_SIZE;
-   asm volatile(lgdt %0 :: m (cpu_gdt_descr[cpu]));
-   asm volatile(lidt %0 :: m (idt_descr));
+   load_gdt(cpu_gdt_descr[cpu]);
+   load_idt(idt_descr);
 
memset(me-thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
syscall_init();
diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c
index 573c0a6..24055b6 100644
--- a/arch/x86_64/kernel/suspend.c
+++ b/arch/x86_64/kernel/suspend.c
@@ -32,9 +32,9 @@ void __save_processor_state(struct saved_context *ctxt)
/*
 * descriptor tables
 */
-   asm volatile (sgdt %0 : =m (ctxt-gdt_limit));
-   asm volatile (sidt %0 : =m (ctxt-idt_limit));
-   asm volatile (str %0  : =m (ctxt-tr));
+   store_gdt((struct desc_ptr *)ctxt-gdt_limit);
+   store_idt((struct desc_ptr *)ctxt-idt_limit);
+   store_tr(ctxt-tr);
 
/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
/*
@@ -91,8 +91,9 @@ void __restore_processor_state(struct saved_context *ctxt)
 * now restore the descriptor tables to their proper values
 * ltr is done i fix_processor_context().
 */
-   asm volatile (lgdt %0 :: m (ctxt-gdt_limit));
-   asm volatile (lidt %0 :: m (ctxt-idt_limit));
+   load_gdt((struct desc_ptr *)ctxt-gdt_limit);
+   load_idt((struct desc_ptr *)ctxt-idt_limit);
+
 
/*
 * segment registers
diff --git a/include/asm-x86_64/desc.h b/include/asm-x86_64/desc.h
index ac991b5..5710e52 100644
--- a/include

[PATCH 14/25 -v2] get rid of inline asm for load_cr3

2007-08-10 Thread Glauber de Oliveira Costa
Besides not elegant, it is now even forbidden, since it can
break paravirtualized guests. load_cr3 should call write_cr3()
instead.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/mmu_context.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-x86_64/mmu_context.h b/include/asm-x86_64/mmu_context.h
index c8cdc1e..9592698 100644
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
@@ -25,7 +25,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
 
 static inline void load_cr3(pgd_t *pgd)
 {
-   asm volatile(movq %0,%%cr3 :: r (__pa(pgd)) : memory);
+   write_cr3(__pa(pgd));
 }
 
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/25 -v2] export cpu_gdt_descr

2007-08-10 Thread Glauber de Oliveira Costa
With paravirualization, hypervisors needs to handle the gdt,
that was right to this point only used at very early
inialization code. Hypervisors are commonly modules, so make
it an export

[  updates from v1
   * make it an EXPORT_SYMBOL_GPL.
   Suggested by Arjan van de Ven
]

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/x8664_ksyms.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..2d3932d 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -60,3 +60,9 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+#ifdef CONFIG_PARAVIRT
+extern unsigned long *cpu_gdt_descr;
+/* Virtualized guests may want to use it */
+EXPORT_SYMBOL_GPL(cpu_gdt_descr);
+#endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 16/25 -v2] turn page operations into native versions

2007-08-10 Thread Glauber de Oliveira Costa
This patch turns the page operations (set and make a page table)
into native_ versions. The operations itself will be later
overriden by paravirt.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/page.h |   36 +++-
 1 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index 88adf1a..ec8b245 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -64,16 +64,42 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 
 extern unsigned long phys_base;
 
-#define pte_val(x) ((x).pte)
-#define pmd_val(x) ((x).pmd)
-#define pud_val(x) ((x).pud)
-#define pgd_val(x) ((x).pgd)
-#define pgprot_val(x)  ((x).pgprot)
+static inline unsigned long native_pte_val(pte_t pte)
+{
+   return pte.pte;
+}
+
+static inline unsigned long native_pud_val(pud_t pud)
+{
+   return pud.pud;
+}
+
+
+static inline unsigned long native_pmd_val(pmd_t pmd)
+{
+   return pmd.pmd;
+}
+
+static inline unsigned long native_pgd_val(pgd_t pgd)
+{
+   return pgd.pgd;
+}
+
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define pte_val(x) native_pte_val(x)
+#define pmd_val(x) native_pmd_val(x)
+#define pud_val(x) native_pud_val(x)
+#define pgd_val(x) native_pgd_val(x)
 
 #define __pte(x) ((pte_t) { (x) } )
 #define __pmd(x) ((pmd_t) { (x) } )
 #define __pud(x) ((pud_t) { (x) } )
 #define __pgd(x) ((pgd_t) { (x) } )
+#endif /* CONFIG_PARAVIRT */
+
+#define pgprot_val(x)  ((x).pgprot)
 #define __pgprot(x)((pgprot_t) { (x) } )
 
 #endif /* !__ASSEMBLY__ */
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/25 -v2] interrupt related native paravirt functions.

2007-08-10 Thread Glauber de Oliveira Costa
The interrupt initialization routine becomes native_init_IRQ and will
be overriden later in case paravirt is on.

[  updates from v1
   * After a talk with Jeremy Fitzhardinge, it turned out that making the
   interrupt vector global was not a good idea. So it is removed in this
   patch
]
Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/i8259.c |5 -
 include/asm-x86_64/irq.h   |2 ++
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c
index 948cae6..048e3cb 100644
--- a/arch/x86_64/kernel/i8259.c
+++ b/arch/x86_64/kernel/i8259.c
@@ -484,7 +484,10 @@ static int __init init_timer_sysfs(void)
 
 device_initcall(init_timer_sysfs);
 
-void __init init_IRQ(void)
+/* Overridden in paravirt.c */
+void init_IRQ(void) __attribute__((weak, alias(native_init_IRQ)));
+
+void __init native_init_IRQ(void)
 {
int i;
 
diff --git a/include/asm-x86_64/irq.h b/include/asm-x86_64/irq.h
index 5006c6e..be55299 100644
--- a/include/asm-x86_64/irq.h
+++ b/include/asm-x86_64/irq.h
@@ -46,6 +46,8 @@ static __inline__ int irq_canonicalize(int irq)
 extern void fixup_irqs(cpumask_t map);
 #endif
 
+void native_init_IRQ(void);
+
 #define __ARCH_HAS_DO_SOFTIRQ 1
 
 #endif /* _ASM_IRQ_H */
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 23/25 -v2] provide paravirt patching function

2007-08-10 Thread Glauber de Oliveira Costa
This patch introduces apply_paravirt(), a function that shall
be called by i386/alternative.c to apply replacements to
paravirt_functions. It is defined to an do-nothing function
if paravirt is not enabled.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/alternative.h |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/alternative.h b/include/asm-x86_64/alternative.h
index ab161e8..e69a141 100644
--- a/include/asm-x86_64/alternative.h
+++ b/include/asm-x86_64/alternative.h
@@ -143,12 +143,14 @@ static inline void alternatives_smp_switch(int smp) {}
  */
 #define ASM_OUTPUT2(a, b) a, b
 
-struct paravirt_patch;
+struct paravirt_patch_site;
 #ifdef CONFIG_PARAVIRT
-void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end);
+void apply_paravirt(struct paravirt_patch_site *start,
+   struct paravirt_patch_site *end);
 #else
 static inline void
-apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
+apply_paravirt(struct paravirt_patch_site *start,
+   struct paravirt_patch_site *end)
 {}
 #define __parainstructions NULL
 #define __parainstructions_end NULL
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/25 -v2] turn priviled operation into a macro

2007-08-10 Thread Glauber de Oliveira Costa
under paravirt, read cr2 cannot be issued directly anymore.
So wrap it in a macro, defined to the operation itself in case
paravirt is off, but to something else if we have paravirt
in the game

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/head.S |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/head.S b/arch/x86_64/kernel/head.S
index e89abcd..1bb6c55 100644
--- a/arch/x86_64/kernel/head.S
+++ b/arch/x86_64/kernel/head.S
@@ -18,6 +18,12 @@
 #include asm/page.h
 #include asm/msr.h
 #include asm/cache.h
+#ifdef CONFIG_PARAVIRT
+#include asm/asm-offsets.h
+#include asm/paravirt.h
+#else
+#define GET_CR2_INTO_RCX mov %cr2, %rcx
+#endif
 
 /* we are not able to switch in one step to the final KERNEL ADRESS SPACE
  * because we need identity-mapped pages.
@@ -267,7 +273,9 @@ ENTRY(early_idt_handler)
xorl %eax,%eax
movq 8(%rsp),%rsi   # get rip
movq (%rsp),%rdx
-   movq %cr2,%rcx
+   /* When PARAVIRT is on, this operation may clobber rax. It is
+ something safe to do, because we've just zeroed rax. */
+   GET_CR2_INTO_RCX
leaq early_idt_msg(%rip),%rdi
call early_printk
cmpl $2,early_recursion_flag(%rip)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/25 -v2] introduce paravirt_release_pgd()

2007-08-10 Thread Glauber de Oliveira Costa
This patch introduces a new macro/function that informs a paravirt
guest when its page table is not more in use, and can be released.
In case we're not paravirt, just do nothing.

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/pgalloc.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86_64/pgalloc.h b/include/asm-x86_64/pgalloc.h
index b467be6..dbe1267 100644
--- a/include/asm-x86_64/pgalloc.h
+++ b/include/asm-x86_64/pgalloc.h
@@ -9,6 +9,12 @@
 #define QUICK_PGD 0/* We preserve special mappings over free */
 #define QUICK_PT 1 /* Other page table pages that are zero on free */
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define paravirt_release_pgd(pgd) do { } while (0)
+#endif
+
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
 #define pud_populate(mm, pud, pmd) \
@@ -100,6 +106,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 static inline void pgd_free(pgd_t *pgd)
 {
BUG_ON((unsigned long)pgd  (PAGE_SIZE-1));
+   paravirt_release_pgd(pgd);
quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 15/25 -v2] introducing paravirt_activate_mm

2007-08-10 Thread Glauber de Oliveira Costa
This function/macro will allow a paravirt guest to be notified we changed
the current task cr3, and act upon it. It's up to them

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/mmu_context.h |   17 ++---
 1 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/mmu_context.h b/include/asm-x86_64/mmu_context.h
index 9592698..77ce047 100644
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
@@ -7,7 +7,16 @@
 #include asm/pda.h
 #include asm/pgtable.h
 #include asm/tlbflush.h
+
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
 #include asm-generic/mm_hooks.h
+static inline void paravirt_activate_mm(struct mm_struct *prev,
+   struct mm_struct *next)
+{
+}
+#endif /* CONFIG_PARAVIRT */
 
 /*
  * possibly do the LDT unload here?
@@ -67,8 +76,10 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
asm volatile(movl %0,%%fs::r(0));  \
 } while(0)
 
-#define activate_mm(prev, next) \
-   switch_mm((prev),(next),NULL)
-
+#define activate_mm(prev, next)\
+do {   \
+   paravirt_activate_mm(prev, next);   \
+   switch_mm((prev),(next),NULL);  \
+} while (0)
 
 #endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 9/25 -v2] report ring kernel is running without paravirt

2007-08-10 Thread Glauber de Oliveira Costa
When paravirtualization is disabled, the kernel is always
running at ring 0. So report it in the appropriate macro

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 include/asm-x86_64/segment.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86_64/segment.h b/include/asm-x86_64/segment.h
index 04b8ab2..240c1bf 100644
--- a/include/asm-x86_64/segment.h
+++ b/include/asm-x86_64/segment.h
@@ -50,4 +50,8 @@
 #define GDT_SIZE (GDT_ENTRIES * 8)
 #define TLS_SIZE (GDT_ENTRY_TLS_ENTRIES * 8) 
 
+#ifndef CONFIG_PARAVIRT
+#define get_kernel_rpl()  0
+#endif
+
 #endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/25 -v2] replace syscall_init

2007-08-10 Thread Glauber de Oliveira Costa
This patch replaces syscall_init by x86_64_syscall_init.
The former will be later replaced by a paravirt replacement
in case paravirt is on

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/setup64.c |8 +++-
 include/asm-x86_64/proto.h   |3 +++
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 49f7342..723822c 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -153,7 +153,7 @@ __attribute__((section(.bss.page_aligned)));
 extern asmlinkage void ignore_sysret(void);
 
 /* May not be marked __init: used by software suspend */
-void syscall_init(void)
+void x86_64_syscall_init(void)
 {
/* 
 * LSTAR and STAR live in a bit strange symbiosis.
@@ -172,6 +172,12 @@ void syscall_init(void)
wrmsrl(MSR_SYSCALL_MASK, EF_TF|EF_DF|EF_IE|0x3000); 
 }
 
+/* Overriden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) syscall_init(void)
+{
+   x86_64_syscall_init();
+}
+
 void __cpuinit check_efer(void)
 {
unsigned long efer;
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 31f20ad..77ed2de 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -18,6 +18,9 @@ extern void init_memory_mapping(unsigned long start, unsigned 
long end);
 
 extern void system_call(void); 
 extern int kernel_syscall(void);
+#ifdef CONFIG_PARAVIRT
+extern void x86_64_syscall_init(void);
+#endif
 extern void syscall_init(void);
 
 extern void ia32_syscall(void);
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 24/25 -v2] paravirt hooks for arch initialization

2007-08-10 Thread Glauber de Oliveira Costa
This patch add paravirtualization hooks in the arch initialization
process. paravirt_arch_setup() lets the guest issue any specific
initialization routine

Also, there is memory_setup(), so guests can handle it their way.

[  updates from v1
   * Don't use a separate ebda pv hook (Jeremy/Andi)
   * Make paravirt_setup_arch() void (Andi)
]

Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/kernel/setup.c |   32 +++-
 include/asm-x86_64/e820.h  |6 ++
 include/asm-x86_64/page.h  |1 +
 3 files changed, 38 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index af838f6..19e0d90 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -44,6 +44,7 @@
 #include linux/dmi.h
 #include linux/dma-mapping.h
 #include linux/ctype.h
+#include linux/uaccess.h
 
 #include asm/mtrr.h
 #include asm/uaccess.h
@@ -65,6 +66,12 @@
 #include asm/sections.h
 #include asm/dmi.h
 
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#else
+#define paravirt_arch_setup()  do {} while (0)
+#endif
+
 /*
  * Machine setup..
  */
@@ -208,6 +215,16 @@ static void discover_ebda(void)
 * 4K EBDA area at 0x40E
 */
ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
+   /*
+* There can be some situations, like paravirtualized guests,
+* in which there is no available ebda information. In such
+* case, just skip it
+*/
+   if (!ebda_addr) {
+   ebda_size = 0;
+   return;
+   }
+
ebda_addr = 4;
 
ebda_size = *(unsigned short *)__va(ebda_addr);
@@ -221,6 +238,13 @@ static void discover_ebda(void)
ebda_size = 64*1024;
 }
 
+/* Overridden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) memory_setup(void)
+{
+   return setup_memory_region();
+}
+
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO Command line: %s\n, boot_command_line);
@@ -231,12 +255,18 @@ void __init setup_arch(char **cmdline_p)
saved_video_mode = SAVED_VIDEO_MODE;
bootloader_type = LOADER_TYPE;
 
+   /*
+* By returning non-zero here, a paravirt impl can choose to
+* skip the rest of the setup process
+*/
+   paravirt_arch_setup();
+
 #ifdef CONFIG_BLK_DEV_RAM
rd_image_start = RAMDISK_FLAGS  RAMDISK_IMAGE_START_MASK;
rd_prompt = ((RAMDISK_FLAGS  RAMDISK_PROMPT_FLAG) != 0);
rd_doload = ((RAMDISK_FLAGS  RAMDISK_LOAD_FLAG) != 0);
 #endif
-   setup_memory_region();
+   memory_setup();
copy_edd();
 
if (!MOUNT_ROOT_RDONLY)
diff --git a/include/asm-x86_64/e820.h b/include/asm-x86_64/e820.h
index 3486e70..2ced3ba 100644
--- a/include/asm-x86_64/e820.h
+++ b/include/asm-x86_64/e820.h
@@ -20,7 +20,12 @@
 #define E820_ACPI  3
 #define E820_NVS   4
 
+#define MAP_TYPE_STR   BIOS-e820
+
 #ifndef __ASSEMBLY__
+
+void native_ebda_info(unsigned *addr, unsigned *size);
+
 struct e820entry {
u64 addr;   /* start of memory segment */
u64 size;   /* size of memory segment */
@@ -56,6 +61,7 @@ extern struct e820map e820;
 
 extern unsigned ebda_addr, ebda_size;
 extern unsigned long nodemap_addr, nodemap_size;
+
 #endif/*!__ASSEMBLY__*/
 
 #endif/*__E820_HEADER*/
diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index ec8b245..8c40fb2 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -149,6 +149,7 @@ extern unsigned long __phys_addr(unsigned long);
 #define __boot_pa(x)   __pa(x)
 #ifdef CONFIG_FLATMEM
 #define pfn_valid(pfn) ((pfn)  end_pfn)
+
 #endif
 
 #define virt_to_page(kaddr)pfn_to_page(__pa(kaddr)  PAGE_SHIFT)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/25 -v2] add paravirtualization support for x86_64

2007-08-10 Thread Glauber de Oliveira Costa
This is finally, the patch we were all looking for. This
patch adds a paravirt.h header with the definition of paravirt_ops
struct. Also, it defines a bunch of inline functions that will
replace, or hook, the other calls. Every one of those functions
adds an entry in the parainstructions section (see vmlinux.lds.S).
Those entries can then be used to runtime-patch the paravirt_ops
functions.

paravirt.c contains implementations of paravirt functions that
are used natively, such as the native_patch. It also fill the
paravirt_ops structure with the whole lot of functions that
were (re)defined throughout this patch set.

There are also changes in asm-offsets.c. paravirt.h needs it
to find out the offsets into the structure of functions
such as irq_enable, used in assembly files.

[  updates from v1
   * make PARAVIRT hidden in Kconfig (Andi Kleen)
   * cleanups in paravirt.h (Andi Kleen)
   * modifications needed to accomodate other parts of the
   patch that changed, such as getting rid of ebda_info
   * put the integers at struct paravirt_ops at the end
   (Jeremy)
]
Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED]
Signed-off-by: Steven Rostedt [EMAIL PROTECTED]
---
 arch/x86_64/Kconfig  |   11 +++
 arch/x86_64/kernel/Makefile  |1 +
 arch/x86_64/kernel/asm-offsets.c |   14 ++
 arch/x86_64/kernel/vmlinux.lds.S |6 ++
 include/asm-x86_64/smp.h |2 +-
 5 files changed, 33 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index ffa0364..00b2fc9 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -373,6 +373,17 @@ config NODES_SHIFT
 
 # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig.
 
+config PARAVIRT
+   bool
+   depends on EXPERIMENTAL
+   help
+ Paravirtualization is a way of running multiple instances of
+ Linux on the same machine, under a hypervisor.  This option
+ changes the kernel so it can modify itself when it is run
+ under a hypervisor, improving performance significantly.
+ However, when run without a hypervisor the kernel is
+ theoretically slower.  If in doubt, say N.
+
 config X86_64_ACPI_NUMA
bool ACPI NUMA detection
depends on NUMA
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index ff5d8c9..120467f 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit.o
 
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
 
diff --git a/arch/x86_64/kernel/asm-offsets.c b/arch/x86_64/kernel/asm-offsets.c
index 778953b..f5eff70 100644
--- a/arch/x86_64/kernel/asm-offsets.c
+++ b/arch/x86_64/kernel/asm-offsets.c
@@ -15,6 +15,9 @@
 #include asm/segment.h
 #include asm/thread_info.h
 #include asm/ia32.h
+#ifdef CONFIG_PARAVIRT
+#include asm/paravirt.h
+#endif
 
 #define DEFINE(sym, val) \
 asm volatile(\n- #sym  %0  #val : : i (val))
@@ -72,6 +75,17 @@ int main(void)
   offsetof (struct rt_sigframe32, uc.uc_mcontext));
BLANK();
 #endif
+#ifdef CONFIG_PARAVIRT
+#define ENTRY(entry) DEFINE(PARAVIRT_ ## entry, offsetof(struct paravirt_ops, 
entry))
+   ENTRY(paravirt_enabled);
+   ENTRY(irq_disable);
+   ENTRY(irq_enable);
+   ENTRY(syscall_return);
+   ENTRY(iret);
+   ENTRY(read_cr2);
+   ENTRY(swapgs);
+   BLANK();
+#endif
DEFINE(pbe_address, offsetof(struct pbe, address));
DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address));
DEFINE(pbe_next, offsetof(struct pbe, next));
diff --git a/arch/x86_64/kernel/vmlinux.lds.S b/arch/x86_64/kernel/vmlinux.lds.S
index ba8ea97..c3fce85 100644
--- a/arch/x86_64/kernel/vmlinux.lds.S
+++ b/arch/x86_64/kernel/vmlinux.lds.S
@@ -185,6 +185,12 @@ SECTIONS
   .altinstr_replacement : AT(ADDR(.altinstr_replacement) - LOAD_OFFSET) {
*(.altinstr_replacement)
   }
+  . = ALIGN(8);
+  .parainstructions : AT(ADDR(.parainstructions) - LOAD_OFFSET) {
+  __parainstructions = .;
+   *(.parainstructions)
+  __parainstructions_end = .;
+  }
   /* .exit.text is discard at runtime, not link time, to deal with references
  from .altinstructions and .eh_frame */
   .exit.text : AT(ADDR(.exit.text) - LOAD_OFFSET) { *(.exit.text) }
diff --git a/include/asm-x86_64/smp.h b/include/asm-x86_64/smp.h
index 6b4..403901b 100644
--- a/include/asm-x86_64/smp.h
+++ b/include/asm-x86_64/smp.h
@@ -22,7 +22,7 @@ extern int disable_apic;
 #ifdef CONFIG_PARAVIRT
 #include asm/paravirt.h
 void native_flush_tlb_others(cpumask_t cpumask, struct mm_struct *mm,
-   unsigned long va);
+   unsigned long va);
 #else
 #define

Re: [PATCH 3/25] [PATCH] irq_flags / halt routines

2007-08-09 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > +#ifdef CONFIG_PARAVIRT
> > +#include 
> > +#  ifdef CONFIG_X86_VSMP
> > +static inline int raw_irqs_disabled_flags(unsigned long flags)
> > +{
> > + return !(flags & X86_EFLAGS_IF) || (flags & X86_EFLAGS_AC);
> > +}
> > +#  else
> > +static inline int raw_irqs_disabled_flags(unsigned long flags)
> > +{
> > + return !(flags & X86_EFLAGS_IF);
> > +}
> > +#  endif
>
> You should really turn the vsmp special case into a paravirt client first
> instead of complicating all this even more.
Looking at it more carefully, it turns out that those functions are
not eligible for being paravirt clients. They do no privileged
operation at all. In fact, all they do is bit manipulation.
That said, the code got a little bit cleaner by moving them down, and so I did.

But later on, you voiced concern about making CONFIG_PARAVIRT depend
on !VSMP. (and said it would be okay, because these functions would be
paravirt clients: but they won't) Given this updated picture, what's
your position about this?

Again, as they don't do anything besides bit manipulation, I don't
think they will stop VSMP from working with PARAVIRT.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64

2007-08-09 Thread Glauber de Oliveira Costa
On 8/9/07, Jeremy Fitzhardinge <[EMAIL PROTECTED]> wrote:
> >
> > Does it really matter?
> >
>
> Well, yes, if alignment is an issue.
Of course, But the question rises from the context that they are both
together at the beginning. So they are not making anybody non-aligned.
Then the question: Why would putting it in the end be different to
putting them _together_, aligned at the beginning ?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64

2007-08-09 Thread Glauber de Oliveira Costa
> > + case PARAVIRT_PATCH(make_pgd):
> > + case PARAVIRT_PATCH(pgd_val):
> > + case PARAVIRT_PATCH(make_pte):
> > + case PARAVIRT_PATCH(pte_val):
> > + case PARAVIRT_PATCH(make_pmd):
> > + case PARAVIRT_PATCH(pmd_val):
> > + case PARAVIRT_PATCH(make_pud):
> > + case PARAVIRT_PATCH(pud_val):
> > + /* These functions end up returning what
> > +they're passed in the first argument */
> >
>
> Is this still true with 64-bit?  Either way, I don't think its worth
> having this here.  The damage to codegen around all those sites has
> already happened, and the additional cost of a noop direct call is
> pretty trivial.  I think this is a nanooptimisation which risks more
> problems than it could possibly be worth.

No it is not. But it is just the comment that is broken. (I forgot to
update it). The case here, is that they put in rax what they receive
in rdi.

> > + case PARAVIRT_PATCH(set_pte):
> > + case PARAVIRT_PATCH(set_pmd):
> > + case PARAVIRT_PATCH(set_pud):
> > + case PARAVIRT_PATCH(set_pgd):
> > + /* These functions end up storing the second
> > +  * argument in the location pointed by the first */
> > + ret = paravirt_patch_store_reg(insns, len);
> > + break;
> >
>
> Ditto, really.  Do this in a later patch if it actually seems to help.

Okay, I can remove them both.

> > +/*
> > + * integers must be use with care here. They can break the 
> > PARAVIRT_PATCH(x)
> > + * macro, that divides the offset in the structure by 8, to get a number
> > + * associated with the hook. Dividing by four would be a solution, but it
> > + * would limit the future growth of the structure if needed.
> >
>
> Why not just stick them at the end of the structure?

Does it really matter?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64

2007-08-09 Thread Glauber de Oliveira Costa
  + case PARAVIRT_PATCH(make_pgd):
  + case PARAVIRT_PATCH(pgd_val):
  + case PARAVIRT_PATCH(make_pte):
  + case PARAVIRT_PATCH(pte_val):
  + case PARAVIRT_PATCH(make_pmd):
  + case PARAVIRT_PATCH(pmd_val):
  + case PARAVIRT_PATCH(make_pud):
  + case PARAVIRT_PATCH(pud_val):
  + /* These functions end up returning what
  +they're passed in the first argument */
 

 Is this still true with 64-bit?  Either way, I don't think its worth
 having this here.  The damage to codegen around all those sites has
 already happened, and the additional cost of a noop direct call is
 pretty trivial.  I think this is a nanooptimisation which risks more
 problems than it could possibly be worth.

No it is not. But it is just the comment that is broken. (I forgot to
update it). The case here, is that they put in rax what they receive
in rdi.

  + case PARAVIRT_PATCH(set_pte):
  + case PARAVIRT_PATCH(set_pmd):
  + case PARAVIRT_PATCH(set_pud):
  + case PARAVIRT_PATCH(set_pgd):
  + /* These functions end up storing the second
  +  * argument in the location pointed by the first */
  + ret = paravirt_patch_store_reg(insns, len);
  + break;
 

 Ditto, really.  Do this in a later patch if it actually seems to help.

Okay, I can remove them both.

  +/*
  + * integers must be use with care here. They can break the 
  PARAVIRT_PATCH(x)
  + * macro, that divides the offset in the structure by 8, to get a number
  + * associated with the hook. Dividing by four would be a solution, but it
  + * would limit the future growth of the structure if needed.
 

 Why not just stick them at the end of the structure?

Does it really matter?

-- 
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64

2007-08-09 Thread Glauber de Oliveira Costa
On 8/9/07, Jeremy Fitzhardinge [EMAIL PROTECTED] wrote:
 
  Does it really matter?
 

 Well, yes, if alignment is an issue.
Of course, But the question rises from the context that they are both
together at the beginning. So they are not making anybody non-aligned.
Then the question: Why would putting it in the end be different to
putting them _together_, aligned at the beginning ?

-- 
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/25] [PATCH] irq_flags / halt routines

2007-08-09 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen [EMAIL PROTECTED] wrote:

  +#ifdef CONFIG_PARAVIRT
  +#include asm/paravirt.h
  +#  ifdef CONFIG_X86_VSMP
  +static inline int raw_irqs_disabled_flags(unsigned long flags)
  +{
  + return !(flags  X86_EFLAGS_IF) || (flags  X86_EFLAGS_AC);
  +}
  +#  else
  +static inline int raw_irqs_disabled_flags(unsigned long flags)
  +{
  + return !(flags  X86_EFLAGS_IF);
  +}
  +#  endif

 You should really turn the vsmp special case into a paravirt client first
 instead of complicating all this even more.
Looking at it more carefully, it turns out that those functions are
not eligible for being paravirt clients. They do no privileged
operation at all. In fact, all they do is bit manipulation.
That said, the code got a little bit cleaner by moving them down, and so I did.

But later on, you voiced concern about making CONFIG_PARAVIRT depend
on !VSMP. (and said it would be okay, because these functions would be
paravirt clients: but they won't) Given this updated picture, what's
your position about this?

Again, as they don't do anything besides bit manipulation, I don't
think they will stop VSMP from working with PARAVIRT.

-- 
Glauber de Oliveira Costa.
Free as in Freedom
http://glommer.net

The less confident you are, the more serious you have to act.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/25] [PATCH] Add debugreg/load_rsp native hooks

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> >
> > @@ -264,13 +270,64 @@ struct thread_struct {
> >   set_fs(USER_DS);  
> >\
> >  } while(0)
> >
> > -#define get_debugreg(var, register)  \
> > - __asm__("movq %%db" #register ", %0"\
> > - :"=r" (var))
> > -#define set_debugreg(value, register)\
> > - __asm__("movq %0,%%db" #register\
> > - : /* no output */   \
> > - :"r" (value))
> > +static inline unsigned long native_get_debugreg(int regno)
> > +{
> > + unsigned long val;
>
> It would be better to have own functions for each debug register I think
>
Andi, you mean:
a) split the debugreg paravirt_ops in various
paravirt_ops.set/get_debugreg{X,Y,Z...}, and then join them together
in a set/get_debugreg(a,b) to keep the current interface. OR
b) keep one paravirt_ops for each set/get_debugreg, then split then in
various set/get_debugregX(a, b), changing the current interface, OR
c) plit the debugreg paravirt_ops in various
paravirt_ops.set/get_debugreg{X,Y,Z...}, and give each its own
function set/get_debugregX(a, b), again, changing the current
interface, OR
d) None of the above?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/7] Change lguest launcher to use asm generic include

2007-08-08 Thread Glauber de Oliveira Costa
> --- a/Documentation/lguest/lguest.c
> +++ b/Documentation/lguest/lguest.c
> @@ -46,7 +46,7 @@ typedef uint32_t u32;
>  typedef uint16_t u16;
>  typedef uint8_t u8;
>  #include "../../include/linux/lguest_launcher.h"
> -#include "../../include/asm-i386/e820.h"
> +#include "../../include/asm/e820.h"

Couldn't we add the ../../../../../../../etc to the Makefile and avoid it here?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/7] Added generic lg.h in lguest directory.

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Steven Rostedt <[EMAIL PROTECTED]> wrote:
> Add a generic lg.h file to call the architecture specific one.
>
> diff --git a/drivers/lguest/lg.h b/drivers/lguest/lg.h
> new file mode 100644
> index 000..4c4356e
> --- /dev/null
> +++ b/drivers/lguest/lg.h
> @@ -0,0 +1,3 @@
> +#ifdef CONFIG_X86_32
> +#include "i386/lg.h"
> +#endif

Wouldn't it be cleaner to do something like the asm/ includes?
I understand that lguest now lives in drivers/ and so we don't put
headers directly in asm-i386 , but we could come up with a similar
thing here.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Introducing paravirt_ops for x86_64

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Nakajima, Jun <[EMAIL PROTECTED]> wrote:
> > So, unless I'm very wrong,  it only makes sense to talk about not
> > supporting large pages in the guest level. But it is not a
> > paravirt_ops problem.
>
> Some MMU-related PV techiniques (including Xen, and direct paging mode
> for Xen/KVM) need to write-protect page tables, avoiding to use 2MB
> pages when mapping page tables. Looks like you did not, and that
> exaplains why the patches are missing the relevant (many) paravirt_ops
> in include/asm-x86_64/pgalloc.h, for example, compared with the i386
> tree.
I see.

I'll address this in the next version of the patch.


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Introducing paravirt_ops for x86_64

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Nakajima, Jun <[EMAIL PROTECTED]> wrote:
> Glauber de Oliveira Costa wrote:
> > Hi folks,
> >
> > After some time away from it, and a big rebase as a consequence, here
> is
> > the updated version of paravirt_ops for x86_64, heading to inclusion.
> >
> > Your criticism is of course, very welcome.
> >
> > Have fun
>
> Do you assume that the kernel ougtht to use 2MB pages for its mappings
> (e.g. initilal text/data,  direct mapping of physical memory) under your
> paravirt_ops?  As far as I look at the patches, I don't find one.

I don't think how it could be relevant here. lguest kernel does use
2MB pages, and it goes smootly. For 2MB pages, we will update the page
tables in the very same way, and in the very places we did before.
Just that the operations can now be overwritten.

So, unless I'm very wrong,  it only makes sense to talk about not
supporting large pages in the guest level. But it is not a
paravirt_ops problem.


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > > Similar.
> > I don't think so. They are live here, but restore_args follows, so we
> > can safely clobber anything here. Right?
>
> The non argument registers cannot be clobbered.
But they are not. Yeah, I ommited it in the changelog, (it is in a
comment at paravirt.h), should probably include. the CLBR_ defines
only accounts for the caller-saved registers.


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> Is this really synced with the latest version of the i386 code?
Roasted already commented on this. I will check out and change it here.

>
> > +#ifdef CONFIG_PARAVIRT
> > +#include 
> > +#endif
>
>
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
> > +#include 
>
>
> Are the includes really all needed?
delay.h is not needed anymore. Most of them, could be maybe moved to
paravirt.c , which is the one that really needs all the native_
things. Yeah, it will be better code this way, will change.

>
> > + if (opfunc == NULL)
> > + /* If there's no function, patch it with a ud2a (BUG) */
> > + ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a);
>
> This will actually give corrupted BUGs because you don't supply
> the full inline BUG header. Perhaps another trap would be better.

You mean this:
> > +#include 
?

>
> > +EXPORT_SYMBOL(paravirt_ops);
>
> Definitely _GPL at least.
Sure.

>
> Should be native_paravirt_ops I guess

makes sense.

> > +
> > + * This generates an indirect call based on the operation type number.
>
> The macros here don't
>

> > +static inline unsigned long read_msr(unsigned int msr)
> > +{
> > + int __err;
>
> No need for __ in inlines
Right. Thanks.


> > +/* The paravirtualized I/O functions */
> > +static inline void slow_down_io(void) {
>
> I doubt this needs to be inline and it's large
In a second look, i386 have such a function in io.h because they need
slow_down_io in a bunch of I/O instructions. It seems that we do not.
Could we just get rid of it, then?

> > + __asm__ __volatile__(paravirt_alt(PARAVIRT_CALL)
>
> No __*__ in new code please

Yup, will fix.

> > +  : "=a"(f)
> > +  : paravirt_type(save_fl),
> > +paravirt_clobber(CLBR_RAX)
> > +  : "memory", "cc");
> > + return f;
> > +}
> > +
> > +static inline void raw_local_irq_restore(unsigned long f)
> > +{
> > + __asm__ __volatile__(paravirt_alt(PARAVIRT_CALL)
> > +  :
> > +  : "D" (f),
>
> Have you investigated if a different input register generates better/smaller
> code? I would assume rdi to be usually used already for the caller's
> arguments so it will produce spilling
>
> Similar for the rax return in the other functions.
I don't think we can do different. These functions can be patched, and
if it happens, they will put their return value in rax. So we'd better
expect it there.
Same goes for rdi, as they will expect the value to be there as an input.

I don't think it will spill in the normal case, as rdi is already the
parameter. So the compiler will just leave it there, untouched.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 21/25] [PATCH] export cpu_gdt_descr

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Arjan van de Ven <[EMAIL PROTECTED]> wrote:
> On Wed, 2007-08-08 at 01:19 -0300, Glauber de Oliveira Costa wrote:
> > With paravirualization, hypervisors needs to handle the gdt,
> > that was right to this point only used at very early
> > inialization code. Hypervisors are commonly modules, so make
> > it an export
> >
>
> the GDT is so deeply internal that this really ought to be a _GPL
> export..

Yes, Arjan, I agree. Thanks for noticing it.


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/25] [PATCH] irq_flags / halt routines

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Wednesday 08 August 2007 16:10:28 Glauber de Oliveira Costa wrote:
> > On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> > >
> > > > +#ifdef CONFIG_PARAVIRT
> > > > +#include 
> > > > +#  ifdef CONFIG_X86_VSMP
> > > > +static inline int raw_irqs_disabled_flags(unsigned long flags)
> > > > +{
> > > > + return !(flags & X86_EFLAGS_IF) || (flags & X86_EFLAGS_AC);
> > > > +}
> > > > +#  else
> > > > +static inline int raw_irqs_disabled_flags(unsigned long flags)
> > > > +{
> > > > + return !(flags & X86_EFLAGS_IF);
> > > > +}
> > > > +#  endif
> > >
> > > You should really turn the vsmp special case into a paravirt client first
> > > instead of complicating all this even more.
> >
> > By "client" you mean a user of the paravirt interface?
>
> Yes

Ohhh, I see. You're talking about just the first piece of code inside
the #ifdef CONFIG_PARAVIRT. In this case yes, I agree with you. It can
be done.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 13/25] [PATCH] turn msr.h functions into native versions

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
> On Wednesday 08 August 2007 06:19, Glauber de Oliveira Costa wrote:
>
> > +static __always_inline long long vget_cycles_sync(void)
>
> Why is there a copy of this function now? That seems wrong

Yeah, the other one is in i386 headers, so We probably wan't to leave
it there. One option is to move get_cycles_sync to x86_64 headers, and
then #ifdef just the offending part.

> > + native_read_tscp();
>
> The instruction is called rdtscp not read_tscp. Please follow that

Although the operation consists in reading tscp. I choose this to be
consistent with i386, but I have no special feelings about it. I'm
okay with changing it if you prefer.

> > +#define rdtsc(low, high) \
>
> This macro can be probably eliminated, no callers in kernel
>
>
Fine.

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 3/25] [PATCH] irq_flags / halt routines

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > +#ifdef CONFIG_PARAVIRT
> > +#include 
> > +#  ifdef CONFIG_X86_VSMP
> > +static inline int raw_irqs_disabled_flags(unsigned long flags)
> > +{
> > + return !(flags & X86_EFLAGS_IF) || (flags & X86_EFLAGS_AC);
> > +}
> > +#  else
> > +static inline int raw_irqs_disabled_flags(unsigned long flags)
> > +{
> > + return !(flags & X86_EFLAGS_IF);
> > +}
> > +#  endif
>
> You should really turn the vsmp special case into a paravirt client first
> instead of complicating all this even more.

By "client" you mean a user of the paravirt interface?

> > +#ifndef CONFIG_PARAVIRT
> > +#define raw_safe_haltnative_raw_safe_halt
> > +#define halt     native_halt
> > +#endif /* ! CONFIG_PARAVIRT */
>
> This seems inconsistent
Sorry andi. Can't see why. Could you elaborate?


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-08 Thread Glauber de Oliveira Costa
On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > -static void discover_ebda(void)
> > +void native_ebda_info(unsigned *addr, unsigned *size)
>
> I guess it would be better to use the resources frame work here.
> Before checking EBDA check if it is already reserved. Then lguest/Xen
> can reserve these areas and stop using it.
Let's make sure I understand: So you suggest skipping discover
altogether in case it is already reserved?

>
> > +/* Overridden in paravirt.c if CONFIG_PARAVIRT */
> > +void __attribute__((weak)) memory_setup(void)
> > +{
> > +   return setup_memory_region();
> > +}
> > +
> > +
> >  void __init setup_arch(char **cmdline_p)
> >  {
> >   printk(KERN_INFO "Command line: %s\n", boot_command_line);
> > @@ -231,12 +255,19 @@ void __init setup_arch(char **cmdline_p)
> >   saved_video_mode = SAVED_VIDEO_MODE;
> >   bootloader_type = LOADER_TYPE;
> >
> > + /*
> > +  * By returning non-zero here, a paravirt impl can choose to
> > +  * skip the rest of the setup process
> > +  */
> > + if (paravirt_arch_setup())
> > + return;
>
> Sorry, but that's an extremly ugly and clumpsy interface and will lead
> to extensive code duplication in hypervisors because so much code
> is disabled.

We can just wipe out the return value right now. Note that it was a
choice, it would only lead to code duplication if the hypervisor
wanted it. But yeah, I understand your concern. They may chose to
return 1 here just to change some tiny thing in the bottom.

I don't know exactly what other kinds of hooks we could put there.
lguest surely didn't need any. Are you okay with just turning it into
void by now ?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S

2007-08-08 Thread Glauber de Oliveira Costa
> > ENTRY adds alignment. Why do you need that export anyways?
>
> The paravirt ops struct points to it.

But the paravirt_ops probably won't need it as an export. So I guess
andi is right.


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S

2007-08-08 Thread Glauber de Oliveira Costa
Thank you for the attention, andi

let's go:

On 8/8/07, Andi Kleen <[EMAIL PROTECTED]> wrote:
>
> > +#define SYSRETQ  \
> > + movq%gs:pda_oldrsp,%rsp;\
> > + swapgs; \
> > + sysretq;
>
> When the macro does more than sysret it should have a different
> name
That's fair. Again, suggestions are welcome. Maybe SYSCALL_RETURN ?

> >   */
> >   .globl int_ret_from_sys_call
> >  int_ret_from_sys_call:
> > - cli
> > + DISABLE_INTERRUPTS(CLBR_ANY)
>
> ANY? There are certainly some registers alive at this point like rax
yes, this one is wrong. Thanks for the catch

> >  retint_restore_args:
> > - cli
> > + DISABLE_INTERRUPTS(CLBR_ANY)
>
> Similar.
I don't think so. They are live here, but restore_args follows, so we
can safely clobber anything here. Right?

>
> >   /*
> >* The iretq could re-enable interrupts:
> >*/
> > @@ -566,10 +587,14 @@ retint_restore_args:
> >  restore_args:
> >   RESTORE_ARGS 0,8,0
> >  iret_label:
> > - iretq
> > +#ifdef CONFIG_PARAVIRT
> > + INTERRUPT_RETURN
> > +ENTRY(native_iret)
>
> ENTRY adds alignment. Why do you need that export anyways?
Just went on the flow. Will change.

> > +#endif
> > +1:   iretq
> >
> >   .section __ex_table,"a"
> > - .quad iret_label,bad_iret
> > + .quad 1b, bad_iret
>
> iret_label seems more expressive to me than 1

fair.

> > + ENABLE_INTERRUPTS(CLBR_NONE)
>
> In many of the CLBR_NONEs there are actually some registers free;
> but it might be safer to keep it this way. But if some client can get
> significantly better code with one or two free registers it might
> be worthwhile to investigate.
That's exactly what I had in mind. I'd highly prefer to keep it this
way until it is merged, and we are sure all the rest is stable

> > - swapgs
> > + SWAPGS_NOSTACK
>
> There's still stack here

Yes, but it is not safe to use. I think Roasted addressed it later on.

> >  paranoid_restore\trace:
> >   RESTORE_ALL 8
> > - iretq
> > + INTERRUPT_RETURN
>
> I suspect Xen will need much more changes anyways because of its
> ring 3 guest. Are these changes sufficient for lguest?

Yes, they are sufficient for lguest.
Does any xen folks have any comment?

-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 5/25] [PATCH] native versions for system.h functions

2007-08-08 Thread Glauber de Oliveira Costa
Okay, this one is obviously wrong, my fault (it doesn't do what it
says it does in the body of the e-mail. Resending...


-- 
Glauber de Oliveira Costa.
"Free as in Freedom"
http://glommer.net

"The less confident you are, the more serious you have to act."
 arch/x86_64/kernel/tce.c|2 -
 arch/x86_64/mm/pageattr.c   |2 -
 include/asm-x86_64/system.h |   55 +---
 3 files changed, 39 insertions(+), 20 deletions(-)
foo


[PATCH 23/25] [PATCH] paravirt hooks for arch initialization

2007-08-08 Thread Glauber de Oliveira Costa
This patch add paravirtualization hooks in the arch initialization
process. paravirt_arch_setup() lets the guest issue any specific
initialization routine, and skip all the rest if it see fits, which
it signals by a proper return value.

In case the initialization continues, we hook at least memory_setup(),
so it can handle it in its own way.

The hypervisor can make its own  ebda mapping visible by providing
its custom ebda_info function.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/setup.c |   41 -
 include/asm-x86_64/e820.h  |6 ++
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index af838f6..8e58a5d 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -65,6 +65,12 @@
 #include 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define paravirt_arch_setup() 0
+#endif
+
 /*
  * Machine setup..
  */
@@ -201,17 +207,28 @@ static inline void copy_edd(void)
 unsigned __initdata ebda_addr;
 unsigned __initdata ebda_size;
 
-static void discover_ebda(void)
+void native_ebda_info(unsigned *addr, unsigned *size)
 {
/*
 * there is a real-mode segmented pointer pointing to the 
 * 4K EBDA area at 0x40E
 */
-   ebda_addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
-   ebda_addr <<= 4;
+   *addr = *(unsigned short *)__va(EBDA_ADDR_POINTER);
+   *addr <<= 4;
+
+   *size = *(unsigned short *)__va(*addr);
+}
 
-   ebda_size = *(unsigned short *)__va(ebda_addr);
+/* Overriden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) ebda_info(unsigned *addr, unsigned *size)
+{
+   native_ebda_info(addr, size);
+}
 
+static void discover_ebda(void)
+{
+
+   ebda_info(_addr, _size);
/* Round EBDA up to pages */
if (ebda_size == 0)
ebda_size = 1;
@@ -221,6 +238,13 @@ static void discover_ebda(void)
ebda_size = 64*1024;
 }
 
+/* Overridden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) memory_setup(void)
+{
+   return setup_memory_region();
+}
+
+
 void __init setup_arch(char **cmdline_p)
 {
printk(KERN_INFO "Command line: %s\n", boot_command_line);
@@ -231,12 +255,19 @@ void __init setup_arch(char **cmdline_p)
saved_video_mode = SAVED_VIDEO_MODE;
bootloader_type = LOADER_TYPE;
 
+   /*
+* By returning non-zero here, a paravirt impl can choose to
+* skip the rest of the setup process
+*/
+   if (paravirt_arch_setup())
+   return;
+
 #ifdef CONFIG_BLK_DEV_RAM
rd_image_start = RAMDISK_FLAGS & RAMDISK_IMAGE_START_MASK;
rd_prompt = ((RAMDISK_FLAGS & RAMDISK_PROMPT_FLAG) != 0);
rd_doload = ((RAMDISK_FLAGS & RAMDISK_LOAD_FLAG) != 0);
 #endif
-   setup_memory_region();
+   memory_setup();
copy_edd();
 
if (!MOUNT_ROOT_RDONLY)
diff --git a/include/asm-x86_64/e820.h b/include/asm-x86_64/e820.h
index 3486e70..2ced3ba 100644
--- a/include/asm-x86_64/e820.h
+++ b/include/asm-x86_64/e820.h
@@ -20,7 +20,12 @@
 #define E820_ACPI  3
 #define E820_NVS   4
 
+#define MAP_TYPE_STR   "BIOS-e820"
+
 #ifndef __ASSEMBLY__
+
+void native_ebda_info(unsigned *addr, unsigned *size);
+
 struct e820entry {
u64 addr;   /* start of memory segment */
u64 size;   /* size of memory segment */
@@ -56,6 +61,7 @@ extern struct e820map e820;
 
 extern unsigned ebda_addr, ebda_size;
 extern unsigned long nodemap_addr, nodemap_size;
+
 #endif/*!__ASSEMBLY__*/
 
 #endif/*__E820_HEADER*/
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 17/25] [PATCH] turn page operations into native versions

2007-08-08 Thread Glauber de Oliveira Costa
This patch turns the page operations (set and make a page table)
into native_ versions. The operations itself will be later
overriden by paravirt.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/page.h |   36 +++-
 1 files changed, 31 insertions(+), 5 deletions(-)

diff --git a/include/asm-x86_64/page.h b/include/asm-x86_64/page.h
index 88adf1a..ec8b245 100644
--- a/include/asm-x86_64/page.h
+++ b/include/asm-x86_64/page.h
@@ -64,16 +64,42 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 
 extern unsigned long phys_base;
 
-#define pte_val(x) ((x).pte)
-#define pmd_val(x) ((x).pmd)
-#define pud_val(x) ((x).pud)
-#define pgd_val(x) ((x).pgd)
-#define pgprot_val(x)  ((x).pgprot)
+static inline unsigned long native_pte_val(pte_t pte)
+{
+   return pte.pte;
+}
+
+static inline unsigned long native_pud_val(pud_t pud)
+{
+   return pud.pud;
+}
+
+
+static inline unsigned long native_pmd_val(pmd_t pmd)
+{
+   return pmd.pmd;
+}
+
+static inline unsigned long native_pgd_val(pgd_t pgd)
+{
+   return pgd.pgd;
+}
+
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define pte_val(x) native_pte_val(x)
+#define pmd_val(x) native_pmd_val(x)
+#define pud_val(x) native_pud_val(x)
+#define pgd_val(x) native_pgd_val(x)
 
 #define __pte(x) ((pte_t) { (x) } )
 #define __pmd(x) ((pmd_t) { (x) } )
 #define __pud(x) ((pud_t) { (x) } )
 #define __pgd(x) ((pgd_t) { (x) } )
+#endif /* CONFIG_PARAVIRT */
+
+#define pgprot_val(x)  ((x).pgprot)
 #define __pgprot(x)((pgprot_t) { (x) } )
 
 #endif /* !__ASSEMBLY__ */
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 13/25] [PATCH] turn msr.h functions into native versions

2007-08-08 Thread Glauber de Oliveira Costa
This patch turns makes the basic operations in msr.h out of native
ones. Those operations are: rdmsr, wrmsr, rdtsc, rdtscp, rdpmc, and
cpuid. After they are turned into functions, some call sites need
casts, and so we provide them.

There is also a fixup needed in the functions located in the vsyscall
area, as they cannot call any of them anymore (otherwise, the call
would go through a kernel address, invalid in userspace mapping).

The solution is to call the now-provided native_ versions instead.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/ia32/syscall32.c  |2 +-
 arch/x86_64/kernel/setup64.c  |6 +-
 arch/x86_64/kernel/tsc.c  |   42 ++-
 arch/x86_64/kernel/vsyscall.c |4 +-
 arch/x86_64/vdso/vgetcpu.c|4 +-
 include/asm-x86_64/msr.h  |  284 +---
 6 files changed, 226 insertions(+), 116 deletions(-)

diff --git a/arch/x86_64/ia32/syscall32.c b/arch/x86_64/ia32/syscall32.c
index 15013ba..dd1b4a3 100644
--- a/arch/x86_64/ia32/syscall32.c
+++ b/arch/x86_64/ia32/syscall32.c
@@ -79,5 +79,5 @@ void syscall32_cpu_init(void)
checking_wrmsrl(MSR_IA32_SYSENTER_ESP, 0ULL);
checking_wrmsrl(MSR_IA32_SYSENTER_EIP, (u64)ia32_sysenter_target);
 
-   wrmsrl(MSR_CSTAR, ia32_cstar_target);
+   wrmsrl(MSR_CSTAR, (u64)ia32_cstar_target);
 }
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 1200aaa..395cf02 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -122,7 +122,7 @@ void pda_init(int cpu)
asm volatile("movl %0,%%fs ; movl %0,%%gs" :: "r" (0)); 
/* Memory clobbers used to order PDA accessed */
mb();
-   wrmsrl(MSR_GS_BASE, pda);
+   wrmsrl(MSR_GS_BASE, (u64)pda);
mb();
 
pda->cpunumber = cpu; 
@@ -161,8 +161,8 @@ void syscall_init(void)
 * but only a 32bit target. LSTAR sets the 64bit rip.
 */ 
wrmsrl(MSR_STAR,  ((u64)__USER32_CS)<<48  | ((u64)__KERNEL_CS)<<32); 
-   wrmsrl(MSR_LSTAR, system_call); 
-   wrmsrl(MSR_CSTAR, ignore_sysret);
+   wrmsrl(MSR_LSTAR, (u64)system_call);
+   wrmsrl(MSR_CSTAR, (u64)ignore_sysret);
 
 #ifdef CONFIG_IA32_EMULATION   
syscall32_cpu_init ();
diff --git a/arch/x86_64/kernel/tsc.c b/arch/x86_64/kernel/tsc.c
index 2a59bde..0db0041 100644
--- a/arch/x86_64/kernel/tsc.c
+++ b/arch/x86_64/kernel/tsc.c
@@ -9,6 +9,46 @@
 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+/*
+ * When paravirt is on, some functionalities are executed through function
+ * pointers in the paravirt_ops structure, for both the host and guest.
+ * These function pointers exist inside the kernel and can not
+ * be accessed by user space. To avoid this, we make a copy of the
+ * get_cycles_sync (called in kernel) but force the use of native_read_tsc.
+ * For the host, it will simply do the native rdtsc. The guest
+ * should set up it's own clock and vread
+ */
+static __always_inline long long vget_cycles_sync(void)
+{
+   unsigned long long ret;
+   unsigned eax, edx;
+
+   /*
+* Use RDTSCP if possible; it is guaranteed to be synchronous
+* and doesn't cause a VMEXIT on Hypervisors
+*/
+   alternative_io(ASM_NOP3, ".byte 0x0f,0x01,0xf9", X86_FEATURE_RDTSCP,
+  ASM_OUTPUT2("=a" (eax), "=d" (edx)),
+  "a" (0U), "d" (0U) : "ecx", "memory");
+   ret = (((unsigned long long)edx) << 32) | ((unsigned long long)eax);
+   if (ret)
+   return ret;
+
+   /*
+* Don't do an additional sync on CPUs where we know
+* RDTSC is already synchronous:
+*/
+   alternative_io("cpuid", ASM_NOP2, X86_FEATURE_SYNC_RDTSC,
+ "=a" (eax), "0" (1) : "ebx","ecx","edx","memory");
+   ret = native_read_tsc();
+
+   return ret;
+}
+#else
+# define vget_cycles_sync() get_cycles_sync()
+#endif
+
 static int notsc __initdata = 0;
 
 unsigned int cpu_khz;  /* TSC clocks / usec, not used here */
@@ -165,7 +205,7 @@ static cycle_t read_tsc(void)
 
 static cycle_t __vsyscall_fn vread_tsc(void)
 {
-   cycle_t ret = (cycle_t)get_cycles_sync();
+   cycle_t ret = (cycle_t)vget_cycles_sync();
return ret;
 }
 
diff --git a/arch/x86_64/kernel/vsyscall.c b/arch/x86_64/kernel/vsyscall.c
index 06c3494..22fc4c9 100644
--- a/arch/x86_64/kernel/vsyscall.c
+++ b/arch/x86_64/kernel/vsyscall.c
@@ -184,7 +184,7 @@ time_t __vsyscall(1) vtime(time_t *t)
 long __vsyscall(2)
 vgetcpu(unsigned *cpu, unsigned *node, struct getcpu_cache *tcache)
 {
-   unsigned int dummy, p;
+   unsigned int p;
unsigned long j = 0;
 
/* Fast cache - only recompute

[PATCH 15/25] [PATCH] get rid of inline asm for load_cr3

2007-08-08 Thread Glauber de Oliveira Costa
Besides not elegant, it is now even forbidden, since it can
break paravirtualized guests. load_cr3 should call write_cr3()
instead.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/mmu_context.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/include/asm-x86_64/mmu_context.h b/include/asm-x86_64/mmu_context.h
index c8cdc1e..9592698 100644
--- a/include/asm-x86_64/mmu_context.h
+++ b/include/asm-x86_64/mmu_context.h
@@ -25,7 +25,7 @@ static inline void enter_lazy_tlb(struct mm_struct *mm, 
struct task_struct *tsk)
 
 static inline void load_cr3(pgd_t *pgd)
 {
-   asm volatile("movq %0,%%cr3" :: "r" (__pa(pgd)) : "memory");
+   write_cr3(__pa(pgd));
 }
 
 static inline void switch_mm(struct mm_struct *prev, struct mm_struct *next, 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 12/25] [PATCH] native versions for set pagetables

2007-08-08 Thread Glauber de Oliveira Costa
This patch turns the set_p{te,md,ud,gd} functions into their
native_ versions. There is no need to patch any caller.

Also, it adds pte_update() and pte_update_defer() calls whenever
we modify a page table entry. This last part was coded to match
i386 as close as possible.

Pieces of the header are moved to below the #ifdef CONFIG_PARAVIRT
site, as they are users of the newly defined set_* macros.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/pgtable.h |  152 -
 1 files changed, 89 insertions(+), 63 deletions(-)

diff --git a/include/asm-x86_64/pgtable.h b/include/asm-x86_64/pgtable.h
index c9d8764..dd572a2 100644
--- a/include/asm-x86_64/pgtable.h
+++ b/include/asm-x86_64/pgtable.h
@@ -57,55 +57,77 @@ extern unsigned long 
empty_zero_page[PAGE_SIZE/sizeof(unsigned long)];
  */
 #define PTRS_PER_PTE   512
 
-#ifndef __ASSEMBLY__
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
 
-#define pte_ERROR(e) \
-   printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pte_val(e))
-#define pmd_ERROR(e) \
-   printk("%s:%d: bad pmd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pmd_val(e))
-#define pud_ERROR(e) \
-   printk("%s:%d: bad pud %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pud_val(e))
-#define pgd_ERROR(e) \
-   printk("%s:%d: bad pgd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pgd_val(e))
+#define set_pte native_set_pte
+#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
+#define set_pmd native_set_pmd
+#define set_pud native_set_pud
+#define set_pgd native_set_pgd
+#define pte_clear(mm,addr,xp)  do { set_pte_at(mm, addr, xp, __pte(0)); } 
while (0)
+#define pmd_clear(xp)  do { set_pmd(xp, __pmd(0)); } while (0)
+#define pud_clear native_pud_clear
+#define pgd_clear native_pgd_clear
+#define pte_update(mm, addr, ptep)  do { } while (0)
+#define pte_update_defer(mm, addr, ptep)do { } while (0)
 
-#define pgd_none(x)(!pgd_val(x))
-#define pud_none(x)(!pud_val(x))
+#endif
+
+#ifndef __ASSEMBLY__
 
-static inline void set_pte(pte_t *dst, pte_t val)
+static inline void native_set_pte(pte_t *dst, pte_t val)
 {
-   pte_val(*dst) = pte_val(val);
+   dst->pte = pte_val(val);
 } 
-#define set_pte_at(mm,addr,ptep,pteval) set_pte(ptep,pteval)
 
-static inline void set_pmd(pmd_t *dst, pmd_t val)
+
+static inline void native_set_pmd(pmd_t *dst, pmd_t val)
 {
-pmd_val(*dst) = pmd_val(val); 
+   dst->pmd = pmd_val(val);
 } 
 
-static inline void set_pud(pud_t *dst, pud_t val)
+static inline void native_set_pud(pud_t *dst, pud_t val)
 {
-   pud_val(*dst) = pud_val(val);
+   dst->pud = pud_val(val);
 }
 
-static inline void pud_clear (pud_t *pud)
+static inline void native_set_pgd(pgd_t *dst, pgd_t val)
 {
-   set_pud(pud, __pud(0));
+   dst->pgd = pgd_val(val);
 }
-
-static inline void set_pgd(pgd_t *dst, pgd_t val)
+static inline void native_pud_clear (pud_t *pud)
 {
-   pgd_val(*dst) = pgd_val(val); 
-} 
+   set_pud(pud, __pud(0));
+}
 
-static inline void pgd_clear (pgd_t * pgd)
+static inline void native_pgd_clear (pgd_t * pgd)
 {
set_pgd(pgd, __pgd(0));
 }
 
-#define ptep_get_and_clear(mm,addr,xp) __pte(xchg(&(xp)->pte, 0))
+#define pte_ERROR(e) \
+   printk("%s:%d: bad pte %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pte_val(e))
+#define pmd_ERROR(e) \
+   printk("%s:%d: bad pmd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pmd_val(e))
+#define pud_ERROR(e) \
+   printk("%s:%d: bad pud %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pud_val(e))
+#define pgd_ERROR(e) \
+   printk("%s:%d: bad pgd %p(%016lx).\n", __FILE__, __LINE__, &(e), 
pgd_val(e))
+
+#define pgd_none(x)(!pgd_val(x))
+#define pud_none(x)(!pud_val(x))
 
 struct mm_struct;
 
+static inline pte_t ptep_get_and_clear(struct mm_struct *mm, unsigned long 
addr, pte_t *ptep)
+{
+   pte_t pte = __pte(xchg(>pte, 0));
+   pte_update(mm, addr, ptep);
+   return pte;
+}
+
 static inline pte_t ptep_get_and_clear_full(struct mm_struct *mm, unsigned 
long addr, pte_t *ptep, int full)
 {
pte_t pte;
@@ -245,7 +267,6 @@ static inline unsigned long pmd_bad(pmd_t pmd)
 
 #define pte_none(x)(!pte_val(x))
 #define pte_present(x) (pte_val(x) & (_PAGE_PRESENT | _PAGE_PROTNONE))
-#define pte_clear(mm,addr,xp)  do { set_pte_at(mm, addr, xp, __pte(0)); } 
while (0)
 
 #define pages_to_mb(x) ((x) >> (20-PAGE_SHIFT))/* FIXME: is this
   right? */
@@ -254,11 +275,11 @@ static inline unsigned long pmd_bad(pmd_t pmd)
 
 static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot)
 {
-   pte_t pte;
-   pte_val(pte) = (page_nr << PAGE_SHIFT);
-   pte_val(pte) |= pgprot_val(pgpr

[PATCH 14/25] [PATCH] add native functions for descriptors handling

2007-08-08 Thread Glauber de Oliveira Costa
This patch turns the basic descriptor handling into native_
functions. It is basically write_idt, load_idt, write_gdt,
load_gdt, set_ldt, store_tr, load_tls, and the ones
for updating a single entry.

In the process of doing that, we change the definition of
load_LDT_nolock, and caller sites have to be patched. We
also patch call sites that now needs a typecast.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/head64.c  |2 +-
 arch/x86_64/kernel/ldt.c |6 +-
 arch/x86_64/kernel/reboot.c  |3 +-
 arch/x86_64/kernel/setup64.c |4 +-
 arch/x86_64/kernel/suspend.c |   11 ++-
 include/asm-x86_64/desc.h|  183 +++--
 include/asm-x86_64/mmu_context.h |4 +-
 7 files changed, 148 insertions(+), 65 deletions(-)

diff --git a/arch/x86_64/kernel/head64.c b/arch/x86_64/kernel/head64.c
index 6c34bdd..a0d05d7 100644
--- a/arch/x86_64/kernel/head64.c
+++ b/arch/x86_64/kernel/head64.c
@@ -70,7 +70,7 @@ void __init x86_64_start_kernel(char * real_mode_data)
 
for (i = 0; i < IDT_ENTRIES; i++)
set_intr_gate(i, early_idt_handler);
-   asm volatile("lidt %0" :: "m" (idt_descr));
+   load_idt(_descr);
 
early_printk("Kernel alive\n");
 
diff --git a/arch/x86_64/kernel/ldt.c b/arch/x86_64/kernel/ldt.c
index bc9ffd5..8e6fcc1 100644
--- a/arch/x86_64/kernel/ldt.c
+++ b/arch/x86_64/kernel/ldt.c
@@ -173,7 +173,7 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
 {
struct task_struct *me = current;
struct mm_struct * mm = me->mm;
-   __u32 entry_1, entry_2, *lp;
+   __u32 entry_1, entry_2;
int error;
struct user_desc ldt_info;
 
@@ -202,7 +202,6 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
goto out_unlock;
}
 
-   lp = (__u32 *) ((ldt_info.entry_number << 3) + (char *) 
mm->context.ldt);
 
/* Allow LDTs to be cleared by the user. */
if (ldt_info.base_addr == 0 && ldt_info.limit == 0) {
@@ -220,8 +219,7 @@ static int write_ldt(void __user * ptr, unsigned long 
bytecount, int oldmode)
 
/* Install the new entry ...  */
 install:
-   *lp = entry_1;
-   *(lp+1) = entry_2;
+   write_ldt_entry(mm->context.ldt, ldt_info.entry_number, entry_1, 
entry_2);
error = 0;
 
 out_unlock:
diff --git a/arch/x86_64/kernel/reboot.c b/arch/x86_64/kernel/reboot.c
index 368db2b..ebc242c 100644
--- a/arch/x86_64/kernel/reboot.c
+++ b/arch/x86_64/kernel/reboot.c
@@ -11,6 +11,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -136,7 +137,7 @@ void machine_emergency_restart(void)
}
 
case BOOT_TRIPLE: 
-   __asm__ __volatile__("lidt (%0)": :"r" (_idt));
+   load_idt((struct desc_ptr *)_idt);
__asm__ __volatile__("int3");
 
reboot_type = BOOT_KBD;
diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 395cf02..49f7342 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -224,8 +224,8 @@ void __cpuinit cpu_init (void)
memcpy(cpu_gdt(cpu), cpu_gdt_table, GDT_SIZE);
 
cpu_gdt_descr[cpu].size = GDT_SIZE;
-   asm volatile("lgdt %0" :: "m" (cpu_gdt_descr[cpu]));
-   asm volatile("lidt %0" :: "m" (idt_descr));
+   load_gdt(_gdt_descr[cpu]);
+   load_idt(_descr);
 
memset(me->thread.tls_array, 0, GDT_ENTRY_TLS_ENTRIES * 8);
syscall_init();
diff --git a/arch/x86_64/kernel/suspend.c b/arch/x86_64/kernel/suspend.c
index 573c0a6..24055b6 100644
--- a/arch/x86_64/kernel/suspend.c
+++ b/arch/x86_64/kernel/suspend.c
@@ -32,9 +32,9 @@ void __save_processor_state(struct saved_context *ctxt)
/*
 * descriptor tables
 */
-   asm volatile ("sgdt %0" : "=m" (ctxt->gdt_limit));
-   asm volatile ("sidt %0" : "=m" (ctxt->idt_limit));
-   asm volatile ("str %0"  : "=m" (ctxt->tr));
+   store_gdt((struct desc_ptr *)>gdt_limit);
+   store_idt((struct desc_ptr *)>idt_limit);
+   store_tr(ctxt->tr);
 
/* XMM0..XMM15 should be handled by kernel_fpu_begin(). */
/*
@@ -91,8 +91,9 @@ void __restore_processor_state(struct saved_context *ctxt)
 * now restore the descriptor tables to their proper values
 * ltr is done i fix_processor_context().
 */
-   asm volatile ("lgdt %0" :: "m" (ctxt->gdt_limit));
-   asm volatile ("lidt %0" :: "m" (ctxt->idt_limit));
+   load_gdt((struct desc_ptr *)&

[PATCH 24/25] [PATCH] provide paravirt patching function

2007-08-08 Thread Glauber de Oliveira Costa
This patch introduces apply_paravirt(), a function that shall
be called by i386/alternative.c to apply replacements to
paravirt_functions. It is defined to an do-nothing function
if paravirt is not enabled.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/alternative.h |8 +---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/include/asm-x86_64/alternative.h b/include/asm-x86_64/alternative.h
index ab161e8..e69a141 100644
--- a/include/asm-x86_64/alternative.h
+++ b/include/asm-x86_64/alternative.h
@@ -143,12 +143,14 @@ static inline void alternatives_smp_switch(int smp) {}
  */
 #define ASM_OUTPUT2(a, b) a, b
 
-struct paravirt_patch;
+struct paravirt_patch_site;
 #ifdef CONFIG_PARAVIRT
-void apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end);
+void apply_paravirt(struct paravirt_patch_site *start,
+   struct paravirt_patch_site *end);
 #else
 static inline void
-apply_paravirt(struct paravirt_patch *start, struct paravirt_patch *end)
+apply_paravirt(struct paravirt_patch_site *start,
+   struct paravirt_patch_site *end)
 {}
 #define __parainstructions NULL
 #define __parainstructions_end NULL
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 25/25] [PATCH] add paravirtualization support for x86_64

2007-08-08 Thread Glauber de Oliveira Costa
This is finally, the patch we were all looking for. This
patch adds a paravirt.h header with the definition of paravirt_ops
struct. Also, it defines a bunch of inline functions that will
replace, or hook, the other calls. Every one of those functions
adds an entry in the parainstructions section (see vmlinux.lds.S).
Those entries can then be used to runtime-patch the paravirt_ops
functions.

paravirt.c contains implementations of paravirt functions that
are used natively, such as the native_patch. It also fill the
paravirt_ops structure with the whole lot of functions that
were (re)defined throughout this patch set.

There are also changes in asm-offsets.c. paravirt.h needs it
to find out the offsets into the structure of functions
such as irq_enable, used in assembly files.

The text in Kconfig is the same as i386 one.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/Kconfig  |   11 +
 arch/x86_64/kernel/Makefile  |1 +
 arch/x86_64/kernel/asm-offsets.c |   14 +
 arch/x86_64/kernel/paravirt.c|  455 +++
 arch/x86_64/kernel/vmlinux.lds.S |6 +
 include/asm-x86_64/paravirt.h|  901 ++
 6 files changed, 1388 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig
index ffa0364..bfea34c 100644
--- a/arch/x86_64/Kconfig
+++ b/arch/x86_64/Kconfig
@@ -373,6 +373,17 @@ config NODES_SHIFT
 
 # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig.
 
+config PARAVIRT
+   bool "Paravirtualization support (EXPERIMENTAL)"
+   depends on EXPERIMENTAL
+   help
+ Paravirtualization is a way of running multiple instances of
+ Linux on the same machine, under a hypervisor.  This option
+ changes the kernel so it can modify itself when it is run
+ under a hypervisor, improving performance significantly.
+ However, when run without a hypervisor the kernel is
+ theoretically slower.  If in doubt, say N.
+
 config X86_64_ACPI_NUMA
bool "ACPI NUMA detection"
depends on NUMA
diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile
index ff5d8c9..120467f 100644
--- a/arch/x86_64/kernel/Makefile
+++ b/arch/x86_64/kernel/Makefile
@@ -38,6 +38,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp.o
 obj-$(CONFIG_K8_NB)+= k8.o
 obj-$(CONFIG_AUDIT)+= audit.o
 
+obj-$(CONFIG_PARAVIRT) += paravirt.o
 obj-$(CONFIG_MODULES)  += module.o
 obj-$(CONFIG_PCI)  += early-quirks.o
 
diff --git a/arch/x86_64/kernel/asm-offsets.c b/arch/x86_64/kernel/asm-offsets.c
index 778953b..a8ffc95 100644
--- a/arch/x86_64/kernel/asm-offsets.c
+++ b/arch/x86_64/kernel/asm-offsets.c
@@ -15,6 +15,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_PARAVIRT
+#include 
+#endif
 
 #define DEFINE(sym, val) \
 asm volatile("\n->" #sym " %0 " #val : : "i" (val))
@@ -72,6 +75,17 @@ int main(void)
   offsetof (struct rt_sigframe32, uc.uc_mcontext));
BLANK();
 #endif
+#ifdef CONFIG_PARAVIRT
+#define ENTRY(entry) DEFINE(PARAVIRT_ ## entry, offsetof(struct paravirt_ops, 
entry))
+   ENTRY(paravirt_enabled);
+   ENTRY(irq_disable);
+   ENTRY(irq_enable);
+   ENTRY(sysret);
+   ENTRY(iret);
+   ENTRY(read_cr2);
+   ENTRY(swapgs);
+   BLANK();
+#endif
DEFINE(pbe_address, offsetof(struct pbe, address));
DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address));
DEFINE(pbe_next, offsetof(struct pbe, next));
diff --git a/arch/x86_64/kernel/paravirt.c b/arch/x86_64/kernel/paravirt.c
new file mode 100644
index 000..a41c1c0
--- /dev/null
+++ b/arch/x86_64/kernel/paravirt.c
@@ -0,0 +1,455 @@
+/*  Paravirtualization interfaces
+Copyright (C) 2007 Glauber de Oliveira Costa and Steven Rostedt,
+Red Hat Inc.
+Based on i386 work by Rusty Russell.
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+*/
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#

[PATCH 9/25] [PATCH] report ring kernel is running without paravirt

2007-08-08 Thread Glauber de Oliveira Costa
When paravirtualization is disabled, the kernel is always
running at ring 0. So report it in the appropriate macro

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/segment.h |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86_64/segment.h b/include/asm-x86_64/segment.h
index 04b8ab2..240c1bf 100644
--- a/include/asm-x86_64/segment.h
+++ b/include/asm-x86_64/segment.h
@@ -50,4 +50,8 @@
 #define GDT_SIZE (GDT_ENTRIES * 8)
 #define TLS_SIZE (GDT_ENTRY_TLS_ENTRIES * 8) 
 
+#ifndef CONFIG_PARAVIRT
+#define get_kernel_rpl()  0
+#endif
+
 #endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 11/25] [PATCH] introduce paravirt_release_pgd()

2007-08-08 Thread Glauber de Oliveira Costa
This patch introduces a new macro/function that informs a paravirt
guest when its page table is not more in use, and can be released.
In case we're not paravirt, just do nothing.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 include/asm-x86_64/pgalloc.h |7 +++
 1 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/include/asm-x86_64/pgalloc.h b/include/asm-x86_64/pgalloc.h
index b467be6..dbe1267 100644
--- a/include/asm-x86_64/pgalloc.h
+++ b/include/asm-x86_64/pgalloc.h
@@ -9,6 +9,12 @@
 #define QUICK_PGD 0/* We preserve special mappings over free */
 #define QUICK_PT 1 /* Other page table pages that are zero on free */
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define paravirt_release_pgd(pgd) do { } while (0)
+#endif
+
 #define pmd_populate_kernel(mm, pmd, pte) \
set_pmd(pmd, __pmd(_PAGE_TABLE | __pa(pte)))
 #define pud_populate(mm, pud, pmd) \
@@ -100,6 +106,7 @@ static inline pgd_t *pgd_alloc(struct mm_struct *mm)
 static inline void pgd_free(pgd_t *pgd)
 {
BUG_ON((unsigned long)pgd & (PAGE_SIZE-1));
+   paravirt_release_pgd(pgd);
quicklist_free(QUICK_PGD, pgd_dtor, pgd);
 }
 
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 21/25] [PATCH] export cpu_gdt_descr

2007-08-08 Thread Glauber de Oliveira Costa
With paravirualization, hypervisors needs to handle the gdt,
that was right to this point only used at very early
inialization code. Hypervisors are commonly modules, so make
it an export

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/x8664_ksyms.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/x86_64/kernel/x8664_ksyms.c b/arch/x86_64/kernel/x8664_ksyms.c
index 77c25b3..8f10698 100644
--- a/arch/x86_64/kernel/x8664_ksyms.c
+++ b/arch/x86_64/kernel/x8664_ksyms.c
@@ -60,3 +60,9 @@ EXPORT_SYMBOL(init_level4_pgt);
 EXPORT_SYMBOL(load_gs_index);
 
 EXPORT_SYMBOL(_proxy_pda);
+
+#ifdef CONFIG_PARAVIRT
+extern unsigned long *cpu_gdt_descr;
+/* Virtualized guests may want to use it */
+EXPORT_SYMBOL(cpu_gdt_descr);
+#endif
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 7/25] [PATCH] interrupt related native paravirt functions.

2007-08-08 Thread Glauber de Oliveira Costa
The interrupt initialization routine becomes native_init_IRQ and will
be overriden later in case paravirt is on.

The interrupt vector is made global, so paravirt guests can reference
it in their initializations. However, "interrupt" is such a common
name, and could lead to clashes, so it is renamed.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/i8259.c |   15 +++
 include/asm-x86_64/irq.h   |2 ++
 2 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/arch/x86_64/kernel/i8259.c b/arch/x86_64/kernel/i8259.c
index 948cae6..8dda872 100644
--- a/arch/x86_64/kernel/i8259.c
+++ b/arch/x86_64/kernel/i8259.c
@@ -75,8 +75,12 @@ BUILD_16_IRQS(0xc) BUILD_16_IRQS(0xd) BUILD_16_IRQS(0xe) 
BUILD_16_IRQS(0xf)
IRQ(x,8), IRQ(x,9), IRQ(x,a), IRQ(x,b), \
IRQ(x,c), IRQ(x,d), IRQ(x,e), IRQ(x,f)
 
-/* for the irq vectors */
-static void (*interrupt[NR_VECTORS - FIRST_EXTERNAL_VECTOR])(void) = {
+/*
+ * For the irq vectors. It is global rather than static to allow for
+ * paravirtualized guests to use it in their own interrupt initialization
+ * routines
+ */
+void (*interrupt_vector[NR_VECTORS - FIRST_EXTERNAL_VECTOR])(void) = {
  IRQLIST_16(0x2), IRQLIST_16(0x3),
IRQLIST_16(0x4), IRQLIST_16(0x5), IRQLIST_16(0x6), IRQLIST_16(0x7),
IRQLIST_16(0x8), IRQLIST_16(0x9), IRQLIST_16(0xa), IRQLIST_16(0xb),
@@ -484,7 +488,10 @@ static int __init init_timer_sysfs(void)
 
 device_initcall(init_timer_sysfs);
 
-void __init init_IRQ(void)
+/* Overridden in paravirt.c */
+void init_IRQ(void) __attribute__((weak, alias("native_init_IRQ")));
+
+void __init native_init_IRQ(void)
 {
int i;
 
@@ -497,7 +504,7 @@ void __init init_IRQ(void)
for (i = 0; i < (NR_VECTORS - FIRST_EXTERNAL_VECTOR); i++) {
int vector = FIRST_EXTERNAL_VECTOR + i;
if (vector != IA32_SYSCALL_VECTOR)
-   set_intr_gate(vector, interrupt[i]);
+   set_intr_gate(vector, interrupt_vector[i]);
}
 
 #ifdef CONFIG_SMP
diff --git a/include/asm-x86_64/irq.h b/include/asm-x86_64/irq.h
index 5006c6e..be55299 100644
--- a/include/asm-x86_64/irq.h
+++ b/include/asm-x86_64/irq.h
@@ -46,6 +46,8 @@ static __inline__ int irq_canonicalize(int irq)
 extern void fixup_irqs(cpumask_t map);
 #endif
 
+void native_init_IRQ(void);
+
 #define __ARCH_HAS_DO_SOFTIRQ 1
 
 #endif /* _ASM_IRQ_H */
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 22/25] [PATCH] turn priviled operation into a macro

2007-08-08 Thread Glauber de Oliveira Costa
under paravirt, read cr2 cannot be issued directly anymore.
So wrap it in a macro, defined to the operation itself in case
paravirt is off, but to something else if we have paravirt
in the game

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/head.S |   10 +-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/head.S b/arch/x86_64/kernel/head.S
index e89abcd..1bb6c55 100644
--- a/arch/x86_64/kernel/head.S
+++ b/arch/x86_64/kernel/head.S
@@ -18,6 +18,12 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_PARAVIRT
+#include 
+#include 
+#else
+#define GET_CR2_INTO_RCX mov %cr2, %rcx
+#endif
 
 /* we are not able to switch in one step to the final KERNEL ADRESS SPACE
  * because we need identity-mapped pages.
@@ -267,7 +273,9 @@ ENTRY(early_idt_handler)
xorl %eax,%eax
movq 8(%rsp),%rsi   # get rip
movq (%rsp),%rdx
-   movq %cr2,%rcx
+   /* When PARAVIRT is on, this operation may clobber rax. It is
+ something safe to do, because we've just zeroed rax. */
+   GET_CR2_INTO_RCX
leaq early_idt_msg(%rip),%rdi
call early_printk
cmpl $2,early_recursion_flag(%rip)
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 19/25] [PATCH] time-related functions paravirt provisions

2007-08-08 Thread Glauber de Oliveira Costa
This patch add provisions for time related functions so they
can be later replaced by paravirt versions.

it basically encloses {g,s}et_wallclock inside the
already existent functions update_persistent_clock and
read_persistent_clock, and defines {s,g}et_wallclock
to the core of such functions.

The timer interrupt setup also have to be replaced.
The job is done by time_init_hook().

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/time.c |   37 +
 1 files changed, 25 insertions(+), 12 deletions(-)

diff --git a/arch/x86_64/kernel/time.c b/arch/x86_64/kernel/time.c
index 6d48a4e..29fcd91 100644
--- a/arch/x86_64/kernel/time.c
+++ b/arch/x86_64/kernel/time.c
@@ -42,6 +42,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -82,18 +83,12 @@ EXPORT_SYMBOL(profile_pc);
  * sheet for details.
  */
 
-static int set_rtc_mmss(unsigned long nowtime)
+int do_set_rtc_mmss(unsigned long nowtime)
 {
int retval = 0;
int real_seconds, real_minutes, cmos_minutes;
unsigned char control, freq_select;
 
-/*
- * IRQs are disabled when we're called from the timer interrupt,
- * no need for spin_lock_irqsave()
- */
-
-   spin_lock(_lock);
 
 /*
  * Tell the clock it's being set and stop it.
@@ -143,14 +138,22 @@ static int set_rtc_mmss(unsigned long nowtime)
CMOS_WRITE(control, RTC_CONTROL);
CMOS_WRITE(freq_select, RTC_FREQ_SELECT);
 
-   spin_unlock(_lock);
-
return retval;
 }
 
 int update_persistent_clock(struct timespec now)
 {
-   return set_rtc_mmss(now.tv_sec);
+   int retval;
+
+/*
+ * IRQs are disabled when we're called from the timer interrupt,
+ * no need for spin_lock_irqsave()
+ */
+   spin_lock(_lock);
+   retval = set_wallclock(now.tv_sec);
+   spin_unlock(_lock);
+
+   return retval;
 }
 
 void main_timer_handler(void)
@@ -195,7 +198,7 @@ static irqreturn_t timer_interrupt(int irq, void *dev_id)
return IRQ_HANDLED;
 }
 
-unsigned long read_persistent_clock(void)
+unsigned long do_get_cmos_time(void)
 {
unsigned int year, mon, day, hour, min, sec;
unsigned long flags;
@@ -246,6 +249,11 @@ unsigned long read_persistent_clock(void)
return mktime(year, mon, day, hour, min, sec);
 }
 
+unsigned long read_persistent_clock(void)
+{
+   return get_wallclock();
+}
+
 /* calibrate_cpu is used on systems with fixed rate TSCs to determine
  * processor frequency */
 #define TICK_COUNT 1
@@ -365,6 +373,11 @@ static struct irqaction irq0 = {
.name   = "timer"
 };
 
+inline void time_init_hook()
+{
+   setup_irq(0, );
+}
+
 void __init time_init(void)
 {
if (nohpet)
@@ -403,7 +416,7 @@ void __init time_init(void)
cpu_khz / 1000, cpu_khz % 1000);
init_tsc_clocksource();
 
-   setup_irq(0, );
+   do_time_init();
 }
 
 /*
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 20/25] [PATCH] replace syscall_init

2007-08-08 Thread Glauber de Oliveira Costa
This patch replaces syscall_init by x86_64_syscall_init.
The former will be later replaced by a paravirt replacement
in case paravirt is on

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/setup64.c |8 +++-
 include/asm-x86_64/proto.h   |3 +++
 2 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/arch/x86_64/kernel/setup64.c b/arch/x86_64/kernel/setup64.c
index 49f7342..723822c 100644
--- a/arch/x86_64/kernel/setup64.c
+++ b/arch/x86_64/kernel/setup64.c
@@ -153,7 +153,7 @@ __attribute__((section(".bss.page_aligned")));
 extern asmlinkage void ignore_sysret(void);
 
 /* May not be marked __init: used by software suspend */
-void syscall_init(void)
+void x86_64_syscall_init(void)
 {
/* 
 * LSTAR and STAR live in a bit strange symbiosis.
@@ -172,6 +172,12 @@ void syscall_init(void)
wrmsrl(MSR_SYSCALL_MASK, EF_TF|EF_DF|EF_IE|0x3000); 
 }
 
+/* Overriden in paravirt.c if CONFIG_PARAVIRT */
+void __attribute__((weak)) syscall_init(void)
+{
+   x86_64_syscall_init();
+}
+
 void __cpuinit check_efer(void)
 {
unsigned long efer;
diff --git a/include/asm-x86_64/proto.h b/include/asm-x86_64/proto.h
index 31f20ad..77ed2de 100644
--- a/include/asm-x86_64/proto.h
+++ b/include/asm-x86_64/proto.h
@@ -18,6 +18,9 @@ extern void init_memory_mapping(unsigned long start, unsigned 
long end);
 
 extern void system_call(void); 
 extern int kernel_syscall(void);
+#ifdef CONFIG_PARAVIRT
+extern void x86_64_syscall_init(void);
+#endif
 extern void syscall_init(void);
 
 extern void ia32_syscall(void);
-- 
1.4.4.2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 18/25] [PATCH] turn priviled operations into macros in entry.S

2007-08-08 Thread Glauber de Oliveira Costa
With paravirt on, we cannot issue operations like swapgs, sysretq,
iretq, cli, sti. So they have to be changed into macros, that will
be later properly replaced for the paravirt case.

The sysretq is a little bit more complicated, and is replaced
by a sequence of three instructions. It is basically because if
we had already issued an swapgs, we would be with a user stack
at this point. So we do all-in-one.

The clobber list follows the idea of the i386 version closely,
and represents which registers are safe to modify at the
point the function is called.

Signed-off-by: Glauber de Oliveira Costa <[EMAIL PROTECTED]>
Signed-off-by: Steven Rostedt <[EMAIL PROTECTED]>
---
 arch/x86_64/kernel/entry.S |  125 ---
 1 files changed, 81 insertions(+), 44 deletions(-)

diff --git a/arch/x86_64/kernel/entry.S b/arch/x86_64/kernel/entry.S
index 1d232e5..48e953b 100644
--- a/arch/x86_64/kernel/entry.S
+++ b/arch/x86_64/kernel/entry.S
@@ -51,8 +51,31 @@
 #include 
 #include 
 
+#ifdef CONFIG_PARAVIRT
+#include 
+#else
+#define ENABLE_INTERRUPTS(x)   sti
+#define DISABLE_INTERRUPTS(x)  cli
+#define INTERRUPT_RETURN   iretq
+#define SWAPGS swapgs
+#define SYSRETQ\
+   movq%gs:pda_oldrsp,%rsp;\
+   swapgs; \
+   sysretq;
+#endif
+
.code64
 
+/* Currently paravirt can't handle swapgs nicely when we
+ * don't have a stack.  So we either find a way around these
+ * or just fault and emulate if a guest tries to call swapgs
+ * directly.
+ *
+ * Either way, this is a good way to document that we don't
+ * have a reliable stack.
+ */
+#define SWAPGS_NOSTACK swapgs
+
 #ifndef CONFIG_PREEMPT
 #define retint_kernel retint_restore_args
 #endif 
@@ -216,14 +239,14 @@ ENTRY(system_call)
CFI_DEF_CFA rsp,PDA_STACKOFFSET
CFI_REGISTERrip,rcx
/*CFI_REGISTER  rflags,r11*/
-   swapgs
+   SWAPGS_NOSTACK
movq%rsp,%gs:pda_oldrsp 
movq%gs:pda_kernelstack,%rsp
/*
 * No need to follow this irqs off/on section - it's straight
 * and short:
 */
-   sti 
+   ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_ARGS 8,1
movq  %rax,ORIG_RAX-ARGOFFSET(%rsp) 
movq  %rcx,RIP-ARGOFFSET(%rsp)
@@ -245,7 +268,7 @@ ret_from_sys_call:
/* edi: flagmask */
 sysret_check:  
GET_THREAD_INFO(%rcx)
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
movl threadinfo_flags(%rcx),%edx
andl %edi,%edx
@@ -259,9 +282,7 @@ sysret_check:
CFI_REGISTERrip,rcx
RESTORE_ARGS 0,-ARG_SKIP,1
/*CFI_REGISTER  rflags,r11*/
-   movq%gs:pda_oldrsp,%rsp
-   swapgs
-   sysretq
+   SYSRETQ
 
CFI_RESTORE_STATE
/* Handle reschedules */
@@ -270,7 +291,7 @@ sysret_careful:
bt $TIF_NEED_RESCHED,%edx
jnc sysret_signal
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
@@ -281,7 +302,7 @@ sysret_careful:
/* Handle a signal */ 
 sysret_signal:
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
testl $(_TIF_SIGPENDING|_TIF_SINGLESTEP|_TIF_MCE_NOTIFY),%edx
jz1f
 
@@ -294,7 +315,7 @@ sysret_signal:
 1: movl $_TIF_NEED_RESCHED,%edi
/* Use IRET because user could have changed frame. This
   works because ptregscall_common has called FIXUP_TOP_OF_STACK. */
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check

@@ -326,7 +347,7 @@ tracesys:
  */
.globl int_ret_from_sys_call
 int_ret_from_sys_call:
-   cli
+   DISABLE_INTERRUPTS(CLBR_ANY)
TRACE_IRQS_OFF
testl $3,CS-ARGOFFSET(%rsp)
je retint_restore_args
@@ -347,20 +368,20 @@ int_careful:
bt $TIF_NEED_RESCHED,%edx
jnc  int_very_careful
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
pushq %rdi
CFI_ADJUST_CFA_OFFSET 8
call schedule
popq %rdi
CFI_ADJUST_CFA_OFFSET -8
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check
 
/* handle signals and tracing -- both require a full stack frame */
 int_very_careful:
TRACE_IRQS_ON
-   sti
+   ENABLE_INTERRUPTS(CLBR_NONE)
SAVE_REST
/* Check for syscall exit trace */  
testl $(_TIF_SYSCALL_TRACE|_TIF_SYSCALL_AUDIT|_TIF_SINGLESTEP),%edx
@@ -383,7 +404,7 @@ int_signal:
 1: movl $_TIF_NEED_RESCHED,%edi
 int_restore_rest:
RESTORE_REST
-   cli
+   DISABLE_INTERRUPTS(CLBR_NONE)
TRACE_IRQS_OFF
jmp int_with_check
CFI_ENDPROC
@@ -504,7 +525,7 @@ END(stub_

<    4   5   6   7   8   9   10   11   >