Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
Andi Kleen wrote: +if (opfunc == NULL) +/* If there's no function, patch it with a ud2a (BUG) */ +ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a); This will actually give corrupted BUGs because you don't supply the full inline BUG header. Perhaps another trap would be better. The BUG handler will still report it as a normal illegal instruction. It should never happen; the main thing is that it clearly points out where the problem is (as opposed to jumping to a NULL pointer and getting the unhelpful oh, eip is zero symptom). +EXPORT_SYMBOL(paravirt_ops); Definitely _GPL at least. No, for the same reason as i386. +extern struct paravirt_ops paravirt_ops; Should be native_paravirt_ops I guess No, because its the current set of pv_ops. It starts all native, but it is either completely or partially overwritten by hypervisor-specific ops. + + * This generates an indirect call based on the operation type number. The macros here don't Yes, PARAVIRT_CALL does: call *(paravirt_ops+%c[paravirt_typenum]*8); + : =a(f) + : paravirt_type(save_fl), + paravirt_clobber(CLBR_RAX) + : memory, cc); +return f; +} + +static inline void raw_local_irq_restore(unsigned long f) +{ +__asm__ __volatile__(paravirt_alt(PARAVIRT_CALL) + : + : D (f), Have you investigated if a different input register generates better/smaller code? I would assume rdi to be usually used already for the caller's arguments so it will produce spilling Similar for the rax return in the other functions. This has to match the normal C calling convention though, doesn't it? J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
Glauber de Oliveira Costa wrote: +static unsigned native_patch(u8 type, u16 clobbers, void *insns, unsigned len) +{ + const unsigned char *start, *end; + unsigned ret; + + switch(type) { +#define SITE(x) case PARAVIRT_PATCH(x): start = start_##x; end = end_##x; goto patch_site + SITE(irq_disable); + SITE(irq_enable); + SITE(restore_fl); + SITE(save_fl); + SITE(iret); + SITE(sysret); + SITE(swapgs); + SITE(read_cr2); + SITE(read_cr3); + SITE(write_cr3); + SITE(clts); + SITE(flush_tlb_single); + SITE(wbinvd); +#undef SITE + + patch_site: + ret = paravirt_patch_insns(insns, len, start, end); + break; + + case PARAVIRT_PATCH(make_pgd): + case PARAVIRT_PATCH(pgd_val): + case PARAVIRT_PATCH(make_pte): + case PARAVIRT_PATCH(pte_val): + case PARAVIRT_PATCH(make_pmd): + case PARAVIRT_PATCH(pmd_val): + case PARAVIRT_PATCH(make_pud): + case PARAVIRT_PATCH(pud_val): + /* These functions end up returning what +they're passed in the first argument */ Is this still true with 64-bit? Either way, I don't think its worth having this here. The damage to codegen around all those sites has already happened, and the additional cost of a noop direct call is pretty trivial. I think this is a nanooptimisation which risks more problems than it could possibly be worth. + case PARAVIRT_PATCH(set_pte): + case PARAVIRT_PATCH(set_pmd): + case PARAVIRT_PATCH(set_pud): + case PARAVIRT_PATCH(set_pgd): + /* These functions end up storing the second + * argument in the location pointed by the first */ + ret = paravirt_patch_store_reg(insns, len); + break; Ditto, really. Do this in a later patch if it actually seems to help. +unsigned paravirt_patch_copy_reg(void *site, unsigned len) +{ + unsigned char *mov = site; + if (len 3) + return len; + + /* This is mov %rdi, %rax */ + *mov++ = 0x48; + *mov++ = 0x89; + *mov = 0xf8; + return 3; +} + +unsigned paravirt_patch_store_reg(void *site, unsigned len) +{ + unsigned char *mov = site; + if (len 3) + return len; + + /* This is mov %rsi, (%rdi) */ + *mov++ = 0x48; + *mov++ = 0x89; + *mov = 0x37; + return 3; +} These seem excessively special-purpose. Are their only uses the ones I commented on above. +/* + * integers must be use with care here. They can break the PARAVIRT_PATCH(x) + * macro, that divides the offset in the structure by 8, to get a number + * associated with the hook. Dividing by four would be a solution, but it + * would limit the future growth of the structure if needed. Why not just stick them at the end of the structure? J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
Glauber de Oliveira Costa wrote: +/* + * integers must be use with care here. They can break the PARAVIRT_PATCH(x) + * macro, that divides the offset in the structure by 8, to get a number + * associated with the hook. Dividing by four would be a solution, but it + * would limit the future growth of the structure if needed. Why not just stick them at the end of the structure? Does it really matter? Well, yes, if alignment is an issue. J ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
+ case PARAVIRT_PATCH(make_pgd): + case PARAVIRT_PATCH(pgd_val): + case PARAVIRT_PATCH(make_pte): + case PARAVIRT_PATCH(pte_val): + case PARAVIRT_PATCH(make_pmd): + case PARAVIRT_PATCH(pmd_val): + case PARAVIRT_PATCH(make_pud): + case PARAVIRT_PATCH(pud_val): + /* These functions end up returning what +they're passed in the first argument */ Is this still true with 64-bit? Either way, I don't think its worth having this here. The damage to codegen around all those sites has already happened, and the additional cost of a noop direct call is pretty trivial. I think this is a nanooptimisation which risks more problems than it could possibly be worth. No it is not. But it is just the comment that is broken. (I forgot to update it). The case here, is that they put in rax what they receive in rdi. + case PARAVIRT_PATCH(set_pte): + case PARAVIRT_PATCH(set_pmd): + case PARAVIRT_PATCH(set_pud): + case PARAVIRT_PATCH(set_pgd): + /* These functions end up storing the second + * argument in the location pointed by the first */ + ret = paravirt_patch_store_reg(insns, len); + break; Ditto, really. Do this in a later patch if it actually seems to help. Okay, I can remove them both. +/* + * integers must be use with care here. They can break the PARAVIRT_PATCH(x) + * macro, that divides the offset in the structure by 8, to get a number + * associated with the hook. Dividing by four would be a solution, but it + * would limit the future growth of the structure if needed. Why not just stick them at the end of the structure? Does it really matter? -- Glauber de Oliveira Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
Hm. So x86-64 doesn't make 64-bit pointers be 64-bit aligned? The ABI does of course, although the penalty of not doing it on current CPUs is only minor. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
-- On Thu, 9 Aug 2007, Andi Kleen wrote: This has to match the normal C calling convention though, doesn't it? Native cli/sti/save/restore_flags are all only assembly and can be easily (in fact more easily than in C) written as pure assembler functions. Then you can use whatever calling convention you want. I agree. Should we make a paravirt_ops_asm.S file that can implement these native funcions, and so we can get rid of the C functions only doing asm? While some paravirt implementations may have more complicated implementations i guess it's still a reasonable requirement to make them simple enough in pure assembler. If not they can use a trampoline, but that's hopefully not needed. It works for lguest64. I'm sure it should be no problem with other HVs. -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
[PATCH 25/25] [PATCH] add paravirtualization support for x86_64
This is finally, the patch we were all looking for. This patch adds a paravirt.h header with the definition of paravirt_ops struct. Also, it defines a bunch of inline functions that will replace, or hook, the other calls. Every one of those functions adds an entry in the parainstructions section (see vmlinux.lds.S). Those entries can then be used to runtime-patch the paravirt_ops functions. paravirt.c contains implementations of paravirt functions that are used natively, such as the native_patch. It also fill the paravirt_ops structure with the whole lot of functions that were (re)defined throughout this patch set. There are also changes in asm-offsets.c. paravirt.h needs it to find out the offsets into the structure of functions such as irq_enable, used in assembly files. The text in Kconfig is the same as i386 one. Signed-off-by: Glauber de Oliveira Costa [EMAIL PROTECTED] Signed-off-by: Steven Rostedt [EMAIL PROTECTED] --- arch/x86_64/Kconfig | 11 + arch/x86_64/kernel/Makefile |1 + arch/x86_64/kernel/asm-offsets.c | 14 + arch/x86_64/kernel/paravirt.c| 455 +++ arch/x86_64/kernel/vmlinux.lds.S |6 + include/asm-x86_64/paravirt.h| 901 ++ 6 files changed, 1388 insertions(+), 0 deletions(-) diff --git a/arch/x86_64/Kconfig b/arch/x86_64/Kconfig index ffa0364..bfea34c 100644 --- a/arch/x86_64/Kconfig +++ b/arch/x86_64/Kconfig @@ -373,6 +373,17 @@ config NODES_SHIFT # Dummy CONFIG option to select ACPI_NUMA from drivers/acpi/Kconfig. +config PARAVIRT + bool Paravirtualization support (EXPERIMENTAL) + depends on EXPERIMENTAL + help + Paravirtualization is a way of running multiple instances of + Linux on the same machine, under a hypervisor. This option + changes the kernel so it can modify itself when it is run + under a hypervisor, improving performance significantly. + However, when run without a hypervisor the kernel is + theoretically slower. If in doubt, say N. + config X86_64_ACPI_NUMA bool ACPI NUMA detection depends on NUMA diff --git a/arch/x86_64/kernel/Makefile b/arch/x86_64/kernel/Makefile index ff5d8c9..120467f 100644 --- a/arch/x86_64/kernel/Makefile +++ b/arch/x86_64/kernel/Makefile @@ -38,6 +38,7 @@ obj-$(CONFIG_X86_VSMP)+= vsmp.o obj-$(CONFIG_K8_NB)+= k8.o obj-$(CONFIG_AUDIT)+= audit.o +obj-$(CONFIG_PARAVIRT) += paravirt.o obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_PCI) += early-quirks.o diff --git a/arch/x86_64/kernel/asm-offsets.c b/arch/x86_64/kernel/asm-offsets.c index 778953b..a8ffc95 100644 --- a/arch/x86_64/kernel/asm-offsets.c +++ b/arch/x86_64/kernel/asm-offsets.c @@ -15,6 +15,9 @@ #include asm/segment.h #include asm/thread_info.h #include asm/ia32.h +#ifdef CONFIG_PARAVIRT +#include asm/paravirt.h +#endif #define DEFINE(sym, val) \ asm volatile(\n- #sym %0 #val : : i (val)) @@ -72,6 +75,17 @@ int main(void) offsetof (struct rt_sigframe32, uc.uc_mcontext)); BLANK(); #endif +#ifdef CONFIG_PARAVIRT +#define ENTRY(entry) DEFINE(PARAVIRT_ ## entry, offsetof(struct paravirt_ops, entry)) + ENTRY(paravirt_enabled); + ENTRY(irq_disable); + ENTRY(irq_enable); + ENTRY(sysret); + ENTRY(iret); + ENTRY(read_cr2); + ENTRY(swapgs); + BLANK(); +#endif DEFINE(pbe_address, offsetof(struct pbe, address)); DEFINE(pbe_orig_address, offsetof(struct pbe, orig_address)); DEFINE(pbe_next, offsetof(struct pbe, next)); diff --git a/arch/x86_64/kernel/paravirt.c b/arch/x86_64/kernel/paravirt.c new file mode 100644 index 000..a41c1c0 --- /dev/null +++ b/arch/x86_64/kernel/paravirt.c @@ -0,0 +1,455 @@ +/* Paravirtualization interfaces +Copyright (C) 2007 Glauber de Oliveira Costa and Steven Rostedt, +Red Hat Inc. +Based on i386 work by Rusty Russell. + +This program is free software; you can redistribute it and/or modify +it under the terms of the GNU General Public License as published by +the Free Software Foundation; either version 2 of the License, or +(at your option) any later version. + +This program is distributed in the hope that it will be useful, +but WITHOUT ANY WARRANTY; without even the implied warranty of +MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +GNU General Public License for more details. + +You should have received a copy of the GNU General Public License +along with this program; if not, write to the Free Software +Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA +*/ +#include linux/errno.h +#include linux/module.h +#include linux/efi.h +#include linux/bcd.h +#include linux/start_kernel.h + +#include asm/bug.h +#include asm/paravirt.h +#include asm/desc.h +#include asm/setup.h +#include asm/irq.h +#include
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
+config PARAVIRT + bool Paravirtualization support (EXPERIMENTAL) This should be hidden and selected by the clients as needed (I already did this change on i386) Users know nothing about paravirt, they just know about Xen, lguest etc. Strictly you would at least need a !X86_VSMP dependency, but with the vsmp change i requested that will be unnecessary Is this really synced with the latest version of the i386 code? +#ifdef CONFIG_PARAVIRT +#include asm/paravirt.h +#endif +#include linux/errno.h +#include linux/module.h +#include linux/efi.h +#include linux/bcd.h +#include linux/start_kernel.h + +#include asm/bug.h +#include asm/paravirt.h +#include asm/desc.h +#include asm/setup.h +#include asm/irq.h +#include asm/delay.h +#include asm/fixmap.h +#include asm/apic.h +#include asm/tlbflush.h +#include asm/msr.h +#include asm/page.h +#include asm/pgtable.h +#include asm/proto.h +#include asm/e820.h +#include asm/time.h +#include asm/asm-offsets.h +#include asm/smp.h +#include asm/irqflags.h Are the includes really all needed? + if (opfunc == NULL) + /* If there's no function, patch it with a ud2a (BUG) */ + ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a); This will actually give corrupted BUGs because you don't supply the full inline BUG header. Perhaps another trap would be better. +EXPORT_SYMBOL(paravirt_ops); Definitely _GPL at least. +extern struct paravirt_ops paravirt_ops; Should be native_paravirt_ops I guess + + * This generates an indirect call based on the operation type number. The macros here don't +static inline unsigned long read_msr(unsigned int msr) +{ + int __err; No need for __ in inlines +/* The paravirtualized I/O functions */ +static inline void slow_down_io(void) { I doubt this needs to be inline and it's large + __asm__ __volatile__(paravirt_alt(PARAVIRT_CALL) No __*__ in new code please + : =a(f) + : paravirt_type(save_fl), +paravirt_clobber(CLBR_RAX) + : memory, cc); + return f; +} + +static inline void raw_local_irq_restore(unsigned long f) +{ + __asm__ __volatile__(paravirt_alt(PARAVIRT_CALL) + : + : D (f), Have you investigated if a different input register generates better/smaller code? I would assume rdi to be usually used already for the caller's arguments so it will produce spilling Similar for the rax return in the other functions. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
On Wed, 8 Aug 2007, Andi Kleen wrote: Strictly you would at least need a !X86_VSMP dependency, but with the vsmp change i requested that will be unnecessary Is this really synced with the latest version of the i386 code? Glauber started the paravirt ops 64 a second time around, from scratch using the PVOPS of i386 as a base. But since we couldn't just take a the PVOPS patch from i386 and apply it to x86_64, this was mainly done by looking at i386 code and massaging it for x86_64. Somethings may have slipped (and we may have been looking at different versions of PVOPS). It's not easy trying to keep up with a moving target ;-) -- Steve ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
On 8/8/07, Andi Kleen [EMAIL PROTECTED] wrote: Is this really synced with the latest version of the i386 code? Roasted already commented on this. I will check out and change it here. +#ifdef CONFIG_PARAVIRT +#include asm/paravirt.h +#endif +#include linux/errno.h +#include linux/module.h +#include linux/efi.h +#include linux/bcd.h +#include linux/start_kernel.h + +#include asm/bug.h +#include asm/paravirt.h +#include asm/desc.h +#include asm/setup.h +#include asm/irq.h +#include asm/delay.h +#include asm/fixmap.h +#include asm/apic.h +#include asm/tlbflush.h +#include asm/msr.h +#include asm/page.h +#include asm/pgtable.h +#include asm/proto.h +#include asm/e820.h +#include asm/time.h +#include asm/asm-offsets.h +#include asm/smp.h +#include asm/irqflags.h Are the includes really all needed? delay.h is not needed anymore. Most of them, could be maybe moved to paravirt.c , which is the one that really needs all the native_ things. Yeah, it will be better code this way, will change. + if (opfunc == NULL) + /* If there's no function, patch it with a ud2a (BUG) */ + ret = paravirt_patch_insns(site, len, start_ud2a, end_ud2a); This will actually give corrupted BUGs because you don't supply the full inline BUG header. Perhaps another trap would be better. You mean this: +#include asm/bug.h ? +EXPORT_SYMBOL(paravirt_ops); Definitely _GPL at least. Sure. Should be native_paravirt_ops I guess makes sense. + + * This generates an indirect call based on the operation type number. The macros here don't +static inline unsigned long read_msr(unsigned int msr) +{ + int __err; No need for __ in inlines Right. Thanks. +/* The paravirtualized I/O functions */ +static inline void slow_down_io(void) { I doubt this needs to be inline and it's large In a second look, i386 have such a function in io.h because they need slow_down_io in a bunch of I/O instructions. It seems that we do not. Could we just get rid of it, then? + __asm__ __volatile__(paravirt_alt(PARAVIRT_CALL) No __*__ in new code please Yup, will fix. + : =a(f) + : paravirt_type(save_fl), +paravirt_clobber(CLBR_RAX) + : memory, cc); + return f; +} + +static inline void raw_local_irq_restore(unsigned long f) +{ + __asm__ __volatile__(paravirt_alt(PARAVIRT_CALL) + : + : D (f), Have you investigated if a different input register generates better/smaller code? I would assume rdi to be usually used already for the caller's arguments so it will produce spilling Similar for the rax return in the other functions. I don't think we can do different. These functions can be patched, and if it happens, they will put their return value in rax. So we'd better expect it there. Same goes for rdi, as they will expect the value to be there as an input. I don't think it will spill in the normal case, as rdi is already the parameter. So the compiler will just leave it there, untouched. -- Glauber de Oliveira Costa. Free as in Freedom http://glommer.net The less confident you are, the more serious you have to act. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization
Re: [PATCH 25/25] [PATCH] add paravirtualization support for x86_64
On Thursday 09 August 2007 01:18:57 Rusty Russell wrote: On Wed, 2007-08-08 at 11:49 -0300, Glauber de Oliveira Costa wrote: On 8/8/07, Andi Kleen [EMAIL PROTECTED] wrote: +EXPORT_SYMBOL(paravirt_ops); Definitely _GPL at least. Sure. We ended up making it EXPORT_SYMBOL in i386 because every driver wants to save and restore interrupt state. Ah true. But questionably-licensed drivers might be less of a concern on x86-64. Nvidia/ATI and other binary modules exist too and users will probably unhappy if they cannot run them anymore. But at usually irq state changes should be patched in anyways and won't need paravirt I guess? Hmm, actually thinking about it the module loader probably has no clue that the relocation it linked will be overwritten so it'll check for the export anyways. So the alternatives would be to add ugly hacks to the module loader or split paravirt_ops in common and low level system areas or export it as a normal export. Not sure what's best. Ok using a normal export is easiest and not that big an issue. -Andi ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linux-foundation.org/mailman/listinfo/virtualization