Re: [PATCH] KVM: Document mmu
On 04/22/2010 11:16 PM, Marcelo Tosatti wrote: Looks good otherwise. Perhaps add a pointer to Joerg's NPT slides, although they're AMD specific. Fixed all the comments, added a Further reading section and applied. Note that this is still complete (example: large pages); patches (especially to Documentation/kvm/locking.txt) more than welcome. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Document mmu
On 04/23/2010 10:12 AM, Gui Jianfeng wrote: +Guest memory (gpa) is part of user address space of the process that is using +kvm. Userspace defines the translation between guest addresses and user +addresses (gpa-gva); note that two gpas may alias to the same gva, but not Do you mean (gpa-hva)? I do. Fixed. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Document mmu
Avi Kivity wrote: +Translation +=== + +The primary job of the mmu is to program the processor's mmu to translate +addresses for the guest. Different translations are required at different +times: + +- when guest paging is disabled, we translate guest physical addresses to + host physical addresses (gpa-hpa) +- when guest paging is enabled, we translate guest virtual addresses, to + guest virtual addresses, to host physical addresses (gva-gpa-hpa) I think you meant 'to guest physical addresses' here. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Document mmu
On 04/22/2010 10:13 AM, Karl Vogel wrote: Avi Kivity wrote: +Translation +=== + +The primary job of the mmu is to program the processor's mmu to translate +addresses for the guest. Different translations are required at different +times: + +- when guest paging is disabled, we translate guest physical addresses to + host physical addresses (gpa-hpa) +- when guest paging is enabled, we translate guest virtual addresses, to + guest virtual addresses, to host physical addresses (gva-gpa-hpa) I think you meant 'to guest physical addresses' here. Fixed. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: Document mmu
On Wed, Apr 21, 2010 at 04:09:21PM +0300, Avi Kivity wrote: Signed-off-by: Avi Kivity a...@redhat.com --- Documentation/kvm/mmu.txt | 296 + 1 files changed, 296 insertions(+), 0 deletions(-) create mode 100644 Documentation/kvm/mmu.txt diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt new file mode 100644 index 000..176f834 --- /dev/null +++ b/Documentation/kvm/mmu.txt @@ -0,0 +1,296 @@ +The x86 kvm shadow mmu +== + +The mmu (in arch/x86/kvm, files mmu.[ch] and paging_tmpl.h) is responsible +for presenting a standard x86 mmu to the guest, while translating guest +physical addresses to host physical addresses. + +The mmu code attempts to satisfy the following requirements: + +- correctness: the guest should not be able to determine that it is running + on an emulated mmu except for timing (we attempt to comply + with the specification, not emulate the characteristics of + a particular implementation such as tlb size) +- security:the guest must not be able to touch host memory not assigned + to it +- performance: minimize the performance penalty imposed by the mmu +- scaling: need to scale to large memory and large vcpu guests +- hardware:support the full range of x86 virtualization hardware +- integration: Linux memory management code must be in control of guest memory + so that swapping, page migration, page merging, transparent + hugepages, and similar features work without change +- dirty tracking: report writes to guest memory to enable live migration + and framebuffer-based displays +- footprint: keep the amount of pinned kernel memory low (most memory + should be shrinkable) +- reliablity: avoid multipage or GFP_ATOMIC allocations + +Acronyms + + +pfn host page frame number +hpa host physical address +hva host virtual address +gfn guest frame number +gpa guest page address guest physical address +gva guest virtual address +ngpa nested guest physical address +ngva nested guest virtual address +pte page table entry (used also to refer generically to paging structure + entries) +gpte guest pte (referring to gfns) +spte shadow pte (referring to pfns) +tdp two dimensional paging (vendor neutral term for NPT and EPT) + +Virtual and real hardware supported +=== + +The mmu supports first-generation mmu hardware, which allows an atomic switch +of the current paging mode and cr3 during guest entry, as well as +two-dimensional paging (AMD's NPT and Intel's EPT). The emulated hardware +it exposes is the traditional 2/3/4 level x86 mmu, with support for global +pages, pae, pse, pse36, cr0.wp, and 1GB pages. Work is in progress to support +exposing NPT capable hardware on NPT capable hosts. + +Translation +=== + +The primary job of the mmu is to program the processor's mmu to translate +addresses for the guest. Different translations are required at different +times: + +- when guest paging is disabled, we translate guest physical addresses to + host physical addresses (gpa-hpa) +- when guest paging is enabled, we translate guest virtual addresses, to + guest virtual addresses, to host physical addresses (gva-gpa-hpa) +- when the guest launches a guest of its own, we translate nested guest + virtual addresses, to nested guest physical addresses, to guest physical + addresses, to host physical addresses (ngva-ngpa-gpa-hpa) + +The primary challenge is to encode between 1 and 3 translations into hardware +that support only 1 (traditional) and 2 (tdp) translations. When the +number of required translations matches the hardware, the mmu operates in +direct mode; otherwise it operates in shadow mode (see below). + +Memory +== + +Guest memory (gpa) is part of user address space of the process that is using +kvm. Userspace defines the translation between guest addresses and user +addresses (gpa-gva); gpa-hva note that two gpas may alias to the same gva, but not +vice versa. + +These gvas may be backed using any method available to the host: anonymous +memory, file backed memory, and device memory. Memory might be paged by the +host at any time. + +Events +== + +The mmu is driven by events, some from the guest, some from the host. + +Guest generated events: +- writes to control registers (especially cr3) +- invlpg/invlpga instruction execution +- access to missing or protected translations + +Host generated events: +- changes in the gpa-hpa translation (either through gpa-hva changes or + through hva-hpa changes) +- memory pressure (the shrinker) + +Shadow pages + + +The principal data structure is the shadow page, 'struct kvm_mmu_page'. A +shadow page contains 512 sptes, which
[PATCH] KVM: Document mmu
Signed-off-by: Avi Kivity a...@redhat.com --- Documentation/kvm/mmu.txt | 296 + 1 files changed, 296 insertions(+), 0 deletions(-) create mode 100644 Documentation/kvm/mmu.txt diff --git a/Documentation/kvm/mmu.txt b/Documentation/kvm/mmu.txt new file mode 100644 index 000..176f834 --- /dev/null +++ b/Documentation/kvm/mmu.txt @@ -0,0 +1,296 @@ +The x86 kvm shadow mmu +== + +The mmu (in arch/x86/kvm, files mmu.[ch] and paging_tmpl.h) is responsible +for presenting a standard x86 mmu to the guest, while translating guest +physical addresses to host physical addresses. + +The mmu code attempts to satisfy the following requirements: + +- correctness: the guest should not be able to determine that it is running + on an emulated mmu except for timing (we attempt to comply + with the specification, not emulate the characteristics of + a particular implementation such as tlb size) +- security:the guest must not be able to touch host memory not assigned + to it +- performance: minimize the performance penalty imposed by the mmu +- scaling: need to scale to large memory and large vcpu guests +- hardware:support the full range of x86 virtualization hardware +- integration: Linux memory management code must be in control of guest memory + so that swapping, page migration, page merging, transparent + hugepages, and similar features work without change +- dirty tracking: report writes to guest memory to enable live migration + and framebuffer-based displays +- footprint: keep the amount of pinned kernel memory low (most memory + should be shrinkable) +- reliablity: avoid multipage or GFP_ATOMIC allocations + +Acronyms + + +pfn host page frame number +hpa host physical address +hva host virtual address +gfn guest frame number +gpa guest page address +gva guest virtual address +ngpa nested guest physical address +ngva nested guest virtual address +pte page table entry (used also to refer generically to paging structure + entries) +gpte guest pte (referring to gfns) +spte shadow pte (referring to pfns) +tdp two dimensional paging (vendor neutral term for NPT and EPT) + +Virtual and real hardware supported +=== + +The mmu supports first-generation mmu hardware, which allows an atomic switch +of the current paging mode and cr3 during guest entry, as well as +two-dimensional paging (AMD's NPT and Intel's EPT). The emulated hardware +it exposes is the traditional 2/3/4 level x86 mmu, with support for global +pages, pae, pse, pse36, cr0.wp, and 1GB pages. Work is in progress to support +exposing NPT capable hardware on NPT capable hosts. + +Translation +=== + +The primary job of the mmu is to program the processor's mmu to translate +addresses for the guest. Different translations are required at different +times: + +- when guest paging is disabled, we translate guest physical addresses to + host physical addresses (gpa-hpa) +- when guest paging is enabled, we translate guest virtual addresses, to + guest virtual addresses, to host physical addresses (gva-gpa-hpa) +- when the guest launches a guest of its own, we translate nested guest + virtual addresses, to nested guest physical addresses, to guest physical + addresses, to host physical addresses (ngva-ngpa-gpa-hpa) + +The primary challenge is to encode between 1 and 3 translations into hardware +that support only 1 (traditional) and 2 (tdp) translations. When the +number of required translations matches the hardware, the mmu operates in +direct mode; otherwise it operates in shadow mode (see below). + +Memory +== + +Guest memory (gpa) is part of user address space of the process that is using +kvm. Userspace defines the translation between guest addresses and user +addresses (gpa-gva); note that two gpas may alias to the same gva, but not +vice versa. + +These gvas may be backed using any method available to the host: anonymous +memory, file backed memory, and device memory. Memory might be paged by the +host at any time. + +Events +== + +The mmu is driven by events, some from the guest, some from the host. + +Guest generated events: +- writes to control registers (especially cr3) +- invlpg/invlpga instruction execution +- access to missing or protected translations + +Host generated events: +- changes in the gpa-hpa translation (either through gpa-hva changes or + through hva-hpa changes) +- memory pressure (the shrinker) + +Shadow pages + + +The principal data structure is the shadow page, 'struct kvm_mmu_page'. A +shadow page contains 512 sptes, which can be either leaf or nonleaf sptes. A +shadow page may contain a mix of leaf and nonleaf sptes. + +A nonleaf spte allows the hardware mmu to reach the leaf pages and +is not related to a translation directly. It points