Re: Nested EPT Write Protection

2015-06-22 Thread Paolo Bonzini


On 22/06/2015 15:28, Hu Yaohui wrote:
> 
> */2504 pseudo_gfn = base_addr >> PAGE_SHIFT;
> 2505 sp = kvm_mmu_get_page(vcpu, pseudo_gfn, iterator.addr,
> 2506   iterator.level - 1,
> 2507   1, ACC_ALL, iterator.sptep);/*
> 2508 if (!sp) {
> 2509 pgprintk("nonpaging_map: ENOMEM\n");
> 2510 kvm_release_pfn_clean(pfn);
> 2511 return -ENOMEM;
> 2512 }
>.
> 
> 
> it will get a pseudo_gfn to allocate a kvm_mmu_page. What if a
> pseudo_gfn itself causes a tdp_page_fault?
> Will it make the corresponding EPT page table entry marked as readonly also?

If tdp_page_fault is used (meaning non-nested KVM: nested KVM uses
ept_page_fault instead), sp->unsync is always true:

/* in kvm_mmu_get_page - __direct_map passes direct == true */
if (!direct) {
if (rmap_write_protect(vcpu, gfn))
kvm_flush_remote_tlbs(vcpu->kvm);
if (level > PT_PAGE_TABLE_LEVEL && need_sync)
kvm_sync_pages(vcpu, gfn);

account_shadowed(vcpu->kvm, sp);
}

so mmu_need_write_protect always returns false.

Note that higher in kvm_mmu_get_page there is another conditional:

if (!need_sync && sp->unsync)
need_sync = true;

but it only applies to the !direct case.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in


Re: Nested EPT Write Protection

2015-06-22 Thread Paolo Bonzini


On 19/06/2015 20:57, Hu Yaohui wrote:
> One more thing, for the standard guest VM which uses EPT, What's the
> usage of "gfn" field in the "struct kvm_mmu_page"?  Since it uses EPT,
> a single shadow page should has no relation with any of the guest
> physical page, right?

The gfn is the same value that you can find in bits 12 to MAXPHYADDR-1
of the EPT page table entry.

Paolo

> According to the source code, each allocated
> shadow page "struct kvm_mmu_page" got it's gfn field filled.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in


Re: Nested EPT Write Protection

2015-06-19 Thread Hu Yaohui
Thanks a lot! It's much straightforward to me right now.

One more thing, for the standard guest VM which uses EPT, What's the
usage of "gfn" field in the "struct kvm_mmu_page"?  Since it uses EPT,
a single shadow page should has no relation with any of the guest
physical page, right? According to the source code, each allocated
shadow page "struct kvm_mmu_page" got it's gfn field filled.

Thanks,
Yaohui

On Fri, Jun 19, 2015 at 11:23 AM, Paolo Bonzini  wrote:
>
>
> On 19/06/2015 14:44, Hu Yaohui wrote:
>> Hi Paolo,
>> Thanks a lot!
>>
>> On Fri, Jun 19, 2015 at 2:27 AM, Paolo Bonzini  wrote:
>>>
>>>
>>> On 19/06/2015 03:52, Hu Yaohui wrote:
>>>> Hi All,
>>>> In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the
>>>> nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If
>>>> the L1 guest writes to the guest EPT(EPT12). How can the shadow
>>>> EPT(EPT02) be modified according?
>>>
>>> Because the EPT02 is write protected, writes to the EPT12 will trap to
>>> the hypervisor.  The hypervisor will execute the write instruction
>>> before reentering the guest and invalidate the modified parts of the
>>> EPT02.  When the invalidated part of the EPT02 is accessed, the
>>> hypervisor will rebuild it according to the EPT12 and the KVM memslots.
>>>
>> Do you mean EPT12 is write protected instead of EPT02?
>
> Yes, sorry.
>
>> According to my understanding, EPT12 will be write protected by marking the
>> page table entry of EPT01 as readonly or marking the host page table
>> entry as readonly.
>> Could you please be more specific the code path which makes the
>> corresponding page table entry as write protected?
>
> Look at set_spte's call to mmu_need_write_protect.
>
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in


Re: Nested EPT Write Protection

2015-06-19 Thread Paolo Bonzini


On 19/06/2015 14:44, Hu Yaohui wrote:
> Hi Paolo,
> Thanks a lot!
> 
> On Fri, Jun 19, 2015 at 2:27 AM, Paolo Bonzini  wrote:
>>
>>
>> On 19/06/2015 03:52, Hu Yaohui wrote:
>>> Hi All,
>>> In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the
>>> nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If
>>> the L1 guest writes to the guest EPT(EPT12). How can the shadow
>>> EPT(EPT02) be modified according?
>>
>> Because the EPT02 is write protected, writes to the EPT12 will trap to
>> the hypervisor.  The hypervisor will execute the write instruction
>> before reentering the guest and invalidate the modified parts of the
>> EPT02.  When the invalidated part of the EPT02 is accessed, the
>> hypervisor will rebuild it according to the EPT12 and the KVM memslots.
>>
> Do you mean EPT12 is write protected instead of EPT02?

Yes, sorry.

> According to my understanding, EPT12 will be write protected by marking the
> page table entry of EPT01 as readonly or marking the host page table
> entry as readonly.
> Could you please be more specific the code path which makes the
> corresponding page table entry as write protected?

Look at set_spte's call to mmu_need_write_protect.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in


Re: Nested EPT Write Protection

2015-06-19 Thread Hu Yaohui
Hi Paolo,
Thanks a lot!

On Fri, Jun 19, 2015 at 2:27 AM, Paolo Bonzini  wrote:
>
>
> On 19/06/2015 03:52, Hu Yaohui wrote:
>> Hi All,
>> In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the
>> nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If
>> the L1 guest writes to the guest EPT(EPT12). How can the shadow
>> EPT(EPT02) be modified according?
>
> Because the EPT02 is write protected, writes to the EPT12 will trap to
> the hypervisor.  The hypervisor will execute the write instruction
> before reentering the guest and invalidate the modified parts of the
> EPT02.  When the invalidated part of the EPT02 is accessed, the
> hypervisor will rebuild it according to the EPT12 and the KVM memslots.
>
Do you mean EPT12 is write protected instead of EPT02?
According to my understanding, EPT12 will be write protected by marking the
page table entry of EPT01 as readonly or marking the host page table
entry as readonly.
Could you please be more specific the code path which makes the
corresponding page table entry as write protected?

Thanks,
Yaohui
> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in


Re: Nested EPT Write Protection

2015-06-18 Thread Paolo Bonzini


On 19/06/2015 03:52, Hu Yaohui wrote:
> Hi All,
> In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the
> nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If
> the L1 guest writes to the guest EPT(EPT12). How can the shadow
> EPT(EPT02) be modified according?

Because the EPT02 is write protected, writes to the EPT12 will trap to
the hypervisor.  The hypervisor will execute the write instruction
before reentering the guest and invalidate the modified parts of the
EPT02.  When the invalidated part of the EPT02 is accessed, the
hypervisor will rebuild it according to the EPT12 and the KVM memslots.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Nested EPT Write Protection

2015-06-18 Thread Hu Yaohui
Hi All,
In kernel 3.14.2, the kvm uses shadow EPT(EPT02) to implement the
nested EPT. The shadow EPT(EPT02) is a shadow of guest EPT (EPT12). If
the L1 guest writes to the guest EPT(EPT12). How can the shadow
EPT(EPT02) be modified according?

Thanks,
Yaohui
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53611] nVMX: Add nested EPT

2015-04-08 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53611

Paolo Bonzini  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||bonz...@gnu.org
 Kernel Version||3.19
 Resolution|--- |CODE_FIX

--- Comment #2 from Paolo Bonzini  ---
Fixed by commit afa61f752ba6 (Advertise the support of EPT to the L1 guest,
through the appropriate MSR., 2013-08-07)

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53611] nVMX: Add nested EPT

2015-03-16 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53611

Bandan Das  changed:

   What|Removed |Added

 Blocks||94971

-- 
You are receiving this mail because:
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Nested EPT page fault

2014-05-05 Thread Hu Yaohui
Hi,
I have one question related to nested EPT page fault.
At the very start, L0 hypervisor launches L2 with an empty EPT0->2
table, building the table on-the-fly.
when one L2 physical page is accessed, ept_page_fault(paging_tmpl.h)
will be called to handle this fault in L0. which will first call
ept_walk_addr to get guest ept entry from EPT1->2. If there is no such
entry, a guest page fault will be injected into L1 to handle this
fault.
When the next time, the same L2 physical page is accessed,
ept_page_fault will be triggered again in L0, which will also call
ept_walk_addr and get the previously filled ept entry in EPT1->2, then
try_async_pf will be called to translate the L1 physical page to L0
physical page. At the very last, an entry will be created in the
EPT0->2 to solve the page fault.
Please correct me if I am wrong.

My question is when the EPT0->1 will be accessed during the EPT0->2
entry created, since according to the turtle's paper, both EPT0->1 and
EPT->12 will be accessed to populate an entry in EPT0->2.

Thanks for your time!

Best Wishes,
Yaohui
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-17 Thread duy hai nguyen
Thank you very much.

I can launch L2 from L1 by directly using qemu-system-x86_64
name_of_image. L2 still hangs if I launch it using 'virsh' command;
libvirt shows this log:

warning : virAuditSend:135 : Failed to send audit message virt=kvm
resrc=net reason=start vm="L2"
uuid=e9549443-e63f-31b5-0692-1396736d06b4 old-net=?
new-net=52:54:00:75:c1:5b: Operation not permitted

I am using libvirt 1.1.1. Is it the above warning that causes the problem?

Best,
Hai

On Fri, Jan 17, 2014 at 6:40 AM, Jan Kiszka  wrote:
> On 2014-01-17 12:29, Kashyap Chamarthy wrote:
>> On Fri, Jan 17, 2014 at 2:51 AM, duy hai nguyen  wrote:
>>> Now I can run an L2 guest (nested guest)  using the kvm kernel module
>>> of kernel 3.12
>>>
>>> However, I am facing a new problem when trying to build and use kvm
>>> kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
>>> hypervisor) cannot boot L2  and the graphic console of virt-manager
>>> hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.
>>>
>>> Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
>>> solve the problem. I have also tried with different kernel versions:
>>> 3.12.0, 3.12.8 and 3.13.0 without success.
>>>
>>> Can you give me some suggestions?
>>
>> Maybe you can try without graphical managers and enable serial console
>> ('console=ttyS0') to your Kernel command-line of L2 guest, so you can
>> see where it's stuck.
>
> Tracing can also be helpful, both in L1 and L0:
>
> http://www.linux-kvm.org/page/Tracing
>
> Jan
>
> --
> Siemens AG, Corporate Technology, CT RTC ITP SES-DE
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-17 Thread Jan Kiszka
On 2014-01-17 12:29, Kashyap Chamarthy wrote:
> On Fri, Jan 17, 2014 at 2:51 AM, duy hai nguyen  wrote:
>> Now I can run an L2 guest (nested guest)  using the kvm kernel module
>> of kernel 3.12
>>
>> However, I am facing a new problem when trying to build and use kvm
>> kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
>> hypervisor) cannot boot L2  and the graphic console of virt-manager
>> hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.
>>
>> Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
>> solve the problem. I have also tried with different kernel versions:
>> 3.12.0, 3.12.8 and 3.13.0 without success.
>>
>> Can you give me some suggestions?
> 
> Maybe you can try without graphical managers and enable serial console
> ('console=ttyS0') to your Kernel command-line of L2 guest, so you can
> see where it's stuck.

Tracing can also be helpful, both in L1 and L0:

http://www.linux-kvm.org/page/Tracing

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-17 Thread Kashyap Chamarthy
On Fri, Jan 17, 2014 at 2:51 AM, duy hai nguyen  wrote:
> Now I can run an L2 guest (nested guest)  using the kvm kernel module
> of kernel 3.12
>
> However, I am facing a new problem when trying to build and use kvm
> kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
> hypervisor) cannot boot L2  and the graphic console of virt-manager
> hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.
>
> Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
> solve the problem. I have also tried with different kernel versions:
> 3.12.0, 3.12.8 and 3.13.0 without success.
>
> Can you give me some suggestions?

Maybe you can try without graphical managers and enable serial console
('console=ttyS0') to your Kernel command-line of L2 guest, so you can
see where it's stuck.

/kashyap
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-16 Thread duy hai nguyen
Now I can run an L2 guest (nested guest)  using the kvm kernel module
of kernel 3.12

However, I am facing a new problem when trying to build and use kvm
kernel module from git://git.kiszka.org/kvm-kmod.git: L1 (nested
hypervisor) cannot boot L2  and the graphic console of virt-manager
hangs displaying 'Booting from Hard Disk...'. L1 still runs fine.

Loading kvm_intel with 'emulate_invalid_guest_state=0' in L0 does not
solve the problem. I have also tried with different kernel versions:
3.12.0, 3.12.8 and 3.13.0 without success.

Can you give me some suggestions?

Thank you very much

Best,
Hai

On Thu, Jan 16, 2014 at 1:17 PM, duy hai nguyen  wrote:
> Thanks, Jan and Paolo!
>
> Great! It helps solve the problem.
>
> Sincerely,
> Hai
>
> On Thu, Jan 16, 2014 at 12:09 PM, Paolo Bonzini  wrote:
>> Il 16/01/2014 17:10, duy hai nguyen ha scritto:
>>> Dear All,
>>>
>>> I am having a problem with using nested EPT in my system: In L0
>>> hypervisor CPUs support vmx and ept; however, L1 hypervisor's CPUs do
>>> not have ept capability. Flag 'ept' appears in /proc/cpuinfo of L0 but
>>> does not show in that of L1.
>>>
>>> - 'Nested' and 'EPT' are enabled in L0:
>>>
>>> $cat /sys/module/kvm_intel/parameters/nested
>>> Y
>>>
>>> $cat /sys/module/kvm_intel/parameters/ept
>>> Y
>>>
>>> - The libvirt xml file used in L0 has this cpu configuration:
>>>
>>> 
>>>
>>> - The kernel version I am using for both L0 and L1 is 3.9.11
>>
>> Nested EPT was added in 3.12.  You need that version in L0.
>>
>> Paolo
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-16 Thread duy hai nguyen
Thanks, Jan and Paolo!

Great! It helps solve the problem.

Sincerely,
Hai

On Thu, Jan 16, 2014 at 12:09 PM, Paolo Bonzini  wrote:
> Il 16/01/2014 17:10, duy hai nguyen ha scritto:
>> Dear All,
>>
>> I am having a problem with using nested EPT in my system: In L0
>> hypervisor CPUs support vmx and ept; however, L1 hypervisor's CPUs do
>> not have ept capability. Flag 'ept' appears in /proc/cpuinfo of L0 but
>> does not show in that of L1.
>>
>> - 'Nested' and 'EPT' are enabled in L0:
>>
>> $cat /sys/module/kvm_intel/parameters/nested
>> Y
>>
>> $cat /sys/module/kvm_intel/parameters/ept
>> Y
>>
>> - The libvirt xml file used in L0 has this cpu configuration:
>>
>> 
>>
>> - The kernel version I am using for both L0 and L1 is 3.9.11
>
> Nested EPT was added in 3.12.  You need that version in L0.
>
> Paolo
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-16 Thread Paolo Bonzini
Il 16/01/2014 17:10, duy hai nguyen ha scritto:
> Dear All,
> 
> I am having a problem with using nested EPT in my system: In L0
> hypervisor CPUs support vmx and ept; however, L1 hypervisor's CPUs do
> not have ept capability. Flag 'ept' appears in /proc/cpuinfo of L0 but
> does not show in that of L1.
> 
> - 'Nested' and 'EPT' are enabled in L0:
> 
> $cat /sys/module/kvm_intel/parameters/nested
> Y
> 
> $cat /sys/module/kvm_intel/parameters/ept
> Y
> 
> - The libvirt xml file used in L0 has this cpu configuration:
> 
> 
> 
> - The kernel version I am using for both L0 and L1 is 3.9.11

Nested EPT was added in 3.12.  You need that version in L0.

Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: nested EPT

2014-01-16 Thread Jan Kiszka
On 2014-01-16 17:10, duy hai nguyen wrote:
> Dear All,
> 
> I am having a problem with using nested EPT in my system: In L0
> hypervisor CPUs support vmx and ept; however, L1 hypervisor's CPUs do
> not have ept capability. Flag 'ept' appears in /proc/cpuinfo of L0 but
> does not show in that of L1.
> 
> - 'Nested' and 'EPT' are enabled in L0:
> 
> $cat /sys/module/kvm_intel/parameters/nested
> Y
> 
> $cat /sys/module/kvm_intel/parameters/ept
> Y
> 
> - The libvirt xml file used in L0 has this cpu configuration:
> 
> 
> 
> - The kernel version I am using for both L0 and L1 is 3.9.11
> 

Update your host kernel (L0), nEPT got merged in 3.12.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SES-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


nested EPT

2014-01-16 Thread duy hai nguyen
Dear All,

I am having a problem with using nested EPT in my system: In L0
hypervisor CPUs support vmx and ept; however, L1 hypervisor's CPUs do
not have ept capability. Flag 'ept' appears in /proc/cpuinfo of L0 but
does not show in that of L1.

- 'Nested' and 'EPT' are enabled in L0:

$cat /sys/module/kvm_intel/parameters/nested
Y

$cat /sys/module/kvm_intel/parameters/ept
Y

- The libvirt xml file used in L0 has this cpu configuration:



- The kernel version I am using for both L0 and L1 is 3.9.11



Thank you very much.

Best,
Hai
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm-unit-tests: VMX: Fix some nested EPT related bugs

2013-09-09 Thread Paolo Bonzini
Il 09/09/2013 17:55, Arthur Chunqi Li ha scritto:
> This patch fix 3 bugs in VMX framework and EPT framework
> 1. Fix bug of setting default value of CPU_SECONDARY
> 2. Fix bug of reading MSR_IA32_VMX_PROCBASED_CTLS2 and
> MSR_IA32_VMX_EPT_VPID_CAP
> 3. For EPT violation and misconfiguration reduced vmexit, vmcs field
> "VM-exit instruction length" is not used and will return unexpected
> value when read.
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
>  x86/vmx.c   |   13 ++---
>  x86/vmx_tests.c |2 --
>  2 files changed, 10 insertions(+), 5 deletions(-)
> 
> diff --git a/x86/vmx.c b/x86/vmx.c
> index 87d1d55..9db4ef4 100644
> --- a/x86/vmx.c
> +++ b/x86/vmx.c
> @@ -304,7 +304,8 @@ static void init_vmcs_ctrl(void)
>   /* Disable VMEXIT of IO instruction */
>   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
>   if (ctrl_cpu_rev[0].set & CPU_SECONDARY) {
> - ctrl_cpu[1] |= ctrl_cpu_rev[1].set & ctrl_cpu_rev[1].clr;
> + ctrl_cpu[1] = (ctrl_cpu[1] | ctrl_cpu_rev[1].set) &
> + ctrl_cpu_rev[1].clr;
>   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]);
>   }
>   vmcs_write(CR3_TARGET_COUNT, 0);
> @@ -489,8 +490,14 @@ static void init_vmx(void)
>   : MSR_IA32_VMX_ENTRY_CTLS);
>   ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
>   : MSR_IA32_VMX_PROCBASED_CTLS);
> - ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
> - ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
> + if ((ctrl_cpu_rev[0].clr & CPU_SECONDARY) != 0)
> + ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
> + else
> + ctrl_cpu_rev[1].val = 0;
> + if ((ctrl_cpu_rev[1].clr & (CPU_EPT | CPU_VPID)) != 0)
> + ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
> + else
> + ept_vpid.val = 0;
>  
>   write_cr0((read_cr0() & fix_cr0_clr) | fix_cr0_set);
>   write_cr4((read_cr4() & fix_cr4_clr) | fix_cr4_set | X86_CR4_VMXE);
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index 6d972c0..e891a9f 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -1075,7 +1075,6 @@ static int ept_exit_handler()
>   print_vmexit_info();
>   return VMX_TEST_VMEXIT;
>   }
> - vmcs_write(GUEST_RIP, guest_rip + insn_len);
>   return VMX_TEST_RESUME;
>   case VMX_EPT_VIOLATION:
>   switch(get_stage()) {
> @@ -1100,7 +1099,6 @@ static int ept_exit_handler()
>   print_vmexit_info();
>   return VMX_TEST_VMEXIT;
>   }
> - vmcs_write(GUEST_RIP, guest_rip + insn_len);
>   return VMX_TEST_RESUME;
>   default:
>   printf("Unknown exit reason, %d\n", reason);
> 

Looks good, thanks!

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Paolo Bonzini
Il 09/09/2013 17:29, Arthur Chunqi Li ha scritto:
> Hi Paolo,
> I noticed another possible bug of this patch. Stage 4 of this patch
> test the scenario that the page of a paging structure is not present,
> then this will cause EPT violation vmexit with bit 8 of exit_qual
> unset. My question is: will instruction length be correctly set on
> this scenario? I got wrong insn_len in "case 4" of VMX_EPT_VIOLATION,
> which may cause triple fault vmexit.

It's plausible that the instruction length is wrong, since the processor
might be fetching the instruction itself and doesn't know the length.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-unit-tests: VMX: Fix some nested EPT related bugs

2013-09-09 Thread Arthur Chunqi Li
This patch fix 3 bugs in VMX framework and EPT framework
1. Fix bug of setting default value of CPU_SECONDARY
2. Fix bug of reading MSR_IA32_VMX_PROCBASED_CTLS2 and
MSR_IA32_VMX_EPT_VPID_CAP
3. For EPT violation and misconfiguration reduced vmexit, vmcs field
"VM-exit instruction length" is not used and will return unexpected
value when read.

Signed-off-by: Arthur Chunqi Li 
---
 x86/vmx.c   |   13 ++---
 x86/vmx_tests.c |2 --
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/x86/vmx.c b/x86/vmx.c
index 87d1d55..9db4ef4 100644
--- a/x86/vmx.c
+++ b/x86/vmx.c
@@ -304,7 +304,8 @@ static void init_vmcs_ctrl(void)
/* Disable VMEXIT of IO instruction */
vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
if (ctrl_cpu_rev[0].set & CPU_SECONDARY) {
-   ctrl_cpu[1] |= ctrl_cpu_rev[1].set & ctrl_cpu_rev[1].clr;
+   ctrl_cpu[1] = (ctrl_cpu[1] | ctrl_cpu_rev[1].set) &
+   ctrl_cpu_rev[1].clr;
vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1]);
}
vmcs_write(CR3_TARGET_COUNT, 0);
@@ -489,8 +490,14 @@ static void init_vmx(void)
: MSR_IA32_VMX_ENTRY_CTLS);
ctrl_cpu_rev[0].val = rdmsr(basic.ctrl ? MSR_IA32_VMX_TRUE_PROC
: MSR_IA32_VMX_PROCBASED_CTLS);
-   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
-   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
+   if ((ctrl_cpu_rev[0].clr & CPU_SECONDARY) != 0)
+   ctrl_cpu_rev[1].val = rdmsr(MSR_IA32_VMX_PROCBASED_CTLS2);
+   else
+   ctrl_cpu_rev[1].val = 0;
+   if ((ctrl_cpu_rev[1].clr & (CPU_EPT | CPU_VPID)) != 0)
+   ept_vpid.val = rdmsr(MSR_IA32_VMX_EPT_VPID_CAP);
+   else
+   ept_vpid.val = 0;
 
write_cr0((read_cr0() & fix_cr0_clr) | fix_cr0_set);
write_cr4((read_cr4() & fix_cr4_clr) | fix_cr4_set | X86_CR4_VMXE);
diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index 6d972c0..e891a9f 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1075,7 +1075,6 @@ static int ept_exit_handler()
print_vmexit_info();
return VMX_TEST_VMEXIT;
}
-   vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
case VMX_EPT_VIOLATION:
switch(get_stage()) {
@@ -1100,7 +1099,6 @@ static int ept_exit_handler()
print_vmexit_info();
return VMX_TEST_VMEXIT;
}
-   vmcs_write(GUEST_RIP, guest_rip + insn_len);
return VMX_TEST_RESUME;
default:
printf("Unknown exit reason, %d\n", reason);
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Arthur Chunqi Li
On Mon, Sep 9, 2013 at 12:57 PM, Arthur Chunqi Li  wrote:
> Some test cases for nested EPT features, including:
> 1. EPT basic framework tests: read, write and remap.
> 2. EPT misconfigurations test cases: page permission mieconfiguration
> and memory type misconfiguration
> 3. EPT violations test cases: page permission violation and paging
> structure violation
>
> Signed-off-by: Arthur Chunqi Li 
> ---
>  x86/vmx_tests.c |  266 
> +++
>  1 file changed, 266 insertions(+)
>
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index c1b39f4..a0b9824 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -1,4 +1,36 @@
>  #include "vmx.h"
> +#include "processor.h"
> +#include "vm.h"
> +#include "msr.h"
> +#include "fwcfg.h"
> +
> +volatile u32 stage;
> +volatile bool init_fail;
> +unsigned long *pml4;
> +u64 eptp;
> +void *data_page1, *data_page2;
> +
> +static inline void set_stage(u32 s)
> +{
> +   barrier();
> +   stage = s;
> +   barrier();
> +}
> +
> +static inline u32 get_stage()
> +{
> +   u32 s;
> +
> +   barrier();
> +   s = stage;
> +   barrier();
> +   return s;
> +}
> +
> +static inline void vmcall()
> +{
> +   asm volatile ("vmcall");
> +}
>
>  void basic_init()
>  {
> @@ -76,6 +108,238 @@ int vmenter_exit_handler()
> return VMX_TEST_VMEXIT;
>  }
>
> +static int setup_ept()
> +{
> +   int support_2m;
> +   unsigned long end_of_memory;
> +
> +   if (!(ept_vpid.val & EPT_CAP_UC) &&
> +   !(ept_vpid.val & EPT_CAP_WB)) {
> +   printf("\tEPT paging-structure memory type "
> +   "UC&WB are not supported\n");
> +   return 1;
> +   }
> +   if (ept_vpid.val & EPT_CAP_UC)
> +   eptp = EPT_MEM_TYPE_UC;
> +   else
> +   eptp = EPT_MEM_TYPE_WB;
> +   if (!(ept_vpid.val & EPT_CAP_PWL4)) {
> +   printf("\tPWL4 is not supported\n");
> +   return 1;
> +   }
> +   eptp |= (3 << EPTP_PG_WALK_LEN_SHIFT);
> +   pml4 = alloc_page();
> +   memset(pml4, 0, PAGE_SIZE);
> +   eptp |= virt_to_phys(pml4);
> +   vmcs_write(EPTP, eptp);
> +   support_2m = !!(ept_vpid.val & EPT_CAP_2M_PAGE);
> +   end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
> +   if (end_of_memory < (1ul << 32))
> +   end_of_memory = (1ul << 32);
> +   if (setup_ept_range(pml4, 0, end_of_memory,
> +   0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
> +   printf("\tSet ept tables failed.\n");
> +   return 1;
> +   }
> +   return 0;
> +}
> +
> +static void ept_init()
> +{
> +   u32 ctrl_cpu[2];
> +
> +   init_fail = false;
> +   ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
> +   ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
> +   ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
> +   & ctrl_cpu_rev[0].clr;
> +   ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
> +   & ctrl_cpu_rev[1].clr;
> +   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
> +   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
> +   if (setup_ept())
> +   init_fail = true;
> +   data_page1 = alloc_page();
> +   data_page2 = alloc_page();
> +   memset(data_page1, 0x0, PAGE_SIZE);
> +   memset(data_page2, 0x0, PAGE_SIZE);
> +   *((u32 *)data_page1) = MAGIC_VAL_1;
> +   *((u32 *)data_page2) = MAGIC_VAL_2;
> +   install_ept(pml4, (unsigned long)data_page1, (unsigned 
> long)data_page2,
> +   EPT_RA | EPT_WA | EPT_EA);
> +}
> +
> +static void ept_main()
> +{
> +   if (init_fail)
> +   return;
> +   if (!(ctrl_cpu_rev[0].clr & CPU_SECONDARY)
> +   && !(ctrl_cpu_rev[1].clr & CPU_EPT)) {
> +   printf("\tEPT is not supported");
> +   return;
> +   }
> +   set_stage(0);
> +   if (*((u32 *)data_page2) != MAGIC_VAL_1 &&
> +   *((u32 *)data_page1) != MAGIC_VAL_1)
> +   report("EPT basic framework - read", 0);
> +   else {
> +   *((u32 *)data_page2) = MAGIC_VAL_3;
> +   vmcall();
> +   if (get_stage() == 1) {
> +   if (*((u32 *)data_page1) == MAGIC_VAL_3 &&
> +  

Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Paolo Bonzini
Il 09/09/2013 16:11, Arthur Chunqi Li ha scritto:
>>> >> +volatile u32 stage;
>>> >> +volatile bool init_fail;
>> >
>> > Why volatile?
> Because init_fail is only set but not used later in ept_init(), and if
> I don't add volatile, compiler may optimize the setting to init_fail.
> 
> This occasion firstly occurred when I write set_stage/get_stage. If
> one variant is set in a function but not used later, the compiler
> usually optimizes this setting as redundant assignment and remove it.

No, the two are different.  "stage" is written several times in the same
function, with no code in the middle:

stage++;
*p = 1;
stage++;

To the compiler, the first store is dead.  The compiler doesn't know
that "*p = 1" traps to the hypervisor.

But this is not the case for "init_fail".

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Arthur Chunqi Li
On Mon, Sep 9, 2013 at 9:56 PM, Paolo Bonzini  wrote:
> Il 09/09/2013 06:57, Arthur Chunqi Li ha scritto:
>> Some test cases for nested EPT features, including:
>> 1. EPT basic framework tests: read, write and remap.
>> 2. EPT misconfigurations test cases: page permission mieconfiguration
>> and memory type misconfiguration
>> 3. EPT violations test cases: page permission violation and paging
>> structure violation
>>
>> Signed-off-by: Arthur Chunqi Li 
>> ---
>>  x86/vmx_tests.c |  266 
>> +++
>>  1 file changed, 266 insertions(+)
>>
>> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
>> index c1b39f4..a0b9824 100644
>> --- a/x86/vmx_tests.c
>> +++ b/x86/vmx_tests.c
>> @@ -1,4 +1,36 @@
>>  #include "vmx.h"
>> +#include "processor.h"
>> +#include "vm.h"
>> +#include "msr.h"
>> +#include "fwcfg.h"
>> +
>> +volatile u32 stage;
>> +volatile bool init_fail;
>
> Why volatile?
Because init_fail is only set but not used later in ept_init(), and if
I don't add volatile, compiler may optimize the setting to init_fail.

This occasion firstly occurred when I write set_stage/get_stage. If
one variant is set in a function but not used later, the compiler
usually optimizes this setting as redundant assignment and remove it.

Arthur
>
> The patch looks good.
>
>> +unsigned long *pml4;
>> +u64 eptp;
>> +void *data_page1, *data_page2;
>> +
>> +static inline void set_stage(u32 s)
>> +{
>> + barrier();
>> + stage = s;
>> + barrier();
>> +}
>> +
>> +static inline u32 get_stage()
>> +{
>> + u32 s;
>> +
>> + barrier();
>> + s = stage;
>> + barrier();
>> + return s;
>> +}
>> +
>> +static inline void vmcall()
>> +{
>> + asm volatile ("vmcall");
>> +}
>>
>>  void basic_init()
>>  {
>> @@ -76,6 +108,238 @@ int vmenter_exit_handler()
>>   return VMX_TEST_VMEXIT;
>>  }
>>
>> +static int setup_ept()
>> +{
>> + int support_2m;
>> + unsigned long end_of_memory;
>> +
>> + if (!(ept_vpid.val & EPT_CAP_UC) &&
>> + !(ept_vpid.val & EPT_CAP_WB)) {
>> + printf("\tEPT paging-structure memory type "
>> + "UC&WB are not supported\n");
>> + return 1;
>> + }
>> + if (ept_vpid.val & EPT_CAP_UC)
>> + eptp = EPT_MEM_TYPE_UC;
>> + else
>> + eptp = EPT_MEM_TYPE_WB;
>> + if (!(ept_vpid.val & EPT_CAP_PWL4)) {
>> + printf("\tPWL4 is not supported\n");
>> + return 1;
>> + }
>> + eptp |= (3 << EPTP_PG_WALK_LEN_SHIFT);
>> + pml4 = alloc_page();
>> + memset(pml4, 0, PAGE_SIZE);
>> + eptp |= virt_to_phys(pml4);
>> + vmcs_write(EPTP, eptp);
>> + support_2m = !!(ept_vpid.val & EPT_CAP_2M_PAGE);
>> + end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
>> + if (end_of_memory < (1ul << 32))
>> + end_of_memory = (1ul << 32);
>> + if (setup_ept_range(pml4, 0, end_of_memory,
>> + 0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
>> + printf("\tSet ept tables failed.\n");
>> + return 1;
>> + }
>> + return 0;
>> +}
>> +
>> +static void ept_init()
>> +{
>> + u32 ctrl_cpu[2];
>> +
>> + init_fail = false;
>> + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
>> + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
>> + ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
>> + & ctrl_cpu_rev[0].clr;
>> + ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
>> + & ctrl_cpu_rev[1].clr;
>> + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
>> + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
>> + if (setup_ept())
>> + init_fail = true;
>> + data_page1 = alloc_page();
>> + data_page2 = alloc_page();
>> + memset(data_page1, 0x0, PAGE_SIZE);
>> + memset(data_page2, 0x0, PAGE_SIZE);
>> + *((u32 *)data_page1) = MAGIC_VAL_1;
>> + *((u32 *)data_page2) = MAGIC_VAL_2;
>> + install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2,
>> + EPT_RA | EPT_WA | EPT_EA);
>> +}
>> +
>> +static void 

Re: [PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-09 Thread Paolo Bonzini
Il 09/09/2013 06:57, Arthur Chunqi Li ha scritto:
> Some test cases for nested EPT features, including:
> 1. EPT basic framework tests: read, write and remap.
> 2. EPT misconfigurations test cases: page permission mieconfiguration
> and memory type misconfiguration
> 3. EPT violations test cases: page permission violation and paging
> structure violation
> 
> Signed-off-by: Arthur Chunqi Li 
> ---
>  x86/vmx_tests.c |  266 
> +++
>  1 file changed, 266 insertions(+)
> 
> diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
> index c1b39f4..a0b9824 100644
> --- a/x86/vmx_tests.c
> +++ b/x86/vmx_tests.c
> @@ -1,4 +1,36 @@
>  #include "vmx.h"
> +#include "processor.h"
> +#include "vm.h"
> +#include "msr.h"
> +#include "fwcfg.h"
> +
> +volatile u32 stage;
> +volatile bool init_fail;

Why volatile?

The patch looks good.

> +unsigned long *pml4;
> +u64 eptp;
> +void *data_page1, *data_page2;
> +
> +static inline void set_stage(u32 s)
> +{
> + barrier();
> + stage = s;
> + barrier();
> +}
> +
> +static inline u32 get_stage()
> +{
> + u32 s;
> +
> + barrier();
> + s = stage;
> + barrier();
> + return s;
> +}
> +
> +static inline void vmcall()
> +{
> + asm volatile ("vmcall");
> +}
>  
>  void basic_init()
>  {
> @@ -76,6 +108,238 @@ int vmenter_exit_handler()
>   return VMX_TEST_VMEXIT;
>  }
>  
> +static int setup_ept()
> +{
> + int support_2m;
> + unsigned long end_of_memory;
> +
> + if (!(ept_vpid.val & EPT_CAP_UC) &&
> + !(ept_vpid.val & EPT_CAP_WB)) {
> + printf("\tEPT paging-structure memory type "
> + "UC&WB are not supported\n");
> + return 1;
> + }
> + if (ept_vpid.val & EPT_CAP_UC)
> + eptp = EPT_MEM_TYPE_UC;
> + else
> + eptp = EPT_MEM_TYPE_WB;
> + if (!(ept_vpid.val & EPT_CAP_PWL4)) {
> + printf("\tPWL4 is not supported\n");
> + return 1;
> + }
> + eptp |= (3 << EPTP_PG_WALK_LEN_SHIFT);
> + pml4 = alloc_page();
> + memset(pml4, 0, PAGE_SIZE);
> + eptp |= virt_to_phys(pml4);
> + vmcs_write(EPTP, eptp);
> + support_2m = !!(ept_vpid.val & EPT_CAP_2M_PAGE);
> + end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
> + if (end_of_memory < (1ul << 32))
> + end_of_memory = (1ul << 32);
> + if (setup_ept_range(pml4, 0, end_of_memory,
> + 0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
> + printf("\tSet ept tables failed.\n");
> + return 1;
> + }
> + return 0;
> +}
> +
> +static void ept_init()
> +{
> + u32 ctrl_cpu[2];
> +
> + init_fail = false;
> + ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
> + ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
> + ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
> + & ctrl_cpu_rev[0].clr;
> + ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
> + & ctrl_cpu_rev[1].clr;
> + vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
> + vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
> + if (setup_ept())
> + init_fail = true;
> + data_page1 = alloc_page();
> + data_page2 = alloc_page();
> + memset(data_page1, 0x0, PAGE_SIZE);
> + memset(data_page2, 0x0, PAGE_SIZE);
> + *((u32 *)data_page1) = MAGIC_VAL_1;
> + *((u32 *)data_page2) = MAGIC_VAL_2;
> + install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2,
> + EPT_RA | EPT_WA | EPT_EA);
> +}
> +
> +static void ept_main()
> +{
> + if (init_fail)
> + return;
> + if (!(ctrl_cpu_rev[0].clr & CPU_SECONDARY)
> + && !(ctrl_cpu_rev[1].clr & CPU_EPT)) {
> + printf("\tEPT is not supported");
> + return;
> + }
> + set_stage(0);
> + if (*((u32 *)data_page2) != MAGIC_VAL_1 &&
> + *((u32 *)data_page1) != MAGIC_VAL_1)
> + report("EPT basic framework - read", 0);
> + else {
> + *((u32 *)data_page2) = MAGIC_VAL_3;
> + vmcall();
> + if (get_stage() == 1) {
> + if (*((u32 *)data_page1) == MAGIC_VAL_3 &&
> + *((u32 *)data_page2) == MAGIC_VAL_2)
> + report("EPT basic framework", 1);
> 

Re: [PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features

2013-09-09 Thread Arthur Chunqi Li
On Mon, Sep 9, 2013 at 3:17 PM, Jan Kiszka  wrote:
> On 2013-09-09 06:57, Arthur Chunqi Li wrote:
>> This series of patches provide the framework of nested EPT and some test
>> cases for nested EPT features.
>>
>> Arthur Chunqi Li (2):
>>   kvm-unit-tests: VMX: The framework of EPT for nested VMX testing
>>   kvm-unit-tests: VMX: Test cases for nested EPT
>>
>>  x86/vmx.c   |  159 -
>>  x86/vmx.h   |   76 
>>  x86/vmx_tests.c |  266 
>> +++
>>  3 files changed, 497 insertions(+), 4 deletions(-)
>>
>
> I suppose this is v2 of the previous patch? What is the delta? A meta
> changelog could go here.
Yes, v1 just provide the framework of EPT (similar to the first patch
of this series), and some more tests about nested EPT is added in this
series (the second patch).

Arthur
>
> Jan
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features

2013-09-09 Thread Jan Kiszka
On 2013-09-09 06:57, Arthur Chunqi Li wrote:
> This series of patches provide the framework of nested EPT and some test
> cases for nested EPT features.
> 
> Arthur Chunqi Li (2):
>   kvm-unit-tests: VMX: The framework of EPT for nested VMX testing
>   kvm-unit-tests: VMX: Test cases for nested EPT
> 
>  x86/vmx.c   |  159 -
>  x86/vmx.h   |   76 
>  x86/vmx_tests.c |  266 
> +++
>  3 files changed, 497 insertions(+), 4 deletions(-)
> 

I suppose this is v2 of the previous patch? What is the delta? A meta
changelog could go here.

Jan



signature.asc
Description: OpenPGP digital signature


[PATCH 0/2] kvm-unit-tests: VMX: Test nested EPT features

2013-09-08 Thread Arthur Chunqi Li
This series of patches provide the framework of nested EPT and some test
cases for nested EPT features.

Arthur Chunqi Li (2):
  kvm-unit-tests: VMX: The framework of EPT for nested VMX testing
  kvm-unit-tests: VMX: Test cases for nested EPT

 x86/vmx.c   |  159 -
 x86/vmx.h   |   76 
 x86/vmx_tests.c |  266 +++
 3 files changed, 497 insertions(+), 4 deletions(-)

-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm-unit-tests: VMX: Test cases for nested EPT

2013-09-08 Thread Arthur Chunqi Li
Some test cases for nested EPT features, including:
1. EPT basic framework tests: read, write and remap.
2. EPT misconfigurations test cases: page permission mieconfiguration
and memory type misconfiguration
3. EPT violations test cases: page permission violation and paging
structure violation

Signed-off-by: Arthur Chunqi Li 
---
 x86/vmx_tests.c |  266 +++
 1 file changed, 266 insertions(+)

diff --git a/x86/vmx_tests.c b/x86/vmx_tests.c
index c1b39f4..a0b9824 100644
--- a/x86/vmx_tests.c
+++ b/x86/vmx_tests.c
@@ -1,4 +1,36 @@
 #include "vmx.h"
+#include "processor.h"
+#include "vm.h"
+#include "msr.h"
+#include "fwcfg.h"
+
+volatile u32 stage;
+volatile bool init_fail;
+unsigned long *pml4;
+u64 eptp;
+void *data_page1, *data_page2;
+
+static inline void set_stage(u32 s)
+{
+   barrier();
+   stage = s;
+   barrier();
+}
+
+static inline u32 get_stage()
+{
+   u32 s;
+
+   barrier();
+   s = stage;
+   barrier();
+   return s;
+}
+
+static inline void vmcall()
+{
+   asm volatile ("vmcall");
+}
 
 void basic_init()
 {
@@ -76,6 +108,238 @@ int vmenter_exit_handler()
return VMX_TEST_VMEXIT;
 }
 
+static int setup_ept()
+{
+   int support_2m;
+   unsigned long end_of_memory;
+
+   if (!(ept_vpid.val & EPT_CAP_UC) &&
+   !(ept_vpid.val & EPT_CAP_WB)) {
+   printf("\tEPT paging-structure memory type "
+   "UC&WB are not supported\n");
+   return 1;
+   }
+   if (ept_vpid.val & EPT_CAP_UC)
+   eptp = EPT_MEM_TYPE_UC;
+   else
+   eptp = EPT_MEM_TYPE_WB;
+   if (!(ept_vpid.val & EPT_CAP_PWL4)) {
+   printf("\tPWL4 is not supported\n");
+   return 1;
+   }
+   eptp |= (3 << EPTP_PG_WALK_LEN_SHIFT);
+   pml4 = alloc_page();
+   memset(pml4, 0, PAGE_SIZE);
+   eptp |= virt_to_phys(pml4);
+   vmcs_write(EPTP, eptp);
+   support_2m = !!(ept_vpid.val & EPT_CAP_2M_PAGE);
+   end_of_memory = fwcfg_get_u64(FW_CFG_RAM_SIZE);
+   if (end_of_memory < (1ul << 32))
+   end_of_memory = (1ul << 32);
+   if (setup_ept_range(pml4, 0, end_of_memory,
+   0, support_2m, EPT_WA | EPT_RA | EPT_EA)) {
+   printf("\tSet ept tables failed.\n");
+   return 1;
+   }
+   return 0;
+}
+
+static void ept_init()
+{
+   u32 ctrl_cpu[2];
+
+   init_fail = false;
+   ctrl_cpu[0] = vmcs_read(CPU_EXEC_CTRL0);
+   ctrl_cpu[1] = vmcs_read(CPU_EXEC_CTRL1);
+   ctrl_cpu[0] = (ctrl_cpu[0] | CPU_SECONDARY)
+   & ctrl_cpu_rev[0].clr;
+   ctrl_cpu[1] = (ctrl_cpu[1] | CPU_EPT)
+   & ctrl_cpu_rev[1].clr;
+   vmcs_write(CPU_EXEC_CTRL0, ctrl_cpu[0]);
+   vmcs_write(CPU_EXEC_CTRL1, ctrl_cpu[1] | CPU_EPT);
+   if (setup_ept())
+   init_fail = true;
+   data_page1 = alloc_page();
+   data_page2 = alloc_page();
+   memset(data_page1, 0x0, PAGE_SIZE);
+   memset(data_page2, 0x0, PAGE_SIZE);
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   *((u32 *)data_page2) = MAGIC_VAL_2;
+   install_ept(pml4, (unsigned long)data_page1, (unsigned long)data_page2,
+   EPT_RA | EPT_WA | EPT_EA);
+}
+
+static void ept_main()
+{
+   if (init_fail)
+   return;
+   if (!(ctrl_cpu_rev[0].clr & CPU_SECONDARY)
+   && !(ctrl_cpu_rev[1].clr & CPU_EPT)) {
+   printf("\tEPT is not supported");
+   return;
+   }
+   set_stage(0);
+   if (*((u32 *)data_page2) != MAGIC_VAL_1 &&
+   *((u32 *)data_page1) != MAGIC_VAL_1)
+   report("EPT basic framework - read", 0);
+   else {
+   *((u32 *)data_page2) = MAGIC_VAL_3;
+   vmcall();
+   if (get_stage() == 1) {
+   if (*((u32 *)data_page1) == MAGIC_VAL_3 &&
+   *((u32 *)data_page2) == MAGIC_VAL_2)
+   report("EPT basic framework", 1);
+   else
+   report("EPT basic framework - remap", 1);
+   }
+   }
+   // Test EPT Misconfigurations
+   set_stage(1);
+   vmcall();
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   if (get_stage() != 2) {
+   report("EPT misconfigurations", 0);
+   goto t1;
+   }
+   set_stage(2);
+   vmcall();
+   *((u32 *)data_page1) = MAGIC_VAL_1;
+   if (get_stage() != 3) {
+   report("EPT misconfigurations", 0);
+   goto t1;
+   }
+   report

Some questions about nested EPT

2013-08-30 Thread Arthur Chunqi Li
Hi there,

When I test nested EPT (enable EPT of L2->L1 address translation), it
occurred some questions when query IA32_VMX_EPT_VPID_CAP.

1. It show that bit 16 and 17 (support for 1G and 2M page) are
disabled in nested IA32_VMX_EPT_VPID_CAP. Why nested EPT fails to
support these? Are there any difficulties?

2. Can the bit 6 (support for a page-walk length of 4) of
IA32_VMX_EPT_VPID_CAP is 0? That is to say if I can design a paging
structure >4 or <4 levels?

Cause I don't know who is the original author of nested EPT, I send
this mail to the whole list. If anyone knows please tell me and CC the
authors for more detailed discussion.

Thanks,
Arthur

-- 
Arthur Chunqi Li
Department of Computer Science
School of EECS
Peking University
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 00/15] Nested EPT

2013-08-07 Thread Paolo Bonzini

On 08/05/2013 10:07 AM, Gleb Natapov wrote:

Xiao comment about checking ept pointer before flushing individual ept context
is addressed here.

Gleb Natapov (3):
   nEPT: make guest's A/D bits depends on guest's paging mode
   nEPT: Support shadow paging for guest paging without A/D bits
   nEPT: correctly check if remote tlb flush is needed for shadowed EPT
 tables

Nadav Har'El (10):
   nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
   nEPT: Fix cr3 handling in nested exit and entry
   nEPT: Fix wrong test in kvm_set_cr3
   nEPT: Move common code to paging_tmpl.h
   nEPT: Add EPT tables support to paging_tmpl.h
   nEPT: MMU context for nested EPT
   nEPT: Nested INVEPT
   nEPT: Advertise EPT to L1
   nEPT: Some additional comments
   nEPT: Miscelleneous cleanups

Yang Zhang (2):
   nEPT: Redefine EPT-specific link_shadow_page()
   nEPT: Add nEPT violation/misconfigration support

  arch/x86/include/asm/kvm_host.h |4 +
  arch/x86/include/asm/vmx.h  |2 +
  arch/x86/include/uapi/asm/vmx.h |1 +
  arch/x86/kvm/mmu.c  |  170 +-
  arch/x86/kvm/mmu.h  |2 +
  arch/x86/kvm/paging_tmpl.h  |  176 +++
  arch/x86/kvm/vmx.c  |  220 ---
  arch/x86/kvm/x86.c  |   11 --
  8 files changed, 467 insertions(+), 119 deletions(-)



Applied, thanks (rebased on top of Xiao's walk_addr_generic fix).

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 00/15] Nested EPT

2013-08-05 Thread Gleb Natapov
Xiao comment about checking ept pointer before flushing individual ept context
is addressed here.

Gleb Natapov (3):
  nEPT: make guest's A/D bits depends on guest's paging mode
  nEPT: Support shadow paging for guest paging without A/D bits
  nEPT: correctly check if remote tlb flush is needed for shadowed EPT
tables

Nadav Har'El (10):
  nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  nEPT: Fix cr3 handling in nested exit and entry
  nEPT: Fix wrong test in kvm_set_cr3
  nEPT: Move common code to paging_tmpl.h
  nEPT: Add EPT tables support to paging_tmpl.h
  nEPT: MMU context for nested EPT
  nEPT: Nested INVEPT
  nEPT: Advertise EPT to L1
  nEPT: Some additional comments
  nEPT: Miscelleneous cleanups

Yang Zhang (2):
  nEPT: Redefine EPT-specific link_shadow_page()
  nEPT: Add nEPT violation/misconfigration support

 arch/x86/include/asm/kvm_host.h |4 +
 arch/x86/include/asm/vmx.h  |2 +
 arch/x86/include/uapi/asm/vmx.h |1 +
 arch/x86/kvm/mmu.c  |  170 +-
 arch/x86/kvm/mmu.h  |2 +
 arch/x86/kvm/paging_tmpl.h  |  176 +++
 arch/x86/kvm/vmx.c  |  220 ---
 arch/x86/kvm/x86.c  |   11 --
 8 files changed, 467 insertions(+), 119 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v7 11/15] nEPT: MMU context for nested EPT

2013-08-05 Thread Gleb Natapov
From: Nadav Har'El 

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Reviewed-by: Xiao Guangrong 
Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
Signed-off-by: Yang Zhang 
Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/mmu.c |   27 +++
 arch/x86/kvm/mmu.h |2 ++
 arch/x86/kvm/vmx.c |   41 -
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index f2d982d..e3bfdde 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3795,6 +3795,33 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = true;
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = ept_page_fault;
+   context->gva_to_gpa = ept_gva_to_gpa;
+   context->sync_page = ept_sync_page;
+   context->invlpg = ept_invlpg;
+   context->update_pte = ept_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   update_permission_bitmask(vcpu, context, true);
+   reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5b59c57..77e044a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -71,6 +71,8 @@ enum {
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 984f8d7..fbfabbe 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1046,6 +1046,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7367,6 +7372,33 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu 
*vcpu,
vmcs12->guest_physical_address = fault->address;
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_ept_mmu(vcpu, &vcpu->arch.mmu,
+   nested_vmx_ept_caps & VMX_EPT_EXECUTE_ONLY_BIT);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = nested_ept_get_cr3;
+   vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+   vcpu->arch.walk_mmu  = &vcpu->arch.nested_mmu;
+
+   return r;
+}
+
+static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7587,6 +7619,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
 
+   if (nested_cpu_has_ept(vmcs12)) {
+   kvm_mmu_unload(vcpu);
+   nested_ept_init_mmu_context(vcpu);
+   }
+
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_contro

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Gleb Natapov
On Mon, Aug 05, 2013 at 01:19:26AM +0800, Xiao Guangrong wrote:
> 
> On Aug 5, 2013, at 12:58 AM, Gleb Natapov  wrote:
> 
> > On Sun, Aug 04, 2013 at 06:42:09PM +0200, Jan Kiszka wrote:
> >> On 2013-08-04 18:15, Xiao Guangrong wrote:
> >>> 
> >>> On Aug 4, 2013, at 11:14 PM, Jan Kiszka  wrote:
> >>> 
> >>>> On 2013-08-04 15:44, Gleb Natapov wrote:
> >>>>> On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
> >>>>>> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
> >>>>>>> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
> >>>>>>>> On 2013-08-01 16:08, Gleb Natapov wrote:
> >>>>>>>>> Another day -- another version of the nested EPT patches. In this 
> >>>>>>>>> version
> >>>>>>>>> included fix for need_remote_flush() with shadowed ept, set bits 6:8
> >>>>>>>>> of exit_qualification during ept_violation, 
> >>>>>>>>> update_permission_bitmask()
> >>>>>>>>> made to work with shadowed ept pages and other small adjustment 
> >>>>>>>>> according
> >>>>>>>>> to review comments.
> >>>>>>>> 
> >>>>>>>> Was just testing it here and ran into a bug: I've L2 accessing the 
> >>>>>>>> HPET
> >>>>>>>> MMIO region that my L1 passed through from L0 (where it is supposed 
> >>>>>>>> to
> >>>>>>>> be emulated in this setup). This used to work with an older posting 
> >>>>>>>> of
> >>>>>>> Not sure I understand your setup. L0 emulates HPET, L1 passes it 
> >>>>>>> through
> >>>>>>> to L2 (mmaps it and creates kvm slot that points to it) and when L2
> >>>>>>> accessed it it locks up?
> >>>>>>> 
> >>>>>>>> Jun, but now it locks up (infinite loop over L2's MMIO access, no 
> >>>>>>>> L2->L1
> >>>>>>>> transition). Any ideas where to look for debugging this?
> >>>>>>>> 
> >>>>>>> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful 
> >>>>>>> :)
> >>>>>>> 
> >>>>>> I did an MMIO access from nested guest in the vmx unit test (which is
> >>>>>> naturally passed through to L0 since L1 is so simple) and I can see 
> >>>>>> that
> >>>>>> the access hits L0.
> >>>>>> 
> >>>>> But then unit test not yet uses nested EPT :)
> >>>> 
> >>>> Indeed, that's what I was about to notice as well. EPT test cases are on
> >>>> Arthur's list, but I suggested to start easier with some MSR switches
> >>>> (just to let him run into KVM's PAT bugs ;) ).
> >>>> 
> >>>> Anyway, here are the traces:
> >>>> 
> >>>> qemu-system-x86-11521 [000]  4724.170191: kvm_entry:vcpu 0
> >>>> qemu-system-x86-11521 [000]  4724.170192: kvm_exit: reason 
> >>>> EPT_VIOLATION rip 0x8102ab70 info 181 0
> >>>> qemu-system-x86-11521 [000]  4724.170192: kvm_page_fault:   address 
> >>>> 1901978 error_code 181
> >>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_pagetable_walk: addr 
> >>>> 1901978 pferr 0 
> >>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >>>> 3c04c007 level 4
> >>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >>>> 3c04d007 level 3
> >>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >>>> 3c05a007 level 2
> >>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >>>> 1901037 level 1
> >>>> qemu-system-x86-11521 [000]  4724.170197: kvm_entry:vcpu 0
> >>>> qemu-system-x86-11521 [000]  4724.170198: kvm_exit: reason 
> >>>> EPT_VIOLATION rip 0x8102ab77 info 81 0
> >>>> qemu-system-x86-11521 [000]  4724.170199: kvm_page_fault:   address 
> >>>> 3a029000 error_code 81
> >>>> qemu-system-x86-1

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Xiao Guangrong

On Aug 5, 2013, at 12:58 AM, Gleb Natapov  wrote:

> On Sun, Aug 04, 2013 at 06:42:09PM +0200, Jan Kiszka wrote:
>> On 2013-08-04 18:15, Xiao Guangrong wrote:
>>> 
>>> On Aug 4, 2013, at 11:14 PM, Jan Kiszka  wrote:
>>> 
>>>> On 2013-08-04 15:44, Gleb Natapov wrote:
>>>>> On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
>>>>>> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
>>>>>>> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
>>>>>>>> On 2013-08-01 16:08, Gleb Natapov wrote:
>>>>>>>>> Another day -- another version of the nested EPT patches. In this 
>>>>>>>>> version
>>>>>>>>> included fix for need_remote_flush() with shadowed ept, set bits 6:8
>>>>>>>>> of exit_qualification during ept_violation, 
>>>>>>>>> update_permission_bitmask()
>>>>>>>>> made to work with shadowed ept pages and other small adjustment 
>>>>>>>>> according
>>>>>>>>> to review comments.
>>>>>>>> 
>>>>>>>> Was just testing it here and ran into a bug: I've L2 accessing the HPET
>>>>>>>> MMIO region that my L1 passed through from L0 (where it is supposed to
>>>>>>>> be emulated in this setup). This used to work with an older posting of
>>>>>>> Not sure I understand your setup. L0 emulates HPET, L1 passes it through
>>>>>>> to L2 (mmaps it and creates kvm slot that points to it) and when L2
>>>>>>> accessed it it locks up?
>>>>>>> 
>>>>>>>> Jun, but now it locks up (infinite loop over L2's MMIO access, no 
>>>>>>>> L2->L1
>>>>>>>> transition). Any ideas where to look for debugging this?
>>>>>>>> 
>>>>>>> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
>>>>>>> 
>>>>>> I did an MMIO access from nested guest in the vmx unit test (which is
>>>>>> naturally passed through to L0 since L1 is so simple) and I can see that
>>>>>> the access hits L0.
>>>>>> 
>>>>> But then unit test not yet uses nested EPT :)
>>>> 
>>>> Indeed, that's what I was about to notice as well. EPT test cases are on
>>>> Arthur's list, but I suggested to start easier with some MSR switches
>>>> (just to let him run into KVM's PAT bugs ;) ).
>>>> 
>>>> Anyway, here are the traces:
>>>> 
>>>> qemu-system-x86-11521 [000]  4724.170191: kvm_entry:vcpu 0
>>>> qemu-system-x86-11521 [000]  4724.170192: kvm_exit: reason 
>>>> EPT_VIOLATION rip 0x8102ab70 info 181 0
>>>> qemu-system-x86-11521 [000]  4724.170192: kvm_page_fault:   address 
>>>> 1901978 error_code 181
>>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_pagetable_walk: addr 
>>>> 1901978 pferr 0 
>>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>>>> 3c04c007 level 4
>>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>>>> 3c04d007 level 3
>>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>>>> 3c05a007 level 2
>>>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>>>> 1901037 level 1
>>>> qemu-system-x86-11521 [000]  4724.170197: kvm_entry:vcpu 0
>>>> qemu-system-x86-11521 [000]  4724.170198: kvm_exit: reason 
>>>> EPT_VIOLATION rip 0x8102ab77 info 81 0
>>>> qemu-system-x86-11521 [000]  4724.170199: kvm_page_fault:   address 
>>>> 3a029000 error_code 81
>>>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_pagetable_walk: addr 
>>>> 3a029000 pferr 0 
>>>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
>>>> 3c04c007 level 4
>>>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
>>>> 3c04d007 level 3
>>>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
>>>> 3c21e007 level 2
>>>> qemu-system-x86-11521 [000]  4724.170200: kvm_mmu_paging_element: pte 
>>>> 3a029037 level 1
>>>> qemu-system-x86-11521 [000]  472

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Gleb Natapov
On Sun, Aug 04, 2013 at 06:42:09PM +0200, Jan Kiszka wrote:
> On 2013-08-04 18:15, Xiao Guangrong wrote:
> > 
> > On Aug 4, 2013, at 11:14 PM, Jan Kiszka  wrote:
> > 
> >> On 2013-08-04 15:44, Gleb Natapov wrote:
> >>> On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
> >>>> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
> >>>>> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
> >>>>>> On 2013-08-01 16:08, Gleb Natapov wrote:
> >>>>>>> Another day -- another version of the nested EPT patches. In this 
> >>>>>>> version
> >>>>>>> included fix for need_remote_flush() with shadowed ept, set bits 6:8
> >>>>>>> of exit_qualification during ept_violation, 
> >>>>>>> update_permission_bitmask()
> >>>>>>> made to work with shadowed ept pages and other small adjustment 
> >>>>>>> according
> >>>>>>> to review comments.
> >>>>>>
> >>>>>> Was just testing it here and ran into a bug: I've L2 accessing the HPET
> >>>>>> MMIO region that my L1 passed through from L0 (where it is supposed to
> >>>>>> be emulated in this setup). This used to work with an older posting of
> >>>>> Not sure I understand your setup. L0 emulates HPET, L1 passes it through
> >>>>> to L2 (mmaps it and creates kvm slot that points to it) and when L2
> >>>>> accessed it it locks up?
> >>>>>
> >>>>>> Jun, but now it locks up (infinite loop over L2's MMIO access, no 
> >>>>>> L2->L1
> >>>>>> transition). Any ideas where to look for debugging this?
> >>>>>>
> >>>>> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
> >>>>>
> >>>> I did an MMIO access from nested guest in the vmx unit test (which is
> >>>> naturally passed through to L0 since L1 is so simple) and I can see that
> >>>> the access hits L0.
> >>>>
> >>> But then unit test not yet uses nested EPT :)
> >>
> >> Indeed, that's what I was about to notice as well. EPT test cases are on
> >> Arthur's list, but I suggested to start easier with some MSR switches
> >> (just to let him run into KVM's PAT bugs ;) ).
> >>
> >> Anyway, here are the traces:
> >>
> >> qemu-system-x86-11521 [000]  4724.170191: kvm_entry:vcpu 0
> >> qemu-system-x86-11521 [000]  4724.170192: kvm_exit: reason 
> >> EPT_VIOLATION rip 0x8102ab70 info 181 0
> >> qemu-system-x86-11521 [000]  4724.170192: kvm_page_fault:   address 
> >> 1901978 error_code 181
> >> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_pagetable_walk: addr 
> >> 1901978 pferr 0 
> >> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >> 3c04c007 level 4
> >> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >> 3c04d007 level 3
> >> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >> 3c05a007 level 2
> >> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> >> 1901037 level 1
> >> qemu-system-x86-11521 [000]  4724.170197: kvm_entry:vcpu 0
> >> qemu-system-x86-11521 [000]  4724.170198: kvm_exit: reason 
> >> EPT_VIOLATION rip 0x8102ab77 info 81 0
> >> qemu-system-x86-11521 [000]  4724.170199: kvm_page_fault:   address 
> >> 3a029000 error_code 81
> >> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_pagetable_walk: addr 
> >> 3a029000 pferr 0 
> >> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
> >> 3c04c007 level 4
> >> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
> >> 3c04d007 level 3
> >> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
> >> 3c21e007 level 2
> >> qemu-system-x86-11521 [000]  4724.170200: kvm_mmu_paging_element: pte 
> >> 3a029037 level 1
> >> qemu-system-x86-11521 [000]  4724.170203: kvm_entry:vcpu 0
> >> qemu-system-x86-11521 [000]  4724.170204: kvm_exit: reason 
> >> EPT_VIOLATION rip 0x8102ab77 info 181 0
> >> qemu-system-x86-11521 [000]  4724.170204: kvm_page_fault:   address 
> >> 

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Jan Kiszka
On 2013-08-04 18:15, Xiao Guangrong wrote:
> 
> On Aug 4, 2013, at 11:14 PM, Jan Kiszka  wrote:
> 
>> On 2013-08-04 15:44, Gleb Natapov wrote:
>>> On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
>>>> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
>>>>> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
>>>>>> On 2013-08-01 16:08, Gleb Natapov wrote:
>>>>>>> Another day -- another version of the nested EPT patches. In this 
>>>>>>> version
>>>>>>> included fix for need_remote_flush() with shadowed ept, set bits 6:8
>>>>>>> of exit_qualification during ept_violation, update_permission_bitmask()
>>>>>>> made to work with shadowed ept pages and other small adjustment 
>>>>>>> according
>>>>>>> to review comments.
>>>>>>
>>>>>> Was just testing it here and ran into a bug: I've L2 accessing the HPET
>>>>>> MMIO region that my L1 passed through from L0 (where it is supposed to
>>>>>> be emulated in this setup). This used to work with an older posting of
>>>>> Not sure I understand your setup. L0 emulates HPET, L1 passes it through
>>>>> to L2 (mmaps it and creates kvm slot that points to it) and when L2
>>>>> accessed it it locks up?
>>>>>
>>>>>> Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
>>>>>> transition). Any ideas where to look for debugging this?
>>>>>>
>>>>> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
>>>>>
>>>> I did an MMIO access from nested guest in the vmx unit test (which is
>>>> naturally passed through to L0 since L1 is so simple) and I can see that
>>>> the access hits L0.
>>>>
>>> But then unit test not yet uses nested EPT :)
>>
>> Indeed, that's what I was about to notice as well. EPT test cases are on
>> Arthur's list, but I suggested to start easier with some MSR switches
>> (just to let him run into KVM's PAT bugs ;) ).
>>
>> Anyway, here are the traces:
>>
>> qemu-system-x86-11521 [000]  4724.170191: kvm_entry:vcpu 0
>> qemu-system-x86-11521 [000]  4724.170192: kvm_exit: reason 
>> EPT_VIOLATION rip 0x8102ab70 info 181 0
>> qemu-system-x86-11521 [000]  4724.170192: kvm_page_fault:   address 
>> 1901978 error_code 181
>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_pagetable_walk: addr 
>> 1901978 pferr 0 
>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>> 3c04c007 level 4
>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>> 3c04d007 level 3
>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>> 3c05a007 level 2
>> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
>> 1901037 level 1
>> qemu-system-x86-11521 [000]  4724.170197: kvm_entry:vcpu 0
>> qemu-system-x86-11521 [000]  4724.170198: kvm_exit: reason 
>> EPT_VIOLATION rip 0x8102ab77 info 81 0
>> qemu-system-x86-11521 [000]  4724.170199: kvm_page_fault:   address 
>> 3a029000 error_code 81
>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_pagetable_walk: addr 
>> 3a029000 pferr 0 
>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
>> 3c04c007 level 4
>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
>> 3c04d007 level 3
>> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
>> 3c21e007 level 2
>> qemu-system-x86-11521 [000]  4724.170200: kvm_mmu_paging_element: pte 
>> 3a029037 level 1
>> qemu-system-x86-11521 [000]  4724.170203: kvm_entry:vcpu 0
>> qemu-system-x86-11521 [000]  4724.170204: kvm_exit: reason 
>> EPT_VIOLATION rip 0x8102ab77 info 181 0
>> qemu-system-x86-11521 [000]  4724.170204: kvm_page_fault:   address 
>> fed000f0 error_code 181
>> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_pagetable_walk: addr 
>> fed000f0 pferr 0 
>> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 
>> 3c04c007 level 4
>> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 
>> 3c42f003 level 3
>> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 
>> 3c626003 level 2
>> qemu-system-x86-11521 [000]  4724.170206: 

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Xiao Guangrong

On Aug 4, 2013, at 11:14 PM, Jan Kiszka  wrote:

> On 2013-08-04 15:44, Gleb Natapov wrote:
>> On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
>>> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
>>>> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
>>>>> On 2013-08-01 16:08, Gleb Natapov wrote:
>>>>>> Another day -- another version of the nested EPT patches. In this version
>>>>>> included fix for need_remote_flush() with shadowed ept, set bits 6:8
>>>>>> of exit_qualification during ept_violation, update_permission_bitmask()
>>>>>> made to work with shadowed ept pages and other small adjustment according
>>>>>> to review comments.
>>>>> 
>>>>> Was just testing it here and ran into a bug: I've L2 accessing the HPET
>>>>> MMIO region that my L1 passed through from L0 (where it is supposed to
>>>>> be emulated in this setup). This used to work with an older posting of
>>>> Not sure I understand your setup. L0 emulates HPET, L1 passes it through
>>>> to L2 (mmaps it and creates kvm slot that points to it) and when L2
>>>> accessed it it locks up?
>>>> 
>>>>> Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
>>>>> transition). Any ideas where to look for debugging this?
>>>>> 
>>>> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
>>>> 
>>> I did an MMIO access from nested guest in the vmx unit test (which is
>>> naturally passed through to L0 since L1 is so simple) and I can see that
>>> the access hits L0.
>>> 
>> But then unit test not yet uses nested EPT :)
> 
> Indeed, that's what I was about to notice as well. EPT test cases are on
> Arthur's list, but I suggested to start easier with some MSR switches
> (just to let him run into KVM's PAT bugs ;) ).
> 
> Anyway, here are the traces:
> 
> qemu-system-x86-11521 [000]  4724.170191: kvm_entry:vcpu 0
> qemu-system-x86-11521 [000]  4724.170192: kvm_exit: reason 
> EPT_VIOLATION rip 0x8102ab70 info 181 0
> qemu-system-x86-11521 [000]  4724.170192: kvm_page_fault:   address 
> 1901978 error_code 181
> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_pagetable_walk: addr 
> 1901978 pferr 0 
> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> 3c04c007 level 4
> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> 3c04d007 level 3
> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 
> 3c05a007 level 2
> qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 1901037 
> level 1
> qemu-system-x86-11521 [000]  4724.170197: kvm_entry:vcpu 0
> qemu-system-x86-11521 [000]  4724.170198: kvm_exit: reason 
> EPT_VIOLATION rip 0x8102ab77 info 81 0
> qemu-system-x86-11521 [000]  4724.170199: kvm_page_fault:   address 
> 3a029000 error_code 81
> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_pagetable_walk: addr 
> 3a029000 pferr 0 
> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
> 3c04c007 level 4
> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
> 3c04d007 level 3
> qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 
> 3c21e007 level 2
> qemu-system-x86-11521 [000]  4724.170200: kvm_mmu_paging_element: pte 
> 3a029037 level 1
> qemu-system-x86-11521 [000]  4724.170203: kvm_entry:vcpu 0
> qemu-system-x86-11521 [000]  4724.170204: kvm_exit: reason 
> EPT_VIOLATION rip 0x8102ab77 info 181 0
> qemu-system-x86-11521 [000]  4724.170204: kvm_page_fault:   address 
> fed000f0 error_code 181
> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_pagetable_walk: addr 
> fed000f0 pferr 0 
> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 
> 3c04c007 level 4
> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 
> 3c42f003 level 3
> qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 
> 3c626003 level 2
> qemu-system-x86-11521 [000]  4724.170206: kvm_mmu_paging_element: pte 
> fed00033 level 1
> qemu-system-x86-11521 [000]  4724.170213: mark_mmio_spte:   
> sptep:0x88014e8ad800 gfn fed00 access 6 gen b7f
> qemu-system-x86-11521 [000]  4724.170214: kvm_mmu_pagetable_walk: addr 
> 8102ab77 pferr 10 F
> qemu-system-x86-11521 [000]  4724.170215: kvm_mmu_pagetable_walk: addr 
> 171 pferr 6 W|U
> qemu-system-x86-11521 [000]  4724

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Jan Kiszka
On 2013-08-04 15:44, Gleb Natapov wrote:
> On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
>> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
>>> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
>>>> On 2013-08-01 16:08, Gleb Natapov wrote:
>>>>> Another day -- another version of the nested EPT patches. In this version
>>>>> included fix for need_remote_flush() with shadowed ept, set bits 6:8
>>>>> of exit_qualification during ept_violation, update_permission_bitmask()
>>>>> made to work with shadowed ept pages and other small adjustment according
>>>>> to review comments.
>>>>
>>>> Was just testing it here and ran into a bug: I've L2 accessing the HPET
>>>> MMIO region that my L1 passed through from L0 (where it is supposed to
>>>> be emulated in this setup). This used to work with an older posting of
>>> Not sure I understand your setup. L0 emulates HPET, L1 passes it through
>>> to L2 (mmaps it and creates kvm slot that points to it) and when L2
>>> accessed it it locks up?
>>>
>>>> Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
>>>> transition). Any ideas where to look for debugging this?
>>>>
>>> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
>>>
>> I did an MMIO access from nested guest in the vmx unit test (which is
>> naturally passed through to L0 since L1 is so simple) and I can see that
>> the access hits L0.
>>
> But then unit test not yet uses nested EPT :)

Indeed, that's what I was about to notice as well. EPT test cases are on
Arthur's list, but I suggested to start easier with some MSR switches
(just to let him run into KVM's PAT bugs ;) ).

Anyway, here are the traces:

 qemu-system-x86-11521 [000]  4724.170191: kvm_entry:vcpu 0
 qemu-system-x86-11521 [000]  4724.170192: kvm_exit: reason 
EPT_VIOLATION rip 0x8102ab70 info 181 0
 qemu-system-x86-11521 [000]  4724.170192: kvm_page_fault:   address 
1901978 error_code 181
 qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_pagetable_walk: addr 1901978 
pferr 0 
 qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 3c04c007 
level 4
 qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 3c04d007 
level 3
 qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 3c05a007 
level 2
 qemu-system-x86-11521 [000]  4724.170193: kvm_mmu_paging_element: pte 1901037 
level 1
 qemu-system-x86-11521 [000]  4724.170197: kvm_entry:vcpu 0
 qemu-system-x86-11521 [000]  4724.170198: kvm_exit: reason 
EPT_VIOLATION rip 0x8102ab77 info 81 0
 qemu-system-x86-11521 [000]  4724.170199: kvm_page_fault:   address 
3a029000 error_code 81
 qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_pagetable_walk: addr 
3a029000 pferr 0 
 qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 3c04c007 
level 4
 qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 3c04d007 
level 3
 qemu-system-x86-11521 [000]  4724.170199: kvm_mmu_paging_element: pte 3c21e007 
level 2
 qemu-system-x86-11521 [000]  4724.170200: kvm_mmu_paging_element: pte 3a029037 
level 1
 qemu-system-x86-11521 [000]  4724.170203: kvm_entry:vcpu 0
 qemu-system-x86-11521 [000]  4724.170204: kvm_exit: reason 
EPT_VIOLATION rip 0x8102ab77 info 181 0
 qemu-system-x86-11521 [000]  4724.170204: kvm_page_fault:   address 
fed000f0 error_code 181
 qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_pagetable_walk: addr 
fed000f0 pferr 0 
 qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 3c04c007 
level 4
 qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 3c42f003 
level 3
 qemu-system-x86-11521 [000]  4724.170205: kvm_mmu_paging_element: pte 3c626003 
level 2
 qemu-system-x86-11521 [000]  4724.170206: kvm_mmu_paging_element: pte fed00033 
level 1
 qemu-system-x86-11521 [000]  4724.170213: mark_mmio_spte:   
sptep:0x88014e8ad800 gfn fed00 access 6 gen b7f
 qemu-system-x86-11521 [000]  4724.170214: kvm_mmu_pagetable_walk: addr 
8102ab77 pferr 10 F
 qemu-system-x86-11521 [000]  4724.170215: kvm_mmu_pagetable_walk: addr 171 
pferr 6 W|U
 qemu-system-x86-11521 [000]  4724.170215: kvm_mmu_paging_element: pte 3c04c007 
level 4
 qemu-system-x86-11521 [000]  4724.170215: kvm_mmu_paging_element: pte 3c04d007 
level 3
 qemu-system-x86-11521 [000]  4724.170216: kvm_mmu_paging_element: pte 3c059007 
level 2
 qemu-system-x86-11521 [000]  4724.170216: kvm_mmu_paging_element: pte 1710037 
level 1
 qemu-system-x86-11521 [000]  4724.170216: kvm_mmu_paging_element: pte 1711067 
level 4
 qemu-system-x86-11521 [000]  

Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Gleb Natapov
On Sun, Aug 04, 2013 at 12:53:56PM +0300, Gleb Natapov wrote:
> On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
> > On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
> > > On 2013-08-01 16:08, Gleb Natapov wrote:
> > > > Another day -- another version of the nested EPT patches. In this 
> > > > version
> > > > included fix for need_remote_flush() with shadowed ept, set bits 6:8
> > > > of exit_qualification during ept_violation, update_permission_bitmask()
> > > > made to work with shadowed ept pages and other small adjustment 
> > > > according
> > > > to review comments.
> > > 
> > > Was just testing it here and ran into a bug: I've L2 accessing the HPET
> > > MMIO region that my L1 passed through from L0 (where it is supposed to
> > > be emulated in this setup). This used to work with an older posting of
> > Not sure I understand your setup. L0 emulates HPET, L1 passes it through
> > to L2 (mmaps it and creates kvm slot that points to it) and when L2
> > accessed it it locks up?
> > 
> > > Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
> > > transition). Any ideas where to look for debugging this?
> > > 
> > Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
> > 
> I did an MMIO access from nested guest in the vmx unit test (which is
> naturally passed through to L0 since L1 is so simple) and I can see that
> the access hits L0.
> 
But then unit test not yet uses nested EPT :)

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Gleb Natapov
On Sun, Aug 04, 2013 at 12:32:06PM +0300, Gleb Natapov wrote:
> On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
> > On 2013-08-01 16:08, Gleb Natapov wrote:
> > > Another day -- another version of the nested EPT patches. In this version
> > > included fix for need_remote_flush() with shadowed ept, set bits 6:8
> > > of exit_qualification during ept_violation, update_permission_bitmask()
> > > made to work with shadowed ept pages and other small adjustment according
> > > to review comments.
> > 
> > Was just testing it here and ran into a bug: I've L2 accessing the HPET
> > MMIO region that my L1 passed through from L0 (where it is supposed to
> > be emulated in this setup). This used to work with an older posting of
> Not sure I understand your setup. L0 emulates HPET, L1 passes it through
> to L2 (mmaps it and creates kvm slot that points to it) and when L2
> accessed it it locks up?
> 
> > Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
> > transition). Any ideas where to look for debugging this?
> > 
> Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)
> 
I did an MMIO access from nested guest in the vmx unit test (which is
naturally passed through to L0 since L1 is so simple) and I can see that
the access hits L0.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Gleb Natapov
On Sun, Aug 04, 2013 at 11:24:41AM +0200, Jan Kiszka wrote:
> On 2013-08-01 16:08, Gleb Natapov wrote:
> > Another day -- another version of the nested EPT patches. In this version
> > included fix for need_remote_flush() with shadowed ept, set bits 6:8
> > of exit_qualification during ept_violation, update_permission_bitmask()
> > made to work with shadowed ept pages and other small adjustment according
> > to review comments.
> 
> Was just testing it here and ran into a bug: I've L2 accessing the HPET
> MMIO region that my L1 passed through from L0 (where it is supposed to
> be emulated in this setup). This used to work with an older posting of
Not sure I understand your setup. L0 emulates HPET, L1 passes it through
to L2 (mmaps it and creates kvm slot that points to it) and when L2
accessed it it locks up?

> Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
> transition). Any ideas where to look for debugging this?
> 
Can you do an ftrace -e kvm -e kvmmmu? Unit test will also be helpful :)

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v6 00/15] Nested EPT

2013-08-04 Thread Jan Kiszka
On 2013-08-01 16:08, Gleb Natapov wrote:
> Another day -- another version of the nested EPT patches. In this version
> included fix for need_remote_flush() with shadowed ept, set bits 6:8
> of exit_qualification during ept_violation, update_permission_bitmask()
> made to work with shadowed ept pages and other small adjustment according
> to review comments.

Was just testing it here and ran into a bug: I've L2 accessing the HPET
MMIO region that my L1 passed through from L0 (where it is supposed to
be emulated in this setup). This used to work with an older posting of
Jun, but now it locks up (infinite loop over L2's MMIO access, no L2->L1
transition). Any ideas where to look for debugging this?

Jan




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v6 12/15] nEPT: MMU context for nested EPT

2013-08-01 Thread Xiao Guangrong
On 08/01/2013 10:08 PM, Gleb Natapov wrote:
> From: Nadav Har'El 
> 
> KVM's existing shadow MMU code already supports nested TDP. To use it, we
> need to set up a new "MMU context" for nested EPT, and create a few callbacks
> for it (nested_ept_*()). This context should also use the EPT versions of
> the page table access functions (defined in the previous patch).
> Then, we need to switch back and forth between this nested context and the
> regular MMU context when switching between L1 and L2 (when L1 runs this L2
> with EPT).

Reviewed-by: Xiao Guangrong 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 00/15] Nested EPT

2013-08-01 Thread Gleb Natapov
Another day -- another version of the nested EPT patches. In this version
included fix for need_remote_flush() with shadowed ept, set bits 6:8
of exit_qualification during ept_violation, update_permission_bitmask()
made to work with shadowed ept pages and other small adjustment according
to review comments.

Gleb Natapov (3):
  nEPT: make guest's A/D bits depends on guest's paging mode
  nEPT: Support shadow paging for guest paging without A/D bits
  nEPT: correctly check if remote tlb flush is needed for shadowed EPT
tables

Nadav Har'El (10):
  nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  nEPT: Fix cr3 handling in nested exit and entry
  nEPT: Fix wrong test in kvm_set_cr3
  nEPT: Move common code to paging_tmpl.h
  nEPT: Add EPT tables support to paging_tmpl.h
  nEPT: Nested INVEPT
  nEPT: MMU context for nested EPT
  nEPT: Advertise EPT to L1
  nEPT: Some additional comments
  nEPT: Miscelleneous cleanups

Yang Zhang (2):
  nEPT: Redefine EPT-specific link_shadow_page()
  nEPT: Add nEPT violation/misconfigration support

 arch/x86/include/asm/kvm_host.h |4 +
 arch/x86/include/asm/vmx.h  |2 +
 arch/x86/include/uapi/asm/vmx.h |1 +
 arch/x86/kvm/mmu.c  |  170 ++-
 arch/x86/kvm/mmu.h  |2 +
 arch/x86/kvm/paging_tmpl.h  |  176 
 arch/x86/kvm/vmx.c  |  215 ---
 arch/x86/kvm/x86.c  |   11 --
 8 files changed, 462 insertions(+), 119 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v6 12/15] nEPT: MMU context for nested EPT

2013-08-01 Thread Gleb Natapov
From: Nadav Har'El 

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
Signed-off-by: Yang Zhang 
Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/mmu.c |   27 +++
 arch/x86/kvm/mmu.h |2 ++
 arch/x86/kvm/vmx.c |   41 -
 3 files changed, 69 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 81b73bc..c0b4e0f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3797,6 +3797,33 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = true;
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = ept_page_fault;
+   context->gva_to_gpa = ept_gva_to_gpa;
+   context->sync_page = ept_sync_page;
+   context->invlpg = ept_invlpg;
+   context->update_pte = ept_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   update_permission_bitmask(vcpu, context, true);
+   reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5b59c57..77e044a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -71,6 +71,8 @@ enum {
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 2d84875..627b504 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1046,6 +1046,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7434,6 +7439,33 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu 
*vcpu,
vmcs12->guest_physical_address = fault->address;
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_ept_mmu(vcpu, &vcpu->arch.mmu,
+   nested_vmx_ept_caps & VMX_EPT_EXECUTE_ONLY_BIT);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = nested_ept_get_cr3;
+   vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+   vcpu->arch.walk_mmu  = &vcpu->arch.nested_mmu;
+
+   return r;
+}
+
+static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7654,6 +7686,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
 
+   if (nested_cpu_has_ept(vmcs12)) {
+   kvm_mmu_unload(vcpu);
+   nested_ept_init_mmu_context(vcpu);
+   }
+
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA

Re: [PATCH v5 11/14] nEPT: MMU context for nested EPT

2013-08-01 Thread Xiao Guangrong
On 08/01/2013 05:16 PM, Xiao Guangrong wrote:
> On 07/31/2013 10:48 PM, Gleb Natapov wrote:
>> From: Nadav Har'El 
>>
>> KVM's existing shadow MMU code already supports nested TDP. To use it, we
>> need to set up a new "MMU context" for nested EPT, and create a few callbacks
>> for it (nested_ept_*()). This context should also use the EPT versions of
>> the page table access functions (defined in the previous patch).
>> Then, we need to switch back and forth between this nested context and the
>> regular MMU context when switching between L1 and L2 (when L1 runs this L2
>> with EPT).
> 
> This patch looks good to me.
> 
> Reviewed-by: Xiao Guangrong 
> 
> But i am confused that update_permission_bitmask() is not adjusted in this
> series. That function depends on kvm_read_cr4_bits(X86_CR4_SMEP) and
> is_write_protection(), these two functions should read the registers from
> L2 guest, using the L2 status to check L1's page table seems strange.
> The same issue is in nested npt. Anything i missed?

After check the code, i found vcpu->arch.mmu is not updated when switch to
nested mmu, that means, "using the L2 status to check L1's page table seems
strange" is wrong. That is fine on nested npt, but nested ept should adjust
the logic anyway.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 11/14] nEPT: MMU context for nested EPT

2013-08-01 Thread Gleb Natapov
On Thu, Aug 01, 2013 at 05:16:07PM +0800, Xiao Guangrong wrote:
> On 07/31/2013 10:48 PM, Gleb Natapov wrote:
> > From: Nadav Har'El 
> > 
> > KVM's existing shadow MMU code already supports nested TDP. To use it, we
> > need to set up a new "MMU context" for nested EPT, and create a few 
> > callbacks
> > for it (nested_ept_*()). This context should also use the EPT versions of
> > the page table access functions (defined in the previous patch).
> > Then, we need to switch back and forth between this nested context and the
> > regular MMU context when switching between L1 and L2 (when L1 runs this L2
> > with EPT).
> 
> This patch looks good to me.
> 
> Reviewed-by: Xiao Guangrong 
> 
> But i am confused that update_permission_bitmask() is not adjusted in this
> series. That function depends on kvm_read_cr4_bits(X86_CR4_SMEP) and
> is_write_protection(), these two functions should read the registers from
> L2 guest, using the L2 status to check L1's page table seems strange.
> The same issue is in nested npt. Anything i missed?
Good catch again. Looks like we need update_permission_bitmask_ept()
that uses different logic to calculate permissions.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 11/14] nEPT: MMU context for nested EPT

2013-08-01 Thread Xiao Guangrong
On 07/31/2013 10:48 PM, Gleb Natapov wrote:
> From: Nadav Har'El 
> 
> KVM's existing shadow MMU code already supports nested TDP. To use it, we
> need to set up a new "MMU context" for nested EPT, and create a few callbacks
> for it (nested_ept_*()). This context should also use the EPT versions of
> the page table access functions (defined in the previous patch).
> Then, we need to switch back and forth between this nested context and the
> regular MMU context when switching between L1 and L2 (when L1 runs this L2
> with EPT).

This patch looks good to me.

Reviewed-by: Xiao Guangrong 

But i am confused that update_permission_bitmask() is not adjusted in this
series. That function depends on kvm_read_cr4_bits(X86_CR4_SMEP) and
is_write_protection(), these two functions should read the registers from
L2 guest, using the L2 status to check L1's page table seems strange.
The same issue is in nested npt. Anything i missed?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 00/14] Nested EPT

2013-07-31 Thread Gleb Natapov
Here is another version of nested EPT patch series. All comments given to v4
are, hopefully, addressed.

Gleb Natapov (2):
  nEPT: make guest's A/D bits depends on guest's paging mode
  nEPT: Support shadow paging for guest paging without A/D bits

Nadav Har'El (10):
  nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  nEPT: Fix cr3 handling in nested exit and entry
  nEPT: Fix wrong test in kvm_set_cr3
  nEPT: Move common code to paging_tmpl.h
  nEPT: Add EPT tables support to paging_tmpl.h
  nEPT: Nested INVEPT
  nEPT: MMU context for nested EPT
  nEPT: Advertise EPT to L1
  nEPT: Some additional comments
  nEPT: Miscelleneous cleanups

Yang Zhang (2):
  nEPT: Redefine EPT-specific link_shadow_page()
  nEPT: Add nEPT violation/misconfigration support

 arch/x86/include/asm/kvm_host.h |4 +
 arch/x86/include/asm/vmx.h  |2 +
 arch/x86/include/uapi/asm/vmx.h |1 +
 arch/x86/kvm/mmu.c  |  134 +---
 arch/x86/kvm/mmu.h  |2 +
 arch/x86/kvm/paging_tmpl.h  |  177 
 arch/x86/kvm/vmx.c  |  213 ---
 arch/x86/kvm/x86.c  |   11 --
 8 files changed, 440 insertions(+), 104 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v5 11/14] nEPT: MMU context for nested EPT

2013-07-31 Thread Gleb Natapov
From: Nadav Har'El 

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
Signed-off-by: Yang Zhang 
Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/mmu.c |   26 ++
 arch/x86/kvm/mmu.h |2 ++
 arch/x86/kvm/vmx.c |   41 -
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58ae9db..37fff14 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3792,6 +3792,32 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = true;
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = ept_page_fault;
+   context->gva_to_gpa = ept_gva_to_gpa;
+   context->sync_page = ept_sync_page;
+   context->invlpg = ept_invlpg;
+   context->update_pte = ept_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5b59c57..77e044a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -71,6 +71,8 @@ enum {
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index f3514d7..f41751a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1046,6 +1046,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7432,6 +7437,33 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu 
*vcpu,
vmcs12->guest_physical_address = fault->address;
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_ept_mmu(vcpu, &vcpu->arch.mmu,
+   nested_vmx_ept_caps & VMX_EPT_EXECUTE_ONLY_BIT);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = nested_ept_get_cr3;
+   vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+   vcpu->arch.walk_mmu  = &vcpu->arch.nested_mmu;
+
+   return r;
+}
+
+static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7652,6 +7684,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
 
+   if (nested_cpu_has_ept(vmcs12)) {
+   kvm_mmu_unload(vcpu);
+   nested_ept_init_mmu_context(vcpu);
+   }
+
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -8124,7 +8161,9 @@ static void l

[PATCH v4 10/13] nEPT: MMU context for nested EPT

2013-07-25 Thread Gleb Natapov
From: Nadav Har'El 

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
Signed-off-by: Yang Zhang 
Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/mmu.c |   26 ++
 arch/x86/kvm/mmu.h |2 ++
 arch/x86/kvm/vmx.c |   41 -
 3 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 58ae9db..37fff14 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3792,6 +3792,32 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = true;
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = ept_page_fault;
+   context->gva_to_gpa = ept_gva_to_gpa;
+   context->sync_page = ept_sync_page;
+   context->invlpg = ept_invlpg;
+   context->update_pte = ept_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   reset_rsvds_bits_mask_ept(vcpu, context, execonly);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_ept_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 5b59c57..77e044a 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -71,6 +71,8 @@ enum {
 
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_ept_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context,
+   bool execonly);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index bbfff8d..6b79db7 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1046,6 +1046,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7433,6 +7438,33 @@ static void nested_ept_inject_page_fault(struct kvm_vcpu 
*vcpu,
vmcs12->guest_physical_address = fault->address;
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_ept_mmu(vcpu, &vcpu->arch.mmu,
+   nested_vmx_ept_caps & VMX_EPT_EXECUTE_ONLY_BIT);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = nested_ept_get_cr3;
+   vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+   vcpu->arch.walk_mmu  = &vcpu->arch.nested_mmu;
+
+   return r;
+}
+
+static void nested_ept_uninit_mmu_context(struct kvm_vcpu *vcpu)
+{
+   vcpu->arch.walk_mmu = &vcpu->arch.mmu;
+}
+
 /*
  * prepare_vmcs02 is called when the L1 guest hypervisor runs its nested
  * L2 guest. L1 has a vmcs for L2 (vmcs12), and this function "merges" it
@@ -7653,6 +7685,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct 
vmcs12 *vmcs12)
vmx_flush_tlb(vcpu);
}
 
+   if (nested_cpu_has_ept(vmcs12)) {
+   kvm_mmu_unload(vcpu);
+   nested_ept_init_mmu_context(vcpu);
+   }
+
if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_IA32_EFER)
vcpu->arch.efer = vmcs12->guest_ia32_efer;
else if (vmcs12->vm_entry_controls & VM_ENTRY_IA32E_MODE)
@@ -8125,7 +8162,9 @@ static void l

[PATCH v4 00/13] Nested EPT

2013-07-25 Thread Gleb Natapov
After changing hands several times I proud to present a new version of
Nested EPT patches. Nothing groundbreaking here comparing to v3: all 
review comment are addressed, some by Yang Zhang and some by Yours Truly.

Gleb Natapov (1):
  nEPT: make guest's A/D bits depends on guest's paging mode

Nadav Har'El (10):
  nEPT: Support LOAD_IA32_EFER entry/exit controls for L1
  nEPT: Fix cr3 handling in nested exit and entry
  nEPT: Fix wrong test in kvm_set_cr3
  nEPT: Move common code to paging_tmpl.h
  nEPT: Add EPT tables support to paging_tmpl.h
  nEPT: Nested INVEPT
  nEPT: MMU context for nested EPT
  nEPT: Advertise EPT to L1
  nEPT: Some additional comments
  nEPT: Miscelleneous cleanups

Yang Zhang (2):
  nEPT: Redefine EPT-specific link_shadow_page()
  nEPT: Add nEPT violation/misconfigration support

 arch/x86/include/asm/kvm_host.h |4 +
 arch/x86/include/asm/vmx.h  |3 +
 arch/x86/include/uapi/asm/vmx.h |1 +
 arch/x86/kvm/mmu.c  |  134 ++---
 arch/x86/kvm/mmu.h  |2 +
 arch/x86/kvm/paging_tmpl.h  |  175 
 arch/x86/kvm/vmx.c  |  210 ---
 arch/x86/kvm/x86.c  |   11 --
 8 files changed, 436 insertions(+), 104 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 05/13] nEPT: MMU context for nested EPT

2013-05-21 Thread Nakajima, Jun
On Tue, May 21, 2013 at 1:50 AM, Xiao Guangrong
 wrote:
> On 05/19/2013 12:52 PM, Jun Nakajima wrote:
>> From: Nadav Har'El 
>>
>> KVM's existing shadow MMU code already supports nested TDP. To use it, we
>> need to set up a new "MMU context" for nested EPT, and create a few callbacks
>> for it (nested_ept_*()). This context should also use the EPT versions of
>> the page table access functions (defined in the previous patch).
>> Then, we need to switch back and forth between this nested context and the
>> regular MMU context when switching between L1 and L2 (when L1 runs this L2
>> with EPT).
>>
>> Signed-off-by: Nadav Har'El 
>> Signed-off-by: Jun Nakajima 
>> Signed-off-by: Xinhao Xu 
>> ---
>>  arch/x86/kvm/mmu.c | 38 ++
>>  arch/x86/kvm/mmu.h |  1 +
>>  arch/x86/kvm/vmx.c | 54 
>> +-
>>  3 files changed, 92 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 6c1670f..37f8d7f 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3653,6 +3653,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
>> kvm_mmu *context)
>>  }
>>  EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
>>
>> +int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
>> +{
>> + ASSERT(vcpu);
>> + ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
>> +
>> + context->shadow_root_level = kvm_x86_ops->get_tdp_level();
>
> That means L1 guest always uses page-walk length == 4? But in your previous 
> patch,
> it can be 2.

We want to support "page-walk length == 4" only.

>
>> +
>> + context->nx = is_nx(vcpu); /* TODO: ? */
>
> Hmm? EPT always support NX.
>
>> + context->new_cr3 = paging_new_cr3;
>> + context->page_fault = EPT_page_fault;
>> + context->gva_to_gpa = EPT_gva_to_gpa;
>> + context->sync_page = EPT_sync_page;
>> + context->invlpg = EPT_invlpg;
>> + context->update_pte = EPT_update_pte;
>> + context->free = paging_free;
>> + context->root_level = context->shadow_root_level;
>> + context->root_hpa = INVALID_PAGE;
>> + context->direct_map = false;
>> +
>> + /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
>> +something different.
>> +  */
>
> Exactly. :)
>
>> + reset_rsvds_bits_mask(vcpu, context);
>> +
>> +
>> + /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
>> +they are done, or why they write to vcpu->arch.mmu and not context
>> +  */
>> + vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
>> + vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
>> + vcpu->arch.mmu.base_role.smep_andnot_wp =
>> + kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
>> + !is_write_protection(vcpu);
>
> I guess we need not care these since the permission of EPT page does not 
> depend
> on these.

Right. I'll clean up this.

>
>> +
>> + return 0;
>> +}
>> +EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
>> +
>>  static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
>>  {
>>   int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
>> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
>> index 2adcbc2..8fc94dd 100644
>> --- a/arch/x86/kvm/mmu.h
>> +++ b/arch/x86/kvm/mmu.h
>> @@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 
>> addr, u64 sptes[4]);
>>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
>>  int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
>> direct);
>>  int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
>> +int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
>>
>>  static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
>>  {
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index fb9cae5..a88432f 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -1045,6 +1045,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
>> vmcs12 *vmcs12,
>>   return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
>>  }
>>
>> +static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
>> +{
>> + return nested_cpu_has2(vmcs12, SEC

Re: [PATCH v3 05/13] nEPT: MMU context for nested EPT

2013-05-21 Thread Xiao Guangrong
On 05/19/2013 12:52 PM, Jun Nakajima wrote:
> From: Nadav Har'El 
> 
> KVM's existing shadow MMU code already supports nested TDP. To use it, we
> need to set up a new "MMU context" for nested EPT, and create a few callbacks
> for it (nested_ept_*()). This context should also use the EPT versions of
> the page table access functions (defined in the previous patch).
> Then, we need to switch back and forth between this nested context and the
> regular MMU context when switching between L1 and L2 (when L1 runs this L2
> with EPT).
> 
> Signed-off-by: Nadav Har'El 
> Signed-off-by: Jun Nakajima 
> Signed-off-by: Xinhao Xu 
> ---
>  arch/x86/kvm/mmu.c | 38 ++
>  arch/x86/kvm/mmu.h |  1 +
>  arch/x86/kvm/vmx.c | 54 
> +-
>  3 files changed, 92 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 6c1670f..37f8d7f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3653,6 +3653,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
> kvm_mmu *context)
>  }
>  EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
> 
> +int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
> +{
> + ASSERT(vcpu);
> + ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
> +
> + context->shadow_root_level = kvm_x86_ops->get_tdp_level();

That means L1 guest always uses page-walk length == 4? But in your previous 
patch,
it can be 2.

> +
> + context->nx = is_nx(vcpu); /* TODO: ? */

Hmm? EPT always support NX.

> + context->new_cr3 = paging_new_cr3;
> + context->page_fault = EPT_page_fault;
> + context->gva_to_gpa = EPT_gva_to_gpa;
> + context->sync_page = EPT_sync_page;
> + context->invlpg = EPT_invlpg;
> + context->update_pte = EPT_update_pte;
> + context->free = paging_free;
> + context->root_level = context->shadow_root_level;
> + context->root_hpa = INVALID_PAGE;
> + context->direct_map = false;
> +
> + /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
> +something different.
> +  */

Exactly. :)

> + reset_rsvds_bits_mask(vcpu, context);
> +
> +
> + /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
> +they are done, or why they write to vcpu->arch.mmu and not context
> +  */
> + vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
> + vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
> + vcpu->arch.mmu.base_role.smep_andnot_wp =
> + kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
> + !is_write_protection(vcpu);

I guess we need not care these since the permission of EPT page does not depend
on these.

> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
> +
>  static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
>  {
>   int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
> diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
> index 2adcbc2..8fc94dd 100644
> --- a/arch/x86/kvm/mmu.h
> +++ b/arch/x86/kvm/mmu.h
> @@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 
> addr, u64 sptes[4]);
>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
>  int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
> direct);
>  int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
> +int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
> 
>  static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
>  {
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index fb9cae5..a88432f 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -1045,6 +1045,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
> vmcs12 *vmcs12,
>   return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
>  }
> 
> +static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
> +{
> + return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
> +}
> +
>  static inline bool is_exception(u32 intr_info)
>  {
>   return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
> @@ -7311,6 +7316,46 @@ static void vmx_set_supported_cpuid(u32 func, struct 
> kvm_cpuid_entry2 *entry)
>   entry->ecx |= bit(X86_FEATURE_VMX);
>  }
> 
> +/* Callbacks for nested_ept_init_mmu_context: */
> +
> +static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
> +{
> + /* return the page table to be shadowed - in our case, EPT12 */
> + return get_vmcs12(vcpu)-

[PATCH v3 05/13] nEPT: MMU context for nested EPT

2013-05-18 Thread Jun Nakajima
From: Nadav Har'El 

KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
---
 arch/x86/kvm/mmu.c | 38 ++
 arch/x86/kvm/mmu.h |  1 +
 arch/x86/kvm/vmx.c | 54 +-
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6c1670f..37f8d7f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3653,6 +3653,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = is_nx(vcpu); /* TODO: ? */
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = EPT_page_fault;
+   context->gva_to_gpa = EPT_gva_to_gpa;
+   context->sync_page = EPT_sync_page;
+   context->invlpg = EPT_invlpg;
+   context->update_pte = EPT_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
+  something different.
+*/
+   reset_rsvds_bits_mask(vcpu, context);
+
+
+   /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
+  they are done, or why they write to vcpu->arch.mmu and not context
+*/
+   vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+   vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
+   vcpu->arch.mmu.base_role.smep_andnot_wp =
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
+   !is_write_protection(vcpu);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2adcbc2..8fc94dd 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 
addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index fb9cae5..a88432f 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1045,6 +1045,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7311,6 +7316,46 @@ static void vmx_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   struct vmcs12 *vmcs12;
+   nested_vmx_vmexit(vcpu);
+   vmcs12 = get_vmcs12(vcpu);
+   /*
+* Note no need to set vmcs12->vm_exit_reason as it is already copied
+* from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
+*/
+   vmcs12->exit_qualification = fault->error_code;
+   vmcs12->guest_physical_address = fault->address;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu

[PATCH v3 05/13] nEPT: MMU context for nested EPT

2013-05-08 Thread Jun Nakajima
KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
---
 arch/x86/kvm/mmu.c | 38 ++
 arch/x86/kvm/mmu.h |  1 +
 arch/x86/kvm/vmx.c | 54 +-
 3 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 6c1670f..37f8d7f 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3653,6 +3653,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = is_nx(vcpu); /* TODO: ? */
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = EPT_page_fault;
+   context->gva_to_gpa = EPT_gva_to_gpa;
+   context->sync_page = EPT_sync_page;
+   context->invlpg = EPT_invlpg;
+   context->update_pte = EPT_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
+  something different.
+*/
+   reset_rsvds_bits_mask(vcpu, context);
+
+
+   /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
+  they are done, or why they write to vcpu->arch.mmu and not context
+*/
+   vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+   vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
+   vcpu->arch.mmu.base_role.smep_andnot_wp =
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
+   !is_write_protection(vcpu);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 2adcbc2..8fc94dd 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 
addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 51b8b4f0..80ab5b1 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1045,6 +1045,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -7305,6 +7310,46 @@ static void vmx_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   struct vmcs12 *vmcs12;
+   nested_vmx_vmexit(vcpu);
+   vmcs12 = get_vmcs12(vcpu);
+   /*
+* Note no need to set vmcs12->vm_exit_reason as it is already copied
+* from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
+*/
+   vmcs12->exit_qualification = fault->error_code;
+   vmcs12->guest_physical_address = fault->address;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = neste

[PATCH v2 05/13] nEPT: MMU context for nested EPT

2013-05-06 Thread Jun Nakajima
KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
---
 arch/x86/kvm/mmu.c | 38 ++
 arch/x86/kvm/mmu.h |  1 +
 arch/x86/kvm/vmx.c | 53 -
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cb9c6fd..99bfc5e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3644,6 +3644,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = is_nx(vcpu); /* TODO: ? */
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = EPT_page_fault;
+   context->gva_to_gpa = EPT_gva_to_gpa;
+   context->sync_page = EPT_sync_page;
+   context->invlpg = EPT_invlpg;
+   context->update_pte = EPT_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
+  something different.
+*/
+   reset_rsvds_bits_mask(vcpu, context);
+
+
+   /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
+  they are done, or why they write to vcpu->arch.mmu and not context
+*/
+   vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+   vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
+   vcpu->arch.mmu.base_role.smep_andnot_wp =
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
+   !is_write_protection(vcpu);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6987108..19dd5ab 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 
addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 485ded6..8fdcacf 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -918,6 +918,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -6873,6 +6878,46 @@ static void vmx_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   struct vmcs12 *vmcs12;
+   nested_vmx_vmexit(vcpu);
+   vmcs12 = get_vmcs12(vcpu);
+   /*
+* Note no need to set vmcs12->vm_exit_reason as it is already copied
+* from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
+*/
+   vmcs12->exit_qualification = fault->error_code;
+   vmcs12->guest_physical_address = fault->address;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = neste

Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-28 Thread Jan Kiszka
On 2013-04-26 18:07, Nakajima, Jun wrote:
> On Thu, Apr 25, 2013 at 11:26 PM, Jan Kiszka  wrote:
> 
>> That's great but - as Gleb already said - unfortunately not yet usable.
>> I'd like to rebase my fixes and enhancements (unrestricted guest mode
>> specifically) on top these days, and also run some tests with a non-KVM
>> guest. So, if git send-email is not yet working there, I would also be
>> happy about a public git repository.
>>
> 
> I re-submitted the patches last night using git send-email this time.
> We had some email problems at that time, and I needed to use a
> workaround (imap-send) at that time (and it didn't work well).

I've picked them up (except for Xinhao's follow-up patch) and rebased
them over next + my pending patches:

git://git.kiszka.org/linux-kvm.git queues/nept

Some patches required a bit massaging to apply, and the last one had a
trivial style issue. Feel free to integrate the changes. I didn't look
into functional details yet.

Instead, I've rebased my unrestricted guest mode patch plus a bunch of
fixes around nEPT and that feature. See the branch above. I'm currently
testing them, and it looks very good so far. Unrestricted guest mode
speeds up L2's BIOS and boot loader phase noticeably.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-26 Thread Nakajima, Jun
On Thu, Apr 25, 2013 at 11:26 PM, Jan Kiszka  wrote:

> That's great but - as Gleb already said - unfortunately not yet usable.
> I'd like to rebase my fixes and enhancements (unrestricted guest mode
> specifically) on top these days, and also run some tests with a non-KVM
> guest. So, if git send-email is not yet working there, I would also be
> happy about a public git repository.
>

I re-submitted the patches last night using git send-email this time.
We had some email problems at that time, and I needed to use a
workaround (imap-send) at that time (and it didn't work well).

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/11] nEPT: MMU context for nested EPT

2013-04-25 Thread Jun Nakajima
KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 
Signed-off-by: Xinhao Xu 
---
 arch/x86/kvm/mmu.c | 38 ++
 arch/x86/kvm/mmu.h |  1 +
 arch/x86/kvm/vmx.c | 53 -
 3 files changed, 91 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index cb9c6fd..99bfc5e 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3644,6 +3644,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct 
kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = is_nx(vcpu); /* TODO: ? */
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = EPT_page_fault;
+   context->gva_to_gpa = EPT_gva_to_gpa;
+   context->sync_page = EPT_sync_page;
+   context->invlpg = EPT_invlpg;
+   context->update_pte = EPT_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
+  something different.
+*/
+   reset_rsvds_bits_mask(vcpu, context);
+
+
+   /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
+  they are done, or why they write to vcpu->arch.mmu and not context
+*/
+   vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+   vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
+   vcpu->arch.mmu.base_role.smep_andnot_wp =
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
+   !is_write_protection(vcpu);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6987108..19dd5ab 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 
addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9e0ec9d..6ab53ca 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -918,6 +918,11 @@ static inline bool nested_cpu_has_virtual_nmis(struct 
vmcs12 *vmcs12,
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -6873,6 +6878,46 @@ static void vmx_set_supported_cpuid(u32 func, struct 
kvm_cpuid_entry2 *entry)
entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   struct vmcs12 *vmcs12;
+   nested_vmx_vmexit(vcpu);
+   vmcs12 = get_vmcs12(vcpu);
+   /*
+* Note no need to set vmcs12->vm_exit_reason as it is already copied
+* from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
+*/
+   vmcs12->exit_qualification = fault->error_code;
+   vmcs12->guest_physical_address = fault->address;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = neste

Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-25 Thread Jan Kiszka
On 2013-04-25 10:00, Nakajima, Jun wrote:
> On Wed, Apr 24, 2013 at 8:55 AM, Nakajima, Jun  wrote:
>> Sorry about the slow progress. We've been distracted by some priority
>> things. The patches are ready (i.e. working), but we are cleaning them
>> up. I'll send what we have today.
> 
> So, I have sent them, and frankly we are still cleaning up.  Please
> bear with us.
> We are also sending one more patchset to deal with EPT
> misconfiguration, but Linux should run in L2 on top of L1 KVM.

That's great but - as Gleb already said - unfortunately not yet usable.
I'd like to rebase my fixes and enhancements (unrestricted guest mode
specifically) on top these days, and also run some tests with a non-KVM
guest. So, if git send-email is not yet working there, I would also be
happy about a public git repository.

Thanks,
Jan




signature.asc
Description: OpenPGP digital signature


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-25 Thread Gleb Natapov
On Thu, Apr 25, 2013 at 01:00:42AM -0700, Nakajima, Jun wrote:
> On Wed, Apr 24, 2013 at 8:55 AM, Nakajima, Jun  wrote:
> > Sorry about the slow progress. We've been distracted by some priority
> > things. The patches are ready (i.e. working), but we are cleaning them
> > up. I'll send what we have today.
> 
> So, I have sent them, and frankly we are still cleaning up.  Please
> bear with us.
> We are also sending one more patchset to deal with EPT
> misconfiguration, but Linux should run in L2 on top of L1 KVM.
> 
The patches are mangled and unreadable. Please resend using "git
send-email".

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-25 Thread Nakajima, Jun
On Wed, Apr 24, 2013 at 8:55 AM, Nakajima, Jun  wrote:
> Sorry about the slow progress. We've been distracted by some priority
> things. The patches are ready (i.e. working), but we are cleaning them
> up. I'll send what we have today.

So, I have sent them, and frankly we are still cleaning up.  Please
bear with us.
We are also sending one more patchset to deal with EPT
misconfiguration, but Linux should run in L2 on top of L1 KVM.

--
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/12] Subject: [PATCH 03/10] nEPT: MMU context for nested EPT

2013-04-25 Thread Nakajima, Jun
KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
Signed-off-by: Jun Nakajima 

modified:   arch/x86/kvm/mmu.c
modified:   arch/x86/kvm/mmu.h
modified:   arch/x86/kvm/vmx.c
---
 arch/x86/kvm/mmu.c | 38 
 arch/x86/kvm/mmu.h |  1 +
 arch/x86/kvm/vmx.c | 56 +++---
 3 files changed, 92 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index 91cac19..34e406e2 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -3674,6 +3674,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu,
struct kvm_mmu *context)
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);

+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+ ASSERT(vcpu);
+ ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+ context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+ context->nx = is_nx(vcpu); /* TODO: ? */
+ context->new_cr3 = paging_new_cr3;
+ context->page_fault = EPT_page_fault;
+ context->gva_to_gpa = EPT_gva_to_gpa;
+ context->sync_page = EPT_sync_page;
+ context->invlpg = EPT_invlpg;
+ context->update_pte = EPT_update_pte;
+ context->free = paging_free;
+ context->root_level = context->shadow_root_level;
+ context->root_hpa = INVALID_PAGE;
+ context->direct_map = false;
+
+ /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
+   something different.
+ */
+ reset_rsvds_bits_mask(vcpu, context);
+
+
+ /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
+   they are done, or why they write to vcpu->arch.mmu and not context
+ */
+ vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+ vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
+ vcpu->arch.mmu.base_role.smep_andnot_wp =
+ kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
+ !is_write_protection(vcpu);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
  int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 6987108..19dd5ab 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -54,6 +54,7 @@ int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu
*vcpu, u64 addr, u64 sptes[4]);
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr,
bool direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);

 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 9e0ec9d..f2fd79d 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -912,12 +912,16 @@ static inline bool nested_cpu_has2(struct vmcs12
*vmcs12, u32 bit)
  (vmcs12->secondary_vm_exec_control & bit);
 }

-static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12,
- struct kvm_vcpu *vcpu)
+static inline bool nested_cpu_has_virtual_nmis(struct vmcs12 *vmcs12)
 {
  return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }

+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+ return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
  return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -6873,6 +6877,46 @@ static void vmx_set_supported_cpuid(u32 func,
struct kvm_cpuid_entry2 *entry)
  entry->ecx |= bit(X86_FEATURE_VMX);
 }

+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+ /* return the page table to be shadowed - in our case, EPT12 */
+ return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+ struct x86_exception *fault)
+{
+ struct vmcs12 *vmcs12;
+ nested_vmx_vmexit(vcpu);
+ vmcs12 = get_vmcs12(vcpu);
+ /*
+ * Note no need to set vmcs12->vm_exit_reason as it is already copied
+ * from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
+ */
+ vmcs12->exit_qualification = fault->error_code;
+ vmcs12->guest_physical_address = fault->address;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+ int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
+
+ vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+ vcpu->arch.mmu.get_cr3   = nested_ept_get_cr3;
+ vcpu->arch.

Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-24 Thread Jan Kiszka
On 2013-04-24 17:55, Nakajima, Jun wrote:
> On Wed, Apr 24, 2013 at 12:25 AM, Jan Kiszka  wrote:
>>>
>>> I don't have a full picture (already asked you to post / git-push your
>>> intermediate state), but nested related states typically go to
>>> nested_vmx, thus vcpu_vmx.
>>
>> Ping regarding publication. I'm about to redo your porting work as we
>> are making no progress.
>>
> 
> Sorry about the slow progress. We've been distracted by some priority
> things. The patches are ready (i.e. working), but we are cleaning them
> up. I'll send what we have today.

Great news, thanks a lot!

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-24 Thread Nakajima, Jun
On Wed, Apr 24, 2013 at 12:25 AM, Jan Kiszka  wrote:
>>
>> I don't have a full picture (already asked you to post / git-push your
>> intermediate state), but nested related states typically go to
>> nested_vmx, thus vcpu_vmx.
>
> Ping regarding publication. I'm about to redo your porting work as we
> are making no progress.
>

Sorry about the slow progress. We've been distracted by some priority
things. The patches are ready (i.e. working), but we are cleaning them
up. I'll send what we have today.

--
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-04-24 Thread Jan Kiszka
On 2013-03-22 17:45, Jan Kiszka wrote:
> On 2013-03-22 07:23, Nakajima, Jun wrote:
>> On Mon, Mar 4, 2013 at 8:45 PM, Nakajima, Jun  wrote:
>>> I have some updates on this. We rebased the patched to the latest KVM
>>> (L0). It turned out that the version of L1 KVM/Linux matters. At that
>>> time, actually I used v3.7 kernel for L1, and the L2 didn't work as I
>>> described above. If I use v3.5 or older for L1, L2 works with the EPT
>>> patches. So, I guess some changes made to v3.6 might have exposed a
>>> bug with the nested EPT patches or somewhere. We are looking at the
>>> changes to root-cause it.
>>>
>>
>> Finally I've had more time to work on this, and I think I've fixed
>> this. The problem was that the exit qualification for EPT violation
>> (to L1) was not accurate (enough). And I needed to save the exit
>> qualification upon EPT violation somewhere. Today, that information is
>> converted to error_code (see below), and we lose the information.  We
>> need to use  at least the lower 3 bits when injecting EPT violation to
>> the L1 VMM. I tried to use the upper bytes of error_code to pass  part
>> of the exit qualification, but it didn't work well. Any suggestion for
>> the place to store the value? kvm_vcpu?
>>
>>...
>> /* It is a write fault? */
>> error_code = exit_qualification & (1U << 1);
>> /* ept page table is present? */
>> error_code |= (exit_qualification >> 3) & 0x1;
>>
>> return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);
> 
> I don't have a full picture (already asked you to post / git-push your
> intermediate state), but nested related states typically go to
> nested_vmx, thus vcpu_vmx.

Ping regarding publication. I'm about to redo your porting work as we
are making no progress.

Jan




signature.asc
Description: OpenPGP digital signature


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-03-22 Thread Jan Kiszka
On 2013-03-22 07:23, Nakajima, Jun wrote:
> On Mon, Mar 4, 2013 at 8:45 PM, Nakajima, Jun  wrote:
>> I have some updates on this. We rebased the patched to the latest KVM
>> (L0). It turned out that the version of L1 KVM/Linux matters. At that
>> time, actually I used v3.7 kernel for L1, and the L2 didn't work as I
>> described above. If I use v3.5 or older for L1, L2 works with the EPT
>> patches. So, I guess some changes made to v3.6 might have exposed a
>> bug with the nested EPT patches or somewhere. We are looking at the
>> changes to root-cause it.
>>
> 
> Finally I've had more time to work on this, and I think I've fixed
> this. The problem was that the exit qualification for EPT violation
> (to L1) was not accurate (enough). And I needed to save the exit
> qualification upon EPT violation somewhere. Today, that information is
> converted to error_code (see below), and we lose the information.  We
> need to use  at least the lower 3 bits when injecting EPT violation to
> the L1 VMM. I tried to use the upper bytes of error_code to pass  part
> of the exit qualification, but it didn't work well. Any suggestion for
> the place to store the value? kvm_vcpu?
> 
>...
> /* It is a write fault? */
> error_code = exit_qualification & (1U << 1);
> /* ept page table is present? */
> error_code |= (exit_qualification >> 3) & 0x1;
> 
> return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);

I don't have a full picture (already asked you to post / git-push your
intermediate state), but nested related states typically go to
nested_vmx, thus vcpu_vmx.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-03-21 Thread Nakajima, Jun
On Mon, Mar 4, 2013 at 8:45 PM, Nakajima, Jun  wrote:
> I have some updates on this. We rebased the patched to the latest KVM
> (L0). It turned out that the version of L1 KVM/Linux matters. At that
> time, actually I used v3.7 kernel for L1, and the L2 didn't work as I
> described above. If I use v3.5 or older for L1, L2 works with the EPT
> patches. So, I guess some changes made to v3.6 might have exposed a
> bug with the nested EPT patches or somewhere. We are looking at the
> changes to root-cause it.
>

Finally I've had more time to work on this, and I think I've fixed
this. The problem was that the exit qualification for EPT violation
(to L1) was not accurate (enough). And I needed to save the exit
qualification upon EPT violation somewhere. Today, that information is
converted to error_code (see below), and we lose the information.  We
need to use  at least the lower 3 bits when injecting EPT violation to
the L1 VMM. I tried to use the upper bytes of error_code to pass  part
of the exit qualification, but it didn't work well. Any suggestion for
the place to store the value? kvm_vcpu?

   ...
/* It is a write fault? */
error_code = exit_qualification & (1U << 1);
/* ept page table is present? */
error_code |= (exit_qualification >> 3) & 0x1;

return kvm_mmu_page_fault(vcpu, gpa, error_code, NULL, 0);

--
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-03-05 Thread Jan Kiszka
On 2013-03-05 05:45, Nakajima, Jun wrote:
> On Tue, Feb 26, 2013 at 11:43 AM, Jan Kiszka  wrote:
>> On 2013-02-26 15:11, Nadav Har'El wrote:
>>> On Thu, Feb 14, 2013, Nakajima, Jun wrote about "Re: [Bug 53611] New: nVMX: 
>>> Add nested EPT":
>>>> We have started looking at the pataches first. But I couldn't
>>>> reproduce the results by simply applying the original patches to v3.6:
>>>> - L2 Ubuntu 12.04 (64-bit)  (smp 2)
>>>> - L1 Ubuntu 12.04 (64-bit) KVM (smp 2)
>>>> - L0 Ubuntu 12.04 (64-bit)-based. kernel/KVM is v3.6 + patches (the
>>>> ones in nept-v2.tgz).
>>>> https://bugzilla.kernel.org/attachment.cgi?id=93101
>>>>
>>>> Without the patches, the L2 guest works. With it, it hangs at boot
>>>> time (just black screen):
>>>> - EPT was detected by L1 KVM.
>>>> - UP L2 didn't help.
>>>> - Looks like it's looping at EPT_walk_add_generic at the same address in 
>>>> L0.
>>>>
>>>> Will take a closer look. It would be helpful if the test configuration
>>>> (e.g kernel/commit id used, L1/L2 guests) was documented as well.
>>>
>>> I sent the patches in August 1st, and they applied to commit
>>> ade38c311a0ad8c32e902fe1d0ae74d0d44bc71e from a week earlier.
>>>
>>> In most of my tests, L1 and L2 were old images - L1 had Linux 2.6.33,
>>> while L2 had Linux 2.6.28. In most of my tests both L1 and L2 were UP.
>>>
>>> I've heard another report of my patch not working with newer L1/L2 -
>>> the report said that L2 failed to boot (like you reported), and also
>>> that L1 became unstable (running anything in it gave a memory fault).
>>> So it is very likely that this code still has bugs - but since I already
>>> know of errors and holes that need to be plugged (see the announcement file
>>> together with the patches), it's not very surprising :( These patches
>>> definitely need some lovin', but it's easier than starting from scratch.
>>
>> FWIW, I'm playing with them on top of kvm-3.6-2 (second pull request for
>> 3.6) for a while. They work OK for my use case (static mapping) but
>> apparently lock up L2 when starting KVM on KVM, just as reported. I
>> didn't look into any details there, still busy with fixing other issues
>> like CR0/CR4 handling (which I came across while adding unrestricted
>> guest support on top of EPT).
> 
> I have some updates on this. We rebased the patched to the latest KVM
> (L0). It turned out that the version of L1 KVM/Linux matters. At that
> time, actually I used v3.7 kernel for L1, and the L2 didn't work as I
> described above. If I use v3.5 or older for L1, L2 works with the EPT
> patches. So, I guess some changes made to v3.6 might have exposed a
> bug with the nested EPT patches or somewhere. We are looking at the
> changes to root-cause it.

Great to hear! Would you mind to share your work early, even when it's
not yet stable?

At least regarding lockups or misbehaviors of L1 and L2, some of the
patches I posted recently may help. Did you try to merge them as well?

Thanks,
Jan




signature.asc
Description: OpenPGP digital signature


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-03-04 Thread Nakajima, Jun
On Tue, Feb 26, 2013 at 11:43 AM, Jan Kiszka  wrote:
> On 2013-02-26 15:11, Nadav Har'El wrote:
>> On Thu, Feb 14, 2013, Nakajima, Jun wrote about "Re: [Bug 53611] New: nVMX: 
>> Add nested EPT":
>>> We have started looking at the pataches first. But I couldn't
>>> reproduce the results by simply applying the original patches to v3.6:
>>> - L2 Ubuntu 12.04 (64-bit)  (smp 2)
>>> - L1 Ubuntu 12.04 (64-bit) KVM (smp 2)
>>> - L0 Ubuntu 12.04 (64-bit)-based. kernel/KVM is v3.6 + patches (the
>>> ones in nept-v2.tgz).
>>> https://bugzilla.kernel.org/attachment.cgi?id=93101
>>>
>>> Without the patches, the L2 guest works. With it, it hangs at boot
>>> time (just black screen):
>>> - EPT was detected by L1 KVM.
>>> - UP L2 didn't help.
>>> - Looks like it's looping at EPT_walk_add_generic at the same address in L0.
>>>
>>> Will take a closer look. It would be helpful if the test configuration
>>> (e.g kernel/commit id used, L1/L2 guests) was documented as well.
>>
>> I sent the patches in August 1st, and they applied to commit
>> ade38c311a0ad8c32e902fe1d0ae74d0d44bc71e from a week earlier.
>>
>> In most of my tests, L1 and L2 were old images - L1 had Linux 2.6.33,
>> while L2 had Linux 2.6.28. In most of my tests both L1 and L2 were UP.
>>
>> I've heard another report of my patch not working with newer L1/L2 -
>> the report said that L2 failed to boot (like you reported), and also
>> that L1 became unstable (running anything in it gave a memory fault).
>> So it is very likely that this code still has bugs - but since I already
>> know of errors and holes that need to be plugged (see the announcement file
>> together with the patches), it's not very surprising :( These patches
>> definitely need some lovin', but it's easier than starting from scratch.
>
> FWIW, I'm playing with them on top of kvm-3.6-2 (second pull request for
> 3.6) for a while. They work OK for my use case (static mapping) but
> apparently lock up L2 when starting KVM on KVM, just as reported. I
> didn't look into any details there, still busy with fixing other issues
> like CR0/CR4 handling (which I came across while adding unrestricted
> guest support on top of EPT).

I have some updates on this. We rebased the patched to the latest KVM
(L0). It turned out that the version of L1 KVM/Linux matters. At that
time, actually I used v3.7 kernel for L1, and the L2 didn't work as I
described above. If I use v3.5 or older for L1, L2 works with the EPT
patches. So, I guess some changes made to v3.6 might have exposed a
bug with the nested EPT patches or somewhere. We are looking at the
changes to root-cause it.

>
> Given that I'm porting now patches between that branch and "next" back
> and forth (I depend on EPT), it would be really great if someone
> familiar with the KVM MMU (or enough time) could port the series to the
> current git head. That would not solve remaining bugs but could trigger
> more development, maybe also help me jumping into this.
>
> Thanks,
> Jan
>


-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53611] nVMX: Add nested EPT

2013-02-27 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53611





--- Comment #1 from Nadav Har'El   2013-02-27 
08:14:13 ---
In addition to the known issues list in the "announce" file attached above, I
thought of several more issues that should be considered:

1. When switching back and forth between L1 and L2 it will be a waste to throw
away the EPT table already built. So I hope (need to check...) that the EPT
table is cached. But what is the cache key - the cr3? But cr3 has a different
meaning in L2 and L1, so it might not be correct to use that as the key.

2. When L0 swaps out pages, it needs to remove these entries in all EPT tables,
including the cached EPT02 even if not currently used. Does this happen
correctly?

3. If L1 uses EPT ("nested EPT") and gives us a malformed EPT12 table, we may
need to inject an EPT_MISCONFIGURATION exit when building the merged EPT02
entry. Typically, we do this building (see "fetch" in paging_tmpl.h) when
handling an EPT violation exit from L2, so if we encounter this problem
instead of reentering L2 immediately, we should exit to L1 with an EPT
misconfigration. I'm not sure exactly how to notice this problem. Perhaps the
pagetable walking code, which in our case walks EPT12 already notices a problem
and does something (#GP perhaps?) and we need to have it do the EPT misconfig
instead. But it is possible we need to add additional tests that are not done
for normal page tables - in particularly regarding reserved bits, and
especially bit 5 (in EPT it is reserved, in normal page tables it is the
accessed bit). This issue is low priority, as it only deals with the error
path; A well-written L1 will not caused EPT configurations anyway.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-26 Thread Gleb Natapov
On Tue, Feb 26, 2013 at 08:43:13PM +0100, Jan Kiszka wrote:
> On 2013-02-26 15:11, Nadav Har'El wrote:
> > On Thu, Feb 14, 2013, Nakajima, Jun wrote about "Re: [Bug 53611] New: nVMX: 
> > Add nested EPT":
> >> We have started looking at the pataches first. But I couldn't
> >> reproduce the results by simply applying the original patches to v3.6:
> >> - L2 Ubuntu 12.04 (64-bit)  (smp 2)
> >> - L1 Ubuntu 12.04 (64-bit) KVM (smp 2)
> >> - L0 Ubuntu 12.04 (64-bit)-based. kernel/KVM is v3.6 + patches (the
> >> ones in nept-v2.tgz).
> >> https://bugzilla.kernel.org/attachment.cgi?id=93101
> >>
> >> Without the patches, the L2 guest works. With it, it hangs at boot
> >> time (just black screen):
> >> - EPT was detected by L1 KVM.
> >> - UP L2 didn't help.
> >> - Looks like it's looping at EPT_walk_add_generic at the same address in 
> >> L0.
> >>
> >> Will take a closer look. It would be helpful if the test configuration
> >> (e.g kernel/commit id used, L1/L2 guests) was documented as well.
> > 
> > I sent the patches in August 1st, and they applied to commit
> > ade38c311a0ad8c32e902fe1d0ae74d0d44bc71e from a week earlier.
> > 
> > In most of my tests, L1 and L2 were old images - L1 had Linux 2.6.33,
> > while L2 had Linux 2.6.28. In most of my tests both L1 and L2 were UP.
> > 
> > I've heard another report of my patch not working with newer L1/L2 -
> > the report said that L2 failed to boot (like you reported), and also
> > that L1 became unstable (running anything in it gave a memory fault).
> > So it is very likely that this code still has bugs - but since I already
> > know of errors and holes that need to be plugged (see the announcement file
> > together with the patches), it's not very surprising :( These patches
> > definitely need some lovin', but it's easier than starting from scratch.
> 
> FWIW, I'm playing with them on top of kvm-3.6-2 (second pull request for
> 3.6) for a while. They work OK for my use case (static mapping) but
> apparently lock up L2 when starting KVM on KVM, just as reported. I
> didn't look into any details there, still busy with fixing other issues
> like CR0/CR4 handling (which I came across while adding unrestricted
> guest support on top of EPT).
> 
> Given that I'm porting now patches between that branch and "next" back
> and forth (I depend on EPT), it would be really great if someone
> familiar with the KVM MMU (or enough time) could port the series to the
> current git head. That would not solve remaining bugs but could trigger
> more development, maybe also help me jumping into this.
> 
I'd like to do that. See if I'll have time...

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-26 Thread Jan Kiszka
On 2013-02-26 15:11, Nadav Har'El wrote:
> On Thu, Feb 14, 2013, Nakajima, Jun wrote about "Re: [Bug 53611] New: nVMX: 
> Add nested EPT":
>> We have started looking at the pataches first. But I couldn't
>> reproduce the results by simply applying the original patches to v3.6:
>> - L2 Ubuntu 12.04 (64-bit)  (smp 2)
>> - L1 Ubuntu 12.04 (64-bit) KVM (smp 2)
>> - L0 Ubuntu 12.04 (64-bit)-based. kernel/KVM is v3.6 + patches (the
>> ones in nept-v2.tgz).
>> https://bugzilla.kernel.org/attachment.cgi?id=93101
>>
>> Without the patches, the L2 guest works. With it, it hangs at boot
>> time (just black screen):
>> - EPT was detected by L1 KVM.
>> - UP L2 didn't help.
>> - Looks like it's looping at EPT_walk_add_generic at the same address in L0.
>>
>> Will take a closer look. It would be helpful if the test configuration
>> (e.g kernel/commit id used, L1/L2 guests) was documented as well.
> 
> I sent the patches in August 1st, and they applied to commit
> ade38c311a0ad8c32e902fe1d0ae74d0d44bc71e from a week earlier.
> 
> In most of my tests, L1 and L2 were old images - L1 had Linux 2.6.33,
> while L2 had Linux 2.6.28. In most of my tests both L1 and L2 were UP.
> 
> I've heard another report of my patch not working with newer L1/L2 -
> the report said that L2 failed to boot (like you reported), and also
> that L1 became unstable (running anything in it gave a memory fault).
> So it is very likely that this code still has bugs - but since I already
> know of errors and holes that need to be plugged (see the announcement file
> together with the patches), it's not very surprising :( These patches
> definitely need some lovin', but it's easier than starting from scratch.

FWIW, I'm playing with them on top of kvm-3.6-2 (second pull request for
3.6) for a while. They work OK for my use case (static mapping) but
apparently lock up L2 when starting KVM on KVM, just as reported. I
didn't look into any details there, still busy with fixing other issues
like CR0/CR4 handling (which I came across while adding unrestricted
guest support on top of EPT).

Given that I'm porting now patches between that branch and "next" back
and forth (I depend on EPT), it would be really great if someone
familiar with the KVM MMU (or enough time) could port the series to the
current git head. That would not solve remaining bugs but could trigger
more development, maybe also help me jumping into this.

Thanks,
Jan



signature.asc
Description: OpenPGP digital signature


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-26 Thread Nadav Har'El
On Thu, Feb 14, 2013, Nakajima, Jun wrote about "Re: [Bug 53611] New: nVMX: Add 
nested EPT":
> We have started looking at the pataches first. But I couldn't
> reproduce the results by simply applying the original patches to v3.6:
> - L2 Ubuntu 12.04 (64-bit)  (smp 2)
> - L1 Ubuntu 12.04 (64-bit) KVM (smp 2)
> - L0 Ubuntu 12.04 (64-bit)-based. kernel/KVM is v3.6 + patches (the
> ones in nept-v2.tgz).
> https://bugzilla.kernel.org/attachment.cgi?id=93101
> 
> Without the patches, the L2 guest works. With it, it hangs at boot
> time (just black screen):
> - EPT was detected by L1 KVM.
> - UP L2 didn't help.
> - Looks like it's looping at EPT_walk_add_generic at the same address in L0.
> 
> Will take a closer look. It would be helpful if the test configuration
> (e.g kernel/commit id used, L1/L2 guests) was documented as well.

I sent the patches in August 1st, and they applied to commit
ade38c311a0ad8c32e902fe1d0ae74d0d44bc71e from a week earlier.

In most of my tests, L1 and L2 were old images - L1 had Linux 2.6.33,
while L2 had Linux 2.6.28. In most of my tests both L1 and L2 were UP.

I've heard another report of my patch not working with newer L1/L2 -
the report said that L2 failed to boot (like you reported), and also
that L1 became unstable (running anything in it gave a memory fault).
So it is very likely that this code still has bugs - but since I already
know of errors and holes that need to be plugged (see the announcement file
together with the patches), it's not very surprising :( These patches
definitely need some lovin', but it's easier than starting from scratch.

Nadav.

-- 
Nadav Har'El|   Tuesday, Feb 26 2013, 16 Adar 5773
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |I think therefore I am. My computer
http://nadav.harel.org.il   |thinks for me, therefore I am not.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-14 Thread Nakajima, Jun
On Tue, Feb 12, 2013 at 11:43 PM, Jan Kiszka  wrote:
>
> On 2013-02-12 20:13, Nakajima, Jun wrote:
> > I looked at your (old) patches, and they seem to be very useful
> > although some of them require rebasing or rewriting. We are interested
> > in completing the nested-VMX features.
>
> That's great news. Can you estimate when you will be able to work on it?
>

We have started looking at the pataches first. But I couldn't
reproduce the results by simply applying the original patches to v3.6:
- L2 Ubuntu 12.04 (64-bit)  (smp 2)
- L1 Ubuntu 12.04 (64-bit) KVM (smp 2)
- L0 Ubuntu 12.04 (64-bit)-based. kernel/KVM is v3.6 + patches (the
ones in nept-v2.tgz).
https://bugzilla.kernel.org/attachment.cgi?id=93101

Without the patches, the L2 guest works. With it, it hangs at boot
time (just black screen):
- EPT was detected by L1 KVM.
- UP L2 didn't help.
- Looks like it's looping at EPT_walk_add_generic at the same address in L0.

Will take a closer look. It would be helpful if the test configuration
(e.g kernel/commit id used, L1/L2 guests) was documented as well.

> I will have a use case for nEPT soon - testing purposes. But working
> into the KVM MMU and doing the port myself may unfortunately consume too
> much time here.
>
> Jan
>

--
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-12 Thread Jan Kiszka
On 2013-02-12 20:13, Nakajima, Jun wrote:
> On Mon, Feb 11, 2013 at 5:27 AM, Nadav Har'El  
> wrote:
>> Hi,
>>
>> On Mon, Feb 11, 2013, Jan Kiszka wrote about "Re: [Bug 53611] New: nVMX: Add 
>> nested EPT":
>>> On 2013-02-11 13:49, bugzilla-dae...@bugzilla.kernel.org wrote:
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=53611
>>>>Summary: nVMX: Add nested EPT
>>
>> Yikes, I didn't realize that these bugzilla edits all get spammed to the
>> entire mailing list :( Sorry about those...
>>
>>> I suppose they do not apply anymore as well. Do you have a recent tree
>>> around somewhere or plan to resume work on it?
>>
>> Unfortunately, no - I did not have time to work on these patches since
>> August.
>>
>> The reason I'm now stuffing these things into the bug tracker is that
>> at the end of this month I am leaving IBM to a new job, so I'm pretty
>> sure I won't have time myself to continue any work on nested VMX, and
>> would like for the missing nested-VMX features to be documented in case
>> someone else comes along and wants to improve it. So unfortunately, you
>> should expect more of this bugzilla spam on the mailing list...
>>
> 
> I looked at your (old) patches, and they seem to be very useful
> although some of them require rebasing or rewriting. We are interested
> in completing the nested-VMX features.

That's great news. Can you estimate when you will be able to work on it?

I will have a use case for nEPT soon - testing purposes. But working
into the KVM MMU and doing the port myself may unfortunately consume too
much time here.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-12 Thread Nakajima, Jun
On Mon, Feb 11, 2013 at 5:27 AM, Nadav Har'El  wrote:
> Hi,
>
> On Mon, Feb 11, 2013, Jan Kiszka wrote about "Re: [Bug 53611] New: nVMX: Add 
> nested EPT":
>> On 2013-02-11 13:49, bugzilla-dae...@bugzilla.kernel.org wrote:
>> > https://bugzilla.kernel.org/show_bug.cgi?id=53611
>> >Summary: nVMX: Add nested EPT
>
> Yikes, I didn't realize that these bugzilla edits all get spammed to the
> entire mailing list :( Sorry about those...
>
>> I suppose they do not apply anymore as well. Do you have a recent tree
>> around somewhere or plan to resume work on it?
>
> Unfortunately, no - I did not have time to work on these patches since
> August.
>
> The reason I'm now stuffing these things into the bug tracker is that
> at the end of this month I am leaving IBM to a new job, so I'm pretty
> sure I won't have time myself to continue any work on nested VMX, and
> would like for the missing nested-VMX features to be documented in case
> someone else comes along and wants to improve it. So unfortunately, you
> should expect more of this bugzilla spam on the mailing list...
>

I looked at your (old) patches, and they seem to be very useful
although some of them require rebasing or rewriting. We are interested
in completing the nested-VMX features.

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-11 Thread Jan Kiszka
On 2013-02-11 14:27, Nadav Har'El wrote:
> Hi,
> 
> On Mon, Feb 11, 2013, Jan Kiszka wrote about "Re: [Bug 53611] New: nVMX: Add 
> nested EPT":
>> On 2013-02-11 13:49, bugzilla-dae...@bugzilla.kernel.org wrote:
>>> https://bugzilla.kernel.org/show_bug.cgi?id=53611
>>>Summary: nVMX: Add nested EPT
> 
> Yikes, I didn't realize that these bugzilla edits all get spammed to the
> entire mailing list :( Sorry about those...
> 
>> I suppose they do not apply anymore as well. Do you have a recent tree
>> around somewhere or plan to resume work on it?
> 
> Unfortunately, no - I did not have time to work on these patches since
> August.
> 
> The reason I'm now stuffing these things into the bug tracker is that
> at the end of this month I am leaving IBM to a new job, so I'm pretty
> sure I won't have time myself to continue any work on nested VMX, and
> would like for the missing nested-VMX features to be documented in case
> someone else comes along and wants to improve it. So unfortunately, you
> should expect more of this bugzilla spam on the mailing list...

A pity that you cannot finish this great work. But documenting the open
issues is definitely helpful and welcome.

Best wishes,
Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-11 Thread Nadav Har'El
Hi,

On Mon, Feb 11, 2013, Jan Kiszka wrote about "Re: [Bug 53611] New: nVMX: Add 
nested EPT":
> On 2013-02-11 13:49, bugzilla-dae...@bugzilla.kernel.org wrote:
> > https://bugzilla.kernel.org/show_bug.cgi?id=53611
> >    Summary: nVMX: Add nested EPT

Yikes, I didn't realize that these bugzilla edits all get spammed to the
entire mailing list :( Sorry about those...

> I suppose they do not apply anymore as well. Do you have a recent tree
> around somewhere or plan to resume work on it?

Unfortunately, no - I did not have time to work on these patches since
August.

The reason I'm now stuffing these things into the bug tracker is that
at the end of this month I am leaving IBM to a new job, so I'm pretty
sure I won't have time myself to continue any work on nested VMX, and
would like for the missing nested-VMX features to be documented in case
someone else comes along and wants to improve it. So unfortunately, you
should expect more of this bugzilla spam on the mailing list...

Nadav.

-- 
Nadav Har'El| Monday, Feb 11 2013, 1 Adar 5773
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |The message above is just this
http://nadav.harel.org.il   |signature's way of propagating itself.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Bug 53611] New: nVMX: Add nested EPT

2013-02-11 Thread Jan Kiszka
On 2013-02-11 13:49, bugzilla-dae...@bugzilla.kernel.org wrote:
> https://bugzilla.kernel.org/show_bug.cgi?id=53611
> 
>Summary: nVMX: Add nested EPT
>Product: Virtualization
>Version: unspecified
>   Platform: All
> OS/Version: Linux
>   Tree: Mainline
> Status: NEW
>   Severity: normal
>   Priority: P1
>  Component: kvm
> AssignedTo: virtualization_...@kernel-bugs.osdl.org
> ReportedBy: n...@math.technion.ac.il
> Regression: No
> 
> 
> Created an attachment (id=93101)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=93101)
> Nested EPT patches, v2
> 
> Nested EPT means emulating EPT for an L1 guest, allowing it to use EPT when
> running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set
> its own cr3 and take its own page faults without either of L0 or L1 getting
> involved. In many workloads this significanlty improves L2's performance over
> the previous two alternatives (shadow page tables over ept, and shadow page
> tables over shadow page tables). As an example, I measured a single-threaded
> "make", which has a lot of context switches and page faults, on the three
> options:
> 
>  shadow over shadow: 105 seconds
>  shadow over EPT: 87 seconds  (this is the default currently)
>  EPT over EPT: 29 seconds
> 
>  single-level virtualization (with EPT): 25 seconds
> 
> So clearly nested EPT would be a big win for such workloads.
> 
> I attach a patch set which I worked on and allowed me to measure the above
> results. This is the same patch set I sent to KVM mailing list on August 1st,
> 2012, titled "nEPT v2: Nested EPT support for Nested VMX".
> 
> This patch set still needs some work: it is known to only work in some setups
> but not others, and the file "announce" in the attached tar lists 5 things
> which definitely need to be done. There were a few additional comments in the
> mailing list - see
> http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95395
> 

I suppose they do not apply anymore as well. Do you have a recent tree
around somewhere or plan to resume work on it?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53611] nVMX: Add nested EPT

2013-02-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53611


Nadav Har'El  changed:

   What|Removed |Added

 Blocks||53601




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 53611] New: nVMX: Add nested EPT

2013-02-11 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=53611

   Summary: nVMX: Add nested EPT
   Product: Virtualization
   Version: unspecified
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: NEW
  Severity: normal
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: n...@math.technion.ac.il
Regression: No


Created an attachment (id=93101)
 --> (https://bugzilla.kernel.org/attachment.cgi?id=93101)
Nested EPT patches, v2

Nested EPT means emulating EPT for an L1 guest, allowing it to use EPT when
running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set
its own cr3 and take its own page faults without either of L0 or L1 getting
involved. In many workloads this significanlty improves L2's performance over
the previous two alternatives (shadow page tables over ept, and shadow page
tables over shadow page tables). As an example, I measured a single-threaded
"make", which has a lot of context switches and page faults, on the three
options:

 shadow over shadow: 105 seconds
 shadow over EPT: 87 seconds  (this is the default currently)
 EPT over EPT: 29 seconds

 single-level virtualization (with EPT): 25 seconds

So clearly nested EPT would be a big win for such workloads.

I attach a patch set which I worked on and allowed me to measure the above
results. This is the same patch set I sent to KVM mailing list on August 1st,
2012, titled "nEPT v2: Nested EPT support for Nested VMX".

This patch set still needs some work: it is known to only work in some setups
but not others, and the file "announce" in the attached tar lists 5 things
which definitely need to be done. There were a few additional comments in the
mailing list - see
http://comments.gmane.org/gmane.comp.emulators.kvm.devel/95395

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] nEPT v2: Nested EPT support for Nested VMX

2012-08-01 Thread Avi Kivity
On 08/01/2012 05:36 PM, Nadav Har'El wrote:
> The following patches add nested EPT support to Nested VMX.
> 
> This is the second version of this patch set. Most of the issues from the
> previous reviews were handled, and in particular there is now a new variant
> of paging_tmpl for EPT page tables.

Thanks for this repost.

> However, while this version does work in my tests, there are still some known
> problems/bugs with this version and unhandled issues from the previous review:
> 
>  1. 32-bit *PAE* L2s currently don't work. non-PAE 32-bit L2s do work
> (and so do, of course, 64-bit L2s).
> 

I'm guessing that this has to do with loading the PDPTEs; probably we're
loading them from L1 instead of L2 during mode transitions.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 03/10] nEPT: MMU context for nested EPT

2012-08-01 Thread Nadav Har'El
KVM's existing shadow MMU code already supports nested TDP. To use it, we
need to set up a new "MMU context" for nested EPT, and create a few callbacks
for it (nested_ept_*()). This context should also use the EPT versions of
the page table access functions (defined in the previous patch).
Then, we need to switch back and forth between this nested context and the
regular MMU context when switching between L1 and L2 (when L1 runs this L2
with EPT).

Signed-off-by: Nadav Har'El 
---
 arch/x86/kvm/mmu.c |   38 +++
 arch/x86/kvm/mmu.h |1 
 arch/x86/kvm/vmx.c |   52 +++
 3 files changed, 91 insertions(+)

--- .before/arch/x86/kvm/mmu.h  2012-08-01 17:22:46.0 +0300
+++ .after/arch/x86/kvm/mmu.h   2012-08-01 17:22:46.0 +0300
@@ -52,6 +52,7 @@ int kvm_mmu_get_spte_hierarchy(struct kv
 void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
 int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
direct);
 int kvm_init_shadow_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context);
 
 static inline unsigned int kvm_mmu_available_pages(struct kvm *kvm)
 {
--- .before/arch/x86/kvm/mmu.c  2012-08-01 17:22:46.0 +0300
+++ .after/arch/x86/kvm/mmu.c   2012-08-01 17:22:46.0 +0300
@@ -3616,6 +3616,44 @@ int kvm_init_shadow_mmu(struct kvm_vcpu 
 }
 EXPORT_SYMBOL_GPL(kvm_init_shadow_mmu);
 
+int kvm_init_shadow_EPT_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *context)
+{
+   ASSERT(vcpu);
+   ASSERT(!VALID_PAGE(vcpu->arch.mmu.root_hpa));
+
+   context->shadow_root_level = kvm_x86_ops->get_tdp_level();
+
+   context->nx = is_nx(vcpu); /* TODO: ? */
+   context->new_cr3 = paging_new_cr3;
+   context->page_fault = EPT_page_fault;
+   context->gva_to_gpa = EPT_gva_to_gpa;
+   context->sync_page = EPT_sync_page;
+   context->invlpg = EPT_invlpg;
+   context->update_pte = EPT_update_pte;
+   context->free = paging_free;
+   context->root_level = context->shadow_root_level;
+   context->root_hpa = INVALID_PAGE;
+   context->direct_map = false;
+
+   /* TODO: reset_rsvds_bits_mask() is not built for EPT, we need
+  something different.
+*/
+   reset_rsvds_bits_mask(vcpu, context);
+
+
+   /* TODO: I copied these from kvm_init_shadow_mmu, I don't know why
+  they are done, or why they write to vcpu->arch.mmu and not context
+*/
+   vcpu->arch.mmu.base_role.cr4_pae = !!is_pae(vcpu);
+   vcpu->arch.mmu.base_role.cr0_wp  = is_write_protection(vcpu);
+   vcpu->arch.mmu.base_role.smep_andnot_wp =
+   kvm_read_cr4_bits(vcpu, X86_CR4_SMEP) &&
+   !is_write_protection(vcpu);
+
+   return 0;
+}
+EXPORT_SYMBOL_GPL(kvm_init_shadow_EPT_mmu);
+
 static int init_kvm_softmmu(struct kvm_vcpu *vcpu)
 {
int r = kvm_init_shadow_mmu(vcpu, vcpu->arch.walk_mmu);
--- .before/arch/x86/kvm/vmx.c  2012-08-01 17:22:46.0 +0300
+++ .after/arch/x86/kvm/vmx.c   2012-08-01 17:22:46.0 +0300
@@ -901,6 +901,11 @@ static inline bool nested_cpu_has_virtua
return vmcs12->pin_based_vm_exec_control & PIN_BASED_VIRTUAL_NMIS;
 }
 
+static inline int nested_cpu_has_ept(struct vmcs12 *vmcs12)
+{
+   return nested_cpu_has2(vmcs12, SECONDARY_EXEC_ENABLE_EPT);
+}
+
 static inline bool is_exception(u32 intr_info)
 {
return (intr_info & (INTR_INFO_INTR_TYPE_MASK | INTR_INFO_VALID_MASK))
@@ -6591,6 +6596,46 @@ static void vmx_set_supported_cpuid(u32 
entry->ecx |= bit(X86_FEATURE_VMX);
 }
 
+/* Callbacks for nested_ept_init_mmu_context: */
+
+static unsigned long nested_ept_get_cr3(struct kvm_vcpu *vcpu)
+{
+   /* return the page table to be shadowed - in our case, EPT12 */
+   return get_vmcs12(vcpu)->ept_pointer;
+}
+
+static void nested_ept_inject_page_fault(struct kvm_vcpu *vcpu,
+   struct x86_exception *fault)
+{
+   struct vmcs12 *vmcs12;
+   nested_vmx_vmexit(vcpu);
+   vmcs12 = get_vmcs12(vcpu);
+   /*
+* Note no need to set vmcs12->vm_exit_reason as it is already copied
+* from vmcs02 in nested_vmx_vmexit() above, i.e., EPT_VIOLATION.
+*/
+   vmcs12->exit_qualification = fault->error_code;
+   vmcs12->guest_physical_address = fault->address;
+}
+
+static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
+{
+   int r = kvm_init_shadow_EPT_mmu(vcpu, &vcpu->arch.mmu);
+
+   vcpu->arch.mmu.set_cr3   = vmx_set_cr3;
+   vcpu->arch.mmu.get_cr3   = nested_ept_get_cr3;
+   vcpu->arch.mmu.inject_page_fault = nested_ept_inject_page_fault;
+
+   vcpu->arch.walk_mmu  = &vcpu->arch.nested_mmu;
+
+   return r;
+}
+
+static void nested_ept_uninit_

[PATCH 0/10] nEPT v2: Nested EPT support for Nested VMX

2012-08-01 Thread Nadav Har'El
The following patches add nested EPT support to Nested VMX.

This is the second version of this patch set. Most of the issues from the
previous reviews were handled, and in particular there is now a new variant
of paging_tmpl for EPT page tables.

However, while this version does work in my tests, there are still some known
problems/bugs with this version and unhandled issues from the previous review:

 1. 32-bit *PAE* L2s currently don't work. non-PAE 32-bit L2s do work
(and so do, of course, 64-bit L2s).

 2. nested_ept_inject_page_fault() assumes vm_exit_reason is already set
to EPT_VIOLATION. However, it is conceivable that L0 emulates some
L2 instruction, and during this emulation we read some L2 memory
causing a need to exit (from L2 to L1) with an EPT violation.

 3. Moreover, now nested_ept_inject_page_fault() always causes an
EPT_VIOLATION, with vmcs12->exit_qualification = fault->error_code.
This is wrong: first fault->error code is not in exit qualification
format but in PFERR_* format. Moreover, PFERR_RSVD_MASK would mean
we need to cause an EPT_MISCONFIG, NOT EPT_VIOLATION.
Instead of trying to fix this by translating PFERR to exit_qualification,
we should calculate and remember in walk_addr() the exit qualification
(and and an additional bit: whether it's an EPT VIOLATION or
MISCONFIGURATION). This will be remembered in new fields in x86_exception.

Avi suggested: "[add to x86_exception] another bool, to distinguish
between EPT VIOLATION and EPT_QUALIFICATION. The error_code field should
be extended to 64 bits for EXIT_QUALIFICATION (though only bits 0-12 are
defined). You need another field for the guest linear address. 
EXIT_QUALIFICATION has to be calculated, it cannot be derived from the
original exit. Look at kvm_propagate_fault()."
He also added: "If we're injecting an EPT VIOLATION to L1 (because we
weren't able to resolve it; say L1 write-protected the page), then we
need to compute EXIT_QUALIFICATION.  Bits 3-5 of EXIT_QUALIFICATION are
computed from EPT12 paging structure entries (easy to derive them from
pt_access/pte_access)."

 4. Also, nested_ept_inject_page_fault() doesn't set guest linear address.
 
 5. There are several "TODO"s left in the code.

If there's any volunteer willing to help me with some of these issues,
it would be great :-)


About nested EPT:
-

Nested EPT means emulating EPT for an L1 guest, allowing it to use EPT when
running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set
its own cr3 and take its own page faults without either of L0 or L1 getting
involved. In many workloads this significanlty improves L2's performance over
the previous two alternatives (shadow page tables over ept, and shadow page
tables over shadow page tables). Our paper [1] described these three options,
and the advantages of nested EPT ("multidimensional paging" in the paper).

Nested EPT is enabled by default (if the hardware supports EPT), so users do
not have to do anything special to enjoy the performance improvement that
this patch gives to L2 guests. L1 may of course choose not to use nested
EPT, by simply not using EPT (e.g., a KVM in L1 may use the "ept=0" option).

Just as a non-scientific, non-representative indication of the kind of
dramatic performance improvement you may see in workloads that have a lot of
context switches and page faults, here is a measurement of the time
an example single-threaded "make" took in L2 (kvm over kvm):

 shadow over shadow: 105 seconds
 ("ept=0" in L0 forces this)

 shadow over EPT: 87 seconds
 (the previous default; Can be forced with "ept=0" in L1)

 EPT over EPT: 29 seconds
 (the default after this patch)

Note that the same test on L1 (with EPT) took 25 seconds, so for this example
workload, performance of nested virtualization is now very close to that of
single-level virtualization.


[1] "The Turtles Project: Design and Implementation of Nested Virtualization",
http://www.usenix.org/events/osdi10/tech/full_papers/Ben-Yehuda.pdf


Patch statistics:
-

 Documentation/virtual/kvm/nested-vmx.txt |4 
 arch/x86/include/asm/vmx.h   |2 
 arch/x86/kvm/mmu.c   |   52 +++-
 arch/x86/kvm/mmu.h   |1 
 arch/x86/kvm/paging_tmpl.h   |   98 -
 arch/x86/kvm/vmx.c   |  227 +++--
 arch/x86/kvm/x86.c   |   11 -
 7 files changed, 354 insertions(+), 41 deletions(-)

--
Nadav Har'El
IBM Haifa Research Lab

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] nEPT: Nested EPT support for Nested VMX

2011-12-12 Thread Avi Kivity
On 12/12/2011 01:37 PM, Nadav Har'El wrote:
> On Sun, Nov 13, 2011, Avi Kivity wrote about "Re: [PATCH 0/10] nEPT: Nested 
> EPT support for Nested VMX":
> > > I also believed that the fault injection part was also correct: I
> > > thought that the code already knows when to handle the fault in L2 (when
> > > the address is missing in cr3), in L1 (when the translation is missing
> > > in EPT12) or else, in L0.
> > 
> > It does, but it needs to propagate the fault code correctly.  The exit
> > reason (ept violation vs ept misconfiguration) is meaningless, since we
> > don't encode anything about it from ept12 into ept02.  In particular an
> > ept violation could lead to
> > 
> > - no fault, ept02 updated, instruction retried
> > - no fault, instruction emulated
> > - L2 fault
> > - ept violation, need to compute ept12 permissions for exit qualification
> > - ept misconfiguration
> > 
> > (the second and third cases occur when it is impossible to create an
> > ept02 mapping - when L0 emulates a gpa that L1 assigns to L2 via ept12).
>
> I'm now trying to figure out this part, and I think I am beginning to
> understand the mess you are referring to:
>
> In nested_ept_inject_page_fault I now assume the exit reason is always EPT
> VIOLATION and have
>
>   vmcs12->exit_qualification = fault->error_code;
>
> But fault->error_code is not in the exit qualification format but in
> the PFERR_* format, which has different meanings for the bits...
> Moreover, PFERR_RSVD_MASK should cause an EPT MISCONFIG, not EPT
> VIOLATION. Is this what you meant above?

In spirit yes.  In practice rather than translating from PFERR format to
EPT VIOLATION EXIT_QUALIFICATION format, walk_addr() should directly
compute the exit qualification (and an additional bit: whether it's an
EPT VIOLATION or EPT MISCONFIGURATION.

> I didn't quite understand what you meant in the 4th case about needing
> to compute ept12 permissions. I'm assuming that if the EPT violation
> was caused because L0 decreased permissions from what L1 thought, then L0
> will solve the problem itself and not inject it to L1. So if we are injecting
> the fault to L1, don't we already know the correct fault reason and don't
> need to compute it?

If we're injecting an EPT VIOLATION to L1 (because we weren't able to
resolve it; say L1 write-protected the page), then we need to compute
EXIT_QUALIFICATION.  Bits 3-5 of EXIT_QUALIFICATION are computed from
EPT12 paging structure entries (easy to derive them from
pt_access/pte_access).

>
> There's another complication: when the fault comes from an EPT violation
> in L2, handle_ept_violation() calls mmu.page_fault() with an error_code of
> exit_qualification & 0x3. This means that the error_code in this case is
> *not* in the expected PFERR_* format, and we need to know that in
> nested_ept_inject_page_fault. Moreover, in the original EPT visolation's
> exit qualification, there were various other bits which we lose (and don't
> have a direct parallel in PFERR_* anyway), so when we reinject the fault,
> L1 doesn't get them.

struct x86_exception already has 'bool nested', which indicates whether
it's an L1 or L2 fault.  You need to extend that, perhaps by adding
another bool, to distinguish between EPT VIOLATION and
EPT_QUALIFICATION.  The error_code field should be extended to 64 bits
for EXIT_QUALIFICATION (though only bits 0-12 are defined).  You need
another field for the guest linear address.

EXIT_QUALIFICATION has to be calculated, it cannot be derived from the
original exit.

Look at kvm_propagate_fault().

> What a mess :(

If you have a splitting headache, you're on the right track.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/10] nEPT: Nested EPT support for Nested VMX

2011-12-12 Thread Nadav Har'El
On Sun, Nov 13, 2011, Avi Kivity wrote about "Re: [PATCH 0/10] nEPT: Nested EPT 
support for Nested VMX":
> > I also believed that the fault injection part was also correct: I
> > thought that the code already knows when to handle the fault in L2 (when
> > the address is missing in cr3), in L1 (when the translation is missing
> > in EPT12) or else, in L0.
> 
> It does, but it needs to propagate the fault code correctly.  The exit
> reason (ept violation vs ept misconfiguration) is meaningless, since we
> don't encode anything about it from ept12 into ept02.  In particular an
> ept violation could lead to
> 
> - no fault, ept02 updated, instruction retried
> - no fault, instruction emulated
> - L2 fault
> - ept violation, need to compute ept12 permissions for exit qualification
> - ept misconfiguration
> 
> (the second and third cases occur when it is impossible to create an
> ept02 mapping - when L0 emulates a gpa that L1 assigns to L2 via ept12).

I'm now trying to figure out this part, and I think I am beginning to
understand the mess you are referring to:

In nested_ept_inject_page_fault I now assume the exit reason is always EPT
VIOLATION and have

vmcs12->exit_qualification = fault->error_code;

But fault->error_code is not in the exit qualification format but in
the PFERR_* format, which has different meanings for the bits...
Moreover, PFERR_RSVD_MASK should cause an EPT MISCONFIG, not EPT
VIOLATION. Is this what you meant above?

I didn't quite understand what you meant in the 4th case about needing
to compute ept12 permissions. I'm assuming that if the EPT violation
was caused because L0 decreased permissions from what L1 thought, then L0
will solve the problem itself and not inject it to L1. So if we are injecting
the fault to L1, don't we already know the correct fault reason and don't
need to compute it?

There's another complication: when the fault comes from an EPT violation
in L2, handle_ept_violation() calls mmu.page_fault() with an error_code of
exit_qualification & 0x3. This means that the error_code in this case is
*not* in the expected PFERR_* format, and we need to know that in
nested_ept_inject_page_fault. Moreover, in the original EPT visolation's
exit qualification, there were various other bits which we lose (and don't
have a direct parallel in PFERR_* anyway), so when we reinject the fault,
L1 doesn't get them.

What a mess :(

-- 
Nadav Har'El|Monday, Dec 12 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Hardware, n.: The parts of a computer
http://nadav.harel.org.il   |system that can be kicked.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-12-08 Thread Nadav Har'El
On Mon, Nov 14, 2011, Avi Kivity wrote about "Re: [PATCH 02/10] nEPT: MMU 
context for nested EPT":
> > >> +#if PTTYPE == EPT 
> > >> +real_gfn = mmu->translate_gpa(vcpu, 
> > >> gfn_to_gpa(table_gfn),
> > >> +  EPT_WRITABLE_MASK);
> > >> +#else
> > >>  real_gfn = mmu->translate_gpa(vcpu, 
> > >> gfn_to_gpa(table_gfn),
> > >>
> > >> PFERR_USER_MASK|PFERR_WRITE_MASK);
> > >> +#endif
> > >> +
> > > 
> > > Unneeded, I think.
> >
> > Is it because translate_nested_gpa always set USER_MASK ? 
> 
> Yes... maybe that function needs to do something like
> 
>access |= mmu->default_access;

Unless I'm misunderstanding something, translate_nested_gpa, and
gva_to_gpa, take as their "access" parameter a bitmask of PFERR_*,
so it's fine for PFERR_USER_MASK to be enabled in translate_nested_gpa;
It just shouldn't cause PT_USER_MASK to be used. The only additional
problem I can find is in walk_addr_generic which

does

if (!check_write_user_access(vcpu, write_fault, user_fault,
  pte))
eperm = true;

and that checks pte & PT_USER_MASK, which it shouldn't if
PTTYPE==PTTYPE_EPT.

It's really confusing that we now have in mmu.c no less than 4 (!)
access bit schemes, similar in many ways but different in many others:

1. page fault error codes (PFERR_*_MASK)
2. x86 page tables acess bits (PT_*_MASK)
3. KVM private access bits (ACC_*_MASK)
4. EPT access bits (VMX_EPT_*_MASK).

I just have to try hard not to confuse them.

-- 
Nadav Har'El|   Thursday, Dec 8 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Sorry, but my karma just ran over your
http://nadav.harel.org.il   |dogma.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-12-07 Thread Avi Kivity
On 12/07/2011 11:06 AM, Nadav Har'El wrote:
> On Sun, Nov 13, 2011, Orit Wasserman wrote about "Re: [PATCH 02/10] nEPT: MMU 
> context for nested EPT":
> > +++ b/arch/x86/kvm/mmu.h
> > @@ -48,6 +48,11 @@
> >  #define PFERR_RSVD_MASK (1U << 3)
> >  #define PFERR_FETCH_MASK (1U << 4)
> >  
> > +#define EPT_WRITABLE_MASK 2
> > +#define EPT_EXEC_MASK 4
>
> This is another example of the "unclean" movement of VMX-specific things into
> x86 :( We already have VMX_EPT_WRITABLE_MASK and friends in vmx.h. I'll
> need to think what is less ugly: to move them to mmu.h, or to include vmx.h
> in mmu.c, or perhaps even create a new include file, ept.h. Avi, do you have
> a preference?

Include vmx.h in mmu.c.  vmx.h is neutral wrt guestiness/hostiness, so
it can be included from mmu.c and vmx.c without issues.

> The last thing I want to do is to repeat the same definitions in two places.

Right.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-12-07 Thread Nadav Har'El
On Sun, Nov 13, 2011, Orit Wasserman wrote about "Re: [PATCH 02/10] nEPT: MMU 
context for nested EPT":
> +++ b/arch/x86/kvm/mmu.h
> @@ -48,6 +48,11 @@
>  #define PFERR_RSVD_MASK (1U << 3)
>  #define PFERR_FETCH_MASK (1U << 4)
>  
> +#define EPT_WRITABLE_MASK 2
> +#define EPT_EXEC_MASK 4

This is another example of the "unclean" movement of VMX-specific things into
x86 :( We already have VMX_EPT_WRITABLE_MASK and friends in vmx.h. I'll
need to think what is less ugly: to move them to mmu.h, or to include vmx.h
in mmu.c, or perhaps even create a new include file, ept.h. Avi, do you have
a preference?
The last thing I want to do is to repeat the same definitions in two places.

-- 
Nadav Har'El|  Wednesday, Dec 7 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |"A witty saying proves nothing." --
http://nadav.harel.org.il   |Voltaire
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-12-06 Thread Avi Kivity
On 12/06/2011 02:40 PM, Nadav Har'El wrote:
> On Sun, Nov 13, 2011, Avi Kivity wrote about "Re: [PATCH 02/10] nEPT: MMU 
> context for nested EPT":
> > On 11/13/2011 01:30 PM, Orit Wasserman wrote:
> > > Maybe this patch can help, this is roughly what Avi wants (I hope) done 
> > > very quickly.
> > > I'm sorry I don't have setup to run nested VMX at the moment so i can't 
> > > test it.
> >...
> > > +#define PTTYPE EPT
> > > +#include "paging_tmpl.h"
> > > +#undef PTTYPE
> > 
> > Yes, that's the key.
>
> I'm now preparing a patch based on such ideas.
>
> One downside of this approach is that mmu.c (and therefore the x86
> module) will now include EPT-specific functions that are of no use or
> relevance to the SVM code. It's not a terrible disaster, but it's
> "unclean". I'll try to think if there's a cleaner way.

I'm perfectly willing to live with this.

In general vmx.c and svm.c only deal with host-side differences between
Intel and AMD.  EPT support in paging.h is guest-side, so it doesn't
belong there.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-12-06 Thread Nadav Har'El
On Sun, Nov 13, 2011, Avi Kivity wrote about "Re: [PATCH 02/10] nEPT: MMU 
context for nested EPT":
> On 11/13/2011 01:30 PM, Orit Wasserman wrote:
> > Maybe this patch can help, this is roughly what Avi wants (I hope) done 
> > very quickly.
> > I'm sorry I don't have setup to run nested VMX at the moment so i can't 
> > test it.
>...
> > +#define PTTYPE EPT
> > +#include "paging_tmpl.h"
> > +#undef PTTYPE
> 
> Yes, that's the key.

I'm now preparing a patch based on such ideas.

One downside of this approach is that mmu.c (and therefore the x86
module) will now include EPT-specific functions that are of no use or
relevance to the SVM code. It's not a terrible disaster, but it's
"unclean". I'll try to think if there's a cleaner way.

Nadav.

-- 
Nadav Har'El|Tuesday, Dec 6 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |Writing software is like sex: One mistake
http://nadav.harel.org.il   |and you have to support it forever.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-11-24 Thread Avi Kivity
On 11/23/2011 05:44 PM, Nadav Har'El wrote:
> On Wed, Nov 23, 2011, Nadav Har'El wrote about "Re: [PATCH 02/10] nEPT: MMU 
> context for nested EPT":
> > > +static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
> > > +{
> > > + int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
> > > +
> > > + vcpu->arch.nested_mmu.gva_to_gpa = EPT_gva_to_gpa_nested;
> > > +
> > > + return r;
> > > +}
> >..
> > I didn't see you actually call this function anywhere - how is it
> > supposed to work?
> >..
> > It seems we need a fifth case in that function.
> >..
>
> On second thought, why is this modifying nested_mmu.gva_to_gpa, and not
> mmu.gva_to_gpa? Isn't the nested_mmu the L2 CR3, which is *not* in EPT
> format, and what we really want to change is the outer mmu, which is
> EPT12 and is indeed in EPT format?
> Or am I missing something?

I think you're right.  The key is to look at what ->walk_mmu points at.

-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-11-23 Thread Nadav Har'El
On Wed, Nov 23, 2011, Nadav Har'El wrote about "Re: [PATCH 02/10] nEPT: MMU 
context for nested EPT":
> > +static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
> > +{
> > +   int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
> > +
> > +   vcpu->arch.nested_mmu.gva_to_gpa = EPT_gva_to_gpa_nested;
> > +
> > +   return r;
> > +}
>..
> I didn't see you actually call this function anywhere - how is it
> supposed to work?
>..
> It seems we need a fifth case in that function.
>..

On second thought, why is this modifying nested_mmu.gva_to_gpa, and not
mmu.gva_to_gpa? Isn't the nested_mmu the L2 CR3, which is *not* in EPT
format, and what we really want to change is the outer mmu, which is
EPT12 and is indeed in EPT format?
Or am I missing something?

Thanks,
Nadav.

-- 
Nadav Har'El| Wednesday, Nov 23 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |My password is my dog's name. His name is
http://nadav.harel.org.il   |a#j!4@h, but I change it every month.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-11-23 Thread Nadav Har'El
On Sun, Nov 13, 2011, Orit Wasserman wrote about "Re: [PATCH 02/10] nEPT: MMU 
context for nested EPT":
> Maybe this patch can help, this is roughly what Avi wants (I hope) done very 
> quickly.
> I'm sorry I don't have setup to run nested VMX at the moment so i can't test 
> it.

Hi Orit, thanks for the code - I'm now working on incorporating
something based on this into my patch. However, I do have a question:

> +static int nested_ept_init_mmu_context(struct kvm_vcpu *vcpu)
> +{
> + int r = kvm_init_shadow_mmu(vcpu, &vcpu->arch.mmu);
> +
> + vcpu->arch.nested_mmu.gva_to_gpa = EPT_gva_to_gpa_nested;
> +
> + return r;
> +}

I didn't see you actually call this function anywhere - how is it
supposed to work?

The way I understand the current code, kvm_mmu_reset_context() calls
init_kvm_mmu() which (in our case) calls init_kvm_nested_mmu().
I think the above gva_to_gpa setting should be there - right?
It seems we need a fifth case in that function.
But at that point in mmu.c, how will I be able to check if this is the
nested EPT case? Do you have any suggestion?

Thanks,
Nadav.

-- 
Nadav Har'El| Wednesday, Nov 23 2011, 
n...@math.technion.ac.il |-
Phone +972-523-790466, ICQ 13349191 |This message contains 100% recycled
http://nadav.harel.org.il   |characters.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-11-14 Thread Avi Kivity
On 11/13/2011 08:26 PM, Orit Wasserman wrote:
> > 
> >>  int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 
> >> sptes[4]);
> >>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
> >>  int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
> >> direct);
> >> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> >> index 507e2b8..70d4cfd 100644
> >> --- a/arch/x86/kvm/paging_tmpl.h
> >> +++ b/arch/x86/kvm/paging_tmpl.h
> >> @@ -39,6 +39,21 @@
> >>#define CMPXCHG cmpxchg64
> >>#define PT_MAX_FULL_LEVELS 2
> >>#endif
> >> +#elif PTTYPE == EPT
> >> +  #define pt_element_t u64
> >> +  #define FNAME(name) EPT_##name
> >> +  #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
> >> +  #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
> >> +  #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
> >> +  #define PT_INDEX(addr, level) PT64_INDEX(addr, level)
> >> +  #define PT_LEVEL_BITS PT64_LEVEL_BITS
> >> +  #ifdef CONFIG_X86_64
> >> +  #define PT_MAX_FULL_LEVELS 4
> >> +  #define CMPXCHG cmpxchg
> >> +  #else
> >> +  #define CMPXCHG cmpxchg64
> >> +  #define PT_MAX_FULL_LEVELS 2
> >> +  #endif
> > 
> > The various masks should be defined here, to avoid lots of #ifdefs later.
> > 
>
> That what I did first but than I was afraid that the MASK will be changed for 
> mmu.c too.
> so I decided on ifdefs.
> The more I think about it I think we need rapper function for mask checking 
> (at least for this file).
> What do you think ?

Either should work, as long as the main logic is clean.

> >>for (;;) {
> >>gfn_t real_gfn;
> >> @@ -186,9 +215,14 @@ retry_walk:
> >>pte_gpa   = gfn_to_gpa(table_gfn) + offset;
> >>walker->table_gfn[walker->level - 1] = table_gfn;
> >>walker->pte_gpa[walker->level - 1] = pte_gpa;
> >> -
> >> +#if PTTYPE == EPT 
> >> +  real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> >> +EPT_WRITABLE_MASK);
> >> +#else
> >>real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> >>  PFERR_USER_MASK|PFERR_WRITE_MASK);
> >> +#endif
> >> +
> > 
> > Unneeded, I think.
>
> Is it because translate_nested_gpa always set USER_MASK ? 

Yes... maybe that function needs to do something like

   access |= mmu->default_access;



-- 
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-11-13 Thread Orit Wasserman
On 11/13/2011 04:32 PM, Avi Kivity wrote:
> On 11/13/2011 01:30 PM, Orit Wasserman wrote:
>> Maybe this patch can help, this is roughly what Avi wants (I hope) done very 
>> quickly.
>> I'm sorry I don't have setup to run nested VMX at the moment so i can't test 
>> it.
>>
>> Orit
>>
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index 9335e1b..bbe212f 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -3180,6 +3180,10 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, 
>> unsigned access,
>>  #include "paging_tmpl.h"
>>  #undef PTTYPE
>>  
>> +#define PTTYPE EPT
>> +#include "paging_tmpl.h"
>> +#undef PTTYPE
>> +
> 
> Yes, that's the key.
> 
> 
>>  int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 
>> sptes[4]);
>>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
>>  int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
>> direct);
>> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
>> index 507e2b8..70d4cfd 100644
>> --- a/arch/x86/kvm/paging_tmpl.h
>> +++ b/arch/x86/kvm/paging_tmpl.h
>> @@ -39,6 +39,21 @@
>>  #define CMPXCHG cmpxchg64
>>  #define PT_MAX_FULL_LEVELS 2
>>  #endif
>> +#elif PTTYPE == EPT
>> +#define pt_element_t u64
>> +#define FNAME(name) EPT_##name
>> +#define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
>> +#define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
>> +#define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
>> +#define PT_INDEX(addr, level) PT64_INDEX(addr, level)
>> +#define PT_LEVEL_BITS PT64_LEVEL_BITS
>> +#ifdef CONFIG_X86_64
>> +#define PT_MAX_FULL_LEVELS 4
>> +#define CMPXCHG cmpxchg
>> +#else
>> +#define CMPXCHG cmpxchg64
>> +#define PT_MAX_FULL_LEVELS 2
>> +#endif
> 
> The various masks should be defined here, to avoid lots of #ifdefs later.
> 

That what I did first but than I was afraid that the MASK will be changed for 
mmu.c too.
so I decided on ifdefs.
The more I think about it I think we need rapper function for mask checking (at 
least for this file).
What do you think ?


>>  #elif PTTYPE == 32
>>  #define pt_element_t u32
>>  #define guest_walker guest_walker32
>> @@ -106,14 +121,19 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu 
>> *vcpu, pt_element_t gpte,
>>  {
>>  unsigned access;
>>  
>> +#if PTTYPE == EPT   
>>  access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
>> +#else
>> +access = (gpte & EPT_WRITABLE_MASK) | EPT_EXEC_MASK;
>>  if (last && !is_dirty_gpte(gpte))
>>  access &= ~ACC_WRITE_MASK;
>> +#endif
> 
> Like here, you could make is_dirty_gpte() local to paging_tmpl()
> returning true for EPT and the dirty bit otherwise.
> 
> 
>>  
>>  #if PTTYPE == 64
>>  if (vcpu->arch.mmu.nx)
>>  access &= ~(gpte >> PT64_NX_SHIFT);
> 
> The ept X bit is lost.
> 
> Could do something like
> 
>access &= (gpte >> PT_X_NX_SHIFT) ^ PT_X_NX_SENSE;
> 
> 
>> +#if PTTYPE == EPT
>> +const int write_fault = access & EPT_WRITABLE_MASK;
>> +const int user_fault  = 0;
>> +const int fetch_fault = 0;
>> +#else
> 
> EPT has fetch permissions (but not user permissions); anyway
> translate_nested_gpa() already does this.
> 
>>  const int write_fault = access & PFERR_WRITE_MASK;
>>  const int user_fault  = access & PFERR_USER_MASK;
>>  const int fetch_fault = access & PFERR_FETCH_MASK;
>> +#endif
>>  u16 errcode = 0;
>>  
>>  trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
>> @@ -174,6 +200,9 @@ retry_walk:
>> (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
>>  
>>  pt_access = ACC_ALL;
>> +#if PTTYPE == EPT 
>> +pt_access =  PT_PRESENT_MASK | EPT_WRITABLE_MASK | EPT_EXEC_MASK;
>> +#endif
> 
> pt_access is not in EPT or ia32 format - it's our own format (xwu).  So
> this doesn't need changing.  Updating gpte_access() is sufficient.
> 
>>  
>>  for (;;) {
>>  gfn_t real_gfn;
>> @@ -186,9 +215,14 @@ retry_walk:
>>  pte_gpa   = gfn_to_gpa(table_gfn) + offset;
>>  walker->table_gfn[walker->level - 1] = table_gfn;
>>  walker->pte_gpa[walker->level - 1] = pte_gpa;
>> -
>> +#if PTTYPE == EPT 
>> +real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
>> +  EPT_WRITABLE_MASK);
>> +#else
>>  real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
>>PFERR_USER_MASK|PFERR_WRITE_MASK);
>> +#endif
>> +
> 
> Unneeded, I think.

Is it because translate_nested_gpa always set USER_MASK ? 

> 
>>  if (unlikely(real_gfn == UNMAPPED_GVA))
>>  goto error;
>>  real_gfn = gpa_to_gfn(real_gfn);
>> @@ -221,6 +255,7 @@ retry_walk:
>>  eperm = true;
>>  #endif
>>  
>> +#if PTTYPE != EPT
>>  if (!eperm && unlikely(!(pte & PT_ACCESSED_MASK))) {
>>

Re: [PATCH 02/10] nEPT: MMU context for nested EPT

2011-11-13 Thread Avi Kivity
On 11/13/2011 01:30 PM, Orit Wasserman wrote:
> Maybe this patch can help, this is roughly what Avi wants (I hope) done very 
> quickly.
> I'm sorry I don't have setup to run nested VMX at the moment so i can't test 
> it.
>
> Orit
>
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index 9335e1b..bbe212f 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -3180,6 +3180,10 @@ static bool sync_mmio_spte(u64 *sptep, gfn_t gfn, 
> unsigned access,
>  #include "paging_tmpl.h"
>  #undef PTTYPE
>  
> +#define PTTYPE EPT
> +#include "paging_tmpl.h"
> +#undef PTTYPE
> +

Yes, that's the key.


>  int kvm_mmu_get_spte_hierarchy(struct kvm_vcpu *vcpu, u64 addr, u64 
> sptes[4]);
>  void kvm_mmu_set_mmio_spte_mask(u64 mmio_mask);
>  int handle_mmio_page_fault_common(struct kvm_vcpu *vcpu, u64 addr, bool 
> direct);
> diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h
> index 507e2b8..70d4cfd 100644
> --- a/arch/x86/kvm/paging_tmpl.h
> +++ b/arch/x86/kvm/paging_tmpl.h
> @@ -39,6 +39,21 @@
>   #define CMPXCHG cmpxchg64
>   #define PT_MAX_FULL_LEVELS 2
>   #endif
> +#elif PTTYPE == EPT
> + #define pt_element_t u64
> + #define FNAME(name) EPT_##name
> + #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK
> + #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl)
> + #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl)
> + #define PT_INDEX(addr, level) PT64_INDEX(addr, level)
> + #define PT_LEVEL_BITS PT64_LEVEL_BITS
> + #ifdef CONFIG_X86_64
> + #define PT_MAX_FULL_LEVELS 4
> + #define CMPXCHG cmpxchg
> + #else
> + #define CMPXCHG cmpxchg64
> + #define PT_MAX_FULL_LEVELS 2
> + #endif

The various masks should be defined here, to avoid lots of #ifdefs later.

>  #elif PTTYPE == 32
>   #define pt_element_t u32
>   #define guest_walker guest_walker32
> @@ -106,14 +121,19 @@ static unsigned FNAME(gpte_access)(struct kvm_vcpu 
> *vcpu, pt_element_t gpte,
>  {
>   unsigned access;
>  
> +#if PTTYPE == EPT   
>   access = (gpte & (PT_WRITABLE_MASK | PT_USER_MASK)) | ACC_EXEC_MASK;
> +#else
> + access = (gpte & EPT_WRITABLE_MASK) | EPT_EXEC_MASK;
>   if (last && !is_dirty_gpte(gpte))
>   access &= ~ACC_WRITE_MASK;
> +#endif

Like here, you could make is_dirty_gpte() local to paging_tmpl()
returning true for EPT and the dirty bit otherwise.


>  
>  #if PTTYPE == 64
>   if (vcpu->arch.mmu.nx)
>   access &= ~(gpte >> PT64_NX_SHIFT);

The ept X bit is lost.

Could do something like

   access &= (gpte >> PT_X_NX_SHIFT) ^ PT_X_NX_SENSE;


> +#if PTTYPE == EPT
> + const int write_fault = access & EPT_WRITABLE_MASK;
> + const int user_fault  = 0;
> + const int fetch_fault = 0;
> +#else

EPT has fetch permissions (but not user permissions); anyway
translate_nested_gpa() already does this.

>   const int write_fault = access & PFERR_WRITE_MASK;
>   const int user_fault  = access & PFERR_USER_MASK;
>   const int fetch_fault = access & PFERR_FETCH_MASK;
> +#endif
>   u16 errcode = 0;
>  
>   trace_kvm_mmu_pagetable_walk(addr, write_fault, user_fault,
> @@ -174,6 +200,9 @@ retry_walk:
>  (mmu->get_cr3(vcpu) & CR3_NONPAE_RESERVED_BITS) == 0);
>  
>   pt_access = ACC_ALL;
> +#if PTTYPE == EPT 
> + pt_access =  PT_PRESENT_MASK | EPT_WRITABLE_MASK | EPT_EXEC_MASK;
> +#endif

pt_access is not in EPT or ia32 format - it's our own format (xwu).  So
this doesn't need changing.  Updating gpte_access() is sufficient.

>  
>   for (;;) {
>   gfn_t real_gfn;
> @@ -186,9 +215,14 @@ retry_walk:
>   pte_gpa   = gfn_to_gpa(table_gfn) + offset;
>   walker->table_gfn[walker->level - 1] = table_gfn;
>   walker->pte_gpa[walker->level - 1] = pte_gpa;
> -
> +#if PTTYPE == EPT 
> + real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> +   EPT_WRITABLE_MASK);
> +#else
>   real_gfn = mmu->translate_gpa(vcpu, gfn_to_gpa(table_gfn),
> PFERR_USER_MASK|PFERR_WRITE_MASK);
> +#endif
> +

Unneeded, I think.

>   if (unlikely(real_gfn == UNMAPPED_GVA))
>   goto error;
>   real_gfn = gpa_to_gfn(real_gfn);
> @@ -221,6 +255,7 @@ retry_walk:
>   eperm = true;
>  #endif
>  
> +#if PTTYPE != EPT
>   if (!eperm && unlikely(!(pte & PT_ACCESSED_MASK))) {
>   int ret;
>   trace_kvm_mmu_set_accessed_bit(table_gfn, index,
> @@ -235,7 +270,7 @@ retry_walk:
>   mark_page_dirty(vcpu->kvm, table_gfn);
>   pte |= PT_ACCESSED_MASK;
>   }
> -
> +#endif

If PT_ACCESSED_MASK is 0 for EPT, this goes away without #ifdef.

> +#if PTTYPE != EPT
>   /* check if the kernel is fetching from user page */
>   if (unlikely(pte

  1   2   >