Re: [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support

2014-06-10 Thread Christoffer Dall
On Tue, Jun 10, 2014 at 11:23:17AM -0700, Mario Smarduch wrote:
> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> > On Tue, Jun 03, 2014 at 04:19:27PM -0700, Mario Smarduch wrote:
> >> This patch adds support for handling 2nd stage page faults during 
> >> migration,
> >> it disables faulting in huge pages, and disolves huge pages to page tables.
> > 
> > s/disolves/dissolves/g
> Will do.
> > 
> >> In case migration is canceled huge pages will be used again.
> >>
> >> Signed-off-by: Mario Smarduch 
> >> ---
> >>  arch/arm/kvm/mmu.c |   36 ++--
> >>  1 file changed, 34 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> >> index 1c546c9..aca4fbf 100644
> >> --- a/arch/arm/kvm/mmu.c
> >> +++ b/arch/arm/kvm/mmu.c
> >> @@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >> phys_addr_t fault_ipa,
> >>struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
> >>struct vm_area_struct *vma;
> >>pfn_t pfn;
> >> +  /* Get logging status, if dirty_bitmap is not NULL then logging is on */
> >> +  bool logging_active = !!memslot->dirty_bitmap;
> > 
> >>  
> >>write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
> >>if (fault_status == FSC_PERM && !write_fault) {
> >> @@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >> phys_addr_t fault_ipa,
> >>spin_lock(&kvm->mmu_lock);
> >>if (mmu_notifier_retry(kvm, mmu_seq))
> >>goto out_unlock;
> >> -  if (!hugetlb && !force_pte)
> >> +
> >> +  /* When logging don't spend cycles to check for huge pages */
> > 
> > drop the comment: either explain the entire clause (which would be too
> > long) or don't explain anything.
> > 
> Ok.
> >> +  if (!hugetlb && !force_pte && !logging_active)
> > 
> > instead of having all this, can't you just change 
> > 
> > if (is_vm_hugetlb_page(vma)) to
> > if (is_vm_hugetlb_page(vma) && !logging_active)
> > 
> > then you're also not mucking around with the gfn etc.
> 
> I didn't want to modify this function too much, but if that's ok that 
> simplifies things a lot.
> 

Don't worry about the changes as much as the resulting code.  If
something requires a lot of refactoring, usually that can be handled by
splitting up renames, factoring out functions, etc. into multiple
smaller patches.

> > 
> >>hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
> >>  
> >> -  if (hugetlb) {
> >> +  /*
> >> +   * Force all not present/perm faults to PTE handling, address both
> >> +   * PMD and PTE faults
> >> +   */
> > 
> > I don't understand this comment?  In which case does this apply?
> > 
> The cases I see here -
> - huge page permission fault is forced into page table code while logging
> - pte permission/not present handled by page table code as before.

Hmm, the wording doesn't really work for me.  I don't think this comment
adds anything or is required, when getting this deep into the fault
handler etc., one better understand what's going on.

The most suitable place for a comment in this work is probably in
stage2_set_pte() where you can now detect a kvm_pmd_huge(), when you add
that, you may want to add a small comment that this only happens when
logging dirty pages.

> >> +  if (hugetlb && !logging_active) {
> >>pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
> >>new_pmd = pmd_mkhuge(new_pmd);
> >>if (writable) {
> >> @@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
> >> phys_addr_t fault_ipa,
> >>} else {
> >>pte_t new_pte = pfn_pte(pfn, PAGE_S2);
> >>if (writable) {
> >> +  /*
> >> +   * If pmd is  mapping a huge page then clear it and let
> >> +   * stage2_set_pte() create a pte table. At the sametime
> >> +   * you write protect the pte (PAGE_S2 pgprot_t).
> >> +   */
> >> +  if (logging_active) {
> >> +  pmd_t *pmd;
> >> +  if (hugetlb) {
> >> +  pfn += pte_index(fault_ipa);
> >> +  gfn = fault_ipa >> PAGE_SHIFT;
> >> +  new_pte = pfn_pte(pfn, PAGE_S2);
> >> +  }
> >> +  pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
> >> +  if (pmd && kvm_pmd_huge(*pmd))
> >> +  clear_pmd_entry(kvm, pmd, fault_ipa);
> >> +  }
> > 
> > now instead of all this, you just need to check for kvm_pmd_huge() in
> > stage2_set_pte() and if that's true, you clear it, and then then install
> > your new pte.
> 
> Yes this really simplifies things!
> 
> > 
> >>kvm_set_s2pte_writable(&new_pte);
> >>kvm_set_pfn_dirty(pfn);
> >>}
> >> @@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 

Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Christoffer Dall
On Tue, Jun 10, 2014 at 09:18:34PM +0200, Paolo Bonzini wrote:
> Il 10/06/2014 20:56, Paolo Bonzini ha scritto:
> >Il 10/06/2014 20:08, Peter Maydell ha scritto:
> >>On 10 June 2014 18:04, Christopher Covington  wrote:
> >>>On 06/10/2014 10:42 AM, Peter Maydell wrote:
> I just noticed that this doesn't mandate that the platform
> provides an RTC. As I understand it, the UEFI spec mandates
> that there's an RTC (could somebody more familiar with UEFI
> than me confirm/deny that?) so we should probably put one here.
> >>>
> >>>Pardon my ignorance, but what exactly disqualifies Generic Timer
> >>>implementations from being used as Real Time Clocks?
> >>
> >>So my naive view was that an RTC actually had to have
> >>support for dealing with real (wall) clock time, ie
> >>knowing it's 2014 and not 1970. The generic timers are
> >>just timers. Or am I wrong and UEFI doesn't really
> >>require that?
> >
> >The real-time clock provides four UEFI runtime services (GetTime,
> >SetTime, GetWakeupTime, SetWakeupTime).  The spec says that you can
> >return EFI_DEVICE_ERROR from GetTime/SetTime if "the time could not be
> >retrieved/set due to a hardware error", but I don't think this is enough
> >to make these two optional.  By comparison, GetWakeupTime/SetWakeupTime
> >can also return EFI_UNSUPPORTED.
> >
> >So I agree that the RTC is required in UEFI.
> 
> ... that said, just like I thought was the case for the serial
> console, do we need to specify the exact hardware models?
> 
> We can just say that the VM can expect UEFI boot and runtime
> services to work.  This includes variable services, time services,
> the serial console protocols and more.  It's up to the
> implementation to provide enough devices to support the firmware,
> and it's out of this spec's scope to specify the firmware's
> implementation.
> 
> I think even the serial devices should be removed.
> 
The problem is that the most common user problem with ARM VMs are that
they boot the thing, and then get no output.  So we wanted some way to
make sure we know that the kernel should be able to print to a console.

UEFI does provide DBG2 output, but that's only during boot time service
(so I'm told), and we need to mandate something that will work  when
the kernel boots.

If kernels actually do use the UEFI runtime services and have no need
for direct access to an RTC when runing in a UEFI compliant system, then
I agree with not specifying the hardware details.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Jidong Xiao
On Wed, Jun 11, 2014 at 2:27 AM, Zhang Haoyu  wrote:
>>> Hi,
>>>
>>> According to this:
>>>
>>> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
>>>
>>> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
>>> even earlier.
>>>
>> This patch is to emulate x2apic for guest, then guest can benefit from 
>> some advantages of x2apic, like using RDMSR/WRMSR instead of mmio.
>>
>>> However, this following patch:
>>>
>>> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
>>>
>>> Also claims that adding support for x2apic. But this later patch was
>>> for kernel 3.9.
>> This patch is to support virtual x2apic mode, which can virtualizing 
>> MSR-based APIC accesses by configuring MSR bitmaps,
>> you can specify some MSR-based accesses without VM exit, the other 
>> MSR-based accesses with VM exit,
>> which belongs to APIC virtualization from certain angle.
> Thanks Haoyu, I am reading the Intel x2apic manual.
>
> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>
> but I don't see the so called "virtual x2apic mode", it seems that the
> manual does not mention anything about that. Do you mean that when the
> Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
> IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
> this the same as the aforementioned "emulate x2apic for guest"? Or,
> "emulate x2apic for guest" is the foundation of "support virtual
> x2apic mode"?
>
> -Jidong
>
Oh, I found the answer, the virtualized x2apic mode is not defined in
the x2apic manual, but is defined in the intel SDM manual, chapter 29
- APIC virtualization and virtual interrupts. Ignore my questions
please, but thank you anyway.

>>> Yes, virtual x2apic mode is part of APICv.
>>> Guest has no idea about it is running in a virtual machine,
>>> VMM also prevent guest from enabling/disabling virtual x2apic mode or some 
>>> other virtualization configurations.
>>>
>>
>>Regarding this "VMM also prevent guest from enabling/disabling virtual
>>x2apic mode", do you mean that guest is not allowed to do something
>>like:
>>
>>Assuming we are in the guest, and we see the original value of the
>>IA32_APICBASE MSR is 0xfee00900, which means EN bit and BSP bit was
>>enabled. And now, we try to set bit 10 (EXTD bit):
>>
>>wrmsr 0x1b 0xfee00d00.
>>
>>Do you mean this wrmsr operation is not allowed? If fact, I tried
>>this, and this simply crashes the guest kernel. (I am using Linux
>>2.6.34 as the guest kernel, and Linux 3.14 as the host kernel.) And
>>this is the exact reason that I initialized this whole discussion. My
>>feeling is this might be a kvm bug, so I wish to figure out and try to
>>fix it.
>>
> The MSR address range 800H through BFFH is architecturally reserved and 
> dedicated for accessing APIC registers in x2APIC mode,
> you are writing the wrong MSR address.
> Please see SDN 3A 10.12.1.2 x2APIC Register Address Space, you will get the 
> answer.
>
Oh, I was writing to 0x1b, which is the address of IA32_APIC_BASE MSR.
In particular I was trying to write bit 10, which is the x2APIC
enable/disable bit. As I said, I just wish to enable that x2APIC. The
original value 0xfee00900 suggests x2APIC is not enabled, i.e., bit 10
is 0. According to SDM manual, this bit has to be 1 if we want to
enable x2APIC.

>>>
>>> So I am very confused:
>>>
>>> First, what's the difference between these two patches? Or say, does
>>> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
>>>
>> kvm support x2apic since kernel 2.6.32, but not support virtual x2apic 
>> mode until kernel 3.9.
>>
>>> Second, can guest use x2apic even if the host does not? (Assuming qemu
>>>has exposed this feature to guest.) The word "use" means something
>>> like, accessing x2apic registers, setting the x2apic enable bit in
>>> IA32_APICBASE MSR (i.e. bit 10).
>>>
>> Yes, you can use x2apic even if the PCPU does not support, and benefit 
>> from it, like performance bonus from MSR accesses instead of MMIO.
>> But, if I remember correctly, if your guest does not support 
>> interrupt-remmping, you only can use physical destination mode when 
>> using x2apic,
>> please see enable_IR_x2apic().
>>
>>> -Jidong
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Zhang Haoyu
>> Hi,
>>
>> According to this:
>>
>> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
>>
>> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
>> even earlier.
>>
> This patch is to emulate x2apic for guest, then guest can benefit from 
> some advantages of x2apic, like using RDMSR/WRMSR instead of mmio.
>
>> However, this following patch:
>>
>> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
>>
>> Also claims that adding support for x2apic. But this later patch was
>> for kernel 3.9.
> This patch is to support virtual x2apic mode, which can virtualizing 
> MSR-based APIC accesses by configuring MSR bitmaps,
> you can specify some MSR-based accesses without VM exit, the other 
> MSR-based accesses with VM exit,
> which belongs to APIC virtualization from certain angle.
 Thanks Haoyu, I am reading the Intel x2apic manual.

 http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf

 but I don't see the so called "virtual x2apic mode", it seems that the
 manual does not mention anything about that. Do you mean that when the
 Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
 IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
 this the same as the aforementioned "emulate x2apic for guest"? Or,
 "emulate x2apic for guest" is the foundation of "support virtual
 x2apic mode"?

 -Jidong

>>>Oh, I found the answer, the virtualized x2apic mode is not defined in
>>>the x2apic manual, but is defined in the intel SDM manual, chapter 29
>>>- APIC virtualization and virtual interrupts. Ignore my questions
>>>please, but thank you anyway.
>>>
>> Yes, virtual x2apic mode is part of APICv.
>> Guest has no idea about it is running in a virtual machine,
>> VMM also prevent guest from enabling/disabling virtual x2apic mode or some 
>> other virtualization configurations.
>>
>
>Regarding this "VMM also prevent guest from enabling/disabling virtual
>x2apic mode", do you mean that guest is not allowed to do something
>like:
>
>Assuming we are in the guest, and we see the original value of the
>IA32_APICBASE MSR is 0xfee00900, which means EN bit and BSP bit was
>enabled. And now, we try to set bit 10 (EXTD bit):
>
>wrmsr 0x1b 0xfee00d00.
>
>Do you mean this wrmsr operation is not allowed? If fact, I tried
>this, and this simply crashes the guest kernel. (I am using Linux
>2.6.34 as the guest kernel, and Linux 3.14 as the host kernel.) And
>this is the exact reason that I initialized this whole discussion. My
>feeling is this might be a kvm bug, so I wish to figure out and try to
>fix it.
>
The MSR address range 800H through BFFH is architecturally reserved and 
dedicated for accessing APIC registers in x2APIC mode,
you are writing the wrong MSR address.
Please see SDN 3A 10.12.1.2 x2APIC Register Address Space, you will get the 
answer.

>>
>> So I am very confused:
>>
>> First, what's the difference between these two patches? Or say, does
>> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
>>
> kvm support x2apic since kernel 2.6.32, but not support virtual x2apic 
> mode until kernel 3.9.
>
>> Second, can guest use x2apic even if the host does not? (Assuming qemu
>>has exposed this feature to guest.) The word "use" means something
>> like, accessing x2apic registers, setting the x2apic enable bit in
>> IA32_APICBASE MSR (i.e. bit 10).
>>
> Yes, you can use x2apic even if the PCPU does not support, and benefit 
> from it, like performance bonus from MSR accesses instead of MMIO.
> But, if I remember correctly, if your guest does not support 
> interrupt-remmping, you only can use physical destination mode when using 
> x2apic,
> please see enable_IR_x2apic().
>
>> -Jidong

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Jidong Xiao
On Wed, Jun 11, 2014 at 1:08 AM, Jidong Xiao  wrote:
> On Wed, Jun 11, 2014 at 12:53 AM, Zhang Haoyu  wrote:
>> Hi,
>>
>> According to this:
>>
>> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
>>
>> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
>> even earlier.
>>
> This patch is to emulate x2apic for guest, then guest can benefit from 
> some advantages of x2apic, like using RDMSR/WRMSR instead of mmio.
>
>> However, this following patch:
>>
>> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
>>
>> Also claims that adding support for x2apic. But this later patch was
>> for kernel 3.9.
> This patch is to support virtual x2apic mode, which can virtualizing 
> MSR-based APIC accesses by configuring MSR bitmaps,
> you can specify some MSR-based accesses without VM exit, the other 
> MSR-based accesses with VM exit,
> which belongs to APIC virtualization from certain angle.
 Thanks Haoyu, I am reading the Intel x2apic manual.

 http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf

 but I don't see the so called "virtual x2apic mode", it seems that the
 manual does not mention anything about that. Do you mean that when the
 Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
 IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
 this the same as the aforementioned "emulate x2apic for guest"? Or,
 "emulate x2apic for guest" is the foundation of "support virtual
 x2apic mode"?

 -Jidong

>>>Oh, I found the answer, the virtualized x2apic mode is not defined in
>>>the x2apic manual, but is defined in the intel SDM manual, chapter 29
>>>- APIC virtualization and virtual interrupts. Ignore my questions
>>>please, but thank you anyway.
>>>
>> Yes, virtual x2apic mode is part of APICv.
>> Guest has no idea about it is running in a virtual machine,
>> VMM also prevent guest from enabling/disabling virtual x2apic mode or some 
>> other virtualization configurations.
>>
>
> Regarding this "VMM also prevent guest from enabling/disabling virtual
> x2apic mode", do you mean that guest is not allowed to do something
> like:
>
> Assuming we are in the guest, and we see the original value of the
> IA32_APICBASE MSR is 0xfee00900, which means EN bit and BSP bit was
> enabled. And now, we try to set bit 10 (EXTD bit):
>
> wrmsr 0x1b 0xfee00d00.
>
> Do you mean this wrmsr operation is not allowed? If fact, I tried
> this, and this simply crashes the guest kernel. (I am using Linux
> 2.6.34 as the guest kernel, and Linux 3.14 as the host kernel.) And
> this is the exact reason that I initialized this whole discussion. My
> feeling is this might be a kvm bug, so I wish to figure out and try to
> fix it.
>

Can someone test this also? Just set bit 10 of IA32_APICBASE MSR in
your guest, and see if the guest crashes. According to Intel SDM
manual, this is the approach to enable x2apic mode. And I think
normally it should not crash the guest kernel.

One can write that msr with the following tool:

https://01.org/msr-tools

And compile this tool and just run the aforementioned command:

wrmsr 0x1b 0xfee00d00 (rdmsr 0x1b to confirm the original value is 0xfee00900.)

I tried on several machines, and this command just crash the kernel.
And I am not sure why.

>>
>> So I am very confused:
>>
>> First, what's the difference between these two patches? Or say, does
>> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
>>
> kvm support x2apic since kernel 2.6.32, but not support virtual x2apic 
> mode until kernel 3.9.
>
>> Second, can guest use x2apic even if the host does not? (Assuming qemu
>>has exposed this feature to guest.) The word "use" means something
>> like, accessing x2apic registers, setting the x2apic enable bit in
>> IA32_APICBASE MSR (i.e. bit 10).
>>
> Yes, you can use x2apic even if the PCPU does not support, and benefit 
> from it, like performance bonus from MSR accesses instead of MMIO.
> But, if I remember correctly, if your guest does not support 
> interrupt-remmping, you only can use physical destination mode when using 
> x2apic,
> please see enable_IR_x2apic().
>
>> -Jidong
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Jidong Xiao
On Wed, Jun 11, 2014 at 12:53 AM, Zhang Haoyu  wrote:
> Hi,
>
> According to this:
>
> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
>
> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
> even earlier.
>
 This patch is to emulate x2apic for guest, then guest can benefit from 
 some advantages of x2apic, like using RDMSR/WRMSR instead of mmio.

> However, this following patch:
>
> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
>
> Also claims that adding support for x2apic. But this later patch was
> for kernel 3.9.
 This patch is to support virtual x2apic mode, which can virtualizing 
 MSR-based APIC accesses by configuring MSR bitmaps,
 you can specify some MSR-based accesses without VM exit, the other 
 MSR-based accesses with VM exit,
 which belongs to APIC virtualization from certain angle.
>>> Thanks Haoyu, I am reading the Intel x2apic manual.
>>>
>>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>>>
>>> but I don't see the so called "virtual x2apic mode", it seems that the
>>> manual does not mention anything about that. Do you mean that when the
>>> Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
>>> IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
>>> this the same as the aforementioned "emulate x2apic for guest"? Or,
>>> "emulate x2apic for guest" is the foundation of "support virtual
>>> x2apic mode"?
>>>
>>> -Jidong
>>>
>>Oh, I found the answer, the virtualized x2apic mode is not defined in
>>the x2apic manual, but is defined in the intel SDM manual, chapter 29
>>- APIC virtualization and virtual interrupts. Ignore my questions
>>please, but thank you anyway.
>>
> Yes, virtual x2apic mode is part of APICv.
> Guest has no idea about it is running in a virtual machine,
> VMM also prevent guest from enabling/disabling virtual x2apic mode or some 
> other virtualization configurations.
>

Regarding this "VMM also prevent guest from enabling/disabling virtual
x2apic mode", do you mean that guest is not allowed to do something
like:

Assuming we are in the guest, and we see the original value of the
IA32_APICBASE MSR is 0xfee00900, which means EN bit and BSP bit was
enabled. And now, we try to set bit 10 (EXTD bit):

wrmsr 0x1b 0xfee00d00.

Do you mean this wrmsr operation is not allowed? If fact, I tried
this, and this simply crashes the guest kernel. (I am using Linux
2.6.34 as the guest kernel, and Linux 3.14 as the host kernel.) And
this is the exact reason that I initialized this whole discussion. My
feeling is this might be a kvm bug, so I wish to figure out and try to
fix it.

>
> So I am very confused:
>
> First, what's the difference between these two patches? Or say, does
> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
>
 kvm support x2apic since kernel 2.6.32, but not support virtual x2apic 
 mode until kernel 3.9.

> Second, can guest use x2apic even if the host does not? (Assuming qemu
>has exposed this feature to guest.) The word "use" means something
> like, accessing x2apic registers, setting the x2apic enable bit in
> IA32_APICBASE MSR (i.e. bit 10).
>
 Yes, you can use x2apic even if the PCPU does not support, and benefit 
 from it, like performance bonus from MSR accesses instead of MMIO.
 But, if I remember correctly, if your guest does not support 
 interrupt-remmping, you only can use physical destination mode when using 
 x2apic,
 please see enable_IR_x2apic().

> -Jidong
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Zhang Haoyu
 Hi,

 According to this:

 https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd

 It looks like kvm have been supporting x2apic since kernel 2.6.32, or
 even earlier.

>>> This patch is to emulate x2apic for guest, then guest can benefit from some 
>>> advantages of x2apic, like using RDMSR/WRMSR instead of mmio.
>>>
 However, this following patch:

 https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5

 Also claims that adding support for x2apic. But this later patch was
 for kernel 3.9.
>>> This patch is to support virtual x2apic mode, which can virtualizing 
>>> MSR-based APIC accesses by configuring MSR bitmaps,
>>> you can specify some MSR-based accesses without VM exit, the other 
>>> MSR-based accesses with VM exit,
>>> which belongs to APIC virtualization from certain angle.
>> Thanks Haoyu, I am reading the Intel x2apic manual.
>>
>> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>>
>> but I don't see the so called "virtual x2apic mode", it seems that the
>> manual does not mention anything about that. Do you mean that when the
>> Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
>> IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
>> this the same as the aforementioned "emulate x2apic for guest"? Or,
>> "emulate x2apic for guest" is the foundation of "support virtual
>> x2apic mode"?
>>
>> -Jidong
>>
>Oh, I found the answer, the virtualized x2apic mode is not defined in
>the x2apic manual, but is defined in the intel SDM manual, chapter 29
>- APIC virtualization and virtual interrupts. Ignore my questions
>please, but thank you anyway.
>
Yes, virtual x2apic mode is part of APICv.
Guest has no idea about it is running in a virtual machine, 
VMM also prevent guest from enabling/disabling virtual x2apic mode or some 
other virtualization configurations.

>-Jidong
>

 So I am very confused:

 First, what's the difference between these two patches? Or say, does
 kvm support x2apic since kernel 2.6.32, or since kernel 3.9?

>>> kvm support x2apic since kernel 2.6.32, but not support virtual x2apic mode 
>>> until kernel 3.9.
>>>
 Second, can guest use x2apic even if the host does not? (Assuming qemu
has exposed this feature to guest.) The word "use" means something
 like, accessing x2apic registers, setting the x2apic enable bit in
 IA32_APICBASE MSR (i.e. bit 10).

>>> Yes, you can use x2apic even if the PCPU does not support, and benefit from 
>>> it, like performance bonus from MSR accesses instead of MMIO.
>>> But, if I remember correctly, if your guest does not support 
>>> interrupt-remmping, you only can use physical destination mode when using 
>>> x2apic,
>>> please see enable_IR_x2apic().
>>>
 -Jidong

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Jidong Xiao
On Tue, Jun 10, 2014 at 11:54 PM, Jidong Xiao  wrote:
> On Tue, Jun 10, 2014 at 9:33 PM, Zhang Haoyu  wrote:
>>> Hi,
>>>
>>> According to this:
>>>
>>> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
>>>
>>> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
>>> even earlier.
>>>
>> This patch is to emulate x2apic for guest, then guest can benefit from some 
>> advantages of x2apic, like using RDMSR/WRMSR instead of mmio.
>>
>>> However, this following patch:
>>>
>>> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
>>>
>>> Also claims that adding support for x2apic. But this later patch was
>>> for kernel 3.9.
>> This patch is to support virtual x2apic mode, which can virtualizing 
>> MSR-based APIC accesses by configuring MSR bitmaps,
>> you can specify some MSR-based accesses without VM exit, the other MSR-based 
>> accesses with VM exit,
>> which belongs to APIC virtualization from certain angle.
> Thanks Haoyu, I am reading the Intel x2apic manual.
>
> http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf
>
> but I don't see the so called "virtual x2apic mode", it seems that the
> manual does not mention anything about that. Do you mean that when the
> Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
> IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
> this the same as the aforementioned "emulate x2apic for guest"? Or,
> "emulate x2apic for guest" is the foundation of "support virtual
> x2apic mode"?
>
> -Jidong
>
Oh, I found the answer, the virtualized x2apic mode is not defined in
the x2apic manual, but is defined in the intel SDM manual, chapter 29
- APIC virtualization and virtual interrupts. Ignore my questions
please, but thank you anyway.

-Jidong

>>>
>>> So I am very confused:
>>>
>>> First, what's the difference between these two patches? Or say, does
>>> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
>>>
>> kvm support x2apic since kernel 2.6.32, but not support virtual x2apic mode 
>> until kernel 3.9.
>>
>>> Second, can guest use x2apic even if the host does not? (Assuming qemu
>>>has exposed this feature to guest.) The word "use" means something
>>> like, accessing x2apic registers, setting the x2apic enable bit in
>>> IA32_APICBASE MSR (i.e. bit 10).
>>>
>> Yes, you can use x2apic even if the PCPU does not support, and benefit from 
>> it, like performance bonus from MSR accesses instead of MMIO.
>> But, if I remember correctly, if your guest does not support 
>> interrupt-remmping, you only can use physical destination mode when using 
>> x2apic,
>> please see enable_IR_x2apic().
>>
>>> -Jidong
>>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Jidong Xiao
On Tue, Jun 10, 2014 at 9:33 PM, Zhang Haoyu  wrote:
>> Hi,
>>
>> According to this:
>>
>> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
>>
>> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
>> even earlier.
>>
> This patch is to emulate x2apic for guest, then guest can benefit from some 
> advantages of x2apic, like using RDMSR/WRMSR instead of mmio.
>
>> However, this following patch:
>>
>> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
>>
>> Also claims that adding support for x2apic. But this later patch was
>> for kernel 3.9.
> This patch is to support virtual x2apic mode, which can virtualizing 
> MSR-based APIC accesses by configuring MSR bitmaps,
> you can specify some MSR-based accesses without VM exit, the other MSR-based 
> accesses with VM exit,
> which belongs to APIC virtualization from certain angle.
Thanks Haoyu, I am reading the Intel x2apic manual.

http://www.intel.com/content/dam/doc/specification-update/64-architecture-x2apic-specification.pdf

but I don't see the so called "virtual x2apic mode", it seems that the
manual does not mention anything about that. Do you mean that when the
Guest OS sets bit 10 (x2apic mode enable bit) of the it virtualized
IA32_APIC_BASE MSR, it is entering a virtual x2apic mode? But isn't
this the same as the aforementioned "emulate x2apic for guest"? Or,
"emulate x2apic for guest" is the foundation of "support virtual
x2apic mode"?

-Jidong

>>
>> So I am very confused:
>>
>> First, what's the difference between these two patches? Or say, does
>> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
>>
> kvm support x2apic since kernel 2.6.32, but not support virtual x2apic mode 
> until kernel 3.9.
>
>> Second, can guest use x2apic even if the host does not? (Assuming qemu
>>has exposed this feature to guest.) The word "use" means something
>> like, accessing x2apic registers, setting the x2apic enable bit in
>> IA32_APICBASE MSR (i.e. bit 10).
>>
> Yes, you can use x2apic even if the PCPU does not support, and benefit from 
> it, like performance bonus from MSR accesses instead of MMIO.
> But, if I remember correctly, if your guest does not support 
> interrupt-remmping, you only can use physical destination mode when using 
> x2apic,
> please see enable_IR_x2apic().
>
>> -Jidong
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: How does kvm support x2apic?

2014-06-10 Thread Zhang Haoyu
> Hi,
>
> According to this:
> 
> https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd
> 
> It looks like kvm have been supporting x2apic since kernel 2.6.32, or
> even earlier.
>
This patch is to emulate x2apic for guest, then guest can benefit from some 
advantages of x2apic, like using RDMSR/WRMSR instead of mmio.

> However, this following patch:
> 
> https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5
> 
> Also claims that adding support for x2apic. But this later patch was
> for kernel 3.9.
This patch is to support virtual x2apic mode, which can virtualizing MSR-based 
APIC accesses by configuring MSR bitmaps,
you can specify some MSR-based accesses without VM exit, the other MSR-based 
accesses with VM exit,
which belongs to APIC virtualization from certain angle.
>
> So I am very confused:
> 
> First, what's the difference between these two patches? Or say, does
> kvm support x2apic since kernel 2.6.32, or since kernel 3.9?
> 
kvm support x2apic since kernel 2.6.32, but not support virtual x2apic mode 
until kernel 3.9.

> Second, can guest use x2apic even if the host does not? (Assuming qemu
>has exposed this feature to guest.) The word "use" means something
> like, accessing x2apic registers, setting the x2apic enable bit in
> IA32_APICBASE MSR (i.e. bit 10).
> 
Yes, you can use x2apic even if the PCPU does not support, and benefit from it, 
like performance bonus from MSR accesses instead of MMIO.
But, if I remember correctly, if your guest does not support 
interrupt-remmping, you only can use physical destination mode when using 
x2apic,
please see enable_IR_x2apic().

> -Jidong

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Using virtio for inter-VM communication

2014-06-10 Thread Vincent JARDIN

On 10/06/2014 18:48, Henning Schild wrote:> Hi,
> In a first prototype i implemented a ivshmem[2] device for the
> hypervisor. That way we can share memory between virtual machines.
> Ivshmem is nice and simple but does not seem to be used anymore.
> And it
> does not define higher level devices, like a console.

FYI, ivhsmem is used here:
  http://dpdk.org/browse/memnic/tree/

http://dpdk.org/browse/memnic/tree/pmd/pmd_memnic.c#n449

There are some few other references too, if needed.

Best regards,
  Vincent

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm: Implement PEBS virtualization

2014-06-10 Thread Marcelo Tosatti
On Tue, Jun 10, 2014 at 12:22:07PM -0700, Andi Kleen wrote:
> On Tue, Jun 10, 2014 at 03:04:48PM -0300, Marcelo Tosatti wrote:
> > On Thu, May 29, 2014 at 06:12:07PM -0700, Andi Kleen wrote:
> > >  {
> > >   struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > > @@ -407,6 +551,20 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
> > > msr_data *msr_info)
> > >   return 0;
> > >   }
> > >   break;
> > > + case MSR_IA32_DS_AREA:
> > > + pmu->ds_area = data;
> > > + return 0;
> > > + case MSR_IA32_PEBS_ENABLE:
> > > + if (data & ~0xf000fULL)
> > > + break;
> > 
> > Bit 63 == PS_ENABLE ?
> 
> PEBS_EN is [3:0] for each counter, but only one bit on Silvermont.
> LL_EN is [36:32], but currently unused.
> 
> > 
> > >  void kvm_handle_pmu_event(struct kvm_vcpu *vcpu)
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 33e8c02..4f39917 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -7288,6 +7288,12 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> > > *vcpu)
> > >   atomic_switch_perf_msrs(vmx);
> > >   debugctlmsr = get_debugctlmsr();
> > >  
> > > + /* Move this somewhere else? */
> > 
> > Unless you hook into vcpu->arch.pmu.ds_area and perf_get_ds_area()
> > writers, it has to be at every vcpu entry.
> > 
> > Could compare values in MSR save area to avoid switch.
> 
> Ok.
> 
> > 
> > > + if (vcpu->arch.pmu.ds_area)
> > > + add_atomic_switch_msr(vmx, MSR_IA32_DS_AREA,
> > > +   vcpu->arch.pmu.ds_area,
> > > +   perf_get_ds_area());
> > 
> > Should clear_atomic_switch_msr before 
> > add_atomic_switch_msr.
> 
> Ok.
> 
> BTW how about general PMU migration? As far as I can tell there 
> is no code to save/restore the state for that currently, right?
> 
> -Andi

Paolo wrote support for it, recently. Paolo?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm: Implement PEBS virtualization

2014-06-10 Thread Andi Kleen
On Tue, Jun 10, 2014 at 03:04:48PM -0300, Marcelo Tosatti wrote:
> On Thu, May 29, 2014 at 06:12:07PM -0700, Andi Kleen wrote:
> >  {
> > struct kvm_pmu *pmu = &vcpu->arch.pmu;
> > @@ -407,6 +551,20 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
> > msr_data *msr_info)
> > return 0;
> > }
> > break;
> > +   case MSR_IA32_DS_AREA:
> > +   pmu->ds_area = data;
> > +   return 0;
> > +   case MSR_IA32_PEBS_ENABLE:
> > +   if (data & ~0xf000fULL)
> > +   break;
> 
> Bit 63 == PS_ENABLE ?

PEBS_EN is [3:0] for each counter, but only one bit on Silvermont.
LL_EN is [36:32], but currently unused.

> 
> >  void kvm_handle_pmu_event(struct kvm_vcpu *vcpu)
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 33e8c02..4f39917 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -7288,6 +7288,12 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> > *vcpu)
> > atomic_switch_perf_msrs(vmx);
> > debugctlmsr = get_debugctlmsr();
> >  
> > +   /* Move this somewhere else? */
> 
> Unless you hook into vcpu->arch.pmu.ds_area and perf_get_ds_area()
> writers, it has to be at every vcpu entry.
> 
> Could compare values in MSR save area to avoid switch.

Ok.

> 
> > +   if (vcpu->arch.pmu.ds_area)
> > +   add_atomic_switch_msr(vmx, MSR_IA32_DS_AREA,
> > + vcpu->arch.pmu.ds_area,
> > + perf_get_ds_area());
> 
> Should clear_atomic_switch_msr before 
> add_atomic_switch_msr.

Ok.

BTW how about general PMU migration? As far as I can tell there 
is no code to save/restore the state for that currently, right?

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Arnd Bergmann
On Tuesday 10 June 2014 18:44:59 Claudio Fontana wrote:
> I just wanted to share with you guys how we are using virtualization
> on ARM64 over here for the OSv project.
> By skipping steps like UEFI, grub, firmware load, etc we strive to
> keep our application launch time low.
> Is this going to create problems for us in the future if you start
> requiring every VM to boot using those instead?

My feeling is that it's out of scope for this specification.
What you are doing is great, and you should keep doing it that way.

It does mean that your applications are aware of which hypervisor
they are running on, and you have to manage them using OSv specific
tools on the host.

The VM System Specification however is meant to provide a way to
distribute a file system image for a virtual machine that can run
on any compliant hypervisor using any compliant management tools.
The reason it goes through all the boot loader steps is to make it
much closer to a real system, so distros don't have to special-case
this.

If you don't care about that, you can just install the tools on
the host to manage your OSv application without a boot loader.
In particular, you can support multiple targets on OSv: e.g.
native Xen, native KVM, and portable standard hypervisor following
the ARM VM System Spec.

Arnd
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Paolo Bonzini

Il 10/06/2014 20:56, Paolo Bonzini ha scritto:

Il 10/06/2014 20:08, Peter Maydell ha scritto:

On 10 June 2014 18:04, Christopher Covington  wrote:

On 06/10/2014 10:42 AM, Peter Maydell wrote:

I just noticed that this doesn't mandate that the platform
provides an RTC. As I understand it, the UEFI spec mandates
that there's an RTC (could somebody more familiar with UEFI
than me confirm/deny that?) so we should probably put one here.


Pardon my ignorance, but what exactly disqualifies Generic Timer
implementations from being used as Real Time Clocks?


So my naive view was that an RTC actually had to have
support for dealing with real (wall) clock time, ie
knowing it's 2014 and not 1970. The generic timers are
just timers. Or am I wrong and UEFI doesn't really
require that?


The real-time clock provides four UEFI runtime services (GetTime,
SetTime, GetWakeupTime, SetWakeupTime).  The spec says that you can
return EFI_DEVICE_ERROR from GetTime/SetTime if "the time could not be
retrieved/set due to a hardware error", but I don't think this is enough
to make these two optional.  By comparison, GetWakeupTime/SetWakeupTime
can also return EFI_UNSUPPORTED.

So I agree that the RTC is required in UEFI.


... that said, just like I thought was the case for the serial console, 
do we need to specify the exact hardware models?


We can just say that the VM can expect UEFI boot and runtime services to 
work.  This includes variable services, time services, the serial 
console protocols and more.  It's up to the implementation to provide 
enough devices to support the firmware, and it's out of this spec's 
scope to specify the firmware's implementation.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Paolo Bonzini

Il 10/06/2014 20:56, Paolo Bonzini ha scritto:

Il 10/06/2014 20:08, Peter Maydell ha scritto:

On 10 June 2014 18:04, Christopher Covington  wrote:

On 06/10/2014 10:42 AM, Peter Maydell wrote:

I just noticed that this doesn't mandate that the platform
provides an RTC. As I understand it, the UEFI spec mandates
that there's an RTC (could somebody more familiar with UEFI
than me confirm/deny that?) so we should probably put one here.


Pardon my ignorance, but what exactly disqualifies Generic Timer
implementations from being used as Real Time Clocks?


So my naive view was that an RTC actually had to have
support for dealing with real (wall) clock time, ie
knowing it's 2014 and not 1970. The generic timers are
just timers. Or am I wrong and UEFI doesn't really
require that?


The real-time clock provides four UEFI runtime services (GetTime,
SetTime, GetWakeupTime, SetWakeupTime).  The spec says that you can
return EFI_DEVICE_ERROR from GetTime/SetTime if "the time could not be
retrieved/set due to a hardware error", but I don't think this is enough
to make these two optional.  By comparison, GetWakeupTime/SetWakeupTime
can also return EFI_UNSUPPORTED.

So I agree that the RTC is required in UEFI.


... that said, just like I thought was the case for the serial console, 
do we need to specify the exact hardware models?


We can just say that the VM can expect UEFI boot and runtime services to 
work.  This includes variable services, time services, the serial 
console protocols and more.  It's up to the implementation to provide 
enough devices to support the firmware, and it's out of this spec's 
scope to specify the firmware's implementation.


I think even the serial devices should be removed.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Paolo Bonzini

Il 10/06/2014 20:08, Peter Maydell ha scritto:

On 10 June 2014 18:04, Christopher Covington  wrote:

On 06/10/2014 10:42 AM, Peter Maydell wrote:

I just noticed that this doesn't mandate that the platform
provides an RTC. As I understand it, the UEFI spec mandates
that there's an RTC (could somebody more familiar with UEFI
than me confirm/deny that?) so we should probably put one here.


Pardon my ignorance, but what exactly disqualifies Generic Timer
implementations from being used as Real Time Clocks?


So my naive view was that an RTC actually had to have
support for dealing with real (wall) clock time, ie
knowing it's 2014 and not 1970. The generic timers are
just timers. Or am I wrong and UEFI doesn't really
require that?


The real-time clock provides four UEFI runtime services (GetTime, 
SetTime, GetWakeupTime, SetWakeupTime).  The spec says that you can 
return EFI_DEVICE_ERROR from GetTime/SetTime if "the time could not be 
retrieved/set due to a hardware error", but I don't think this is enough 
to make these two optional.  By comparison, GetWakeupTime/SetWakeupTime 
can also return EFI_UNSUPPORTED.


So I agree that the RTC is required in UEFI.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 4/4] arm: dirty page logging 2nd stage page fault handling support

2014-06-10 Thread Mario Smarduch
On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> On Tue, Jun 03, 2014 at 04:19:27PM -0700, Mario Smarduch wrote:
>> This patch adds support for handling 2nd stage page faults during migration,
>> it disables faulting in huge pages, and disolves huge pages to page tables.
> 
> s/disolves/dissolves/g
Will do.
> 
>> In case migration is canceled huge pages will be used again.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/kvm/mmu.c |   36 ++--
>>  1 file changed, 34 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 1c546c9..aca4fbf 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -966,6 +966,8 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  struct kvm_mmu_memory_cache *memcache = &vcpu->arch.mmu_page_cache;
>>  struct vm_area_struct *vma;
>>  pfn_t pfn;
>> +/* Get logging status, if dirty_bitmap is not NULL then logging is on */
>> +bool logging_active = !!memslot->dirty_bitmap;
> 
>>  
>>  write_fault = kvm_is_write_fault(kvm_vcpu_get_hsr(vcpu));
>>  if (fault_status == FSC_PERM && !write_fault) {
>> @@ -1019,10 +1021,16 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  spin_lock(&kvm->mmu_lock);
>>  if (mmu_notifier_retry(kvm, mmu_seq))
>>  goto out_unlock;
>> -if (!hugetlb && !force_pte)
>> +
>> +/* When logging don't spend cycles to check for huge pages */
> 
> drop the comment: either explain the entire clause (which would be too
> long) or don't explain anything.
> 
Ok.
>> +if (!hugetlb && !force_pte && !logging_active)
> 
> instead of having all this, can't you just change 
> 
> if (is_vm_hugetlb_page(vma)) to
> if (is_vm_hugetlb_page(vma) && !logging_active)
> 
> then you're also not mucking around with the gfn etc.

I didn't want to modify this function too much, but if that's ok that 
simplifies things a lot.

> 
>>  hugetlb = transparent_hugepage_adjust(&pfn, &fault_ipa);
>>  
>> -if (hugetlb) {
>> +/*
>> + * Force all not present/perm faults to PTE handling, address both
>> + * PMD and PTE faults
>> + */
> 
> I don't understand this comment?  In which case does this apply?
> 
The cases I see here -
- huge page permission fault is forced into page table code while logging
- pte permission/not present handled by page table code as before.
>> +if (hugetlb && !logging_active) {
>>  pmd_t new_pmd = pfn_pmd(pfn, PAGE_S2);
>>  new_pmd = pmd_mkhuge(new_pmd);
>>  if (writable) {
>> @@ -1034,6 +1042,22 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  } else {
>>  pte_t new_pte = pfn_pte(pfn, PAGE_S2);
>>  if (writable) {
>> +/*
>> + * If pmd is  mapping a huge page then clear it and let
>> + * stage2_set_pte() create a pte table. At the sametime
>> + * you write protect the pte (PAGE_S2 pgprot_t).
>> + */
>> +if (logging_active) {
>> +pmd_t *pmd;
>> +if (hugetlb) {
>> +pfn += pte_index(fault_ipa);
>> +gfn = fault_ipa >> PAGE_SHIFT;
>> +new_pte = pfn_pte(pfn, PAGE_S2);
>> +}
>> +pmd = stage2_get_pmd(kvm, NULL, fault_ipa);
>> +if (pmd && kvm_pmd_huge(*pmd))
>> +clear_pmd_entry(kvm, pmd, fault_ipa);
>> +}
> 
> now instead of all this, you just need to check for kvm_pmd_huge() in
> stage2_set_pte() and if that's true, you clear it, and then then install
> your new pte.

Yes this really simplifies things!

> 
>>  kvm_set_s2pte_writable(&new_pte);
>>  kvm_set_pfn_dirty(pfn);
>>  }
>> @@ -1041,6 +1065,14 @@ static int user_mem_abort(struct kvm_vcpu *vcpu, 
>> phys_addr_t fault_ipa,
>>  ret = stage2_set_pte(kvm, memcache, fault_ipa, &new_pte, false);
>>  }
>>  
>> +/*
>> + * Log the dirty page in dirty_bitmap[], call regardless if logging is
>> + * disabled or enabled both cases handled safely.
>> + * TODO: for larger page size mark mulitple dirty page bits for each
>> + *   4k page.
>> + */
>> +if (writable)
>> +mark_page_dirty(kvm, gfn);
> 
> what if you just faulted in a page on a read which wasn't present
> before but it happens to belong to a writeable memslot, is that page
> then dirty? hmmm.
> 
A bug, must also check if it was a write fault not just that we're dealing with
a writable region. This one could be pretty bad on performance, not to mention
in accurate. It will be interesting to see new test results, glad y

Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Peter Maydell
On 10 June 2014 18:04, Christopher Covington  wrote:
> On 06/10/2014 10:42 AM, Peter Maydell wrote:
>> I just noticed that this doesn't mandate that the platform
>> provides an RTC. As I understand it, the UEFI spec mandates
>> that there's an RTC (could somebody more familiar with UEFI
>> than me confirm/deny that?) so we should probably put one here.
>
> Pardon my ignorance, but what exactly disqualifies Generic Timer
> implementations from being used as Real Time Clocks?

So my naive view was that an RTC actually had to have
support for dealing with real (wall) clock time, ie
knowing it's 2014 and not 1970. The generic timers are
just timers. Or am I wrong and UEFI doesn't really
require that?

It's also handy if you're booting Linux directly without
UEFI, since it means you can actually have a /dev/rtc0
(and QEMU's implementation at least will correctly give
you the time based on the host's RTC).

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH v7 3/4] arm: dirty log write protect management support

2014-06-10 Thread Mario Smarduch
On 06/10/2014 02:22 AM, Christoffer Dall wrote:
> On Mon, Jun 09, 2014 at 06:47:12PM -0700, Mario Smarduch wrote:
>> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
>>> On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
 kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
 changed this function, this patch picks up those changes, re-tested 
 everything
 works. Applies cleanly with other patches.

 This patch adds support for keeping track of VM dirty pages. As dirty page 
 log
 is retrieved, the pages that have been written are write protected again 
 for
 next write and log read.

 Signed-off-by: Mario Smarduch 
 ---
  arch/arm/include/asm/kvm_host.h |3 ++
  arch/arm/kvm/arm.c  |5 ---
  arch/arm/kvm/mmu.c  |   79 +++
  arch/x86/kvm/x86.c  |   86 
 ---
  virt/kvm/kvm_main.c |   86 
 +++
  5 files changed, 168 insertions(+), 91 deletions(-)

 diff --git a/arch/arm/include/asm/kvm_host.h 
 b/arch/arm/include/asm/kvm_host.h
 index 59565f5..b760f9c 100644
 --- a/arch/arm/include/asm/kvm_host.h
 +++ b/arch/arm/include/asm/kvm_host.h
 @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 
 regid);
  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
  
  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
 +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
 +  struct kvm_memory_slot *slot,
 +  gfn_t gfn_offset, unsigned long mask);
>>>
>>> Do all other architectures implement this function?  arm64?
>>
>> Besides arm, x86 but the function is not generic.
>>>
> 
> you're now calling this from generic code, so all architecture must
> implement it, and the prototype should proably be in
> include/linux/kvm_host.h, not in the arch-specific headers.
Ah ok.
> 
  
  #endif /* __ARM_KVM_HOST_H__ */
 diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
 index dfd63ac..f06fb21 100644
 --- a/arch/arm/kvm/arm.c
 +++ b/arch/arm/kvm/arm.c
 @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
}
  }
  
 -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
 -{
 -  return -EINVAL;
 -}
 -
>>>
>>> What about the other architectures implementing this function?
>>
>> Six architectures define this function. With this patch this
>> function is generic in kvm_main.c used by x86.
> 
> But you're not defining it as a weak symbol (and I don't suspect that
> you should unless other archs do this in a *very* different way), so you
> need to either remove it from the other archs, make it a weak symbol (I
> hope this is not the case) or do something else.
Mistake on my part I just cut and paste Xiaos x86's recent upstream patch and 
didn't add weak definition.

I looked at IA64, MIPS (two of them ), S390 somewhat similar but quite 
different implementations. They use a sync version, where the dirty bitmaps 
are maintained at arch level and then copied to memslot->dirty_bitmap. There 
is only commonality between x86 and ARM right now, x86 uses
memslot->dirty_bitmap directly.

Maybe this function should go back to architecture layer, it's
unlikely it can become generic across all architectures.

There is also the issue of kvm_flush_remote_tlbs(), that's also weak,
the generic one is using IPIs. Since it's only used in mmu.c maybe make 
this one static.


> 
> -Christoffer
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] kvm: Implement PEBS virtualization

2014-06-10 Thread Marcelo Tosatti
On Thu, May 29, 2014 at 06:12:07PM -0700, Andi Kleen wrote:
>  {
>   struct kvm_pmu *pmu = &vcpu->arch.pmu;
> @@ -407,6 +551,20 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct 
> msr_data *msr_info)
>   return 0;
>   }
>   break;
> + case MSR_IA32_DS_AREA:
> + pmu->ds_area = data;
> + return 0;
> + case MSR_IA32_PEBS_ENABLE:
> + if (data & ~0xf000fULL)
> + break;

Bit 63 == PS_ENABLE ?

>  void kvm_handle_pmu_event(struct kvm_vcpu *vcpu)
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 33e8c02..4f39917 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -7288,6 +7288,12 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu 
> *vcpu)
>   atomic_switch_perf_msrs(vmx);
>   debugctlmsr = get_debugctlmsr();
>  
> + /* Move this somewhere else? */

Unless you hook into vcpu->arch.pmu.ds_area and perf_get_ds_area()
writers, it has to be at every vcpu entry.

Could compare values in MSR save area to avoid switch.

> + if (vcpu->arch.pmu.ds_area)
> + add_atomic_switch_msr(vmx, MSR_IA32_DS_AREA,
> +   vcpu->arch.pmu.ds_area,
> +   perf_get_ds_area());

Should clear_atomic_switch_msr before 
add_atomic_switch_msr.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


How does kvm support x2apic?

2014-06-10 Thread Jidong Xiao
Hi,

According to this:

https://github.com/torvalds/linux/commit/0d1de2d901f4ba0972a3886496a44fb1d3300dbd

It looks like kvm have been supporting x2apic since kernel 2.6.32, or
even earlier.

However, this following patch:

https://github.com/torvalds/linux/commit/8d14695f9542e9e0195d6e41ddaa52c32322adf5

Also claims that adding support for x2apic. But this later patch was
for kernel 3.9.

So I am very confused:

First, what's the difference between these two patches? Or say, does
kvm support x2apic since kernel 2.6.32, or since kernel 3.9?

Second, can guest use x2apic even if the host does not? (Assuming qemu
has exposed this feature to guest.) The word "use" means something
like, accessing x2apic registers, setting the x2apic enable bit in
IA32_APICBASE MSR (i.e. bit 10).

-Jidong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: Book3s PR: Disable AIL mode with OPAL

2014-06-10 Thread Alexander Graf
When we're using PR KVM we must not allow the CPU to take interrupts
in virtual mode, as the SLB does not contain host kernel mappings
when running inside the guest context.

To make sure we get good performance for non-KVM tasks but still
properly functioning PR KVM, let's just disable AIL whenever a vcpu
is scheduled in.

This patch fixes running PR KVM on POWER8 bare metal for me.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index 03fc884..cdc0eef 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -71,6 +71,12 @@ static void kvmppc_core_vcpu_load_pr(struct kvm_vcpu *vcpu, 
int cpu)
svcpu->in_use = 0;
svcpu_put(svcpu);
 #endif
+
+   /* Disable AIL if supported */
+   if (cpu_has_feature(CPU_FTR_HVMODE) &&
+   cpu_has_feature(CPU_FTR_ARCH_207S))
+   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) & ~LPCR_AIL);
+
vcpu->cpu = smp_processor_id();
 #ifdef CONFIG_PPC_BOOK3S_32
current->thread.kvm_shadow_vcpu = vcpu->arch.shadow_vcpu;
@@ -91,6 +97,12 @@ static void kvmppc_core_vcpu_put_pr(struct kvm_vcpu *vcpu)
 
kvmppc_giveup_ext(vcpu, MSR_FP | MSR_VEC | MSR_VSX);
kvmppc_giveup_fac(vcpu, FSCR_TAR_LG);
+
+   /* Enable AIL if supported */
+   if (cpu_has_feature(CPU_FTR_HVMODE) &&
+   cpu_has_feature(CPU_FTR_ARCH_207S))
+   mtspr(SPRN_LPCR, mfspr(SPRN_LPCR) | LPCR_AIL_3);
+
vcpu->cpu = -1;
 }
 
-- 
1.8.1.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Using virtio for inter-VM communication

2014-06-10 Thread Henning Schild
Hi,

i am working on the jailhouse[1] project and am currently looking at
inter-VM communication. We want to connect guests directly with virtual
consoles based on shared memory. The code complexity in the hypervisor
should be minimal, it should just make the shared memory discoverable
and provide a signaling mechanism.

We would like to reuse virtio so that Linux-guests will eventually just
work without having to patch them. Having looked at virtio it seems to
be focused on host<->guest communication and does not consider direct
guest<->guest communication. I.e. the queues use guest-physical
addressing, which is only meaningful for the guest and the host.

In a first prototype i implemented a ivshmem[2] device for the
hypervisor. That way we can share memory between virtual machines.
Ivshmem is nice and simple but does not seem to be used anymore. And it
does not define higher level devices, like a console.

At this point i could:
- define a console on top of ivshmem
- see how i can get a virtio console to work between guests on shared
memory

Is anyone already using something like that? I guess zero-copy virtio
devices in Xen would be a similar case. I read a suggestion from may
2010 to introduce a virtio feature bit for shared memory
(VIRTIO_F_RING_SHMEM_ADDR). But that did not make it into the
virtio-spec.

regards,
Henning

[1] jailhouse
https://github.com/siemens/jailhouse

[2] ivshmem
https://gitorious.org/nahanni
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Christopher Covington
Hi Peter,

On 06/10/2014 10:42 AM, Peter Maydell wrote:
> On 28 March 2014 18:45, Christoffer Dall  wrote:
>> ARM VM System Specification
>> ===
>>
> 
>> The virtual hardware platform must provide a number of mandatory
>> peripherals:
>>
>>   Serial console:  The platform should provide a console,
>>   based on an emulated pl011, a virtio-console, or a Xen PV console.
>>
>>   An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer.  GICv2
>>   limits the the number of virtual CPUs to 8 cores, newer GIC versions
>>   removes this limitation.
>>
>>   The ARM virtual timer and counter should be available to the VM as
>>   per the ARM Generic Timers specification in the ARM ARM [1].
> 
> I just noticed that this doesn't mandate that the platform
> provides an RTC. As I understand it, the UEFI spec mandates
> that there's an RTC (could somebody more familiar with UEFI
> than me confirm/deny that?) so we should probably put one here.

Pardon my ignorance, but what exactly disqualifies Generic Timer
implementations from being used as Real Time Clocks?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Paolo Bonzini

Il 10/06/2014 16:42, Peter Maydell ha scritto:

 The guest OS must include support for pl031 and mc146818 RTC.

(QEMU is going to provide a PL031, because that's the standard
ARM primecell device for this and it's what's in the vexpress.
kvmtool looks like it's going to provide mc146818, because
that's the standard x86 RTC and kvmtool happens to already
emulate that.)


Why can't kvmtool add pl031 support?  mc146818 is quite complicated to 
emulate efficiently (without waking up once a second or more).


Paolo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Claudio Fontana
Hello all,

I just wanted to share with you guys how we are using virtualization on ARM64 
over here for the OSv project.
I don't know if that's something that could be useful for your specification 
effort.

In OSv, creating and starting a VM to some level means starting an application.
That is, OSv should be a very thin bare bones guest server OS, which also acts 
as a kind of run-time library for an application to run.
All the devices are assumed to be virtualized and heavily relying on virtio.

Therefore we see a higher need for quick VM launch than it might be for other 
use cases.

One aspect of this is that we currently start executing the image directly (no 
UEFI involved on the guest),
and in some cases we might not need a full fledged file system at all,
as the communication can happen via virtio channels.

We do have a need for ACPI for discovery of information like gic addresses, 
timers, interrupts... (no interest on device trees, really), and of PCI-E.

By skipping steps like UEFI, grub, firmware load, etc we strive to keep our 
application launch time low.
Is this going to create problems for us in the future if you start requiring 
every VM to boot using those instead?

Thank you for your comments,

Claudio


On 28.03.2014 19:45, Christoffer Dall wrote:
> ARM VM System Specification
> ===
> 
> Goal
> 
> The goal of this spec is to allow suitably-built OS images to run on
> all ARM virtualization solutions, such as KVM or Xen.
> 
> Recommendations in this spec are valid for aarch32 and aarch64 alike, and
> they aim to be hypervisor agnostic.
> 
> Note that simply adhering to the SBSA [2] is not a valid approach, for
> example because the SBSA mandates EL2, which will not be available for
> VMs.  Further, this spec also covers the aarch32 execution mode, not
> covered in the SBSA.
> 
> 
> Image format
> 
> The image format, as presented to the VM, needs to be well-defined in
> order for prepared disk images to be bootable across various
> virtualization implementations.
> 
> The raw disk format as presented to the VM must be partitioned with a
> GUID Partition Table (GPT).  The bootable software must be placed in the
> EFI System Partition (ESP), using the UEFI removable media path, and
> must be an EFI application complying to the UEFI Specification 2.4
> Revision A [6].
> 
> The ESP partition's GPT entry's partition type GUID must be
> C12A7328-F81F-11D2-BA4B-00A0C93EC93B and the file system must be
> formatted as FAT32/vfat as per Section 12.3.1.1 in [6].
> 
> The removable media path is \EFI\BOOT\BOOTARM.EFI for the aarch32
> execution state and is \EFI\BOOT\BOOTAA64.EFI for the aarch64 execution
> state as specified in Section 3.3 (3.3 (Boot Option Variables Default Boot
> Behavior) and 3.4.1.1 (Removable Media Boot Behavior) in [6].
> 
> This ensures that tools for both Xen and KVM can load a binary UEFI
> firmware which can read and boot the EFI application in the disk image.
> 
> A typical scenario will be GRUB2 packaged as an EFI application, which
> mounts the system boot partition and boots Linux.
> 
> 
> Virtual Firmware
> 
> The VM system must be UEFI compliant in order to be able to boot the EFI
> application in the ESP.  It is recommended that this is achieved by
> loading a UEFI binary as the first software executed by the VM, which
> then executes the EFI application.  The UEFI implementation should be
> compliant with UEFI Specification 2.4 Revision A [6] or later.
> 
> This document strongly recommends that the VM implementation supports
> persistent environment storage for virtual firmware implementation in
> order to ensure probable use cases such as adding additional disk images
> to a VM or running installers to perform upgrades.
> 
> This document strongly recommends that VM implementations implement
> persistent variable storage for their UEFI implementation.  Persistent
> variable storage shall be a property of a VM instance, but shall not be
> stored as part of a portable disk image.  Portable disk images shall
> conform to the UEFI removable disk requirements from the UEFI spec and
> cannot rely on on a pre-configured UEFI environment.
> 
> The binary UEFI firmware implementation should not be distributed as
> part of the VM image, but is specific to the VM implementation.
> 
> 
> Hardware Description
> 
> The VM system must be UEFI compliant and therefore the UEFI system table
> will provide a means to access hardware description data.
> 
> The VM implementation must provide through its UEFI implementation:
> 
>   a complete FDT which describes the entire VM system and will boot
>   mainline kernels driven by device tree alone
> 
> For more information about the arm and arm64 boot conventions, see
> Documentation/arm/Booting and Documentation/arm64/booting.txt in the
> Linux kernel source tree.
> 
> For more information about UEFI booting, see [4] and [5].
> 
> 
> VM Platform
> ---

reduce networking latency

2014-06-10 Thread David Xu
Hi All,

I found this interesting project from KVM TODO website:

allow handling short packets from softirq or VCPU context
 Plan:
   We are going through the scheduler 3 times
   (could be up to 5 if softirqd is involved)
   Consider RX: host irq -> io thread -> VCPU thread ->
   guest irq -> guest thread.
   This adds a lot of latency.
   We can cut it by some 1.5x if we do a bit of work
   either in the VCPU or softirq context.
 Testing: netperf TCP RR - should be improved drastically
  netperf TCP STREAM guest to host - no regression
 Developer: MST

 I am also tuning the vCPU scheduling of KVM. If someone would like to
say some details about the work either in the vCPU or softirq context,
I will be very appreciated. BTW, how to get the evaluation results
that the shortcut can improve the performance by up to 1.5X?
Thanks a lot!

Regards,
Cong
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Ian Campbell
On Tue, 2014-06-10 at 15:42 +0100, Peter Maydell wrote:
> On 28 March 2014 18:45, Christoffer Dall  wrote:
> > ARM VM System Specification
> > ===
> >
> 
> > The virtual hardware platform must provide a number of mandatory
> > peripherals:
> >
> >   Serial console:  The platform should provide a console,
> >   based on an emulated pl011, a virtio-console, or a Xen PV console.
> >
> >   An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer.  GICv2
> >   limits the the number of virtual CPUs to 8 cores, newer GIC versions
> >   removes this limitation.
> >
> >   The ARM virtual timer and counter should be available to the VM as
> >   per the ARM Generic Timers specification in the ARM ARM [1].
> 
> I just noticed that this doesn't mandate that the platform
> provides an RTC. As I understand it, the UEFI spec mandates
> that there's an RTC (could somebody more familiar with UEFI
> than me confirm/deny that?) so we should probably put one here.

Isn't that already done transitively via the requirement to provide a
UEFI environment?

I thought the RTC was exposed via UEFI Runtime Service, in which case
it's mostly a hypervisor internal issue how time is provided to the
(hypervisor provided) UEFI implementation, the guest OS just uses the
runtime services interfaces.

Given that do we also need to standardise on a guest OS visible clock
device too? I'm not sure if we do or not, but if yes then I have a
couple of comments on the suggested wording (you can probably guess what
they are going to be...):

> Suggested wording:
> 
>  RTC: The platform should provide a real time clock, based
>  on an emulated pl031 or mc146818.

We would need to include the Xen PV wallclock here too.

> and in the guest-support section later:
> 
>  The guest OS must include support for pl031 and mc146818 RTC.

and here.

Ian.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC v2] ARM VM System Specification

2014-06-10 Thread Peter Maydell
On 28 March 2014 18:45, Christoffer Dall  wrote:
> ARM VM System Specification
> ===
>

> The virtual hardware platform must provide a number of mandatory
> peripherals:
>
>   Serial console:  The platform should provide a console,
>   based on an emulated pl011, a virtio-console, or a Xen PV console.
>
>   An ARM Generic Interrupt Controller v2 (GICv2) [3] or newer.  GICv2
>   limits the the number of virtual CPUs to 8 cores, newer GIC versions
>   removes this limitation.
>
>   The ARM virtual timer and counter should be available to the VM as
>   per the ARM Generic Timers specification in the ARM ARM [1].

I just noticed that this doesn't mandate that the platform
provides an RTC. As I understand it, the UEFI spec mandates
that there's an RTC (could somebody more familiar with UEFI
than me confirm/deny that?) so we should probably put one here.

Suggested wording:

 RTC: The platform should provide a real time clock, based
 on an emulated pl031 or mc146818.

and in the guest-support section later:

 The guest OS must include support for pl031 and mc146818 RTC.

(QEMU is going to provide a PL031, because that's the standard
ARM primecell device for this and it's what's in the vexpress.
kvmtool looks like it's going to provide mc146818, because
that's the standard x86 RTC and kvmtool happens to already
emulate that.)

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] kvm: x86: emulate monitor and mwait instructions as nop

2014-06-10 Thread Michael S. Tsirkin
On Tue, Jun 03, 2014 at 10:21:58AM -0400, Gabriel L. Somlo wrote:
> On Tue, Jun 03, 2014 at 11:17:48AM +0200, Paolo Bonzini wrote:
> > 
> > I think it's fine as it is now. :)
> 
> On Mon, Jun 02, 2014 at 09:55:18PM -0400, Gabriel L. Somlo wrote:
> > 
> > W.r.t. monitor/mwait, a guest can do one of the following:
> > 
> > 1. Never check CPUID, and never use monitor/mwait
> > - This is great, we don't have to do anything about these
> > 
> > 2. Check CPUID for mwait, use it to idle in preference over hlt
> > - Linux, Windows, and Mavericks (10.9) do this
> > - we never want to have CPUID say "yes" to these, since
> >   monitor/mwait support will be clunky in the best case,
> >   and hlt is overwhelmingly preferable! [*]
> > 
> > 3. Never check CPUID, use monitor/mwait with abandon
> > - OS X 10.6 .. 10.8 does this
> > - emulating monitor/mwait here allows us to boot the guest
> >   and use it, and perform sysadmin surgery to force a hlt
> >   based idle
> > 
> > 4. Check CPUID, panic if unavailable
> > - OS X 10.5 did this, IIRC.
> > - whether I can do kext surgery and get it to stop checking
> >   CPUID *in addition to* falling back to hlt-based idle is
> >   TBD.
> > - emulating monitor/mwait allows us to boot this type of
> >   guest, BUT WE ALSO HAVE TO ADVERTISE IT VIA CPUID !!!
> 
> As it is right now, #4 is not being addressed (and we can't just
> advertise mwait via cpuid, or we'd be screwing up #2).

Yes, I didn't understand 10.5 did #4.

> I also feel a bit weird about the "undocumented feature" aspect
> of NOT generating an invalid opcode for something that *should*
> be an invalid opcode according to the feature set advertised via
> cpuid...
> 
> So if there's a way to make it so we can tell QEMU/KVM to
> "--enable-mwait" on a per-guest basis, I think that'd be better
> than an always-on "undocumented" behavior...
> 
> But then again, I'm most likely missing something about the big
> picture... :)
> 
> Thanks much,
> --Gabriel
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RESEND PATCH v7 3/4] arm: dirty log write protect management support

2014-06-10 Thread Christoffer Dall
On Mon, Jun 09, 2014 at 06:47:12PM -0700, Mario Smarduch wrote:
> On 06/08/2014 05:05 AM, Christoffer Dall wrote:
> > On Fri, Jun 06, 2014 at 10:33:41AM -0700, Mario Smarduch wrote:
> >> kvm_vm_ioctl_get_dirty_log() is generic used by x86, ARM. x86 recent patch 
> >> changed this function, this patch picks up those changes, re-tested 
> >> everything
> >> works. Applies cleanly with other patches.
> >>
> >> This patch adds support for keeping track of VM dirty pages. As dirty page 
> >> log
> >> is retrieved, the pages that have been written are write protected again 
> >> for
> >> next write and log read.
> >>
> >> Signed-off-by: Mario Smarduch 
> >> ---
> >>  arch/arm/include/asm/kvm_host.h |3 ++
> >>  arch/arm/kvm/arm.c  |5 ---
> >>  arch/arm/kvm/mmu.c  |   79 +++
> >>  arch/x86/kvm/x86.c  |   86 
> >> ---
> >>  virt/kvm/kvm_main.c |   86 
> >> +++
> >>  5 files changed, 168 insertions(+), 91 deletions(-)
> >>
> >> diff --git a/arch/arm/include/asm/kvm_host.h 
> >> b/arch/arm/include/asm/kvm_host.h
> >> index 59565f5..b760f9c 100644
> >> --- a/arch/arm/include/asm/kvm_host.h
> >> +++ b/arch/arm/include/asm/kvm_host.h
> >> @@ -232,5 +232,8 @@ u64 kvm_arm_timer_get_reg(struct kvm_vcpu *, u64 
> >> regid);
> >>  int kvm_arm_timer_set_reg(struct kvm_vcpu *, u64 regid, u64 value);
> >>  
> >>  void kvm_mmu_wp_memory_region(struct kvm *kvm, int slot);
> >> +void kvm_mmu_write_protect_pt_masked(struct kvm *kvm,
> >> +  struct kvm_memory_slot *slot,
> >> +  gfn_t gfn_offset, unsigned long mask);
> > 
> > Do all other architectures implement this function?  arm64?
> 
> Besides arm, x86 but the function is not generic.
> > 

you're now calling this from generic code, so all architecture must
implement it, and the prototype should proably be in
include/linux/kvm_host.h, not in the arch-specific headers.

> >>  
> >>  #endif /* __ARM_KVM_HOST_H__ */
> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> >> index dfd63ac..f06fb21 100644
> >> --- a/arch/arm/kvm/arm.c
> >> +++ b/arch/arm/kvm/arm.c
> >> @@ -780,11 +780,6 @@ long kvm_arch_vcpu_ioctl(struct file *filp,
> >>}
> >>  }
> >>  
> >> -int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct kvm_dirty_log *log)
> >> -{
> >> -  return -EINVAL;
> >> -}
> >> -
> > 
> > What about the other architectures implementing this function?
> 
> Six architectures define this function. With this patch this
> function is generic in kvm_main.c used by x86.

But you're not defining it as a weak symbol (and I don't suspect that
you should unless other archs do this in a *very* different way), so you
need to either remove it from the other archs, make it a weak symbol (I
hope this is not the case) or do something else.

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html