date:20140505

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf



On 06.05.14 02:41, Paul Mackerras wrote:

On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote:

On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

+#ifdef CONFIG_PPC_BOOK3S_64
+   return vcpu->arch.fault_dar;

How about PA6T and G5s?

G5 sets DAR on an alignment interrupt.

As for PA6T, I don't know for sure, but if it doesn't, ordinary
alignment interrupts wouldn't be handled properly, since the code in
arch/powerpc/kernel/align.c assumes DAR contains the address being
accessed on all PowerPC CPUs.


Now that's a good point. If we simply behave like Linux, I'm fine. This 
definitely deserves a comment on the #ifdef in the code.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Alexander Graf



On 06.05.14 06:26, Gavin Shan wrote:

On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote:

On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:

On 05/05/2014 03:27 AM, Gavin Shan wrote:

The series of patches intends to support EEH for PCI devices, which have been
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to support
EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
to emulate XICS. Without introducing new mechanism, we just extend that
existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
initiated from guest are posted to host where the requests get handled or
delivered to underly firmware for further handling. For that, the host 
kerenl
has to maintain the PCI address (host domain/bus/slot/function to guest's
PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
will be built when initializing VFIO device in QEMU and destroied when the
VFIO device in QEMU is going to offline, or VM is destroy.

Do you also expose all those interfaces to user space? VFIO is as much
about user space device drivers as it is about device assignment.


Yep, all the interfaces are exported to user space.


I would like to first see an implementation that doesn't touch KVM
emulation code at all but instead routes everything through QEMU. As a
second step we can then accelerate performance critical paths inside of KVM.


Ok. I'll change the implementation. However, the QEMU still has to
poll/push information from/to host kerenl. So the best place for that
would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature.

For the error injection, I guess I have to put the logic token management
into QEMU and error injection request will be handled by QEMU and then
routed to host kernel via additional syscall as we did for pSeries.


Yes, start off without in-kernel XICS so everything simply lives in 
QEMU. Then add callbacks into the in-kernel XICS to inject these 
interrupts if we don't have wide enough interfaces already.




Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/4] KVM: nVMX: rearrange get_vmx_mem_address

2014-05-05 Thread Bandan Das

Our common function for vmptr checks (in 2/4) needs to fetch
the memory address

Signed-off-by: Bandan Das 
---
 arch/x86/kvm/vmx.c | 106 ++---
 1 file changed, 53 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 1f68c58..c18fe9a4 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5775,6 +5775,59 @@ static enum hrtimer_restart 
vmx_preemption_timer_fn(struct hrtimer *timer)
 }
 
 /*
+ * Decode the memory-address operand of a vmx instruction, as recorded on an
+ * exit caused by such an instruction (run by a guest hypervisor).
+ * On success, returns 0. When the operand is invalid, returns 1 and throws
+ * #UD or #GP.
+ */
+static int get_vmx_mem_address(struct kvm_vcpu *vcpu,
+unsigned long exit_qualification,
+u32 vmx_instruction_info, gva_t *ret)
+{
+   /*
+* According to Vol. 3B, "Information for VM Exits Due to Instruction
+* Execution", on an exit, vmx_instruction_info holds most of the
+* addressing components of the operand. Only the displacement part
+* is put in exit_qualification (see 3B, "Basic VM-Exit Information").
+* For how an actual address is calculated from all these components,
+* refer to Vol. 1, "Operand Addressing".
+*/
+   int  scaling = vmx_instruction_info & 3;
+   int  addr_size = (vmx_instruction_info >> 7) & 7;
+   bool is_reg = vmx_instruction_info & (1u << 10);
+   int  seg_reg = (vmx_instruction_info >> 15) & 7;
+   int  index_reg = (vmx_instruction_info >> 18) & 0xf;
+   bool index_is_valid = !(vmx_instruction_info & (1u << 22));
+   int  base_reg   = (vmx_instruction_info >> 23) & 0xf;
+   bool base_is_valid  = !(vmx_instruction_info & (1u << 27));
+
+   if (is_reg) {
+   kvm_queue_exception(vcpu, UD_VECTOR);
+   return 1;
+   }
+
+   /* Addr = segment_base + offset */
+   /* offset = base + [index * scale] + displacement */
+   *ret = vmx_get_segment_base(vcpu, seg_reg);
+   if (base_is_valid)
+   *ret += kvm_register_read(vcpu, base_reg);
+   if (index_is_valid)
+   *ret += kvm_register_read(vcpu, index_reg)<> 7) & 7;
-   bool is_reg = vmx_instruction_info & (1u << 10);
-   int  seg_reg = (vmx_instruction_info >> 15) & 7;
-   int  index_reg = (vmx_instruction_info >> 18) & 0xf;
-   bool index_is_valid = !(vmx_instruction_info & (1u << 22));
-   int  base_reg   = (vmx_instruction_info >> 23) & 0xf;
-   bool base_is_valid  = !(vmx_instruction_info & (1u << 27));
-
-   if (is_reg) {
-   kvm_queue_exception(vcpu, UD_VECTOR);
-   return 1;
-   }
-
-   /* Addr = segment_base + offset */
-   /* offset = base + [index * scale] + displacement */
-   *ret = vmx_get_segment_base(vcpu, seg_reg);
-   if (base_is_valid)
-   *ret += kvm_register_read(vcpu, base_reg);
-   if (index_is_valid)
-   *ret += kvm_register_read(vcpu, index_reg)

[PATCH v2 3/4] KVM: nVMX: fail on invalid vmclear/vmptrld pointer

2014-05-05 Thread Bandan Das

The spec mandates that if the vmptrld or vmclear
address is equal to the vmxon region pointer, the
instruction should fail with error "VMPTRLD with
VMXON pointer" or "VMCLEAR with VMXON pointer"

Signed-off-by: Bandan Das 
---
 arch/x86/kvm/vmx.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 059906a..6c125ff 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -6083,6 +6083,12 @@ static int handle_vmclear(struct kvm_vcpu *vcpu)
return 1;
}
 
+   if (vmptr == vmx->nested.vmxon_ptr) {
+   nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_VMXON_POINTER);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+
if (vmptr == vmx->nested.current_vmptr) {
nested_release_vmcs12(vmx);
vmx->nested.current_vmptr = -1ull;
@@ -6426,6 +6432,12 @@ static int handle_vmptrld(struct kvm_vcpu *vcpu)
return 1;
}
 
+   if (vmptr == vmx->nested.vmxon_ptr) {
+   nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_VMXON_POINTER);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+
if (vmx->nested.current_vmptr != vmptr) {
struct vmcs12 *new_vmcs12;
struct page *page;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 0/4] Emulate VMXON region correctly

2014-05-05 Thread Bandan Das

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=54521

The vmxon region is unused by nvmx, but adding these checks
are probably harmless and may detect buggy L1 hypervisors in 
the future!

v2:
1/4
  - Commit message change to reflect addition of new function
2/4
  - Use cpuid_maxphyaddr()
  - Fix a leak with kunmap()
  - Remove unnecessary braces around comparisions
  - Move all checks into a common function, this will be later
used by handle_vmptrld and handle_vmclear in 4/4
4/4
  - New patch - use common function to perform checks on
vmptr

Bandan Das (4):
  KVM: nVMX: rearrange get_vmx_mem_address
  KVM: nVMX: additional checks on vmxon region
  KVM: nVMX: fail on invalid vmclear/vmptrld pointer
  KVM: nVMX: move vmclear and vmptrld pre-checks to
nested_vmx_check_vmptr

 arch/x86/kvm/cpuid.c |   1 +
 arch/x86/kvm/vmx.c   | 240 +--
 2 files changed, 156 insertions(+), 85 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/4] KVM: nVMX: move vmclear and vmptrld pre-checks to nested_vmx_check_vmptr

2014-05-05 Thread Bandan Das

Some checks are common to all, and moreover,
according to the spec, the check for whether any bits
beyond the physical address width are set are also
applicable to all of them

Signed-off-by: Bandan Das 
---
 arch/x86/kvm/vmx.c | 83 --
 1 file changed, 37 insertions(+), 46 deletions(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index 6c125ff..9b36057 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -5833,8 +5833,10 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu,
  * - if it's 4KB aligned
  * - No bits beyond the physical address width are set
  * - Returns 0 on success or else 1
+ * (Intel SDM Section 30.3)
  */
-static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason)
+static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason,
+ gpa_t *vmpointer)
 {
gva_t gva;
gpa_t vmptr;
@@ -5882,11 +5884,42 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu 
*vcpu, int exit_reason)
kunmap(page);
vmx->nested.vmxon_ptr = vmptr;
break;
+   case EXIT_REASON_VMCLEAR:
+   if (!IS_ALIGNED(vmptr, PAGE_SIZE) || (vmptr >> maxphyaddr)) {
+   nested_vmx_failValid(vcpu,
+VMXERR_VMCLEAR_INVALID_ADDRESS);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
 
+   if (vmptr == vmx->nested.vmxon_ptr) {
+   nested_vmx_failValid(vcpu,
+VMXERR_VMCLEAR_VMXON_POINTER);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+   break;
+   case EXIT_REASON_VMPTRLD:
+   if (!IS_ALIGNED(vmptr, PAGE_SIZE) || (vmptr >> maxphyaddr)) {
+   nested_vmx_failValid(vcpu,
+VMXERR_VMPTRLD_INVALID_ADDRESS);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+
+   if (vmptr == vmx->nested.vmxon_ptr) {
+   nested_vmx_failValid(vcpu,
+VMXERR_VMCLEAR_VMXON_POINTER);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+   break;
default:
return 1; /* shouldn't happen */
}
 
+   if (vmpointer)
+   *vmpointer = vmptr;
return 0;
 }
 
@@ -5929,7 +5962,7 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
return 1;
}
 
-   if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON))
+   if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON, NULL))
return 1;
 
if (vmx->nested.vmxon) {
@@ -6058,37 +6091,16 @@ static int handle_vmoff(struct kvm_vcpu *vcpu)
 static int handle_vmclear(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   gva_t gva;
gpa_t vmptr;
struct vmcs12 *vmcs12;
struct page *page;
-   struct x86_exception e;
 
if (!nested_vmx_check_permission(vcpu))
return 1;
 
-   if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
-   vmcs_read32(VMX_INSTRUCTION_INFO), &gva))
+   if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMCLEAR, &vmptr))
return 1;
 
-   if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr,
-   sizeof(vmptr), &e)) {
-   kvm_inject_page_fault(vcpu, &e);
-   return 1;
-   }
-
-   if (!IS_ALIGNED(vmptr, PAGE_SIZE)) {
-   nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_INVALID_ADDRESS);
-   skip_emulated_instruction(vcpu);
-   return 1;
-   }
-
-   if (vmptr == vmx->nested.vmxon_ptr) {
-   nested_vmx_failValid(vcpu, VMXERR_VMCLEAR_VMXON_POINTER);
-   skip_emulated_instruction(vcpu);
-   return 1;
-   }
-
if (vmptr == vmx->nested.current_vmptr) {
nested_release_vmcs12(vmx);
vmx->nested.current_vmptr = -1ull;
@@ -6408,35 +6420,14 @@ static int handle_vmwrite(struct kvm_vcpu *vcpu)
 static int handle_vmptrld(struct kvm_vcpu *vcpu)
 {
struct vcpu_vmx *vmx = to_vmx(vcpu);
-   gva_t gva;
gpa_t vmptr;
-   struct x86_exception e;
u32 exec_control;
 
if (!nested_vmx_check_permission(vcpu))
return 1;
 
-   if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
-   vmcs_read32(VMX_INSTRUCTION_INFO), &gva))
-   return 1;
-
-   if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr,
-   sizeof(vmptr), &e)) {
-   kvm_inject_page_fault(vcpu, &e);

[PATCH v2 2/4] KVM: nVMX: additional checks on vmxon region

2014-05-05 Thread Bandan Das

Currently, the vmxon region isn't used in the nested case.
However, according to the spec, the vmxon instruction performs
additional sanity checks on this region and the associated
pointer. Modify emulated vmxon to better adhere to the spec
requirements

Signed-off-by: Bandan Das 
---
 arch/x86/kvm/cpuid.c |  1 +
 arch/x86/kvm/vmx.c   | 67 
 2 files changed, 68 insertions(+)

diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index f47a104..da9894b 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -726,6 +726,7 @@ int cpuid_maxphyaddr(struct kvm_vcpu *vcpu)
 not_found:
return 36;
 }
+EXPORT_SYMBOL_GPL(cpuid_maxphyaddr);
 
 /*
  * If no match is found, check whether we exceed the vCPU's limit
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c18fe9a4..059906a 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -354,6 +354,7 @@ struct vmcs02_list {
 struct nested_vmx {
/* Has the level1 guest done vmxon? */
bool vmxon;
+   gpa_t vmxon_ptr;
 
/* The guest-physical address of the current VMCS L1 keeps for L2 */
gpa_t current_vmptr;
@@ -5828,6 +5829,68 @@ static int get_vmx_mem_address(struct kvm_vcpu *vcpu,
 }
 
 /*
+ * This function performs the various checks including
+ * - if it's 4KB aligned
+ * - No bits beyond the physical address width are set
+ * - Returns 0 on success or else 1
+ */
+static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason)
+{
+   gva_t gva;
+   gpa_t vmptr;
+   struct x86_exception e;
+   struct page *page;
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+   int maxphyaddr = cpuid_maxphyaddr(vcpu);
+
+   if (get_vmx_mem_address(vcpu, vmcs_readl(EXIT_QUALIFICATION),
+   vmcs_read32(VMX_INSTRUCTION_INFO), &gva))
+   return 1;
+
+   if (kvm_read_guest_virt(&vcpu->arch.emulate_ctxt, gva, &vmptr,
+   sizeof(vmptr), &e)) {
+   kvm_inject_page_fault(vcpu, &e);
+   return 1;
+   }
+
+   switch (exit_reason) {
+   case EXIT_REASON_VMON:
+   /*
+* SDM 3: 24.11.5
+* The first 4 bytes of VMXON region contain the supported
+* VMCS revision identifier
+*
+* Note - IA32_VMX_BASIC[48] will never be 1
+* for the nested case;
+* which replaces physical address width with 32
+*
+*/
+   if (!IS_ALIGNED(vmptr, PAGE_SIZE) || (vmptr >> maxphyaddr)) {
+   nested_vmx_failInvalid(vcpu);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+
+   page = nested_get_page(vcpu, vmptr);
+   if (page == NULL ||
+   *(u32 *)kmap(page) != VMCS12_REVISION) {
+   nested_vmx_failInvalid(vcpu);
+   kunmap(page);
+   skip_emulated_instruction(vcpu);
+   return 1;
+   }
+   kunmap(page);
+   vmx->nested.vmxon_ptr = vmptr;
+   break;
+
+   default:
+   return 1; /* shouldn't happen */
+   }
+
+   return 0;
+}
+
+/*
  * Emulate the VMXON instruction.
  * Currently, we just remember that VMX is active, and do not save or even
  * inspect the argument to VMXON (the so-called "VMXON pointer") because we
@@ -5865,6 +5928,10 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
kvm_inject_gp(vcpu, 0);
return 1;
}
+
+   if (nested_vmx_check_vmptr(vcpu, EXIT_REASON_VMON))
+   return 1;
+
if (vmx->nested.vmxon) {
nested_vmx_failValid(vcpu, VMXERR_VMXON_IN_VMX_ROOT_OPERATION);
skip_emulated_instruction(vcpu);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Gavin Shan

On Mon, May 05, 2014 at 08:00:12AM -0600, Alex Williamson wrote:
>On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
>> On 05/05/2014 03:27 AM, Gavin Shan wrote:
>> > The series of patches intends to support EEH for PCI devices, which have 
>> > been
>> > passed through to PowerKVM based guest via VFIO. The implementation is
>> > straightforward based on the issues or problems we have to resolve to 
>> > support
>> > EEH for PowerKVM based guest.
>> >
>> > - Emulation for EEH RTAS requests. Thanksfully, we already have 
>> > infrastructure
>> >to emulate XICS. Without introducing new mechanism, we just extend that
>> >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
>> >initiated from guest are posted to host where the requests get handled 
>> > or
>> >delivered to underly firmware for further handling. For that, the host 
>> > kerenl
>> >has to maintain the PCI address (host domain/bus/slot/function to 
>> > guest's
>> >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address 
>> > mapping
>> >will be built when initializing VFIO device in QEMU and destroied when 
>> > the
>> >VFIO device in QEMU is going to offline, or VM is destroy.
>> 
>> Do you also expose all those interfaces to user space? VFIO is as much 
>> about user space device drivers as it is about device assignment.
>> 

Yep, all the interfaces are exported to user space. 

>> I would like to first see an implementation that doesn't touch KVM 
>> emulation code at all but instead routes everything through QEMU. As a 
>> second step we can then accelerate performance critical paths inside of KVM.
>> 

Ok. I'll change the implementation. However, the QEMU still has to
poll/push information from/to host kerenl. So the best place for that
would be tce_iommu_driver_ops::ioctl as EEH is Power specific feature.

For the error injection, I guess I have to put the logic token management
into QEMU and error injection request will be handled by QEMU and then
routed to host kernel via additional syscall as we did for pSeries.

>> That way we ensure that user space device drivers have all the power 
>> over a device they need to drive it.
>
>+1
>

Thanks,
Gavin

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-05 Thread Paul Mackerras

On Mon, May 05, 2014 at 08:17:00PM +0530, Aneesh Kumar K.V wrote:
> Alexander Graf  writes:
> 
> > On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
> >> Signed-off-by: Aneesh Kumar K.V 
> >
> > No patch description, no proper explanations anywhere why you're doing 
> > what. All of that in a pretty sensitive piece of code. There's no way 
> > this patch can go upstream in its current form.
> >
> 
> Sorry about being vague. Will add a better commit message. The goal is
> to export MPSS support to guest if the host support the same. MPSS
> support is exported via penc encoding in "ibm,segment-page-sizes". The
> actual format can be found at htab_dt_scan_page_sizes. When the guest
> memory is backed by hugetlbfs we expose the penc encoding the host
> support to guest via kvmppc_add_seg_page_size. 

In a case like this it's good to assume the reader doesn't know very
much about Power CPUs, and probably isn't familiar with acronyms such
as MPSS.  The patch needs an introductory paragraph explaining that on
recent IBM Power CPUs, while the hashed page table is looked up using
the page size from the segmentation hardware (i.e. the SLB), it is
possible to have the HPT entry indicate a larger page size.  Thus for
example it is possible to put a 16MB page in a 64kB segment, but since
the hash lookup is done using a 64kB page size, it may be necessary to
put multiple entries in the HPT for a single 16MB page.  This
capability is called mixed page-size segment (MPSS).  With MPSS,
there are two relevant page sizes: the base page size, which is the
size used in searching the HPT, and the actual page size, which is the
size indicated in the HPT entry.  Note that the actual page size is
always >= base page size.

> Now the challenge to THP support is to make sure that our henter,
> hremove etc decode base page size and actual page size correctly
> from the hash table entry values. Most of the changes is to do that.
> Rest of the stuff is already handled by kvm. 
> 
> NOTE: It is much easier to read the code after applying the patch rather
> than reading the diff. I have added comments around each steps in the
> code.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 5/5] change update_range to handle > 4GB 2nd stage range for ARMv7

2014-05-05 Thread Mario Smarduch

Hi Gavin,
   thanks, didn't catch that, I'll remove these calls.

- Mario

On 05/05/2014 04:34 PM, Gavin Guo wrote:
> Hi Mario,
> 
> On Tue, Apr 29, 2014 at 9:06 AM, Mario Smarduch  
> wrote:
>>
>> This patch adds support for unmapping 2nd stage page tables for addresses 
>> >4GB
>> on ARMv7.
>>
>> Signed-off-by: Mario Smarduch 
>> ---
>>  arch/arm/kvm/mmu.c |   20 
>>  1 file changed, 12 insertions(+), 8 deletions(-)
>>
>> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
>> index 88f5503..afbf8ba 100644
>> --- a/arch/arm/kvm/mmu.c
>> +++ b/arch/arm/kvm/mmu.c
>> @@ -176,21 +176,25 @@ static void clear_pte_entry(struct kvm *kvm, pte_t 
>> *pte, phys_addr_t addr)
>> }
>>  }
>>
>> +/* Function shared between identity and 2nd stage mappings. For 2nd stage
>> + * the IPA may be > 4GB on ARMv7, and page table range functions
>> + * will fail. kvm_xxx_addr_end() is used to handle both cases.
>> + */
>>  static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
>> -   unsigned long long start, u64 size)
>> +   phys_addr_t start, u64 size)
>>  {
>> pgd_t *pgd;
>> pud_t *pud;
>> pmd_t *pmd;
>> pte_t *pte;
>> -   unsigned long long addr = start, end = start + size;
>> -   u64 next;
>> +   phys_addr_t addr = start, end = start + size;
>> +   phys_addr_t next;
>>
>> while (addr < end) {
>> pgd = pgdp + pgd_index(addr);
>> pud = pud_offset(pgd, addr);
>> if (pud_none(*pud)) {
>> -   addr = pud_addr_end(addr, end);
>> +   addr = kvm_pud_addr_end(addr, end);
>> continue;
>> }
>>
>> @@ -200,13 +204,13 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
>>  * move on.
>>  */
>> clear_pud_entry(kvm, pud, addr);
>> -   addr = pud_addr_end(addr, end);
>> +   addr = kvm_pud_addr_end(addr, end);
>> continue;
>> }
>>
>> pmd = pmd_offset(pud, addr);
>> if (pmd_none(*pmd)) {
>> -   addr = pmd_addr_end(addr, end);
>> +   addr = kvm_pmd_addr_end(addr, end);
>> continue;
>> }
>>
>> @@ -221,10 +225,10 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
>>  */
>> if (kvm_pmd_huge(*pmd) || page_empty(pte)) {
>> clear_pmd_entry(kvm, pmd, addr);
>> -   next = pmd_addr_end(addr, end);
>> +   next = kvm_pmd_addr_end(addr, end);
>> if (page_empty(pmd) && !page_empty(pud)) {
>> clear_pud_entry(kvm, pud, addr);
>> -   next = pud_addr_end(addr, end);
>> +   next = kvm_pud_addr_end(addr, end);
>> }
>> }
>>
>> --
>> 1.7.9.5
>>
>>
>>
> 
> It seems that your adding kvm_pmd_addr_end(addr, end) already exists
> in the following patch and may need to remove these parts from your
> patch.
> 
> commit a3c8bd31af260a17d626514f636849ee1cd1f63e
> Author: Marc Zyngier 
> Date:   Tue Feb 18 14:29:03 2014 +
> 
> ARM: KVM: introduce kvm_p*d_addr_end
> 
> The use of p*d_addr_end with stage-2 translation is slightly dodgy,
> as the IPA is 40bits, while all the p*d_addr_end helpers are
> taking an unsigned long (arm64 is fine with that as unligned long
> is 64bit).
> 
> The fix is to introduce 64bit clean versions of the same helpers,
> and use them in the stage-2 page table code.
> 
> Signed-off-by: Marc Zyngier 
> Acked-by: Catalin Marinas 
> Reviewed-by: Christoffer Dall 
> 
> Gavin
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM exit on UD interception

2014-05-05 Thread Alexandru Duţu

Thank you Jun!

Now I understand that there is a strong need to support this scenario
where the host might run into trouble executing binaries with
instructions unknown to it.

I am still wondering if there is a way to actually exit KVM on UD from
a syscall instruction without modifying the KVM kernel module?

Best regards,
Alex


On Mon, May 5, 2014 at 7:07 PM, Nakajima, Jun  wrote:
> On Mon, May 5, 2014 at 11:48 AM, Alexandru Duţu  wrote:
>> Thank you Jun! I see that in case of VMX does not emulated the
>> instruction that produced a UD exception, it just queues the exception
>> and returns 1. After that KVM will still try to enter virtualized
>> execution and so forth, the execution probably finishing with a DF and
>> shut down. It does not seem that KVM, in case of VMX, will exit
>> immediately on UD.
>>
>> I am not sure what you meant with MOVBE emulation.
>
> I meant:
>
> commit 84cffe499b9418d6c3b4de2ad9599cc2ec50c607
> Author: Borislav Petkov 
> Date:   Tue Oct 29 12:54:56 2013 +0100
>
> kvm: Emulate MOVBE
>
> This basically came from the need to be able to boot 32-bit Atom SMP
> guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
> matter of fact, qemu has since recently received MOVBE support but we
> cannot share that with kvm emulation and thus we have to do this in the
> host. We're waay faster in kvm anyway. :-)
>
> So, we piggyback on the #UD path and emulate the MOVBE functionality.
> With it, an 8-core SMP guest boots in under 6 seconds.
>
> Also, requesting MOVBE emulation needs to happen explicitly to work,
> i.e. qemu -cpu n270,+movbe...
>
> Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
> kernel in kvm executes MOVBE ~60K times.
>
> Signed-off-by: Andre Przywara 
> Signed-off-by: Borislav Petkov 
> Signed-off-by: Paolo Bonzini 
>
>
> --
> Jun
> Intel Open Source Technology Center



-- 
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 0/3] Emulator Speedups - Optimize Instruction fetches

2014-05-05 Thread Bandan Das

My initial attempt at caching gva->gpa->hva translations. Pretty straight
forward with details in the individual patches.

I haven't yet looked into if there are other possibilities
to speed things up, just thought of sending these out since 
the numbers are better

567 cycles/emulated jump instruction
718 cycles/emulated move instruction
730 cycles/emulated arithmetic instruction
946 cycles/emulated memory load instruction
956 cycles/emulated memory store instruction
921 cycles/emulated memory RMW instruction

Old realmode.flat numbers from init ctxt changes -
https://lkml.org/lkml/2014/4/16/848

639 cycles/emulated jump instruction (4.3%)
776 cycles/emulated move instruction (7.5%)
791 cycles/emulated arithmetic instruction (11%)
943 cycles/emulated memory load instruction (5.2%)
948 cycles/emulated memory store instruction (7.6%)
929 cycles/emulated memory RMW instruction (9.0%)

Bandan Das (3):
  KVM: x86: pass ctxt to fetch helper function
  KVM: x86: use memory_prepare in fetch helper function
  KVM: x86: cache userspace address for faster fetches

 arch/x86/include/asm/kvm_emulate.h |  7 +-
 arch/x86/kvm/x86.c | 46 --
 2 files changed, 40 insertions(+), 13 deletions(-)

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 3/3] KVM: x86: cache userspace address for faster fetches

2014-05-05 Thread Bandan Das

On every instruction fetch, kvm_read_guest_virt_helper
does the gva to gpa translation followed by searching for the
memslot. Store the gva hva mapping so that if there's a match
we can directly call __copy_from_user()

Suggested-by: Paolo Bonzini 
Signed-off-by: Bandan Das 
---
 arch/x86/include/asm/kvm_emulate.h |  7 ++-
 arch/x86/kvm/x86.c | 33 +++--
 2 files changed, 29 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/kvm_emulate.h 
b/arch/x86/include/asm/kvm_emulate.h
index 085d688..20ccde4 100644
--- a/arch/x86/include/asm/kvm_emulate.h
+++ b/arch/x86/include/asm/kvm_emulate.h
@@ -323,10 +323,11 @@ struct x86_emulate_ctxt {
int (*execute)(struct x86_emulate_ctxt *ctxt);
int (*check_perm)(struct x86_emulate_ctxt *ctxt);
/*
-* The following five fields are cleared together,
+* The following six fields are cleared together,
 * the rest are initialized unconditionally in x86_decode_insn
 * or elsewhere
 */
+   bool addr_cache_valid;
u8 rex_prefix;
u8 lock_prefix;
u8 rep_prefix;
@@ -348,6 +349,10 @@ struct x86_emulate_ctxt {
struct fetch_cache fetch;
struct read_cache io_read;
struct read_cache mem_read;
+   struct {
+   gfn_t gfn;
+   unsigned long uaddr;
+   } addr_cache;
 };
 
 /* Repeat String Operation Prefix */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index cf69e3b..7afcfc7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4072,26 +4072,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void 
*val, unsigned int bytes,
unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
int ret;
unsigned long uaddr;
+   gfn_t gfn = addr >> PAGE_SHIFT;
 
-   ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
-   exception, false,
-   NULL, &uaddr);
-   if (ret != X86EMUL_CONTINUE)
-   return ret;
+   if (ctxt->addr_cache_valid &&
+   (ctxt->addr_cache.gfn == gfn))
+   uaddr = (ctxt->addr_cache.uaddr << PAGE_SHIFT) +
+   offset_in_page(addr);
+   else {
+   ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
+   exception, false,
+   NULL, &uaddr);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+
+   if (unlikely(kvm_is_error_hva(uaddr))) {
+   r = X86EMUL_PROPAGATE_FAULT;
+   return r;
+   }
 
-   if (unlikely(kvm_is_error_hva(uaddr))) {
-   r = X86EMUL_PROPAGATE_FAULT;
-   return r;
+   /* Cache gfn and hva */
+   ctxt->addr_cache.gfn = addr >> PAGE_SHIFT;
+   ctxt->addr_cache.uaddr = uaddr >> PAGE_SHIFT;
+   ctxt->addr_cache_valid = true;
}
 
ret = __copy_from_user(data, (void __user *)uaddr, toread);
if (ret < 0) {
r = X86EMUL_IO_NEEDED;
+   /* Where else should we invalidate cache ? */
+   ctxt->ops->memory_finish(ctxt, NULL, uaddr);
return r;
}
 
-   ctxt->ops->memory_finish(ctxt, NULL, uaddr);
-
bytes -= toread;
data += toread;
addr += toread;
@@ -4339,6 +4351,7 @@ static void emulator_memory_finish(struct 
x86_emulate_ctxt *ctxt,
struct kvm_memory_slot *memslot;
gfn_t gfn;
 
+   ctxt->addr_cache_valid = false;
if (!opaque)
return;
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Paul Mackerras

On Mon, May 05, 2014 at 01:19:30PM +0200, Alexander Graf wrote:
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> >+#ifdef CONFIG_PPC_BOOK3S_64
> >+return vcpu->arch.fault_dar;
> 
> How about PA6T and G5s?

G5 sets DAR on an alignment interrupt.

As for PA6T, I don't know for sure, but if it doesn't, ordinary
alignment interrupts wouldn't be handled properly, since the code in
arch/powerpc/kernel/align.c assumes DAR contains the address being
accessed on all PowerPC CPUs.

Did PA Semi ever publish a user manual for the PA6T, I wonder?

Paul.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 2/3] KVM: x86: use memory_prepare in fetch helper function

2014-05-05 Thread Bandan Das

Insn fetch fastpath function. Not that
arch.walk_mmu->gva_to_gpa can't be used but let's
piggyback on top of interface meant for our purpose

Signed-off-by: Bandan Das 
---
 arch/x86/kvm/x86.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 17e3d661..cf69e3b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4065,29 +4065,38 @@ static int kvm_read_guest_virt_helper(gva_t addr, void 
*val, unsigned int bytes,
  struct x86_exception *exception)
 {
void *data = val;
-   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
int r = X86EMUL_CONTINUE;
 
while (bytes) {
-   gpa_t gpa = vcpu->arch.walk_mmu->gva_to_gpa(vcpu, addr, access,
-   exception);
unsigned offset = addr & (PAGE_SIZE-1);
unsigned toread = min(bytes, (unsigned)PAGE_SIZE - offset);
int ret;
+   unsigned long uaddr;
 
-   if (gpa == UNMAPPED_GVA)
-   return X86EMUL_PROPAGATE_FAULT;
-   ret = kvm_read_guest(vcpu->kvm, gpa, data, toread);
+   ret = ctxt->ops->memory_prepare(ctxt, addr, toread,
+   exception, false,
+   NULL, &uaddr);
+   if (ret != X86EMUL_CONTINUE)
+   return ret;
+
+   if (unlikely(kvm_is_error_hva(uaddr))) {
+   r = X86EMUL_PROPAGATE_FAULT;
+   return r;
+   }
+
+   ret = __copy_from_user(data, (void __user *)uaddr, toread);
if (ret < 0) {
r = X86EMUL_IO_NEEDED;
-   goto out;
+   return r;
}
 
+   ctxt->ops->memory_finish(ctxt, NULL, uaddr);
+
bytes -= toread;
data += toread;
addr += toread;
}
-out:
+
return r;
 }
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[RFC PATCH 1/3] KVM: x86: pass ctxt to fetch helper function

2014-05-05 Thread Bandan Das

In the following patches, our adress caching struct that's
embedded within struct x86_emulate_ctxt will need to be
accessed

Signed-off-by: Bandan Das 
---
 arch/x86/kvm/x86.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 122410d..17e3d661 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -4061,10 +4061,11 @@ gpa_t kvm_mmu_gva_to_gpa_system(struct kvm_vcpu *vcpu, 
gva_t gva,
 }
 
 static int kvm_read_guest_virt_helper(gva_t addr, void *val, unsigned int 
bytes,
- struct kvm_vcpu *vcpu, u32 access,
+ struct x86_emulate_ctxt *ctxt, u32 access,
  struct x86_exception *exception)
 {
void *data = val;
+   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
int r = X86EMUL_CONTINUE;
 
while (bytes) {
@@ -4098,7 +4099,7 @@ static int kvm_fetch_guest_virt(struct x86_emulate_ctxt 
*ctxt,
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 
-   return kvm_read_guest_virt_helper(addr, val, bytes, vcpu,
+   return kvm_read_guest_virt_helper(addr, val, bytes, ctxt,
  access | PFERR_FETCH_MASK,
  exception);
 }
@@ -4110,7 +4111,7 @@ int kvm_read_guest_virt(struct x86_emulate_ctxt *ctxt,
struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
u32 access = (kvm_x86_ops->get_cpl(vcpu) == 3) ? PFERR_USER_MASK : 0;
 
-   return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, access,
+   return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, access,
  exception);
 }
 EXPORT_SYMBOL_GPL(kvm_read_guest_virt);
@@ -4119,8 +4120,7 @@ static int kvm_read_guest_virt_system(struct 
x86_emulate_ctxt *ctxt,
  gva_t addr, void *val, unsigned int bytes,
  struct x86_exception *exception)
 {
-   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
-   return kvm_read_guest_virt_helper(addr, val, bytes, vcpu, 0, exception);
+   return kvm_read_guest_virt_helper(addr, val, bytes, ctxt, 0, exception);
 }
 
 int kvm_write_guest_virt_system(struct x86_emulate_ctxt *ctxt,
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Benjamin Herrenschmidt

On Mon, 2014-05-05 at 16:43 +0200, Alexander Graf wrote:
> > Paul mentioned that BOOK3S always had DAR value set on alignment
> > interrupt. And the patch is to enable/collect correct DAR value when
> > running with Little Endian PR guest. Now to limit the impact and to
> > enable Little Endian PR guest, I ended up doing the conditional code
> > only for book3s 64 for which we know for sure that we set DAR value.
> 
> Yes, and I'm asking whether we know that this statement holds true for 
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
> at least developed by IBM, I'd assume its semantics here are similar to 
> POWER4, but for PA6T I wouldn't be so sure.

I am not aware of any PowerPC processor that does not set DAR on
alignment interrupts. Paul, are you ?

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM exit on UD interception

2014-05-05 Thread Nakajima, Jun

On Mon, May 5, 2014 at 11:48 AM, Alexandru Duţu  wrote:
> Thank you Jun! I see that in case of VMX does not emulated the
> instruction that produced a UD exception, it just queues the exception
> and returns 1. After that KVM will still try to enter virtualized
> execution and so forth, the execution probably finishing with a DF and
> shut down. It does not seem that KVM, in case of VMX, will exit
> immediately on UD.
>
> I am not sure what you meant with MOVBE emulation.

I meant:

commit 84cffe499b9418d6c3b4de2ad9599cc2ec50c607
Author: Borislav Petkov 
Date:   Tue Oct 29 12:54:56 2013 +0100

kvm: Emulate MOVBE

This basically came from the need to be able to boot 32-bit Atom SMP
guests on an AMD host, i.e. a host which doesn't support MOVBE. As a
matter of fact, qemu has since recently received MOVBE support but we
cannot share that with kvm emulation and thus we have to do this in the
host. We're waay faster in kvm anyway. :-)

So, we piggyback on the #UD path and emulate the MOVBE functionality.
With it, an 8-core SMP guest boots in under 6 seconds.

Also, requesting MOVBE emulation needs to happen explicitly to work,
i.e. qemu -cpu n270,+movbe...

Just FYI, a fairly straight-forward boot of a MOVBE-enabled 3.9-rc6+
kernel in kvm executes MOVBE ~60K times.

Signed-off-by: Andre Przywara 
Signed-off-by: Borislav Petkov 
Signed-off-by: Paolo Bonzini 

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Benjamin Herrenschmidt

On Mon, 2014-05-05 at 17:16 +0200, Alexander Graf wrote:
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

Possibly but the fact remains, this can be avoided by making sure that
if we create a CMA reserve for KVM, then it uses it rather than using
the rest of main memory for hash tables.

> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

The point is that we explicitly reserve those pages in CMA for use
by KVM for that specific purpose, but the current code tries first
to get them out of the normal pool.

This is not an optimal behaviour and is what Aneesh patches are
trying to fix.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Benjamin Herrenschmidt

On Mon, 2014-05-05 at 19:56 +0530, Aneesh Kumar K.V wrote:
> 
> Paul mentioned that BOOK3S always had DAR value set on alignment
> interrupt. And the patch is to enable/collect correct DAR value when
> running with Little Endian PR guest. Now to limit the impact and to
> enable Little Endian PR guest, I ended up doing the conditional code
> only for book3s 64 for which we know for sure that we set DAR value.

Only BookS ? Afaik, the kernel align.c unconditionally uses DAR on
every processor type. It's DSISR that may or may not be populated
but afaik DAR always is.

Cheers,
Ben.


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] KVM call minutes for 2014-04-29

2014-05-05 Thread Peter Crosthwaite

On Wed, Apr 30, 2014 at 1:20 AM, Juan Quintela  wrote:
>
>
> 2014-04-29
> --
>
> - security (CVE)
>   New group to handle that issues responsible.
>   Mail is still not encrypted, wolud be.
>   mst writing a wiki page about it
>   what is the criteria to request (not) for a CVE number
>   Look at http://wiki.qemu.org/SecurityProcess
>
> - hot [un]plug for passthrough devices for platform devices
>
>   Lots of discussions about how to do it internally/externally from
>   qemu, both with its [dis]advantages.  Basically how to do things
>   there.
>

Iv'e had a play with QOMifying both Memory regions and GPIOs and
attaching them via QOM links. Looks viable as a unified solution. Can
we discuss at next call?

Regards,
Peter

>
> Later, Juan.
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4 5/5] change update_range to handle > 4GB 2nd stage range for ARMv7

2014-05-05 Thread Gavin Guo

Hi Mario,

On Tue, Apr 29, 2014 at 9:06 AM, Mario Smarduch  wrote:
>
> This patch adds support for unmapping 2nd stage page tables for addresses >4GB
> on ARMv7.
>
> Signed-off-by: Mario Smarduch 
> ---
>  arch/arm/kvm/mmu.c |   20 
>  1 file changed, 12 insertions(+), 8 deletions(-)
>
> diff --git a/arch/arm/kvm/mmu.c b/arch/arm/kvm/mmu.c
> index 88f5503..afbf8ba 100644
> --- a/arch/arm/kvm/mmu.c
> +++ b/arch/arm/kvm/mmu.c
> @@ -176,21 +176,25 @@ static void clear_pte_entry(struct kvm *kvm, pte_t 
> *pte, phys_addr_t addr)
> }
>  }
>
> +/* Function shared between identity and 2nd stage mappings. For 2nd stage
> + * the IPA may be > 4GB on ARMv7, and page table range functions
> + * will fail. kvm_xxx_addr_end() is used to handle both cases.
> + */
>  static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
> -   unsigned long long start, u64 size)
> +   phys_addr_t start, u64 size)
>  {
> pgd_t *pgd;
> pud_t *pud;
> pmd_t *pmd;
> pte_t *pte;
> -   unsigned long long addr = start, end = start + size;
> -   u64 next;
> +   phys_addr_t addr = start, end = start + size;
> +   phys_addr_t next;
>
> while (addr < end) {
> pgd = pgdp + pgd_index(addr);
> pud = pud_offset(pgd, addr);
> if (pud_none(*pud)) {
> -   addr = pud_addr_end(addr, end);
> +   addr = kvm_pud_addr_end(addr, end);
> continue;
> }
>
> @@ -200,13 +204,13 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
>  * move on.
>  */
> clear_pud_entry(kvm, pud, addr);
> -   addr = pud_addr_end(addr, end);
> +   addr = kvm_pud_addr_end(addr, end);
> continue;
> }
>
> pmd = pmd_offset(pud, addr);
> if (pmd_none(*pmd)) {
> -   addr = pmd_addr_end(addr, end);
> +   addr = kvm_pmd_addr_end(addr, end);
> continue;
> }
>
> @@ -221,10 +225,10 @@ static void unmap_range(struct kvm *kvm, pgd_t *pgdp,
>  */
> if (kvm_pmd_huge(*pmd) || page_empty(pte)) {
> clear_pmd_entry(kvm, pmd, addr);
> -   next = pmd_addr_end(addr, end);
> +   next = kvm_pmd_addr_end(addr, end);
> if (page_empty(pmd) && !page_empty(pud)) {
> clear_pud_entry(kvm, pud, addr);
> -   next = pud_addr_end(addr, end);
> +   next = kvm_pud_addr_end(addr, end);
> }
> }
>
> --
> 1.7.9.5
>
>
>

It seems that your adding kvm_pmd_addr_end(addr, end) already exists
in the following patch and may need to remove these parts from your
patch.

commit a3c8bd31af260a17d626514f636849ee1cd1f63e
Author: Marc Zyngier 
Date:   Tue Feb 18 14:29:03 2014 +

ARM: KVM: introduce kvm_p*d_addr_end

The use of p*d_addr_end with stage-2 translation is slightly dodgy,
as the IPA is 40bits, while all the p*d_addr_end helpers are
taking an unsigned long (arm64 is fine with that as unligned long
is 64bit).

The fix is to introduce 64bit clean versions of the same helpers,
and use them in the stage-2 page table code.

Signed-off-by: Marc Zyngier 
Acked-by: Catalin Marinas 
Reviewed-by: Christoffer Dall 

Gavin
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcelo Tosatti

Marcin,

Can you provide detailed instructions on how to reproduce the problem? 

Thanks

On Mon, May 05, 2014 at 08:27:10PM -0300, Marcelo Tosatti wrote:
> On Mon, May 05, 2014 at 08:26:04PM +0200, Marcin Gibuła wrote:
> > >>is it possible to have kvmclock jumping forward?
> > >>
> > >>Because I've reproducible case when at about 1 per 20 vm restores, VM 
> > >>freezes for couple of hours and then resumes with date few hundreds years 
> > >>ahead. Happens only with kvmclock.
> > >>
> > >>And this patch seems to fix very similar issue so maybe it's all the same 
> > >>bug.
> > >
> > >I'm fairly sure it is the exact same bug. Jumping backward is like jumping 
> > >forward by a big amount :)
> > 
> > Hi,
> > 
> > I've tested your path on my test VM... don't know if it's pure luck
> > or not, but it didn't hang with over 70 restores.
> > 
> > The message "KVM Clock migrated backwards, using later time" fires
> > every time, but VM is healthy after resume.
> 
> What is the host clocksource? (cat
> /sys/devices/system/clocksource/clocksource0/current_clocksource). 
> 
> And kernel version?
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcelo Tosatti

On Mon, May 05, 2014 at 08:23:43PM -0300, Marcelo Tosatti wrote:
> Hi Alexander,
> 
> On Mon, May 05, 2014 at 03:51:22PM +0200, Alexander Graf wrote:
> > When we migrate we ask the kernel about its current belief on what the guest
> > time would be. 
> 
> KVM_GET_CLOCK which returns the time in "struct kvm_clock_data".
> 
> > However, I've seen cases where the kvmclock guest structure
> > indicates a time more recent than the kvm returned time.

This should not happen because the value returned by KVM_GET_CLOCK 
(get_kernel_ns() + kvmclock_offset) should be relatively in sync
with what is seen in the guest via kvmclock read.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcelo Tosatti

On Mon, May 05, 2014 at 08:26:04PM +0200, Marcin Gibuła wrote:
> >>is it possible to have kvmclock jumping forward?
> >>
> >>Because I've reproducible case when at about 1 per 20 vm restores, VM 
> >>freezes for couple of hours and then resumes with date few hundreds years 
> >>ahead. Happens only with kvmclock.
> >>
> >>And this patch seems to fix very similar issue so maybe it's all the same 
> >>bug.
> >
> >I'm fairly sure it is the exact same bug. Jumping backward is like jumping 
> >forward by a big amount :)
> 
> Hi,
> 
> I've tested your path on my test VM... don't know if it's pure luck
> or not, but it didn't hang with over 70 restores.
> 
> The message "KVM Clock migrated backwards, using later time" fires
> every time, but VM is healthy after resume.

What is the host clocksource? (cat
/sys/devices/system/clocksource/clocksource0/current_clocksource). 

And kernel version?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcelo Tosatti

Hi Alexander,

On Mon, May 05, 2014 at 03:51:22PM +0200, Alexander Graf wrote:
> When we migrate we ask the kernel about its current belief on what the guest
> time would be. 

KVM_GET_CLOCK which returns the time in "struct kvm_clock_data".

> However, I've seen cases where the kvmclock guest structure
> indicates a time more recent than the kvm returned time.

More details please:

1) By what algorithm you retrieve
and compare time in kvmclock guest structure and KVM_GET_CLOCK.
What are the results of the comparison.
And whether and backwards time was visible in the guest.

2) What is the host clocksource.

The test below is not a good one because:

T1) KVM_GET_CLOCK (save s->clock).
T2) save env->tsc.

The difference in scaled time between T1 and T2 is larger than 1
nanosecond, so the

(time_at_migration > s->clock)

check is almost always positive (what matters though is whether 
time backwards event can be seen reading kvmclock in the guest).

> To make sure we never go backwards, calculate what the guest would have seen
> as time at the point of migration and use that value instead of the kernel
> returned one when it's more recent.
> 
> While this doesn't fix the underlying issue that the kernel's view of time
> is skewed, it allows us to safely migrate guests even from sources that are
> known broken.
> 
> Signed-off-by: Alexander Graf 
> ---
>  hw/i386/kvm/clock.c | 48 
>  1 file changed, 48 insertions(+)
> 
> diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
> index 892aa02..c6521cf 100644
> --- a/hw/i386/kvm/clock.c
> +++ b/hw/i386/kvm/clock.c
> @@ -14,6 +14,7 @@
>   */
>  
>  #include "qemu-common.h"
> +#include "qemu/host-utils.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "hw/sysbus.h"
> @@ -34,6 +35,47 @@ typedef struct KVMClockState {
>  bool clock_valid;
>  } KVMClockState;
>  
> +struct pvclock_vcpu_time_info {
> +uint32_t   version;
> +uint32_t   pad0;
> +uint64_t   tsc_timestamp;
> +uint64_t   system_time;
> +uint32_t   tsc_to_system_mul;
> +int8_t tsc_shift;
> +uint8_tflags;
> +uint8_tpad[2];
> +} __attribute__((__packed__)); /* 32 bytes */
> +
> +static uint64_t kvmclock_current_nsec(KVMClockState *s)
> +{
> +CPUState *cpu = first_cpu;
> +CPUX86State *env = cpu->env_ptr;
> +hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
> +uint64_t migration_tsc = env->tsc;
> +struct pvclock_vcpu_time_info time;
> +uint64_t delta;
> +uint64_t nsec_lo;
> +uint64_t nsec_hi;
> +uint64_t nsec;
> +
> +if (!(env->system_time_msr & 1ULL)) {
> +/* KVM clock not active */
> +return 0;
> +}
> +
> +cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
> +
> +delta = migration_tsc - time.tsc_timestamp;
> +if (time.tsc_shift < 0) {
> +delta >>= -time.tsc_shift;
> +} else {
> +delta <<= time.tsc_shift;
> +}
> +
> +mulu64(&nsec_lo, &nsec_hi, delta, time.tsc_to_system_mul);
> +nsec = (nsec_lo >> 32) | (nsec_hi << 32);
> +return nsec + time.system_time;
> +}
>  
>  static void kvmclock_vm_state_change(void *opaque, int running,
>   RunState state)
> @@ -45,9 +87,15 @@ static void kvmclock_vm_state_change(void *opaque, int 
> running,
>  
>  if (running) {
>  struct kvm_clock_data data;
> +uint64_t time_at_migration = kvmclock_current_nsec(s);
>  
>  s->clock_valid = false;
>  
> +if (time_at_migration > s->clock) {
> +fprintf(stderr, "KVM Clock migrated backwards, using later 
> time\n");
> +s->clock = time_at_migration;
> +}
> +
>  data.clock = s->clock;
>  data.flags = 0;
>  ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
> -- 
> 1.7.12.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Nested EPT page fault

2014-05-05 Thread Hu Yaohui

Hi,
I have one question related to nested EPT page fault.
At the very start, L0 hypervisor launches L2 with an empty EPT0->2
table, building the table on-the-fly.
when one L2 physical page is accessed, ept_page_fault(paging_tmpl.h)
will be called to handle this fault in L0. which will first call
ept_walk_addr to get guest ept entry from EPT1->2. If there is no such
entry, a guest page fault will be injected into L1 to handle this
fault.
When the next time, the same L2 physical page is accessed,
ept_page_fault will be triggered again in L0, which will also call
ept_walk_addr and get the previously filled ept entry in EPT1->2, then
try_async_pf will be called to translate the L1 physical page to L0
physical page. At the very last, an entry will be created in the
EPT0->2 to solve the page fault.
Please correct me if I am wrong.

My question is when the EPT0->1 will be accessed during the EPT0->2
entry created, since according to the turtle's paper, both EPT0->1 and
EPT->12 will be accessed to populate an entry in EPT0->2.

Thanks for your time!

Best Wishes,
Yaohui
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Christian Zigotzky


Am 05.05.14 16:57, schrieb Olof Johansson:

[Now without HTML email -- it's what you get for cc:ing me at work
instead of my upstream email :)]

2014-05-05 7:43 GMT-07:00 Alexander Graf :

On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

Although it's optional IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V3:
* Use make_dsisr instead of checking feature flag to decide whether to use
 saved dsisr or not





ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
+#ifdef CONFIG_PPC_BOOK3S_64
+   return vcpu->arch.fault_dar;

How about PA6T and G5s?



Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.


Yes, and I'm asking whether we know that this statement holds true for PA6T and 
G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least 
developed by IBM, I'd assume its semantics here are similar to POWER4, but for 
PA6T I wouldn't be so sure.


Thanks for looking out for us, obviously IBM doesn't (based on the
reply a minute ago).

In the end, since there's been no work to enable KVM on PA6T, I'm not
too worried. I guess it's one more thing to sort out (and check for)
whenever someone does that.

I definitely don't have cycles to deal with that myself at this time.
I can help find hardware for someone who wants to, but even then I'm
guessing the interest is pretty limited.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe kvm-ppc" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Just for info: "PR" KVM works great on my PA6T machine. I booted the 
Lubuntu 14.04 PowerPC live DVD on a QEMU virtual machine with "PR" KVM 
successfully. But Mac OS X Jaguar, Panther, and Tiger don't boot with 
KVM on Mac-on-Linux and QEMU. See 
http://forum.hyperion-entertainment.biz/viewtopic.php?f=35&t=1747.


-- Christian
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/1] KVM: x86: improve the usability of the 'kvm_pio' tracepoint

2014-05-05 Thread Paolo Bonzini


Il 02/05/2014 17:57, Ulrich Obergfell ha scritto:

This patch moves the 'kvm_pio' tracepoint to emulator_pio_in_emulated()
and emulator_pio_out_emulated(), and it adds an argument (a pointer to
the 'pio_data'). A single 8-bit or 16-bit or 32-bit data item is fetched
from 'pio_data' (depending on 'size'), and the value is included in the
trace record ('val'). If 'count' is greater than one, this is indicated
by the string "(...)" in the trace output.


A difference is that the tracepoint will be reported after an exit to 
userspace in the case of "in", rather than before.


The improvement however is noticeable; especially for "out" it allows to 
obtain much more information about the state of a device from a long 
trace.  Applying to kvm/queue, thanks.


Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: x86_64 allyesconfig has screwed up voffset and blows up KVM

2014-05-05 Thread H. Peter Anvin

On 05/05/2014 11:41 AM, Andy Lutomirski wrote:
> I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9.  I'm not sure
> what's going on here.
> 
> voffset.h contains:
> 
> #define VO__end 0x8111c7a0
> #define VO__end 0x8db9a000
> #define VO__text 0x8100
> 
> because
> 
> $ nm vmlinux|grep ' _end'
> 8111c7a0 t _end
> 8db9a000 B _end
> 

The "t _end" implies there is a local symbol _end which I guess the
scripts are incorrectly picking up.  Taking a look now.

-hpa


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PULL 05/20] machine: Replace QEMUMachine by MachineClass in accelerator configuration

2014-05-05 Thread Andreas Färber

From: Marcel Apfelbaum 

This minimizes QEMUMachine usage, as part of machine QOM-ification.

Signed-off-by: Marcel Apfelbaum 
Signed-off-by: Andreas Färber 
---
 include/hw/boards.h |  3 +--
 include/hw/xen/xen.h|  2 +-
 include/qemu/typedefs.h |  1 +
 include/sysemu/kvm.h|  2 +-
 include/sysemu/qtest.h  |  2 +-
 kvm-all.c   |  6 +++---
 kvm-stub.c  |  2 +-
 qtest.c |  2 +-
 vl.c| 10 +-
 xen-all.c   |  2 +-
 xen-stub.c  |  2 +-
 11 files changed, 17 insertions(+), 17 deletions(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index be2e432..8f53334 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -3,12 +3,11 @@
 #ifndef HW_BOARDS_H
 #define HW_BOARDS_H
 
+#include "qemu/typedefs.h"
 #include "sysemu/blockdev.h"
 #include "hw/qdev.h"
 #include "qom/object.h"
 
-typedef struct MachineClass MachineClass;
-
 typedef struct QEMUMachineInitArgs {
 const MachineClass *machine;
 ram_addr_t ram_size;
diff --git a/include/hw/xen/xen.h b/include/hw/xen/xen.h
index 9d549fc..85fda3d 100644
--- a/include/hw/xen/xen.h
+++ b/include/hw/xen/xen.h
@@ -36,7 +36,7 @@ void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
 
 qemu_irq *xen_interrupt_controller_init(void);
 
-int xen_init(QEMUMachine *machine);
+int xen_init(MachineClass *mc);
 int xen_hvm_init(MemoryRegion **ram_memory);
 void xenstore_store_pv_console_info(int i, struct CharDriverState *chr);
 
diff --git a/include/qemu/typedefs.h b/include/qemu/typedefs.h
index bf8daac..86bab12 100644
--- a/include/qemu/typedefs.h
+++ b/include/qemu/typedefs.h
@@ -31,6 +31,7 @@ typedef struct MemoryListener MemoryListener;
 typedef struct MemoryMappingList MemoryMappingList;
 
 typedef struct QEMUMachine QEMUMachine;
+typedef struct MachineClass MachineClass;
 typedef struct NICInfo NICInfo;
 typedef struct HCIInfo HCIInfo;
 typedef struct AudioState AudioState;
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index 192fe89..5ad4e0e 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -152,7 +152,7 @@ extern KVMState *kvm_state;
 
 /* external API */
 
-int kvm_init(QEMUMachine *machine);
+int kvm_init(MachineClass *mc);
 
 int kvm_has_sync_mmu(void);
 int kvm_has_vcpu_events(void);
diff --git a/include/sysemu/qtest.h b/include/sysemu/qtest.h
index 224131f..95c9ade 100644
--- a/include/sysemu/qtest.h
+++ b/include/sysemu/qtest.h
@@ -26,7 +26,7 @@ static inline bool qtest_enabled(void)
 
 bool qtest_driver(void);
 
-int qtest_init_accel(QEMUMachine *machine);
+int qtest_init_accel(MachineClass *mc);
 void qtest_init(const char *qtest_chrdev, const char *qtest_log, Error **errp);
 
 static inline int qtest_available(void)
diff --git a/kvm-all.c b/kvm-all.c
index 82a9119..5cb7f26 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -1341,7 +1341,7 @@ static int kvm_max_vcpus(KVMState *s)
 return (ret) ? ret : kvm_recommended_vcpus(s);
 }
 
-int kvm_init(QEMUMachine *machine)
+int kvm_init(MachineClass *mc)
 {
 static const char upgrade_note[] =
 "Please upgrade to at least kernel 2.6.29 or recent kvm-kmod\n"
@@ -1433,8 +1433,8 @@ int kvm_init(QEMUMachine *machine)
 }
 
 kvm_type = qemu_opt_get(qemu_get_machine_opts(), "kvm-type");
-if (machine->kvm_type) {
-type = machine->kvm_type(kvm_type);
+if (mc->kvm_type) {
+type = mc->kvm_type(kvm_type);
 } else if (kvm_type) {
 fprintf(stderr, "Invalid argument kvm-type=%s\n", kvm_type);
 goto err;
diff --git a/kvm-stub.c b/kvm-stub.c
index ccdba62..8acda86 100644
--- a/kvm-stub.c
+++ b/kvm-stub.c
@@ -34,7 +34,7 @@ int kvm_init_vcpu(CPUState *cpu)
 return -ENOSYS;
 }
 
-int kvm_init(QEMUMachine *machine)
+int kvm_init(MachineClass *mc)
 {
 return -ENOSYS;
 }
diff --git a/qtest.c b/qtest.c
index 0ac9f42..2aba20d 100644
--- a/qtest.c
+++ b/qtest.c
@@ -500,7 +500,7 @@ static void qtest_event(void *opaque, int event)
 }
 }
 
-int qtest_init_accel(QEMUMachine *machine)
+int qtest_init_accel(MachineClass *mc)
 {
 configure_icount("0");
 
diff --git a/vl.c b/vl.c
index 2c2b625..f423b2e 100644
--- a/vl.c
+++ b/vl.c
@@ -2725,7 +2725,7 @@ static MachineClass *machine_parse(const char *name)
 exit(!name || !is_help_option(name));
 }
 
-static int tcg_init(QEMUMachine *machine)
+static int tcg_init(MachineClass *mc)
 {
 tcg_exec_init(tcg_tb_size * 1024 * 1024);
 return 0;
@@ -2735,7 +2735,7 @@ static struct {
 const char *opt_name;
 const char *name;
 int (*available)(void);
-int (*init)(QEMUMachine *);
+int (*init)(MachineClass *mc);
 bool *allowed;
 } accel_list[] = {
 { "tcg", "tcg", tcg_available, tcg_init, &tcg_allowed },
@@ -2744,7 +2744,7 @@ static struct {
 { "qtest", "QTest", qtest_available, qtest_init_accel, &qtest_allowed },
 };
 
-static int configure_accelerator(QEMUMachine *machine)
+static int configure_accelerator(MachineClass *mc)
 {
 const char *p;

Re: KVM exit on UD interception

2014-05-05 Thread Alexandru Duţu

Thank you Jun! I see that in case of VMX does not emulated the
instruction that produced a UD exception, it just queues the exception
and returns 1. After that KVM will still try to enter virtualized
execution and so forth, the execution probably finishing with a DF and
shut down. It does not seem that KVM, in case of VMX, will exit
immediately on UD.

I am not sure what you meant with MOVBE emulation.

Thanks,
Alex

On Mon, May 5, 2014 at 12:34 PM, Nakajima, Jun  wrote:
> On Mon, May 5, 2014 at 8:56 AM, Alexandru Duţu  wrote:
>> Dear all,
>>
>> It seems that currently, on UD interception KVM does not exit
>> completely. Virtualized execution finishes, KVM executes
>> ud_intercept() after which it enters virtualized execution again.
>
> Maybe you might want to take a look at the VMX side (to port it to
> SVM). The MOVBE emulation, for example, should be helpful.
>
>>
>> I am working on accelerating with virtualized execution a simulator
>> that emulates system calls. Essentially doing virtualized execution
>> without a OS kernel. In order to make this work, I had to modify my
>> the KVM kernel module such that ud_intercept() return 0 and not 1
>> which break KVM __vcpu_run loop. This is necessary as I need to trap
>> syscall instructions, exit virtualized execution with UD exception,
>> emulate the system call in the simulator and after the system call is
>> done enter back in virtualized mode and start execution with the help
>> of KVM.
>>
>> So by modifying ud_intercept() to return 0, I got all this to work. Is
>> it possible to achieve the same effect (exit on undefined opcode)
>> without modifying ud_intercept()?
>>
>> It seems that re-entering virtualized execution on UD interception
>> gives the user the flexibility of running binaries with newer
>> instructions on older hardware, if kvm is able to emulate the newer
>> instructions. I do not fully understand the details of this scenario,
>> is there such a scenario or is it likely that ud_interception() will
>> change?
>>
>> Thank you in advance!
>>
>> Best regards,
>> Alex
>> --
>
> --
> Jun
> Intel Open Source Technology Center



-- 
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

x86_64 allyesconfig has screwed up voffset and blows up KVM

2014-05-05 Thread Andy Lutomirski

I'm testing 39bfe90706ab0f588db7cb4d1c0e6d1181e1d2f9.  I'm not sure
what's going on here.

voffset.h contains:

#define VO__end 0x8111c7a0
#define VO__end 0x8db9a000
#define VO__text 0x8100

because

$ nm vmlinux|grep ' _end'
8111c7a0 t _end
8db9a000 B _end


Booting the resulting image says:

KVM internal error. Suberror: 1
emulation failure
EAX=8001 EBX= ECX=c080 EDX=
ESI=00014630 EDI=0b08f000 EBP=0010 ESP=038f14b8
EIP=00100119 EFL=00010046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0018   00c09300 DPL=0 DS   [-WA]
CS =0010   00c09b00 DPL=0 CS32 [-RA]
SS =0018   00c09300 DPL=0 DS   [-WA]
DS =0018   00c09300 DPL=0 DS   [-WA]
FS =0018   00c09300 DPL=0 DS   [-WA]
GS =0018   00c09300 DPL=0 DS   [-WA]
LDT=   00c0
TR =0020  0fff 00808b00 DPL=0 TSS64-busy
GDT= 038e5320 0030
IDT=  
CR0=8011 CR2= CR3=0b089000 CR4=0020
DR0= DR1= DR2=
DR3=
DR6=0ff0 DR7=0400
EFER=0500
Code=?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? 
?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ?? ??
?? ?? ?? ?? ?? ??

Linus's tree from today doesn't seem any better.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcin Gibuła


is it possible to have kvmclock jumping forward?

Because I've reproducible case when at about 1 per 20 vm restores, VM freezes 
for couple of hours and then resumes with date few hundreds years ahead. 
Happens only with kvmclock.

And this patch seems to fix very similar issue so maybe it's all the same bug.


I'm fairly sure it is the exact same bug. Jumping backward is like jumping 
forward by a big amount :)


Hi,

I've tested your path on my test VM... don't know if it's pure luck or 
not, but it didn't hang with over 70 restores.


The message "KVM Clock migrated backwards, using later time" fires every 
time, but VM is healthy after resume.


--
mg

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Alexander Graf



> Am 05.05.2014 um 19:46 schrieb Marcin Gibuła :
> 
> W dniu 2014-05-05 15:51, Alexander Graf pisze:
>> When we migrate we ask the kernel about its current belief on what the guest
>> time would be. However, I've seen cases where the kvmclock guest structure
>> indicates a time more recent than the kvm returned time.
> 
> Hi,
> 
> is it possible to have kvmclock jumping forward?
> 
> Because I've reproducible case when at about 1 per 20 vm restores, VM freezes 
> for couple of hours and then resumes with date few hundreds years ahead. 
> Happens only with kvmclock.
> 
> And this patch seems to fix very similar issue so maybe it's all the same bug.

I'm fairly sure it is the exact same bug. Jumping backward is like jumping 
forward by a big amount :)

Alex

> 
> -- 
> mg
> 
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Marcin Gibuła


W dniu 2014-05-05 15:51, Alexander Graf pisze:

When we migrate we ask the kernel about its current belief on what the guest
time would be. However, I've seen cases where the kvmclock guest structure
indicates a time more recent than the kvm returned time.


Hi,

is it possible to have kvmclock jumping forward?

Because I've reproducible case when at about 1 per 20 vm restores, VM 
freezes for couple of hours and then resumes with date few hundreds 
years ahead. Happens only with kvmclock.


And this patch seems to fix very similar issue so maybe it's all the 
same bug.


--
mg



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM exit on UD interception

2014-05-05 Thread Nakajima, Jun

On Mon, May 5, 2014 at 8:56 AM, Alexandru Duţu  wrote:
> Dear all,
>
> It seems that currently, on UD interception KVM does not exit
> completely. Virtualized execution finishes, KVM executes
> ud_intercept() after which it enters virtualized execution again.

Maybe you might want to take a look at the VMX side (to port it to
SVM). The MOVBE emulation, for example, should be helpful.

>
> I am working on accelerating with virtualized execution a simulator
> that emulates system calls. Essentially doing virtualized execution
> without a OS kernel. In order to make this work, I had to modify my
> the KVM kernel module such that ud_intercept() return 0 and not 1
> which break KVM __vcpu_run loop. This is necessary as I need to trap
> syscall instructions, exit virtualized execution with UD exception,
> emulate the system call in the simulator and after the system call is
> done enter back in virtualized mode and start execution with the help
> of KVM.
>
> So by modifying ud_intercept() to return 0, I got all this to work. Is
> it possible to achieve the same effect (exit on undefined opcode)
> without modifying ud_intercept()?
>
> It seems that re-entering virtualized execution on UD interception
> gives the user the flexibility of running binaries with newer
> instructions on older hardware, if kvm is able to emulate the newer
> instructions. I do not fully understand the details of this scenario,
> is there such a scenario or is it likely that ud_interception() will
> change?
>
> Thank you in advance!
>
> Best regards,
> Alex
> --

-- 
Jun
Intel Open Source Technology Center
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] KVM: x86: improve the usability of the 'kvm_pio' tracepoint

2014-05-05 Thread Ulrich Obergfell

> From: "Xiao Guangrong" 
> To: "Ulrich Obergfell" , kvm@vger.kernel.org
> Cc: pbonz...@redhat.com
> Sent: Monday, May 5, 2014 9:10:19 AM
> Subject: Re: [PATCH 0/1] KVM: x86: improve the usability of the 'kvm_pio' 
> tracepoint
>
> On 05/02/2014 11:57 PM, Ulrich Obergfell wrote:
>> The current implementation of the 'kvm_pio' tracepoint in 
>> emulator_pio_in_out()
>> only tells us that 'something' has been read from or written to an I/O port. 
>> To
>> improve the usability of the tracepoint, I propose to include the 
>> value/content
>> that has been read or written in the trace output. The proposed patch aims at
>> the more common case where a single 8-bit or 16-bit or 32-bit value has been
>> read or written -- it does not fully cover the case where 'count' is greater
>> than one.
>> 
>> This is an example of what the patch can do (trace of PCI config space 
>> access).
>> 
>>  - on the host
>> 
>># trace-cmd record -e kvm:kvm_pio -f "(port >= 0xcf8) && (port <= 0xcff)"
>>/sys/kernel/debug/tracing/events/kvm/kvm_pio/filter
>>Hit Ctrl^C to stop recording
>> 
>>  - in a Linux guest
>> 
>># dd if=/sys/bus/pci/devices/:00:06.0/config bs=2 count=4 | hexdump
>>4+0 records in
>>4+0 records out
>>8 bytes (8 B) copied, 0.000114056 s, 70.1 kB/s
>>000 1af4 1001 0507 0010
>>008
>> 
>>  - on the host
>> 
>># trace-cmd report
>>...
>>qemu-kvm-23216 [001] 15211.994089: kvm_pio:  pio_write
>>at 0xcf8 size 4 count 1 val 0x80003000 
>>qemu-kvm-23216 [001] 15211.994108: kvm_pio:  pio_read
>>at 0xcfc size 2 count 1 val 0x1af4 
>>qemu-kvm-23216 [001] 15211.994129: kvm_pio:  pio_write
>>at 0xcf8 size 4 count 1 val 0x80003000 
>>qemu-kvm-23216 [001] 15211.994136: kvm_pio:  pio_read
>>at 0xcfe size 2 count 1 val 0x1001 
>>qemu-kvm-23216 [001] 15211.994143: kvm_pio:  pio_write
>>at 0xcf8 size 4 count 1 val 0x80003004 
>>qemu-kvm-23216 [001] 15211.994150: kvm_pio:  pio_read
>>at 0xcfc size 2 count 1 val 0x507 
>>qemu-kvm-23216 [001] 15211.994155: kvm_pio:  pio_write
>>at 0xcf8 size 4 count 1 val 0x80003004 
>>qemu-kvm-23216 [001] 15211.994161: kvm_pio:  pio_read
>>at 0xcfe size 2 count 1 val 0x10 
>> 
>
> Nice.
>
> Could please check "perf kvm stat" to see if "--event=ioport"
> can work after your patch?
>
> Reviewed-by: Xiao Guangrong 

I've run a quick test with a local kernel - built from 3.15.0-rc1 source,
including the proposed patch - in combination with the 'perf' package that
is installed on my test machine. I didn't build a new 'perf' binary from
3.15.0-rc1 source.

The following output of the 'perf kvm stat live --event=ioport -d 10' command
looks reasonable.


17:10:29.036811

Analyze events for all VMs, all VCPUs:

  IO Port AccessSamples  Samples% Time%   Min Time   Max Time   
  Avg time 

   0x177:PIN 3520.00%15.40%1us3us  
1.68us ( +-   8.63% )
   0x376:PIN 3017.14%16.37%1us6us  
2.08us ( +-  17.15% )
  0x170:POUT 15 8.57%18.99%2us9us  
4.83us ( +-  14.34% )
 0xc0ea:POUT 10 5.71% 6.57%2us2us  
2.51us ( +-   5.06% )
  0xc0ea:PIN 10 5.71% 6.21%1us6us  
2.37us ( +-  23.18% )
  0x176:POUT 10 5.71% 6.69%1us3us  
2.55us ( +-   7.59% )
   0x170:PIN  5 2.86% 3.36%2us2us  
2.56us ( +-   1.17% )
   0x171:PIN  5 2.86% 1.47%1us1us  
1.12us ( +-   0.37% )
  0x171:POUT  5 2.86% 3.26%2us2us  
2.49us ( +-   2.25% )
   0x172:PIN  5 2.86% 1.45%1us1us  
1.11us ( +-   0.24% )
  0x172:POUT  5 2.86% 2.67%1us2us  
2.04us ( +-   3.00% )
   0x173:PIN  5 2.86% 1.46%1us1us  
1.11us ( +-   0.29% )
  0x173:POUT  5 2.86% 2.60%1us2us  
1.99us ( +-   2.96% )
   0x174:PIN  5 2.86% 1.45%1us1us  
1.11us ( +-   0.16% )
  0x174:POUT  5 2.86% 2.60%1us2us  
1.99us ( +-   3.13% )
   0x175:PIN  5 2.86% 1.46%1us1us  
1.12us ( +-   0.15% )
  0x175:POUT  5 2.86% 2.60%1us2us  
1.98us ( +-   3.04% )
   0x176:PIN  5 2.86% 1.45%1us1us  
1.11us ( +-   0.23% )
  0x177:POUT  5 2.86% 3.94%2us3us  
3.01us ( +-   2.06% )

Total Samples:

KVM exit on UD interception

2014-05-05 Thread Alexandru Duţu

Dear all,

It seems that currently, on UD interception KVM does not exit
completely. Virtualized execution finishes, KVM executes
ud_intercept() after which it enters virtualized execution again.

I am working on accelerating with virtualized execution a simulator
that emulates system calls. Essentially doing virtualized execution
without a OS kernel. In order to make this work, I had to modify my
the KVM kernel module such that ud_intercept() return 0 and not 1
which break KVM __vcpu_run loop. This is necessary as I need to trap
syscall instructions, exit virtualized execution with UD exception,
emulate the system call in the simulator and after the system call is
done enter back in virtualized mode and start execution with the help
of KVM.

So by modifying ud_intercept() to return 0, I got all this to work. Is
it possible to achieve the same effect (exit on undefined opcode)
without modifying ud_intercept()?

It seems that re-entering virtualized execution on UD interception
gives the user the flexibility of running binaries with newer
instructions on older hardware, if kvm is able to emulate the newer
instructions. I do not fully understand the details of this scenario,
is there such a scenario or is it likely that ud_interception() will
change?

Thank you in advance!

Best regards,
Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

>> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" 
>> :
>> 
>> Alexander Graf  writes:
>> 
 On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
 We reserve 5% of total ram for CMA allocation and not using that can
 result in us running out of numa node memory with specific
 configuration. One caveat is we may not have node local hpt with pinned
 vcpu configuration. But currently libvirt also pins the vcpu to cpuset
 after creating hash page table.
>>> 
>>> I don't understand the problem. Can you please elaborate?
>> 
>> Lets take a system with 100GB RAM. We reserve around 5GB for htab
>> allocation. Now if we use rest of available memory for hugetlbfs
>> (because we want all the guest to be backed by huge pages), we would
>> end up in a situation where we have a few GB of free RAM and 5GB of CMA
>> reserve area. Now if we allow hash page table allocation to consume the
>> free space, we would end up hitting page allocation failure for other
>> non movable kernel allocation even though we still have 5GB CMA reserve
>> space free.
>
> Isn't this a greater problem? We should start swapping before we hit
> the point where non movable kernel allocation fails, no?

But there is nothing much to swap. Because most of the memory is
reserved for guest RAM via hugetlbfs. 

>
> The fact that KVM uses a good number of normal kernel pages is maybe
> suboptimal, but shouldn't be a critical problem.

Yes. But then in this case we could do better isn't it ? We already have
a large part of guest RAM kept aside for htab allocation which cannot be
used for non movable allocation. And we ignore that reserve space and
use other areas for hash page table allocation with the current code.

We actually hit this case in one of the test box.

 KVM guest htab at c01e5000 (order 30), LPID 1
 libvirtd invoked oom-killer: gfp_mask=0x2000d0, order=0,oom_score_adj=0
 libvirtd cpuset=/ mems_allowed=0,16
 CPU: 72 PID: 20044 Comm: libvirtd Not tainted 3.10.23-1401.pkvm2_1.4.ppc64 #1
 Call Trace:
 [c01e3b63f150] [c0017330] .show_stack+0x130/0x200(unreliable)
 [c01e3b63f220] [c087a888] .dump_stack+0x28/0x3c
 [c01e3b63f290] [c0876a4c] .dump_header+0xbc/0x228
 [c01e3b63f360] [c01dd838].oom_kill_process+0x318/0x4c0
 [c01e3b63f440] [c01de258] .out_of_memory+0x518/0x550
 [c01e3b63f520] [c01e5aac].__alloc_pages_nodemask+0xb3c/0xbf0
 [c01e3b63f700] [c0243580] .new_slab+0x440/0x490
 [c01e3b63f7a0] [c08781fc] .__slab_alloc+0x17c/0x618
 [c01e3b63f8d0] [c02467fc].kmem_cache_alloc_node_trace+0xcc/0x300
 [c01e3b63f990] [c010f62c].alloc_fair_sched_group+0xfc/0x200
 [c01e3b63fa60] [c0104f00].sched_create_group+0x50/0xe0
 [c01e3b63fae0] [c0104fc0].cpu_cgroup_css_alloc+0x30/0x80
 [c01e3b63fb60] [c01513ec] .cgroup_mkdir+0x2bc/0x6e0
 [c01e3b63fc50] [c0275aec] .vfs_mkdir+0x14c/0x220
 [c01e3b63fcf0] [c027a734] .SyS_mkdirat+0x94/0x110
 [c01e3b63fdb0] [c027a7e4] .SyS_mkdir+0x34/0x50
 [c01e3b63fe30] [c0009f54] syscall_exit+0x0/0x98

Node 0 DMA free:23424kB min:23424kB low:29248kB high:35136kB
active_anon:0kB inactive_anon:128kB active_file:256kB inactive_file:384kB
unevictable:9536kB isolated(anon):0kB isolated(file):0kB present:67108864kB
managed:65931776kB mlocked:9536kB dirty:64kB writeback:0kB mapped:5376kB
shmem:0kB slab_reclaimable:23616kB slab_unreclaimable:1237056kB
kernel_stack:18256kB pagetables:1088kB unstable:0kB bounce:0kB free_cma:0kB
writeback_tmp:0kB pages_scanned:78 all_unreclaimable? yes
lowmem_reserve[]: 0 0 0
Node 16 DMA free:5787008kB min:21376kB low:26688kB high:32064kB
active_anon:1984kB inactive_anon:2112kB active_file:896kB inactive_file:64kB
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:67108864kB
managed:60060032kB mlocked:0kB dirty:128kB writeback:3712kB mapped:0kB
shmem:0kB slab_reclaimable:23424kB slab_unreclaimable:826048kB
kernel_stack:576kB pagetables:1408kB unstable:0kB bounce:0kB free_cma:5767040kB
writeback_tmp:0kB pages_scanned:756 all_unreclaimable? yes

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/11] perf kvm: allow for variable string sizes

2014-05-05 Thread David Ahern


On 5/5/14, 4:27 AM, Christian Borntraeger wrote:

diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 922706c..806c0e4 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -75,7 +75,7 @@ struct kvm_events_ops {
bool (*is_end_event)(struct perf_evsel *evsel,
 struct perf_sample *sample, struct event_key *key);
void (*decode_key)(struct perf_kvm_stat *kvm, struct event_key *key,
-  char decode[20]);
+  char *decode);
const char *name;
  };

@@ -84,6 +84,8 @@ struct exit_reasons_table {
const char *reason;
  };

+#define DECODE_STR_LEN_MAX 80
+
  #define EVENTS_BITS   12
  #define EVENTS_CACHE_SIZE (1UL << EVENTS_BITS)

@@ -101,6 +103,8 @@ struct perf_kvm_stat {
struct exit_reasons_table *exit_reasons;
const char *exit_reasons_isa;

+   int decode_str_len;
+


This should not be a part of the perf_kvm_stat struct. Just leave it as 
a macro and use DECODE_STR_LEN_MAX in place of 20. Which means 
DECODE_STR_LEN_MAX needs to be 20 in this patch, and arch specific in 
the follow up patch.



struct kvm_events_ops *events_ops;
key_cmp_fun compare;
struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
@@ -182,12 +186,12 @@ static const char *get_exit_reason(struct perf_kvm_stat 
*kvm,

  static void exit_event_decode_key(struct perf_kvm_stat *kvm,
  struct event_key *key,
- char decode[20])
+ char *decode)
  {
const char *exit_reason = get_exit_reason(kvm, kvm->exit_reasons,
  key->key);

-   scnprintf(decode, 20, "%s", exit_reason);
+   scnprintf(decode, kvm->decode_str_len, "%s", exit_reason);
  }

  static struct kvm_events_ops exit_events = {
@@ -249,10 +253,11 @@ static bool mmio_event_end(struct perf_evsel *evsel, 
struct perf_sample *sample,

  static void mmio_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused,
  struct event_key *key,
- char decode[20])
+ char *decode)
  {
-   scnprintf(decode, 20, "%#lx:%s", (unsigned long)key->key,
-   key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
+   scnprintf(decode, kvm->decode_str_len, "%#lx:%s",
+ (unsigned long)key->key,
+ key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
  }

  static struct kvm_events_ops mmio_events = {
@@ -292,10 +297,11 @@ static bool ioport_event_end(struct perf_evsel *evsel,

  static void ioport_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused,
struct event_key *key,
-   char decode[20])
+   char *decode)
  {
-   scnprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key,
-   key->info ? "POUT" : "PIN");
+   scnprintf(decode, kvm->decode_str_len, "%#llx:%s",
+ (unsigned long long)key->key,
+ key->info ? "POUT" : "PIN");
  }

  static struct kvm_events_ops ioport_events = {
@@ -523,13 +529,13 @@ static bool handle_end_event(struct perf_kvm_stat *kvm,
time_diff = sample->time - time_begin;

if (kvm->duration && time_diff > kvm->duration) {
-   char decode[32];
+   char decode[DECODE_STR_LEN_MAX];

kvm->events_ops->decode_key(kvm, &event->key, decode);
if (strcmp(decode, "HLT")) {
-   pr_info("%" PRIu64 " VM %d, vcpu %d: %s event took %" PRIu64 
"usec\n",
+   pr_info("%" PRIu64 " VM %d, vcpu %d: %*s event took %" PRIu64 
"usec\n",
 sample->time, sample->pid, 
vcpu_record->vcpu_id,
-decode, time_diff/1000);
+32, decode, time_diff/1000);


This pr_info does not need the length.

David
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Alexander Graf



> Am 05.05.2014 um 16:35 schrieb "Aneesh Kumar K.V" 
> :
> 
> Alexander Graf  writes:
> 
>>> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>>> We reserve 5% of total ram for CMA allocation and not using that can
>>> result in us running out of numa node memory with specific
>>> configuration. One caveat is we may not have node local hpt with pinned
>>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>>> after creating hash page table.
>> 
>> I don't understand the problem. Can you please elaborate?
> 
> Lets take a system with 100GB RAM. We reserve around 5GB for htab
> allocation. Now if we use rest of available memory for hugetlbfs
> (because we want all the guest to be backed by huge pages), we would
> end up in a situation where we have a few GB of free RAM and 5GB of CMA
> reserve area. Now if we allow hash page table allocation to consume the
> free space, we would end up hitting page allocation failure for other
> non movable kernel allocation even though we still have 5GB CMA reserve
> space free.

Isn't this a greater problem? We should start swapping before we hit the point 
where non movable kernel allocation fails, no?

The fact that KVM uses a good number of normal kernel pages is maybe 
suboptimal, but shouldn't be a critical problem.


Alex

> 
> -aneesh
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf



> Am 05.05.2014 um 16:50 schrieb "Aneesh Kumar K.V" 
> :
> 
> Alexander Graf  writes:
> 
>>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>> Alexander Graf  writes:
>>> 
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to use
>saved dsisr or not
>>> 
>>> 
>   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>   {
> +#ifdef CONFIG_PPC_BOOK3S_64
> +return vcpu->arch.fault_dar;
 How about PA6T and G5s?
>>> Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>> 
>> Yes, and I'm asking whether we know that this statement holds true for 
>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
>> at least developed by IBM, I'd assume its semantics here are similar to 
>> POWER4, but for PA6T I wouldn't be so sure.
> 
> I will have to defer to Paul on that question. But that should not
> prevent this patch from going upstream right ?

Regressions are big no-gos.

Alex

> 
> -aneesh
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf



> Am 05.05.2014 um 16:57 schrieb Olof Johansson :
> 
> [Now without HTML email -- it's what you get for cc:ing me at work
> instead of my upstream email :)]
> 
> 2014-05-05 7:43 GMT-07:00 Alexander Graf :
>> 
>>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>> 
>>> Alexander Graf  writes:
>>> 
> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
> 
> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
> 
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to use
>saved dsisr or not
>>> 
>>> 
>   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>   {
> +#ifdef CONFIG_PPC_BOOK3S_64
> +   return vcpu->arch.fault_dar;
 
 How about PA6T and G5s?
>>> Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>> 
>> 
>> Yes, and I'm asking whether we know that this statement holds true for PA6T 
>> and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least 
>> developed by IBM, I'd assume its semantics here are similar to POWER4, but 
>> for PA6T I wouldn't be so sure.
> 
> Thanks for looking out for us, obviously IBM doesn't (based on the
> reply a minute ago).
> 
> In the end, since there's been no work to enable KVM on PA6T, I'm not
> too worried. I guess it's one more thing to sort out (and check for)
> whenever someone does that.
> 
> I definitely don't have cycles to deal with that myself at this time.
> I can help find hardware for someone who wants to, but even then I'm
> guessing the interest is pretty limited.

I know of at least 1 person who successfully runs PR KVM on a PA6T, so it's 
neither neglected nor non-working.

If you can get me access to a pa6t system I can easily check whether alignment 
interrupts generate dar and dsisr properly :).


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Olof Johansson

2014-05-05 8:03 GMT-07:00 Aneesh Kumar K.V :
> Olof Johansson  writes:
>
>> 2014-05-05 7:43 GMT-07:00 Alexander Graf :
>>
>>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>>
 Alexander Graf  writes:

  On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>
>> Although it's optional IBM POWER cpus always had DAR value set on
>> alignment interrupt. So don't try to compute these values.
>>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>> Changes from V3:
>> * Use make_dsisr instead of checking feature flag to decide whether to
>> use
>> saved dsisr or not
>>
>>  

 ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>{
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +   return vcpu->arch.fault_dar;
>>
> How about PA6T and G5s?
>
>
>  Paul mentioned that BOOK3S always had DAR value set on alignment
 interrupt. And the patch is to enable/collect correct DAR value when
 running with Little Endian PR guest. Now to limit the impact and to
 enable Little Endian PR guest, I ended up doing the conditional code
 only for book3s 64 for which we know for sure that we set DAR value.

>>>
>>> Yes, and I'm asking whether we know that this statement holds true for
>>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at
>>> least developed by IBM, I'd assume its semantics here are similar to
>>> POWER4, but for PA6T I wouldn't be so sure.
>>>
>>>
>> Thanks for looking out for us, obviously IBM doesn't (based on the reply a
>> minute ago).
>
> The reason I deferred the question to Paul is really because I don't
> know enough about PA6T and G5 to comment. I intentionally restricted the
> changes to BOOK3S_64 because I wanted to make sure I don't break
> anything else. It is in no way to hint that others don't care.

Ah, I see -- the disconnect is that you don't think PA6T and 970 are
64-bit book3s CPUs. They are.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Aneesh Kumar K.V

Olof Johansson  writes:

> 2014-05-05 7:43 GMT-07:00 Alexander Graf :
>
>> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>
>>> Alexander Graf  writes:
>>>
>>>  On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

> Although it's optional IBM POWER cpus always had DAR value set on
> alignment interrupt. So don't try to compute these values.
>
> Signed-off-by: Aneesh Kumar K.V 
> ---
> Changes from V3:
> * Use make_dsisr instead of checking feature flag to decide whether to
> use
> saved dsisr or not
>
>  
>>>
>>> ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>{
> +#ifdef CONFIG_PPC_BOOK3S_64
> +   return vcpu->arch.fault_dar;
>
 How about PA6T and G5s?


  Paul mentioned that BOOK3S always had DAR value set on alignment
>>> interrupt. And the patch is to enable/collect correct DAR value when
>>> running with Little Endian PR guest. Now to limit the impact and to
>>> enable Little Endian PR guest, I ended up doing the conditional code
>>> only for book3s 64 for which we know for sure that we set DAR value.
>>>
>>
>> Yes, and I'm asking whether we know that this statement holds true for
>> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at
>> least developed by IBM, I'd assume its semantics here are similar to
>> POWER4, but for PA6T I wouldn't be so sure.
>>
>>
> Thanks for looking out for us, obviously IBM doesn't (based on the reply a
> minute ago).

The reason I deferred the question to Paul is really because I don't
know enough about PA6T and G5 to comment. I intentionally restricted the
changes to BOOK3S_64 because I wanted to make sure I don't break
anything else. It is in no way to hint that others don't care.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Olof Johansson

[Now without HTML email -- it's what you get for cc:ing me at work
instead of my upstream email :)]

2014-05-05 7:43 GMT-07:00 Alexander Graf :
>
> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>>
>> Alexander Graf  writes:
>>
>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

 Although it's optional IBM POWER cpus always had DAR value set on
 alignment interrupt. So don't try to compute these values.

 Signed-off-by: Aneesh Kumar K.V 
 ---
 Changes from V3:
 * Use make_dsisr instead of checking feature flag to decide whether to use
 saved dsisr or not

>> 
>>
ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
 +#ifdef CONFIG_PPC_BOOK3S_64
 +   return vcpu->arch.fault_dar;
>>>
>>> How about PA6T and G5s?
>>>
>>>
>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>
>
> Yes, and I'm asking whether we know that this statement holds true for PA6T 
> and G5 chips which I wouldn't consider IBM POWER. Since the G5 is at least 
> developed by IBM, I'd assume its semantics here are similar to POWER4, but 
> for PA6T I wouldn't be so sure.
>

Thanks for looking out for us, obviously IBM doesn't (based on the
reply a minute ago).

In the end, since there's been no work to enable KVM on PA6T, I'm not
too worried. I guess it's one more thing to sort out (and check for)
whenever someone does that.

I definitely don't have cycles to deal with that myself at this time.
I can help find hardware for someone who wants to, but even then I'm
guessing the interest is pretty limited.


-Olof
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

> On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf  writes:
>>
>>> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
 Although it's optional IBM POWER cpus always had DAR value set on
 alignment interrupt. So don't try to compute these values.

 Signed-off-by: Aneesh Kumar K.V 
 ---
 Changes from V3:
 * Use make_dsisr instead of checking feature flag to decide whether to use
 saved dsisr or not

>> 
>>
ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
{
 +#ifdef CONFIG_PPC_BOOK3S_64
 +  return vcpu->arch.fault_dar;
>>> How about PA6T and G5s?
>>>
>>>
>> Paul mentioned that BOOK3S always had DAR value set on alignment
>> interrupt. And the patch is to enable/collect correct DAR value when
>> running with Little Endian PR guest. Now to limit the impact and to
>> enable Little Endian PR guest, I ended up doing the conditional code
>> only for book3s 64 for which we know for sure that we set DAR value.
>
> Yes, and I'm asking whether we know that this statement holds true for 
> PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
> at least developed by IBM, I'd assume its semantics here are similar to 
> POWER4, but for PA6T I wouldn't be so sure.

I will have to defer to Paul on that question. But that should not
prevent this patch from going upstream right ?

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

> On 05/05/2014 04:38 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf  writes:
>>
>>> On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote:
 Alexander Graf  writes:

> When running on a POWER8 host, we get away with running the guest as 
> POWER7
> and nothing falls apart.
>
> However, when we start exposing POWER8 as guest CPU, guests will start 
> using
> new abilities on POWER8 which we need to handle.
>
> This patch set does a minimalistic approach to implementing those bits to
> make guests happy enough to run.
>
>
> Alex
>
> Alexander Graf (6):
> KVM: PPC: Book3S PR: Ignore PMU SPRs
> KVM: PPC: Book3S PR: Emulate TIR register
> KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR
> KVM: PPC: Book3S PR: Expose TAR facility to guest
> KVM: PPC: Book3S PR: Expose EBB registers
> KVM: PPC: Book3S PR: Expose TM registers
>
>arch/powerpc/include/asm/kvm_asm.h| 18 ---
>arch/powerpc/include/asm/kvm_book3s_asm.h |  2 +
>arch/powerpc/include/asm/kvm_host.h   |  3 ++
>arch/powerpc/kernel/asm-offsets.c |  3 ++
>arch/powerpc/kvm/book3s.c | 34 +
>arch/powerpc/kvm/book3s_emulate.c | 53 
>arch/powerpc/kvm/book3s_hv.c  | 30 ---
>arch/powerpc/kvm/book3s_pr.c  | 82 
> +++
>arch/powerpc/kvm/book3s_segment.S | 25 ++
>9 files changed, 212 insertions(+), 38 deletions(-)
>
 I did most of this as part of

 [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support
 http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com

 Any reason why that is not picked up ? TM was the reason I didn't push the
 patchset again. I was not sure how to get all the TM details to
 work.
>>> Ugh, I guess I mostly discarded it as brainstorm patches because they
>>> were marked RFC :(
>>>
>> Do you want me to rework them ?. I guess facility unavailable part and
>> TM part in this series are better than what I had. Rest all are more or
>> less similar. Or you could cherry pick the SPR handling you haven't
>> added yet from this series ?
>
> I personally refuse to apply patches that are marked RFC, since IMHO on 
> those the author himself isn't sure he wants them applied yet :).
>
> I'd say I'll just apply mine after another autotest run and then you 
> rebase your things on top and fill the gaps with a real, non-RFC patch set.

Will do

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support

2014-05-05 Thread Alexander Graf


On 05/05/2014 04:38 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


When running on a POWER8 host, we get away with running the guest as POWER7
and nothing falls apart.

However, when we start exposing POWER8 as guest CPU, guests will start using
new abilities on POWER8 which we need to handle.

This patch set does a minimalistic approach to implementing those bits to
make guests happy enough to run.


Alex

Alexander Graf (6):
KVM: PPC: Book3S PR: Ignore PMU SPRs
KVM: PPC: Book3S PR: Emulate TIR register
KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR
KVM: PPC: Book3S PR: Expose TAR facility to guest
KVM: PPC: Book3S PR: Expose EBB registers
KVM: PPC: Book3S PR: Expose TM registers

   arch/powerpc/include/asm/kvm_asm.h| 18 ---
   arch/powerpc/include/asm/kvm_book3s_asm.h |  2 +
   arch/powerpc/include/asm/kvm_host.h   |  3 ++
   arch/powerpc/kernel/asm-offsets.c |  3 ++
   arch/powerpc/kvm/book3s.c | 34 +
   arch/powerpc/kvm/book3s_emulate.c | 53 
   arch/powerpc/kvm/book3s_hv.c  | 30 ---
   arch/powerpc/kvm/book3s_pr.c  | 82 
+++
   arch/powerpc/kvm/book3s_segment.S | 25 ++
   9 files changed, 212 insertions(+), 38 deletions(-)


I did most of this as part of

[RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support
http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com

Any reason why that is not picked up ? TM was the reason I didn't push the
patchset again. I was not sure how to get all the TM details to
work.

Ugh, I guess I mostly discarded it as brainstorm patches because they
were marked RFC :(


Do you want me to rework them ?. I guess facility unavailable part and
TM part in this series are better than what I had. Rest all are more or
less similar. Or you could cherry pick the SPR handling you haven't
added yet from this series ?


I personally refuse to apply patches that are marked RFC, since IMHO on 
those the author himself isn't sure he wants them applied yet :).


I'd say I'll just apply mine after another autotest run and then you 
rebase your things on top and fill the gaps with a real, non-RFC patch set.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

> On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:
>> Signed-off-by: Aneesh Kumar K.V 
>
> No patch description, no proper explanations anywhere why you're doing 
> what. All of that in a pretty sensitive piece of code. There's no way 
> this patch can go upstream in its current form.
>

Sorry about being vague. Will add a better commit message. The goal is
to export MPSS support to guest if the host support the same. MPSS
support is exported via penc encoding in "ibm,segment-page-sizes". The
actual format can be found at htab_dt_scan_page_sizes. When the guest
memory is backed by hugetlbfs we expose the penc encoding the host
support to guest via kvmppc_add_seg_page_size. 

Now the challenge to THP support is to make sure that our henter,
hremove etc decode base page size and actual page size correctly
from the hash table entry values. Most of the changes is to do that.
Rest of the stuff is already handled by kvm. 

NOTE: It is much easier to read the code after applying the patch rather
than reading the diff. I have added comments around each steps in the
code.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf


On 05/05/2014 04:26 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

Although it's optional IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V3:
* Use make_dsisr instead of checking feature flag to decide whether to use
saved dsisr or not





   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
   {
+#ifdef CONFIG_PPC_BOOK3S_64
+   return vcpu->arch.fault_dar;

How about PA6T and G5s?



Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.


Yes, and I'm asking whether we know that this statement holds true for 
PA6T and G5 chips which I wouldn't consider IBM POWER. Since the G5 is 
at least developed by IBM, I'd assume its semantics here are similar to 
POWER4, but for PA6T I wouldn't be so sure.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

> On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote:
>> Alexander Graf  writes:
>>
>>> When running on a POWER8 host, we get away with running the guest as POWER7
>>> and nothing falls apart.
>>>
>>> However, when we start exposing POWER8 as guest CPU, guests will start using
>>> new abilities on POWER8 which we need to handle.
>>>
>>> This patch set does a minimalistic approach to implementing those bits to
>>> make guests happy enough to run.
>>>
>>>
>>> Alex
>>>
>>> Alexander Graf (6):
>>>KVM: PPC: Book3S PR: Ignore PMU SPRs
>>>KVM: PPC: Book3S PR: Emulate TIR register
>>>KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR
>>>KVM: PPC: Book3S PR: Expose TAR facility to guest
>>>KVM: PPC: Book3S PR: Expose EBB registers
>>>KVM: PPC: Book3S PR: Expose TM registers
>>>
>>>   arch/powerpc/include/asm/kvm_asm.h| 18 ---
>>>   arch/powerpc/include/asm/kvm_book3s_asm.h |  2 +
>>>   arch/powerpc/include/asm/kvm_host.h   |  3 ++
>>>   arch/powerpc/kernel/asm-offsets.c |  3 ++
>>>   arch/powerpc/kvm/book3s.c | 34 +
>>>   arch/powerpc/kvm/book3s_emulate.c | 53 
>>>   arch/powerpc/kvm/book3s_hv.c  | 30 ---
>>>   arch/powerpc/kvm/book3s_pr.c  | 82 
>>> +++
>>>   arch/powerpc/kvm/book3s_segment.S | 25 ++
>>>   9 files changed, 212 insertions(+), 38 deletions(-)
>>>
>> I did most of this as part of
>>
>> [RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support
>> http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com
>>
>> Any reason why that is not picked up ? TM was the reason I didn't push the
>> patchset again. I was not sure how to get all the TM details to
>> work.
>
> Ugh, I guess I mostly discarded it as brainstorm patches because they 
> were marked RFC :(
>

Do you want me to rework them ?. I guess facility unavailable part and
TM part in this series are better than what I had. Rest all are more or
less similar. Or you could cherry pick the SPR handling you haven't
added yet from this series ?

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

> On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:
>> We reserve 5% of total ram for CMA allocation and not using that can
>> result in us running out of numa node memory with specific
>> configuration. One caveat is we may not have node local hpt with pinned
>> vcpu configuration. But currently libvirt also pins the vcpu to cpuset
>> after creating hash page table.
>
> I don't understand the problem. Can you please elaborate?
>
>

Lets take a system with 100GB RAM. We reserve around 5GB for htab
allocation. Now if we use rest of available memory for hugetlbfs
(because we want all the guest to be backed by huge pages), we would
end up in a situation where we have a few GB of free RAM and 5GB of CMA
reserve area. Now if we allow hash page table allocation to consume the
free space, we would end up hitting page allocation failure for other
non movable kernel allocation even though we still have 5GB CMA reserve
space free.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Aneesh Kumar K.V

Alexander Graf  writes:

> On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:
>> Although it's optional IBM POWER cpus always had DAR value set on
>> alignment interrupt. So don't try to compute these values.
>>
>> Signed-off-by: Aneesh Kumar K.V 
>> ---
>> Changes from V3:
>> * Use make_dsisr instead of checking feature flag to decide whether to use
>>saved dsisr or not
>>

>>   ulong kvmppc_alignment_dar(struct kvm_vcpu *vcpu, unsigned int inst)
>>   {
>> +#ifdef CONFIG_PPC_BOOK3S_64
>> +return vcpu->arch.fault_dar;
>
> How about PA6T and G5s?
>
>

Paul mentioned that BOOK3S always had DAR value set on alignment
interrupt. And the patch is to enable/collect correct DAR value when
running with Little Endian PR guest. Now to limit the impact and to
enable Little Endian PR guest, I ended up doing the conditional code
only for book3s 64 for which we know for sure that we set DAR value.

-aneesh

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-05-05 Thread Paolo Bonzini


Il 05/05/2014 16:21, Christian Borntraeger ha scritto:

On 28/04/14 18:39, Paolo Bonzini wrote:

From: Christian Borntraeger 


Given all your work, What about From: Paolo Bonzini 
plus "Based on an inital patch from Christian Borntraeger"


No big deal, I don't care about authorship that much.


@@ -221,17 +225,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
void *key)
unsigned long flags = (unsigned long)key;
struct kvm_kernel_irq_routing_entry *irq;
struct kvm *kvm = irqfd->kvm;
+   int idx;

if (flags & POLLIN) {
-   rcu_read_lock();
-   irq = rcu_dereference(irqfd->irq_entry);
+   idx = srcu_read_lock(&kvm->irq_srcu);
+   irq = srcu_dereference(irqfd->irq_entry, &kvm->irq_srcu);
/* An event has been signaled, inject an interrupt */
if (irq)
kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
false);
else
schedule_work(&irqfd->inject);
-   rcu_read_unlock();
+   srcu_read_unlock(&kvm->irq_srcu, idx);
}

if (flags & POLLHUP) {
@@ -363,7 +368,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
}

list_add_rcu(&irqfd->resampler_link, &irqfd->resampler->list);
-   synchronize_rcu();
+   synchronize_srcu(&kvm->irq_srcu);


No idea what resampler is, can this become time critical as well - iow do we 
need expedited here?


It's for level-triggered interrupts.  I decided that if synchronize_rcu 
was good enough before, synchronize_srcu will do after the patch.



@@ -85,7 +86,7 @@ void kvm_unregister_irq_ack_notifier(struct kvm *kvm,
mutex_lock(&kvm->irq_lock);
hlist_del_init_rcu(&kian->link);
mutex_unlock(&kvm->irq_lock);
-   synchronize_rcu();
+   synchronize_srcu_expedited(&kvm->irq_srcu);


Hmm, looks like all callers are slow path (shutdown, deregister assigned dev). 
Couldnt
we use the non expedited variant?


... but I have screwed up this one.  Thanks, I'll change it.


r = kvm_arch_init_vm(kvm, type);
if (r)
-   goto out_err_nodisable;
+   goto out_err_no_disable;

r = hardware_enable_all();
if (r)
-   goto out_err_nodisable;
+   goto out_err_no_disable;

 #ifdef CONFIG_HAVE_KVM_IRQCHIP
INIT_HLIST_HEAD(&kvm->mask_notifier_list);
@@ -473,10 +473,12 @@ static struct kvm *kvm_create_vm(unsigned long type)
r = -ENOMEM;
kvm->memslots = kzalloc(sizeof(struct kvm_memslots), GFP_KERNEL);
if (!kvm->memslots)
-   goto out_err_nosrcu;
+   goto out_err_no_srcu;
kvm_init_memslots_id(kvm);
if (init_srcu_struct(&kvm->srcu))
-   goto out_err_nosrcu;
+   goto out_err_no_srcu;
+   if (init_srcu_struct(&kvm->irq_srcu))
+   goto out_err_no_irq_srcu;
for (i = 0; i < KVM_NR_BUSES; i++) {
kvm->buses[i] = kzalloc(sizeof(struct kvm_io_bus),
GFP_KERNEL);
@@ -505,10 +507,12 @@ static struct kvm *kvm_create_vm(unsigned long type)
return kvm;

 out_err:
+   cleanup_srcu_struct(&kvm->irq_srcu);
+out_err_no_irq_srcu:
cleanup_srcu_struct(&kvm->srcu);
-out_err_nosrcu:
+out_err_no_srcu:
hardware_disable_all();
-out_err_nodisable:
+out_err_no_disable:



the patch would be smaller without this change, but it makes the naming more 
consistent, so ok.


Yeah, out_err_noirq_srcu or out_err_noirqsrcu are both very ugly.

Thanks for the review, I'm making the small change to remove expedited 
and applying to kvm/queue.


Paolo




for (i = 0; i < KVM_NR_BUSES; i++)
kfree(kvm->buses[i]);
kfree(kvm->memslots);





--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v4] kvm/irqchip: Speed up KVM_SET_GSI_ROUTING

2014-05-05 Thread Christian Borntraeger

On 28/04/14 18:39, Paolo Bonzini wrote:
> From: Christian Borntraeger 

Given all your work, What about From: Paolo Bonzini 
plus
"Based on an inital patch from Christian Borntraeger"
> 
> When starting lots of dataplane devices the bootup takes very long on
> Christian's s390 with irqfd patches. With larger setups he is even
> able to trigger some timeouts in some components.  Turns out that the
> KVM_SET_GSI_ROUTING ioctl takes very long (strace claims up to 0.1 sec)
> when having multiple CPUs.  This is caused by the  synchronize_rcu and
> the HZ=100 of s390.  By changing the code to use a private srcu we can
> speed things up.  This patch reduces the boot time till mounting root
> from 8 to 2 seconds on my s390 guest with 100 disks.
> 
> Uses of hlist_for_each_entry_rcu, hlist_add_head_rcu, hlist_del_init_rcu
> are fine because they do not have lockdep checks (hlist_for_each_entry_rcu
> uses rcu_dereference_raw rather than rcu_dereference, and write-sides
> do not do rcu lockdep at all).
> 
> Note that we're hardly relying on the "sleepable" part of srcu.  We just
> want SRCU's faster detection of grace periods.
> 
> Testing was done by Andrew Theurer using NETPERF.  The difference between
> results "before" and "after" the patch has mean -0.2% and standard deviation
> 0.6%.  Using a paired t-test on the data points says that there is a 2.5%
> probability that the patch is the cause of the performance difference
> (rather than a random fluctuation).
> 
> Cc: Marcelo Tosatti 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Christian Borntraeger 
> Signed-off-by: Paolo Bonzini 

Some questions regarding expedided vs. non expedited and a comment without a 
necessary action.

Otherwise
Reviewed-by: Christian Borntraeger 
Tested-by: Christian Borntraeger  # on s390

> ---
>  include/linux/kvm_host.h |  1 +
>  virt/kvm/eventfd.c   | 25 +++--
>  virt/kvm/irq_comm.c  | 17 +
>  virt/kvm/irqchip.c   | 31 ---
>  virt/kvm/kvm_main.c  | 16 ++--
>  5 files changed, 51 insertions(+), 39 deletions(-)
> 
> diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
> index 820fc2e1d9df..cd0df9a9352d 100644
> --- a/include/linux/kvm_host.h
> +++ b/include/linux/kvm_host.h
> @@ -368,6 +368,7 @@ struct kvm {
>   struct mm_struct *mm; /* userspace tied to this vm */
>   struct kvm_memslots *memslots;
>   struct srcu_struct srcu;
> + struct srcu_struct irq_srcu;
>  #ifdef CONFIG_KVM_APIC_ARCHITECTURE
>   u32 bsp_vcpu_id;
>  #endif
> diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
> index 912ec5a95e2c..20c3af7692c5 100644
> --- a/virt/kvm/eventfd.c
> +++ b/virt/kvm/eventfd.c
> @@ -31,6 +31,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
> 
>  #include "iodev.h"
> @@ -118,19 +119,22 @@ static void
>  irqfd_resampler_ack(struct kvm_irq_ack_notifier *kian)
>  {
>   struct _irqfd_resampler *resampler;
> + struct kvm *kvm;
>   struct _irqfd *irqfd;
> + int idx;
> 
>   resampler = container_of(kian, struct _irqfd_resampler, notifier);
> + kvm = resampler->kvm;
> 
> - kvm_set_irq(resampler->kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
> + kvm_set_irq(kvm, KVM_IRQFD_RESAMPLE_IRQ_SOURCE_ID,
>   resampler->notifier.gsi, 0, false);
> 
> - rcu_read_lock();
> + idx = srcu_read_lock(&kvm->irq_srcu);
> 
>   list_for_each_entry_rcu(irqfd, &resampler->list, resampler_link)
>   eventfd_signal(irqfd->resamplefd, 1);
> 
> - rcu_read_unlock();
> + srcu_read_unlock(&kvm->irq_srcu, idx);
>  }
> 
>  static void
> @@ -142,7 +146,7 @@ irqfd_resampler_shutdown(struct _irqfd *irqfd)
>   mutex_lock(&kvm->irqfds.resampler_lock);
> 
>   list_del_rcu(&irqfd->resampler_link);
> - synchronize_rcu();
> + synchronize_srcu(&kvm->irq_srcu);
> 
>   if (list_empty(&resampler->list)) {
>   list_del(&resampler->link);
> @@ -221,17 +225,18 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
> sync, void *key)
>   unsigned long flags = (unsigned long)key;
>   struct kvm_kernel_irq_routing_entry *irq;
>   struct kvm *kvm = irqfd->kvm;
> + int idx;
> 
>   if (flags & POLLIN) {
> - rcu_read_lock();
> - irq = rcu_dereference(irqfd->irq_entry);
> + idx = srcu_read_lock(&kvm->irq_srcu);
> + irq = srcu_dereference(irqfd->irq_entry, &kvm->irq_srcu);
>   /* An event has been signaled, inject an interrupt */
>   if (irq)
>   kvm_set_msi(irq, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1,
>   false);
>   else
>   schedule_work(&irqfd->inject);
> - rcu_read_unlock();
> + srcu_read_unlock(&kvm->irq_srcu, idx);
>   }
> 
>   if (flags & POLLHUP) {
> @@ -363,7 +368,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
>

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Alex Williamson

On Mon, 2014-05-05 at 13:56 +0200, Alexander Graf wrote:
> On 05/05/2014 03:27 AM, Gavin Shan wrote:
> > The series of patches intends to support EEH for PCI devices, which have 
> > been
> > passed through to PowerKVM based guest via VFIO. The implementation is
> > straightforward based on the issues or problems we have to resolve to 
> > support
> > EEH for PowerKVM based guest.
> >
> > - Emulation for EEH RTAS requests. Thanksfully, we already have 
> > infrastructure
> >to emulate XICS. Without introducing new mechanism, we just extend that
> >existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
> >initiated from guest are posted to host where the requests get handled or
> >delivered to underly firmware for further handling. For that, the host 
> > kerenl
> >has to maintain the PCI address (host domain/bus/slot/function to guest's
> >PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address 
> > mapping
> >will be built when initializing VFIO device in QEMU and destroied when 
> > the
> >VFIO device in QEMU is going to offline, or VM is destroy.
> 
> Do you also expose all those interfaces to user space? VFIO is as much 
> about user space device drivers as it is about device assignment.
> 
> I would like to first see an implementation that doesn't touch KVM 
> emulation code at all but instead routes everything through QEMU. As a 
> second step we can then accelerate performance critical paths inside of KVM.
> 
> That way we ensure that user space device drivers have all the power 
> over a device they need to drive it.

+1



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvmclock: Ensure time in migration never goes backward

2014-05-05 Thread Alexander Graf

When we migrate we ask the kernel about its current belief on what the guest
time would be. However, I've seen cases where the kvmclock guest structure
indicates a time more recent than the kvm returned time.

To make sure we never go backwards, calculate what the guest would have seen
as time at the point of migration and use that value instead of the kernel
returned one when it's more recent.

While this doesn't fix the underlying issue that the kernel's view of time
is skewed, it allows us to safely migrate guests even from sources that are
known broken.

Signed-off-by: Alexander Graf 
---
 hw/i386/kvm/clock.c | 48 
 1 file changed, 48 insertions(+)

diff --git a/hw/i386/kvm/clock.c b/hw/i386/kvm/clock.c
index 892aa02..c6521cf 100644
--- a/hw/i386/kvm/clock.c
+++ b/hw/i386/kvm/clock.c
@@ -14,6 +14,7 @@
  */
 
 #include "qemu-common.h"
+#include "qemu/host-utils.h"
 #include "sysemu/sysemu.h"
 #include "sysemu/kvm.h"
 #include "hw/sysbus.h"
@@ -34,6 +35,47 @@ typedef struct KVMClockState {
 bool clock_valid;
 } KVMClockState;
 
+struct pvclock_vcpu_time_info {
+uint32_t   version;
+uint32_t   pad0;
+uint64_t   tsc_timestamp;
+uint64_t   system_time;
+uint32_t   tsc_to_system_mul;
+int8_t tsc_shift;
+uint8_tflags;
+uint8_tpad[2];
+} __attribute__((__packed__)); /* 32 bytes */
+
+static uint64_t kvmclock_current_nsec(KVMClockState *s)
+{
+CPUState *cpu = first_cpu;
+CPUX86State *env = cpu->env_ptr;
+hwaddr kvmclock_struct_pa = env->system_time_msr & ~1ULL;
+uint64_t migration_tsc = env->tsc;
+struct pvclock_vcpu_time_info time;
+uint64_t delta;
+uint64_t nsec_lo;
+uint64_t nsec_hi;
+uint64_t nsec;
+
+if (!(env->system_time_msr & 1ULL)) {
+/* KVM clock not active */
+return 0;
+}
+
+cpu_physical_memory_read(kvmclock_struct_pa, &time, sizeof(time));
+
+delta = migration_tsc - time.tsc_timestamp;
+if (time.tsc_shift < 0) {
+delta >>= -time.tsc_shift;
+} else {
+delta <<= time.tsc_shift;
+}
+
+mulu64(&nsec_lo, &nsec_hi, delta, time.tsc_to_system_mul);
+nsec = (nsec_lo >> 32) | (nsec_hi << 32);
+return nsec + time.system_time;
+}
 
 static void kvmclock_vm_state_change(void *opaque, int running,
  RunState state)
@@ -45,9 +87,15 @@ static void kvmclock_vm_state_change(void *opaque, int 
running,
 
 if (running) {
 struct kvm_clock_data data;
+uint64_t time_at_migration = kvmclock_current_nsec(s);
 
 s->clock_valid = false;
 
+if (time_at_migration > s->clock) {
+fprintf(stderr, "KVM Clock migrated backwards, using later 
time\n");
+s->clock = time_at_migration;
+}
+
 data.clock = s->clock;
 data.flags = 0;
 ret = kvm_vm_ioctl(kvm_state, KVM_SET_CLOCK, &data);
-- 
1.7.12.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 09/11] perf kvm: use defines of kvm events

2014-05-05 Thread Paolo Bonzini


Il 25/04/2014 11:12, Christian Borntraeger ha scritto:

From: Alexander Yarygin 

Currently perf-kvm uses string literals for kvm event names,
but it works only for x86, because other architectures may have
other names for those events.

This patch introduces defines for kvm_entry and kvm_exit events
and lets perf-kvm replace literals.

Signed-off-by: Alexander Yarygin 
Reviewed-by: Cornelia Huck 
Signed-off-by: Christian Borntraeger 
---
 arch/x86/include/uapi/asm/kvm.h |  8 
 tools/perf/builtin-kvm.c| 10 --
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index d3a8778..88c0099 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -8,6 +8,8 @@

 #include 
 #include 
+#include 
+#include 

 #define DE_VECTOR 0
 #define DB_VECTOR 1
@@ -342,4 +344,10 @@ struct kvm_xcrs {
 struct kvm_sync_regs {
 };

+#define VCPU_ID "vcpu_id"
+
+#define KVM_ENTRY "kvm:kvm_entry"
+#define KVM_EXIT "kvm:kvm_exit"
+#define KVM_EXIT_REASON "exit_reason"
+
 #endif /* _ASM_X86_KVM_H */


What about adding a new asm/kvm-perf.h header instead?

1) I don't like very much the namespace pollution that the first hunk 
causes (and the second one isn't really pretty either).


2) perf doesn't need most of uapi/asm/kvm.h, in fact it only needs a 
couple of #defines because it is a dependency of uapi/asm/svm.h.  So it 
is uapi/asm/svm.h that should include uapi/asm/kvm.h, not perf.


Paolo


diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
index 806c0e4..9a162ae 100644
--- a/tools/perf/builtin-kvm.c
+++ b/tools/perf/builtin-kvm.c
@@ -30,8 +30,6 @@
 #include 

 #ifdef HAVE_KVM_STAT_SUPPORT
-#include 
-#include 
 #include 

 struct event_key {
@@ -130,12 +128,12 @@ static void exit_event_get_key(struct perf_evsel *evsel,
   struct event_key *key)
 {
key->info = 0;
-   key->key = perf_evsel__intval(evsel, sample, "exit_reason");
+   key->key = perf_evsel__intval(evsel, sample, KVM_EXIT_REASON);
 }

 static bool kvm_exit_event(struct perf_evsel *evsel)
 {
-   return !strcmp(evsel->name, "kvm:kvm_exit");
+   return !strcmp(evsel->name, KVM_EXIT);
 }

 static bool exit_event_begin(struct perf_evsel *evsel,
@@ -151,7 +149,7 @@ static bool exit_event_begin(struct perf_evsel *evsel,

 static bool kvm_entry_event(struct perf_evsel *evsel)
 {
-   return !strcmp(evsel->name, "kvm:kvm_entry");
+   return !strcmp(evsel->name, KVM_ENTRY);
 }

 static bool exit_event_end(struct perf_evsel *evsel,
@@ -557,7 +555,7 @@ struct vcpu_event_record *per_vcpu_record(struct thread 
*thread,
return NULL;
}

-   vcpu_record->vcpu_id = perf_evsel__intval(evsel, sample, 
"vcpu_id");
+   vcpu_record->vcpu_id = perf_evsel__intval(evsel, sample, 
VCPU_ID);
thread->priv = vcpu_record;
}




--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC 00/22] EEH Support for VFIO PCI devices on PowerKVM guest

2014-05-05 Thread Alexander Graf


On 05/05/2014 03:27 AM, Gavin Shan wrote:

The series of patches intends to support EEH for PCI devices, which have been
passed through to PowerKVM based guest via VFIO. The implementation is
straightforward based on the issues or problems we have to resolve to support
EEH for PowerKVM based guest.

- Emulation for EEH RTAS requests. Thanksfully, we already have infrastructure
   to emulate XICS. Without introducing new mechanism, we just extend that
   existing infrastructure to support EEH RTAS emulation. EEH RTAS requests
   initiated from guest are posted to host where the requests get handled or
   delivered to underly firmware for further handling. For that, the host kerenl
   has to maintain the PCI address (host domain/bus/slot/function to guest's
   PHB BUID/bus/slot/function) mapping via KVM VFIO device. The address mapping
   will be built when initializing VFIO device in QEMU and destroied when the
   VFIO device in QEMU is going to offline, or VM is destroy.


Do you also expose all those interfaces to user space? VFIO is as much 
about user space device drivers as it is about device assignment.


I would like to first see an implementation that doesn't touch KVM 
emulation code at all but instead routes everything through QEMU. As a 
second step we can then accelerate performance critical paths inside of KVM.


That way we ensure that user space device drivers have all the power 
over a device they need to drive it.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH] KVM: PPC: BOOK3S: HV: THP support for guest

2014-05-05 Thread Alexander Graf


On 05/04/2014 07:30 PM, Aneesh Kumar K.V wrote:

Signed-off-by: Aneesh Kumar K.V 


No patch description, no proper explanations anywhere why you're doing 
what. All of that in a pretty sensitive piece of code. There's no way 
this patch can go upstream in its current form.



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: BOOK3S: PR: Fix WARN_ON with debug options on

2014-05-05 Thread Alexander Graf


On 05/04/2014 07:26 PM, Aneesh Kumar K.V wrote:

With debug option "sleep inside atomic section checking" enabled we get
the below WARN_ON during a PR KVM boot. This is because upstream now
have PREEMPT_COUNT enabled even if we have preempt disabled. Fix the
warning by adding preempt_disable/enable around floating point and altivec
enable.

WARNING: at arch/powerpc/kernel/process.c:156
Modules linked in: kvm_pr kvm
CPU: 1 PID: 3990 Comm: qemu-system-ppc Tainted: GW 3.15.0-rc1+ #4
task: c000eb85b3a0 ti: c000ec59c000 task.ti: c000ec59c000
NIP: c0015c84 LR: d3334644 CTR: c0015c00
REGS: c000ec59f140 TRAP: 0700   Tainted: GW  (3.15.0-rc1+)
MSR: 80029032   CR: 4224  XER: 2000
CFAR: c0015c24 SOFTE: 1
GPR00: d3334644 c000ec59f3c0 c0e2fa40 c000e2f8
GPR04: 0800 2000 0001 8000
GPR08: 0001 0001 2000 c0015c00
GPR12: d333da18 cfb80900  
GPR16:    3fffce4e0fa1
GPR20: 0010 0001 0002 100b9a38
GPR24: 0002   0013
GPR28:  c000eb85b3a0 2000 c000e2f8
NIP [c0015c84] .enable_kernel_fp+0x84/0x90
LR [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
Call Trace:
[c000ec59f3c0] [0010] 0x10 (unreliable)
[c000ec59f430] [d3334644] .kvmppc_handle_ext+0x134/0x190 [kvm_pr]
[c000ec59f4c0] [d324b380] .kvmppc_set_msr+0x30/0x50 [kvm]
[c000ec59f530] [d3337cac] .kvmppc_core_emulate_op_pr+0x16c/0x5e0 
[kvm_pr]
[c000ec59f5f0] [d324a944] .kvmppc_emulate_instruction+0x284/0xa80 
[kvm]
[c000ec59f6c0] [d3336888] .kvmppc_handle_exit_pr+0x488/0xb70 
[kvm_pr]
[c000ec59f790] [d3338d34] kvm_start_lightweight+0xcc/0xdc [kvm_pr]
[c000ec59f960] [d3336288] .kvmppc_vcpu_run_pr+0xc8/0x190 [kvm_pr]
[c000ec59f9f0] [d324c880] .kvmppc_vcpu_run+0x30/0x50 [kvm]
[c000ec59fa60] [d3249e74] .kvm_arch_vcpu_ioctl_run+0x54/0x1b0 [kvm]
[c000ec59faf0] [d3244948] .kvm_vcpu_ioctl+0x478/0x760 [kvm]
[c000ec59fcb0] [c0224e34] .do_vfs_ioctl+0x4d4/0x790
[c000ec59fd90] [c0225148] .SyS_ioctl+0x58/0xb0
[c000ec59fe30] [c000a1e4] syscall_exit+0x0/0x98

Signed-off-by: Aneesh Kumar K.V 


Thanks, applied to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PPC: BOOK3S: HV: Don't try to allocate from kernel page allocator for hash page table.

2014-05-05 Thread Alexander Graf


On 05/04/2014 07:25 PM, Aneesh Kumar K.V wrote:

We reserve 5% of total ram for CMA allocation and not using that can
result in us running out of numa node memory with specific
configuration. One caveat is we may not have node local hpt with pinned
vcpu configuration. But currently libvirt also pins the vcpu to cpuset
after creating hash page table.


I don't understand the problem. Can you please elaborate?


Alex



Signed-off-by: Aneesh Kumar K.V 
---
  arch/powerpc/kvm/book3s_64_mmu_hv.c | 23 ++-
  1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_64_mmu_hv.c 
b/arch/powerpc/kvm/book3s_64_mmu_hv.c
index fb25ebc0af0c..f32896ffd784 100644
--- a/arch/powerpc/kvm/book3s_64_mmu_hv.c
+++ b/arch/powerpc/kvm/book3s_64_mmu_hv.c
@@ -52,7 +52,7 @@ static void kvmppc_rmap_reset(struct kvm *kvm);
  
  long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)

  {
-   unsigned long hpt;
+   unsigned long hpt = 0;
struct revmap_entry *rev;
struct page *page = NULL;
long order = KVM_DEFAULT_HPT_ORDER;
@@ -64,22 +64,11 @@ long kvmppc_alloc_hpt(struct kvm *kvm, u32 *htab_orderp)
}
  
  	kvm->arch.hpt_cma_alloc = 0;

-   /*
-* try first to allocate it from the kernel page allocator.
-* We keep the CMA reserved for failed allocation.
-*/
-   hpt = __get_free_pages(GFP_KERNEL | __GFP_ZERO | __GFP_REPEAT |
-  __GFP_NOWARN, order - PAGE_SHIFT);
-
-   /* Next try to allocate from the preallocated pool */
-   if (!hpt) {
-   VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
-   page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
-   if (page) {
-   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
-   kvm->arch.hpt_cma_alloc = 1;
-   } else
-   --order;
+   VM_BUG_ON(order < KVM_CMA_CHUNK_ORDER);
+   page = kvm_alloc_hpt(1 << (order - PAGE_SHIFT));
+   if (page) {
+   hpt = (unsigned long)pfn_to_kaddr(page_to_pfn(page));
+   kvm->arch.hpt_cma_alloc = 1;
}
  
  	/* Lastly try successively smaller sizes from the page allocator */


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V4] POWERPC: BOOK3S: KVM: Use the saved dar value and generic make_dsisr

2014-05-05 Thread Alexander Graf


On 05/04/2014 07:21 PM, Aneesh Kumar K.V wrote:

Although it's optional IBM POWER cpus always had DAR value set on
alignment interrupt. So don't try to compute these values.

Signed-off-by: Aneesh Kumar K.V 
---
Changes from V3:
* Use make_dsisr instead of checking feature flag to decide whether to use
   saved dsisr or not

  arch/powerpc/include/asm/disassemble.h | 34 +++
  arch/powerpc/kernel/align.c| 34 +--
  arch/powerpc/kvm/book3s_emulate.c  | 43 --
  3 files changed, 40 insertions(+), 71 deletions(-)

diff --git a/arch/powerpc/include/asm/disassemble.h 
b/arch/powerpc/include/asm/disassemble.h
index 856f8deb557a..6330a61b875a 100644
--- a/arch/powerpc/include/asm/disassemble.h
+++ b/arch/powerpc/include/asm/disassemble.h
@@ -81,4 +81,38 @@ static inline unsigned int get_oc(u32 inst)
  {
return (inst >> 11) & 0x7fff;
  }
+
+#define IS_XFORM(inst) (get_op(inst)  == 31)
+#define IS_DSFORM(inst)(get_op(inst) >= 56)
+
+/*
+ * Create a DSISR value from the instruction
+ */
+static inline unsigned make_dsisr(unsigned instr)
+{
+   unsigned dsisr;
+
+
+   /* bits  6:15 --> 22:31 */
+   dsisr = (instr & 0x03ff) >> 16;
+
+   if (IS_XFORM(instr)) {
+   /* bits 29:30 --> 15:16 */
+   dsisr |= (instr & 0x0006) << 14;
+   /* bit 25 -->17 */
+   dsisr |= (instr & 0x0040) << 8;
+   /* bits 21:24 --> 18:21 */
+   dsisr |= (instr & 0x0780) << 3;
+   } else {
+   /* bit  5 -->17 */
+   dsisr |= (instr & 0x0400) >> 12;
+   /* bits  1: 4 --> 18:21 */
+   dsisr |= (instr & 0x7800) >> 17;
+   /* bits 30:31 --> 12:13 */
+   if (IS_DSFORM(instr))
+   dsisr |= (instr & 0x0003) << 18;
+   }
+
+   return dsisr;
+}
  #endif /* __ASM_PPC_DISASSEMBLE_H__ */
diff --git a/arch/powerpc/kernel/align.c b/arch/powerpc/kernel/align.c
index 94908af308d8..34f55524d456 100644
--- a/arch/powerpc/kernel/align.c
+++ b/arch/powerpc/kernel/align.c
@@ -25,14 +25,13 @@
  #include 
  #include 
  #include 
+#include 
  
  struct aligninfo {

unsigned char len;
unsigned char flags;
  };
  
-#define IS_XFORM(inst)	(((inst) >> 26) == 31)

-#define IS_DSFORM(inst)(((inst) >> 26) >= 56)
  
  #define INVALID	{ 0, 0 }
  
@@ -192,37 +191,6 @@ static struct aligninfo aligninfo[128] = {

  };
  
  /*

- * Create a DSISR value from the instruction
- */
-static inline unsigned make_dsisr(unsigned instr)
-{
-   unsigned dsisr;
-
-
-   /* bits  6:15 --> 22:31 */
-   dsisr = (instr & 0x03ff) >> 16;
-
-   if (IS_XFORM(instr)) {
-   /* bits 29:30 --> 15:16 */
-   dsisr |= (instr & 0x0006) << 14;
-   /* bit 25 -->17 */
-   dsisr |= (instr & 0x0040) << 8;
-   /* bits 21:24 --> 18:21 */
-   dsisr |= (instr & 0x0780) << 3;
-   } else {
-   /* bit  5 -->17 */
-   dsisr |= (instr & 0x0400) >> 12;
-   /* bits  1: 4 --> 18:21 */
-   dsisr |= (instr & 0x7800) >> 17;
-   /* bits 30:31 --> 12:13 */
-   if (IS_DSFORM(instr))
-   dsisr |= (instr & 0x0003) << 18;
-   }
-
-   return dsisr;
-}
-
-/*
   * The dcbz (data cache block zero) instruction
   * gives an alignment fault if used on non-cacheable
   * memory.  We handle the fault mainly for the
diff --git a/arch/powerpc/kvm/book3s_emulate.c 
b/arch/powerpc/kvm/book3s_emulate.c
index 99d40f8977e8..04c38f049dfd 100644
--- a/arch/powerpc/kvm/book3s_emulate.c
+++ b/arch/powerpc/kvm/book3s_emulate.c
@@ -569,48 +569,14 @@ unprivileged:
  
  u32 kvmppc_alignment_dsisr(struct kvm_vcpu *vcpu, unsigned int inst)

  {
-   u32 dsisr = 0;
-
-   /*
-* This is what the spec says about DSISR bits (not mentioned = 0):
-*
-* 12:13[DS]Set to bits 30:31
-* 15:16[X] Set to bits 29:30
-* 17   [X] Set to bit 25
-*  [D/DS]  Set to bit 5
-* 18:21[X] Set to bits 21:24
-*  [D/DS]  Set to bits 1:4
-* 22:26Set to bits 6:10 (RT/RS/FRT/FRS)
-* 27:31Set to bits 11:15 (RA)
-*/
-
-   switch (get_op(inst)) {
-   /* D-form */
-   case OP_LFS:
-   case OP_LFD:
-   case OP_STFD:
-   case OP_STFS:
-   dsisr |= (inst >> 12) & 0x4000;   /* bit 17 */
-   dsisr |= (inst >> 17) & 0x3c00; /* bits 18:21 */
-   break;
-   /* X-form */
-   case 31:
-   dsisr |= (inst << 14) & 0x18000; /* bits 15:16 */
-   dsisr |= (inst <

Re: [PATCH 0/6] KVM: PPC: Book3S PR: Add POWER8 support

2014-05-05 Thread Alexander Graf


On 05/04/2014 06:36 PM, Aneesh Kumar K.V wrote:

Alexander Graf  writes:


When running on a POWER8 host, we get away with running the guest as POWER7
and nothing falls apart.

However, when we start exposing POWER8 as guest CPU, guests will start using
new abilities on POWER8 which we need to handle.

This patch set does a minimalistic approach to implementing those bits to
make guests happy enough to run.


Alex

Alexander Graf (6):
   KVM: PPC: Book3S PR: Ignore PMU SPRs
   KVM: PPC: Book3S PR: Emulate TIR register
   KVM: PPC: Book3S PR: Handle Facility interrupt and FSCR
   KVM: PPC: Book3S PR: Expose TAR facility to guest
   KVM: PPC: Book3S PR: Expose EBB registers
   KVM: PPC: Book3S PR: Expose TM registers

  arch/powerpc/include/asm/kvm_asm.h| 18 ---
  arch/powerpc/include/asm/kvm_book3s_asm.h |  2 +
  arch/powerpc/include/asm/kvm_host.h   |  3 ++
  arch/powerpc/kernel/asm-offsets.c |  3 ++
  arch/powerpc/kvm/book3s.c | 34 +
  arch/powerpc/kvm/book3s_emulate.c | 53 
  arch/powerpc/kvm/book3s_hv.c  | 30 ---
  arch/powerpc/kvm/book3s_pr.c  | 82 +++
  arch/powerpc/kvm/book3s_segment.S | 25 ++
  9 files changed, 212 insertions(+), 38 deletions(-)


I did most of this as part of

[RFC PATCH 01/10] KVM: PPC: BOOK3S: PR: Add POWER8 support
http://mid.gmane.org/1390927455-3312-1-git-send-email-aneesh.ku...@linux.vnet.ibm.com

Any reason why that is not picked up ? TM was the reason I didn't push the
patchset again. I was not sure how to get all the TM details to
work.


Ugh, I guess I mostly discarded it as brainstorm patches because they 
were marked RFC :(



Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V5] KVM: PPC: BOOK3S: PR: Enable Little Endian PR guest

2014-05-05 Thread Alexander Graf


On 05/05/2014 05:09 AM, Aneesh Kumar K.V wrote:

This patch make sure we inherit the LE bit correctly in different case
so that we can run Little Endian distro in PR mode

Signed-off-by: Aneesh Kumar K.V 


Thanks, applied to kvm-ppc-queue.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 11/11] perf kvm: add stat support on s390

2014-05-05 Thread Christian Borntraeger

On 25/04/14 11:12, Christian Borntraeger wrote:
> +#if defined(__i386__) || defined(__x86_64__)
>   else if (!strcmp(kvm->report_event, "mmio"))
>   kvm->events_ops = &mmio_events;
>   else if (!strcmp(kvm->report_event, "ioport"))
>   kvm->events_ops = &ioport_events;
> +#endif

To address Davids review, the next version will have this hunk as well:

diff --git a/tools/perf/Documentation/perf-kvm.txt 
b/tools/perf/Documentation/perf-kvm.txt
index 52276a6..e974749 100644
--- a/tools/perf/Documentation/perf-kvm.txt
+++ b/tools/perf/Documentation/perf-kvm.txt
@@ -103,8 +103,8 @@ STAT REPORT OPTIONS
analyze events which occures on this vcpu. (default: all vcpus)
 
 --event=::
-   event to be analyzed. Possible values: vmexit, mmio, ioport.
-   (default: vmexit)
+   event to be analyzed. Possible values: vmexit, mmio (x86 only),
+   ioport (x86 only). (default: vmexit)
 -k::
 --key=::
Sorting key. Possible values: sample (default, sort by samples

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC 00/11] perf/s390/kvm: trace events, perf kvm stat

2014-05-05 Thread Christian Borntraeger

On 02/05/14 20:14, David Ahern wrote:
> On 5/2/14, 3:16 AM, Jiri Olsa wrote:
[...]
>> CC-ing David Ahern
>>
> I don't have the original emails, but looking at
> https://lkml.org/lkml/2014/4/25/331
> 
> 
> [PATCH 01/11] s390: add sie exit reasons tables
> [PATCH 02/11] KVM: s390: Use trace tables from sie.h
> [PATCH 03/11] KVM: s390: decoder of SIE intercepted instructions
> [PATCH 04/11] KVM: s390: Use intercept_insn decoder in trace event
> - not perf related
> 
> 
> [PATCH 05/11] perf kvm: Intoduce HAVE_KVM_STAT_SUPPORT flag
> [PATCH 06/11] perf kvm: simplify of exit reasons tables definitions
> [PATCH 07/11] perf kvm: Refactoring of cpu_isa_config()
> [PATCH 10/11] perf: allow to use cpuinfo on s390
> Reviewed-by: David Ahern 
> 
> 
> [PATCH 09/11] perf kvm: use defines of kvm events
> - KVM team should ack kvm.h change

Paolo,
any chance to ack these changes?

> - perf side looks fine to me
> 
> [PATCH 11/11] perf kvm: add stat support on s390
> - like to see the arch bits moved to arch/x86 and arch/s390 rather than 
> adding #ifdefs
> - disabling ioport and mmio options is ok, but if you are going to compile it 
> out update the documentation accordingly.
> 
> David
> 

Thanks.

Question is now how to proceed:

Patch 1-4 are s390/kvm specific. I am s390/kvm maintainer, so I can hereby Ack 
them.
Patch 5-10 are perf specific
Patch 11 is s390/kvm/perf specific and needs both patch series as a base.

I see several variants for the next submission:

a: all patches via Paolos KVM tree
b: all patches via perf tree  (e.g. via Jiri)
c: via both trees. (e.g. I prepare a git branch based on 3.15-rc1 so that 
during next merge window the common history should make most things work out 
fine)
d: patch 1-4 via KVM, patch 5-10 via perf, patch 11 after both trees are merged 
into Linus

Christian

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 08/11] perf kvm: allow for variable string sizes

2014-05-05 Thread Christian Borntraeger

David,

thanks for the review. 

Are you ok with this change as well? The alternative is to shorten our 
descriptions (in 1/11 s390: add sie exit reasons tables), which would make the 
trace output less comprehensible, though.

Christian

On 25/04/14 11:12, Christian Borntraeger wrote:
> From: Alexander Yarygin 
> 
> This makes it possible for other architectures to decode to different
> string lengths.
> 
> Needed by follow-up patch "perf kvm: add stat support on s390".
> 
> Signed-off-by: Alexander Yarygin 
> Signed-off-by: Christian Borntraeger 
> ---
>  tools/perf/builtin-kvm.c | 38 +++---
>  1 file changed, 23 insertions(+), 15 deletions(-)
> 
> diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c
> index 922706c..806c0e4 100644
> --- a/tools/perf/builtin-kvm.c
> +++ b/tools/perf/builtin-kvm.c
> @@ -75,7 +75,7 @@ struct kvm_events_ops {
>   bool (*is_end_event)(struct perf_evsel *evsel,
>struct perf_sample *sample, struct event_key *key);
>   void (*decode_key)(struct perf_kvm_stat *kvm, struct event_key *key,
> -char decode[20]);
> +char *decode);
>   const char *name;
>  };
> 
> @@ -84,6 +84,8 @@ struct exit_reasons_table {
>   const char *reason;
>  };
> 
> +#define DECODE_STR_LEN_MAX 80
> +
>  #define EVENTS_BITS  12
>  #define EVENTS_CACHE_SIZE(1UL << EVENTS_BITS)
> 
> @@ -101,6 +103,8 @@ struct perf_kvm_stat {
>   struct exit_reasons_table *exit_reasons;
>   const char *exit_reasons_isa;
> 
> + int decode_str_len;
> +
>   struct kvm_events_ops *events_ops;
>   key_cmp_fun compare;
>   struct list_head kvm_events_cache[EVENTS_CACHE_SIZE];
> @@ -182,12 +186,12 @@ static const char *get_exit_reason(struct perf_kvm_stat 
> *kvm,
> 
>  static void exit_event_decode_key(struct perf_kvm_stat *kvm,
> struct event_key *key,
> -   char decode[20])
> +   char *decode)
>  {
>   const char *exit_reason = get_exit_reason(kvm, kvm->exit_reasons,
> key->key);
> 
> - scnprintf(decode, 20, "%s", exit_reason);
> + scnprintf(decode, kvm->decode_str_len, "%s", exit_reason);
>  }
> 
>  static struct kvm_events_ops exit_events = {
> @@ -249,10 +253,11 @@ static bool mmio_event_end(struct perf_evsel *evsel, 
> struct perf_sample *sample,
> 
>  static void mmio_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused,
> struct event_key *key,
> -   char decode[20])
> +   char *decode)
>  {
> - scnprintf(decode, 20, "%#lx:%s", (unsigned long)key->key,
> - key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
> + scnprintf(decode, kvm->decode_str_len, "%#lx:%s",
> +   (unsigned long)key->key,
> +   key->info == KVM_TRACE_MMIO_WRITE ? "W" : "R");
>  }
> 
>  static struct kvm_events_ops mmio_events = {
> @@ -292,10 +297,11 @@ static bool ioport_event_end(struct perf_evsel *evsel,
> 
>  static void ioport_event_decode_key(struct perf_kvm_stat *kvm __maybe_unused,
>   struct event_key *key,
> - char decode[20])
> + char *decode)
>  {
> - scnprintf(decode, 20, "%#llx:%s", (unsigned long long)key->key,
> - key->info ? "POUT" : "PIN");
> + scnprintf(decode, kvm->decode_str_len, "%#llx:%s",
> +   (unsigned long long)key->key,
> +   key->info ? "POUT" : "PIN");
>  }
> 
>  static struct kvm_events_ops ioport_events = {
> @@ -523,13 +529,13 @@ static bool handle_end_event(struct perf_kvm_stat *kvm,
>   time_diff = sample->time - time_begin;
> 
>   if (kvm->duration && time_diff > kvm->duration) {
> - char decode[32];
> + char decode[DECODE_STR_LEN_MAX];
> 
>   kvm->events_ops->decode_key(kvm, &event->key, decode);
>   if (strcmp(decode, "HLT")) {
> - pr_info("%" PRIu64 " VM %d, vcpu %d: %s event took %" 
> PRIu64 "usec\n",
> + pr_info("%" PRIu64 " VM %d, vcpu %d: %*s event took %" 
> PRIu64 "usec\n",
>sample->time, sample->pid, 
> vcpu_record->vcpu_id,
> -  decode, time_diff/1000);
> +  32, decode, time_diff/1000);
>   }
>   }
> 
> @@ -738,7 +744,7 @@ static void show_timeofday(void)
> 
>  static void print_result(struct perf_kvm_stat *kvm)
>  {
> - char decode[20];
> + char decode[DECODE_STR_LEN_MAX];
>   struct kvm_event *event;
>   int vcpu = kvm->trace_vcpu;
> 
> @@ -749,7 +755,7 @@ static void print_result(struct perf_kvm_stat *kvm)
> 
>   pr_info("\n\n");
>   print_vcpu_info(kvm);
> - pr_info("%2

Re: [patch] KVM: s390: return -EFAULT if copy_from_user() fails

2014-05-05 Thread Christian Borntraeger

On 03/05/14 22:18, Dan Carpenter wrote:
> When copy_from_user() fails, this code returns the number of bytes
> remaining instead of a negative error code.  The positive number is
> returned to the user but otherwise it is harmless.
> 
> Signed-off-by: Dan Carpenter 

Thanks. Applied to KVM/s390 fix queue.


> ---
> I am not able to compile this.
> 
> diff --git a/arch/s390/kvm/guestdbg.c b/arch/s390/kvm/guestdbg.c
> index 757ccef..3e8d409 100644
> --- a/arch/s390/kvm/guestdbg.c
> +++ b/arch/s390/kvm/guestdbg.c
> @@ -223,9 +223,10 @@ int kvm_s390_import_bp_data(struct kvm_vcpu *vcpu,
>   goto error;
>   }
> 
> - ret = copy_from_user(bp_data, dbg->arch.hw_bp, size);
> - if (ret)
> + if (copy_from_user(bp_data, dbg->arch.hw_bp, size)) {
> + ret = -EFAULT;
>   goto error;
> + }
> 
>   for (i = 0; i < dbg->arch.nr_hw_bp; i++) {
>   switch (bp_data[i].type) {
> --
> To unsubscribe from this list: send the line "unsubscribe linux-s390" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/1] KVM: x86: improve the usability of the 'kvm_pio' tracepoint

2014-05-05 Thread Xiao Guangrong

On 05/02/2014 11:57 PM, Ulrich Obergfell wrote:
> The current implementation of the 'kvm_pio' tracepoint in 
> emulator_pio_in_out()
> only tells us that 'something' has been read from or written to an I/O port. 
> To
> improve the usability of the tracepoint, I propose to include the 
> value/content
> that has been read or written in the trace output. The proposed patch aims at
> the more common case where a single 8-bit or 16-bit or 32-bit value has been
> read or written -- it does not fully cover the case where 'count' is greater
> than one.
> 
> This is an example of what the patch can do (trace of PCI config space 
> access).
> 
>  - on the host
> 
># trace-cmd record -e kvm:kvm_pio -f "(port >= 0xcf8) && (port <= 0xcff)"
>/sys/kernel/debug/tracing/events/kvm/kvm_pio/filter
>Hit Ctrl^C to stop recording
> 
>  - in a Linux guest
> 
># dd if=/sys/bus/pci/devices/:00:06.0/config bs=2 count=4 | hexdump
>4+0 records in
>4+0 records out
>8 bytes (8 B) copied, 0.000114056 s, 70.1 kB/s
>000 1af4 1001 0507 0010
>008
> 
>  - on the host
> 
># trace-cmd report
>...
>qemu-kvm-23216 [001] 15211.994089: kvm_pio:  pio_write
>at 0xcf8 size 4 count 1 val 0x80003000 
>qemu-kvm-23216 [001] 15211.994108: kvm_pio:  pio_read
>at 0xcfc size 2 count 1 val 0x1af4 
>qemu-kvm-23216 [001] 15211.994129: kvm_pio:  pio_write
>at 0xcf8 size 4 count 1 val 0x80003000 
>qemu-kvm-23216 [001] 15211.994136: kvm_pio:  pio_read
>at 0xcfe size 2 count 1 val 0x1001 
>qemu-kvm-23216 [001] 15211.994143: kvm_pio:  pio_write
>at 0xcf8 size 4 count 1 val 0x80003004 
>qemu-kvm-23216 [001] 15211.994150: kvm_pio:  pio_read
>at 0xcfc size 2 count 1 val 0x507 
>qemu-kvm-23216 [001] 15211.994155: kvm_pio:  pio_write
>at 0xcf8 size 4 count 1 val 0x80003004 
>qemu-kvm-23216 [001] 15211.994161: kvm_pio:  pio_read
>at 0xcfe size 2 count 1 val 0x10 
> 

Nice.

Could please check "perf kvm stat" to see if "--event=ioport"
can work after your patch?

Reviewed-by: Xiao Guangrong 

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

72 matches

Mail list logo