date:20150604

Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Xiao Guangrong



[ CCed Zhang Yang ]

On 06/04/2015 04:36 PM, Paolo Bonzini wrote:



On 04/06/2015 10:23, Xiao Guangrong wrote:


So, why do you need to always use IPAT=0?  Can patch 15 keep the current
logic for RAM, like this:

 if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
 ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
   VMX_EPT_MT_EPTE_SHIFT;
 else
 ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
 | VMX_EPT_IPAT_BIT;


Yeah, it's okay, actually we considered this way, however
- it's light enough, it did not hurt guest performance based on our
   benchmark.
- the logic has always used for noncherent_dma case, extend it to
   normal case should have low risk and also help us to check the logic.


But noncoherent_dma is not the common case, so it's not necessarily true
that the risk is low.


I thought noncoherent_dma exists on 1st generation(s) IOMMU, it should
be fully tested at that time.




- completely follow MTRRS spec would be better than host hides it.


We are a virtualization platform, we know well when MTRRs are necessary.

Tis a risk from blindly obeying the guest MTRRs: userspace can see stale
data if the guest's accesses bypass the cache.  AMD bypasses this by
enabling snooping even in cases that ordinarily wouldn't snoop; for
Intel the solution is that RAM-backed areas should always use IPAT.


Not sure if UC and other cacheable type combinations on guest and host
will cause problem. The SMD mentioned that snoop is not required only when
"The UC attribute comes from the MTRRs and the processors are not required
 to snoop their caches since the data could never have been cached."
(Vol 3. 11.5.2.2)
VMX do not touch hardware MTRR MSRs and i guess snoop works under this case.

I also noticed if SS (self-snooping) is supported we need not to invalidate
cache when programming memory type (Vol 3. 11.11.8), so that means CPU works
well on the page which has different cache types i guess.

After think it carefully, we (Zhang Yang) doubt if always set WB for DMA
memory is really a good idea because we can not assume WB DMA works well for
all devices. One example is that audio DMA (not a MMIO region) is required WC
to improve its performance.

However, we think the SDM is not clear enough so let's do full vMTRR on MMIO
and noncoherent_dma first.　:)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH V1 0/5] Enable ACPI support for KVM ARM

2015-06-04 Thread Wei Huang


Hi Christoffer and Marc,

Ping. Any comments on this patchset for V2?

Thanks,
-Wei


On 5/28/15 00:23, Wei Huang wrote:

Initial ACPI support for ARM64 has been accepted into Linux kernel recently.
Now it is a good time to re-visit ACPI support for KVM. This patchset
enables ACPI for both arch_timer and vGIC by probing related ACPI tables
and does necessary initialization.

Note that Alexander Spyridaki submitted similar patches before. Some of
his ideas were borrowed in this patchset, but with substancial changes.
In addition we extend support for both GICv2 and GICv3.

This patchset would work better on top of recent GIC/IRQCHIP patches by
Hanjun Guo, who added support for gic_version in ACPI struct of GIC
distributor (search "ACPICA: Introduce GIC version for arm based system").

This patchset can be applied cleanly on top of Linx 4.1-rc1.

Wei Huang (5):
   kvm: arm64: Enable ACPI support for virt arch timer
   kvm: arm64: Dispatch virt GIC probing to device tree and ACPI
   kvm: arm64: Detect GIC version for proper ACPI vGIC probing
   kvm: arm64: Implement ACPI probing code for GICv2
   kvm: arm64: Implement ACPI probing code for GICv3

  include/kvm/arm_vgic.h|  36 +---
  virt/kvm/arm/arch_timer.c |  64 -
  virt/kvm/arm/vgic-v2.c|  65 +++--
  virt/kvm/arm/vgic-v3.c|  56 +--
  virt/kvm/arm/vgic.c   | 140 ++
  5 files changed, 320 insertions(+), 41 deletions(-)


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 0/4] Consolidated KVM vPMU support for x86

2015-06-04 Thread Wei Huang

Currently KVM only supports vPMU for Intel CPUs. This patchset enables
KVM vPMU support for AMD platform by creating a common PMU interface for
x86. By refractoring, PMU related MSR accesses from guest VMs are dispatched
to corresponding functions defined in arch specific files.

V4:
  * Change vPMU API function names to further clarify their functionality
  * Small fix for switch statement index in EVNTSELn and PERFCTRn (patch 4)
  * Add Tested-by and Reviewed-by

V3:
  * Rebase the code to the latest of KVM tree (queue branch);
  * Branch out the Intel specific code from pmu.c to pmu_intel.c, in order
to reflect the change history more accurately;
  * Name the parameters/variables more consistently (use msr, idx, 
pmc_idx) across files;
  * Fix issues (whitespaces, macro names, ...) based on Radim's V2 comments;
  * Fix the MSR_K7_PERFCTRn and MSR_K7_EVNTSELn access code (in patch 4);

V2:
  * Create a generic pmu.c file which is shared by Intel and AMD CPUs;
  * pmu.c code becomes part of kvm.ko module. Similarly pmu_intel.c and
pmu_amd.c are linked to kvm-intel.ko and kvm-amd.ko respectively;
  * Re-define kvm_pmu_ops function pointers. Per Radim Krcmar's comments,
a large portion of Intel vPMU code are now consolidated and moved to
pmu.c;
  * Polish pmu_amd.c code to comply with new definition of kvm_pmu_ops;

V1:
  * Adopt the file layout suggested by Radim Krcmar
  * Link arch module with its specific PMU file

RFC:
  * Initial version for RFC

Wei Huang (4):
  KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch
  KVM: x86/vPMU: Create vPMU interface for VMX and SVM
  KVM: x86/vPMU: Implement AMD vPMU code for KVM
  KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs

 arch/x86/include/asm/kvm_host.h |  50 ++--
 arch/x86/kvm/Makefile   |   4 +-
 arch/x86/kvm/cpuid.c|   3 +-
 arch/x86/kvm/pmu.c  | 557 +++-
 arch/x86/kvm/pmu.h  |  99 +++
 arch/x86/kvm/pmu_amd.c  | 207 +++
 arch/x86/kvm/pmu_intel.c| 360 ++
 arch/x86/kvm/svm.c  |   8 +
 arch/x86/kvm/vmx.c  |   8 +
 arch/x86/kvm/x86.c  |  76 ++
 10 files changed, 896 insertions(+), 476 deletions(-)
 create mode 100644 arch/x86/kvm/pmu.h
 create mode 100644 arch/x86/kvm/pmu_amd.c
 create mode 100644 arch/x86/kvm/pmu_intel.c

-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 4/4] KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs

2015-06-04 Thread Wei Huang

This patch enables AMD guest VM to access (R/W) PMU related MSRs, which
include PERFCTR[0..3] and EVNTSEL[0..3].

Reviewed-by: Radim Krčmář 
Signed-off-by: Wei Huang 
---
 arch/x86/kvm/x86.c | 51 +--
 1 file changed, 9 insertions(+), 42 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 1bd1165..afd88e4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2261,36 +2261,11 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct 
msr_data *msr_info)
case MSR_IA32_MC0_CTL ... MSR_IA32_MCx_CTL(KVM_MAX_MCE_BANKS) - 1:
return set_msr_mce(vcpu, msr, data);
 
-   /* Performance counters are not protected by a CPUID bit,
-* so we should check all of them in the generic path for the sake of
-* cross vendor migration.
-* Writing a zero into the event select MSRs disables them,
-* which we perfectly emulate ;-). Any other value should be at least
-* reported, some guests depend on them.
-*/
-   case MSR_K7_EVNTSEL0:
-   case MSR_K7_EVNTSEL1:
-   case MSR_K7_EVNTSEL2:
-   case MSR_K7_EVNTSEL3:
-   if (data != 0)
-   vcpu_unimpl(vcpu, "unimplemented perfctr wrmsr: "
-   "0x%x data 0x%llx\n", msr, data);
-   break;
-   /* at least RHEL 4 unconditionally writes to the perfctr registers,
-* so we ignore writes to make it happy.
-*/
-   case MSR_K7_PERFCTR0:
-   case MSR_K7_PERFCTR1:
-   case MSR_K7_PERFCTR2:
-   case MSR_K7_PERFCTR3:
-   vcpu_unimpl(vcpu, "unimplemented perfctr wrmsr: "
-   "0x%x data 0x%llx\n", msr, data);
-   break;
-   case MSR_P6_PERFCTR0:
-   case MSR_P6_PERFCTR1:
-   pr = true;
-   case MSR_P6_EVNTSEL0:
-   case MSR_P6_EVNTSEL1:
+   case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
+   case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
+   pr = true; /* fall through */
+   case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
+   case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_set_msr(vcpu, msr_info);
 
@@ -2513,24 +2488,16 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 *pdata)
case MSR_K8_SYSCFG:
case MSR_K7_HWCR:
case MSR_VM_HSAVE_PA:
-   case MSR_K7_EVNTSEL0:
-   case MSR_K7_EVNTSEL1:
-   case MSR_K7_EVNTSEL2:
-   case MSR_K7_EVNTSEL3:
-   case MSR_K7_PERFCTR0:
-   case MSR_K7_PERFCTR1:
-   case MSR_K7_PERFCTR2:
-   case MSR_K7_PERFCTR3:
case MSR_K8_INT_PENDING_MSG:
case MSR_AMD64_NB_CFG:
case MSR_FAM10H_MMIO_CONF_BASE:
case MSR_AMD64_BU_CFG2:
data = 0;
break;
-   case MSR_P6_PERFCTR0:
-   case MSR_P6_PERFCTR1:
-   case MSR_P6_EVNTSEL0:
-   case MSR_P6_EVNTSEL1:
+   case MSR_K7_EVNTSEL0 ... MSR_K7_EVNTSEL3:
+   case MSR_K7_PERFCTR0 ... MSR_K7_PERFCTR3:
+   case MSR_P6_PERFCTR0 ... MSR_P6_PERFCTR1:
+   case MSR_P6_EVNTSEL0 ... MSR_P6_EVNTSEL1:
if (kvm_pmu_is_valid_msr(vcpu, msr))
return kvm_pmu_get_msr(vcpu, msr, pdata);
data = 0;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH V4 1/4] KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch

2015-06-04 Thread Wei Huang

This patch defines a new function pointer struct (kvm_pmu_ops) to
support vPMU for both Intel and AMD. The functions pointers defined in
this new struct will be linked with Intel and AMD functions later. In the
meanwhile the struct that maps from event_sel bits to PERF_TYPE_HARDWARE
events is renamed and moved from Intel specific code to kvm_host.h as a
common struct.

Signed-off-by: Wei Huang 
---
 arch/x86/include/asm/kvm_host.h | 34 --
 arch/x86/kvm/pmu.c  | 21 -
 2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index dea2e7e..4f0ca20 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -305,6 +305,12 @@ struct kvm_mmu {
u64 pdptrs[4]; /* pae */
 };
 
+struct msr_data {
+   bool host_initiated;
+   u32 index;
+   u64 data;
+};
+
 enum pmc_type {
KVM_PMC_GP = 0,
KVM_PMC_FIXED,
@@ -319,6 +325,12 @@ struct kvm_pmc {
struct kvm_vcpu *vcpu;
 };
 
+struct kvm_event_hw_type_mapping {
+   u8 eventsel;
+   u8 unit_mask;
+   unsigned event_type;
+};
+
 struct kvm_pmu {
unsigned nr_arch_gp_counters;
unsigned nr_arch_fixed_counters;
@@ -337,6 +349,22 @@ struct kvm_pmu {
u64 reprogram_pmi;
 };
 
+struct kvm_pmu_ops {
+   unsigned (*find_arch_event)(struct kvm_pmu *pmu, u8 event_select,
+   u8 unit_mask);
+   unsigned (*find_fixed_event)(int idx);
+   bool (*pmc_enabled)(struct kvm_pmc *pmc);
+   struct kvm_pmc *(*pmc_idx_to_pmc)(struct kvm_pmu *pmu, int pmc_idx);
+   struct kvm_pmc *(*msr_idx_to_pmc)(struct kvm_vcpu *vcpu, unsigned idx);
+   int (*is_valid_msr_idx)(struct kvm_vcpu *vcpu, unsigned idx);
+   bool (*is_valid_msr)(struct kvm_vcpu *vcpu, u32 msr);
+   int (*get_msr)(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
+   int (*set_msr)(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
+   void (*refresh)(struct kvm_vcpu *vcpu);
+   void (*init)(struct kvm_vcpu *vcpu);
+   void (*reset)(struct kvm_vcpu *vcpu);
+};
+
 enum {
KVM_DEBUGREG_BP_ENABLED = 1,
KVM_DEBUGREG_WONT_EXIT = 2,
@@ -679,12 +707,6 @@ struct kvm_vcpu_stat {
 
 struct x86_instruction_info;
 
-struct msr_data {
-   bool host_initiated;
-   u32 index;
-   u64 data;
-};
-
 struct kvm_lapic_irq {
u32 vector;
u32 delivery_mode;
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 29fbf9d..3af3404 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -20,12 +20,7 @@
 #include "cpuid.h"
 #include "lapic.h"
 
-static struct kvm_arch_event_perf_mapping {
-   u8 eventsel;
-   u8 unit_mask;
-   unsigned event_type;
-   bool inexact;
-} arch_events[] = {
+static struct kvm_event_hw_type_mapping intel_arch_events[] = {
/* Index must match CPUID 0x0A.EBX bit vector */
[0] = { 0x3c, 0x00, PERF_COUNT_HW_CPU_CYCLES },
[1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
@@ -37,7 +32,7 @@ static struct kvm_arch_event_perf_mapping {
[7] = { 0x00, 0x30, PERF_COUNT_HW_REF_CPU_CYCLES },
 };
 
-/* mapping between fixed pmc index and arch_events array */
+/* mapping between fixed pmc index and intel_arch_events array */
 static int fixed_pmc_events[] = {1, 0, 7};
 
 static bool pmc_is_gp(struct kvm_pmc *pmc)
@@ -202,16 +197,16 @@ static unsigned find_arch_event(struct kvm_pmu *pmu, u8 
event_select,
 {
int i;
 
-   for (i = 0; i < ARRAY_SIZE(arch_events); i++)
-   if (arch_events[i].eventsel == event_select
-   && arch_events[i].unit_mask == unit_mask
+   for (i = 0; i < ARRAY_SIZE(intel_arch_events); i++)
+   if (intel_arch_events[i].eventsel == event_select
+   && intel_arch_events[i].unit_mask == unit_mask
&& (pmu->available_event_types & (1 << i)))
break;
 
-   if (i == ARRAY_SIZE(arch_events))
+   if (i == ARRAY_SIZE(intel_arch_events))
return PERF_COUNT_HW_MAX;
 
-   return arch_events[i].event_type;
+   return intel_arch_events[i].event_type;
 }
 
 static void reprogram_gp_counter(struct kvm_pmc *pmc, u64 eventsel)
@@ -265,7 +260,7 @@ static void reprogram_fixed_counter(struct kvm_pmc *pmc, u8 
en_pmi, int idx)
return;
 
reprogram_counter(pmc, PERF_TYPE_HARDWARE,
-   arch_events[fixed_pmc_events[idx]].event_type,
+   intel_arch_events[fixed_pmc_events[idx]].event_type,
!(en & 0x2), /* exclude user */
!(en & 0x1), /* exclude kernel */
pmi, false, false);
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger

[PATCH V4 3/4] KVM: x86/vPMU: Implement AMD vPMU code for KVM

2015-06-04 Thread Wei Huang

This patch replaces the empty AMD vPMU functions (in pmu_amd.c) with real
implementation.

Signed-off-by: Wei Huang 
---
 arch/x86/kvm/pmu_amd.c | 122 ++---
 1 file changed, 116 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/pmu_amd.c b/arch/x86/kvm/pmu_amd.c
index 388f637..ff5cea7 100644
--- a/arch/x86/kvm/pmu_amd.c
+++ b/arch/x86/kvm/pmu_amd.c
@@ -19,11 +19,33 @@
 #include "lapic.h"
 #include "pmu.h"
 
+/* duplicated from amd_perfmon_event_map, K7 and above should work. */
+static struct kvm_event_hw_type_mapping amd_event_mapping[] = {
+   [0] = { 0x76, 0x00, PERF_COUNT_HW_CPU_CYCLES },
+   [1] = { 0xc0, 0x00, PERF_COUNT_HW_INSTRUCTIONS },
+   [2] = { 0x80, 0x00, PERF_COUNT_HW_CACHE_REFERENCES },
+   [3] = { 0x81, 0x00, PERF_COUNT_HW_CACHE_MISSES },
+   [4] = { 0xc2, 0x00, PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
+   [5] = { 0xc3, 0x00, PERF_COUNT_HW_BRANCH_MISSES },
+   [6] = { 0xd0, 0x00, PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
+   [7] = { 0xd1, 0x00, PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
+};
+
 static unsigned amd_find_arch_event(struct kvm_pmu *pmu,
u8 event_select,
u8 unit_mask)
 {
-   return PERF_COUNT_HW_MAX;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(amd_event_mapping); i++)
+   if (amd_event_mapping[i].eventsel == event_select
+   && amd_event_mapping[i].unit_mask == unit_mask)
+   break;
+
+   if (i == ARRAY_SIZE(amd_event_mapping))
+   return PERF_COUNT_HW_MAX;
+
+   return amd_event_mapping[i].event_type;
 }
 
 /* return PERF_COUNT_HW_MAX as AMD doesn't have fixed events */
@@ -32,53 +54,141 @@ static unsigned amd_find_fixed_event(int idx)
return PERF_COUNT_HW_MAX;
 }
 
+/* check if a PMC is enabled by comparing it against global_ctrl bits. Because
+ * AMD CPU doesn't have global_ctrl MSR, all PMCs are enabled (return TRUE).
+ */
 static bool amd_pmc_enabled(struct kvm_pmc *pmc)
 {
-   return false;
+   return true;
 }
 
 static struct kvm_pmc *amd_pmc_idx_to_pmc(struct kvm_pmu *pmu, int pmc_idx)
 {
-   return NULL;
+   return get_gp_pmc(pmu, MSR_K7_EVNTSEL0 + pmc_idx, MSR_K7_EVNTSEL0);
 }
 
 /* returns 0 if idx's corresponding MSR exists; otherwise returns 1. */
 static int amd_is_valid_msr_idx(struct kvm_vcpu *vcpu, unsigned idx)
 {
-   return 1;
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+   idx &= ~(3u << 30);
+
+   return (idx >= pmu->nr_arch_gp_counters);
 }
 
 /* idx is the ECX register of RDPMC instruction */
 static struct kvm_pmc *amd_msr_idx_to_pmc(struct kvm_vcpu *vcpu, unsigned idx)
 {
-   return NULL;
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   struct kvm_pmc *counters;
+
+   idx &= ~(3u << 30);
+   if (idx >= pmu->nr_arch_gp_counters)
+   return NULL;
+   counters = pmu->gp_counters;
+
+   return &counters[idx];
 }
 
 static bool amd_is_valid_msr(struct kvm_vcpu *vcpu, u32 msr)
 {
-   return false;
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   int ret = false;
+
+   ret = get_gp_pmc(pmu, msr, MSR_K7_PERFCTR0) ||
+   get_gp_pmc(pmu, msr, MSR_K7_EVNTSEL0);
+
+   return ret;
 }
 
 static int amd_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   struct kvm_pmc *pmc;
+
+   /* MSR_K7_PERFCTRn */
+   pmc = get_gp_pmc(pmu, msr, MSR_K7_PERFCTR0);
+   if (pmc) {
+   *data = pmc_read_counter(pmc);
+   return 0;
+   }
+   /* MSR_K7_EVNTSELn */
+   pmc = get_gp_pmc(pmu, msr, MSR_K7_EVNTSEL0);
+   if (pmc) {
+   *data = pmc->eventsel;
+   return 0;
+   }
+
return 1;
 }
 
 static int amd_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 {
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+   struct kvm_pmc *pmc;
+   u32 msr = msr_info->index;
+   u64 data = msr_info->data;
+
+   /* MSR_K7_PERFCTRn */
+   pmc = get_gp_pmc(pmu, msr, MSR_K7_PERFCTR0);
+   if (pmc) {
+   if (!msr_info->host_initiated)
+   data = (s64)data;
+   pmc->counter += data - pmc_read_counter(pmc);
+   return 0;
+   }
+   /* MSR_K7_EVNTSELn */
+   pmc = get_gp_pmc(pmu, msr, MSR_K7_EVNTSEL0);
+   if (pmc) {
+   if (data == pmc->eventsel)
+   return 0;
+   if (!(data & pmu->reserved_bits)) {
+   reprogram_gp_counter(pmc, data);
+   return 0;
+   }
+   }
+
return 1;
 }
 
 static void amd_pmu_refresh(struct kvm_vcpu *vcpu)
 {
+   struct kvm_pmu *pmu = vcpu_to_pmu(vcpu);
+
+   pmu->nr_arch_gp_counters = AMD64_NUM_COUNTERS;
+   pmu->counter_bitmask[KVM_PMC_GP] = ((u64)1 << 48) - 1;
+   pmu->rese

[PATCH V4 2/4] KVM: x86/vPMU: Create vPMU interface for VMX and SVM

2015-06-04 Thread Wei Huang

This patch splits existing vPMU code into a common vPMU interface (pmc.c)
and Intel specific vPMU code (pmu_intel.c) using the following steps:

- Part of arechitectural vPMU code is extracted and moved to pmu_intel.c
  file. They are hooked up with the newly-defined intel_pmu_ops, which will
  be called from pmu interface;
- Create a dummy pmu_amd.c file for AMD SVM with empty functions;

All architectural vPMU functions are now called via PMU function dispatcher
(kvm_pmu_ops). This function dispatcher is initialized by calling
kvm_x86_ops->get_pmu_ops() at the beginning. Also note that Intel and AMD
modules are now generated by combinig their corresponding arch files
(vmx.c/svm.c) and pmu files (pmu_intel.c/pmu_amd.c).

Tested-by: Radim Krčmář 
Signed-off-by: Wei Huang 
---
 arch/x86/include/asm/kvm_host.h |  16 +-
 arch/x86/kvm/Makefile   |   4 +-
 arch/x86/kvm/cpuid.c|   3 +-
 arch/x86/kvm/pmu.c  | 552 +++-
 arch/x86/kvm/pmu.h  |  99 +++
 arch/x86/kvm/pmu_amd.c  |  97 +++
 arch/x86/kvm/pmu_intel.c| 360 ++
 arch/x86/kvm/svm.c  |   8 +
 arch/x86/kvm/vmx.c  |   8 +
 arch/x86/kvm/x86.c  |  25 +-
 10 files changed, 749 insertions(+), 423 deletions(-)
 create mode 100644 arch/x86/kvm/pmu.h
 create mode 100644 arch/x86/kvm/pmu_amd.c
 create mode 100644 arch/x86/kvm/pmu_intel.c

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 4f0ca20..26060b2 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -855,6 +855,8 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
   struct kvm_memory_slot *slot,
   gfn_t offset, unsigned long mask);
+
+   struct kvm_pmu_ops *(*get_pmu_ops)(void);
 };
 
 struct kvm_arch_async_pf {
@@ -865,6 +867,7 @@ struct kvm_arch_async_pf {
 };
 
 extern struct kvm_x86_ops *kvm_x86_ops;
+extern struct kvm_pmu_ops *kvm_pmu_ops;
 
 static inline void adjust_tsc_offset_guest(struct kvm_vcpu *vcpu,
   s64 adjustment)
@@ -1188,17 +1191,4 @@ extern bool kvm_find_async_pf_gfn(struct kvm_vcpu *vcpu, 
gfn_t gfn);
 void kvm_complete_insn_gp(struct kvm_vcpu *vcpu, int err);
 
 int kvm_is_in_guest(void);
-
-void kvm_pmu_init(struct kvm_vcpu *vcpu);
-void kvm_pmu_destroy(struct kvm_vcpu *vcpu);
-void kvm_pmu_reset(struct kvm_vcpu *vcpu);
-void kvm_pmu_cpuid_update(struct kvm_vcpu *vcpu);
-bool kvm_pmu_msr(struct kvm_vcpu *vcpu, u32 msr);
-int kvm_pmu_get_msr(struct kvm_vcpu *vcpu, u32 msr, u64 *data);
-int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info);
-int kvm_pmu_check_pmc(struct kvm_vcpu *vcpu, unsigned pmc);
-int kvm_pmu_read_pmc(struct kvm_vcpu *vcpu, unsigned pmc, u64 *data);
-void kvm_handle_pmu_event(struct kvm_vcpu *vcpu);
-void kvm_deliver_pmi(struct kvm_vcpu *vcpu);
-
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 16e8f96..c1c99d7 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -14,8 +14,8 @@ kvm-$(CONFIG_KVM_ASYNC_PF)+= $(KVM)/async_pf.o
 kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
   i8254.o ioapic.o irq_comm.o cpuid.o pmu.o
 kvm-$(CONFIG_KVM_DEVICE_ASSIGNMENT)+= assigned-dev.o iommu.o
-kvm-intel-y+= vmx.o
-kvm-amd-y  += svm.o
+kvm-intel-y+= vmx.o pmu_intel.o
+kvm-amd-y  += svm.o pmu_amd.o
 
 obj-$(CONFIG_KVM)  += kvm.o
 obj-$(CONFIG_KVM_INTEL)+= kvm-intel.o
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 59b69f6..d75ee44 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -22,6 +22,7 @@
 #include "lapic.h"
 #include "mmu.h"
 #include "trace.h"
+#include "pmu.h"
 
 static u32 xstate_required_size(u64 xstate_bv, bool compacted)
 {
@@ -107,7 +108,7 @@ int kvm_update_cpuid(struct kvm_vcpu *vcpu)
/* Update physical-address width */
vcpu->arch.maxphyaddr = cpuid_query_maxphyaddr(vcpu);
 
-   kvm_pmu_cpuid_update(vcpu);
+   kvm_pmu_refresh(vcpu);
return 0;
 }
 
diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index 3af3404..1d5e8d2 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -1,17 +1,16 @@
 /*
  * Kernel-based Virtual Machine -- Performance Monitoring Unit support
  *
- * Copyright 2011 Red Hat, Inc. and/or its affiliates.
+ * Copyright 2015 Red Hat, Inc. and/or its affiliates.
  *
  * Authors:
  *   Avi Kivity   
  *   Gleb Natapov 
+ *   Wei Huang
  *
  * This work is licensed under the terms of the GNU GPL, version 2.  See
  * the COPYING file in the top-level directory.
- *
  */
-
 #include 
 #include 
 #include 
@@ -19,83 +18,41 @@
 #include "x86.h"
 #include "cpuid.h"
 #include "lapic.h"
+#include "pmu.h"
+
+/* NOTE:
+ * - Ea

Re: [PATCH kernel v11 21/34] powerpc/powernv/ioda2: Add TCE invalidation for all attached groups

2015-06-04 Thread Alexey Kardashevskiy


On 06/05/2015 10:27 AM, Gavin Shan wrote:

On Fri, May 29, 2015 at 06:44:45PM +1000, Alexey Kardashevskiy wrote:

The iommu_table struct keeps a list of IOMMU groups it is used for.
At the moment there is just a single group attached but further
patches will add TCE table sharing. When sharing is enabled, TCE cache
in each PE needs to be invalidated so does the patch.

This does not change pnv_pci_ioda1_tce_invalidate() as there is no plan
to enable TCE table sharing on PHBs older than IODA2.

Signed-off-by: Alexey Kardashevskiy 


Reviewed-by: Gavin Shan 


---
Changes:
v10:
* new to the series
---
arch/powerpc/platforms/powernv/pci-ioda.c | 35 ---
1 file changed, 23 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
b/arch/powerpc/platforms/powernv/pci-ioda.c
index 3fd8b18..94fccc8 100644
--- a/arch/powerpc/platforms/powernv/pci-ioda.c
+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
@@ -24,6 +24,7 @@
#include 
#include 
#include 
+#include 

#include 
#include 
@@ -1764,23 +1765,15 @@ static inline void 
pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
__raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
}

-static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
-   unsigned long index, unsigned long npages, bool rm)
+static void pnv_pci_ioda2_tce_do_invalidate(unsigned pe_number, bool rm,
+   __be64 __iomem *invalidate, unsigned shift,
+   unsigned long index, unsigned long npages)


The better function name would be: pnv_pci_ioda2_do_tce_invalidate(), and


Ok.



it seems we needn't "bool rm" any more since "invalidate" has been assigned
with virtual/real address by caller.



We still need @rm here as different helpers are used for real and virt 
modes - __raw_rm_writeq and __raw_writeq.






Thanks,
Gavin


{
-   struct iommu_table_group_link *tgl = list_first_entry_or_null(
-   &tbl->it_group_list, struct iommu_table_group_link,
-   next);
-   struct pnv_ioda_pe *pe = container_of(tgl->table_group,
-   struct pnv_ioda_pe, table_group);
unsigned long start, end, inc;
-   __be64 __iomem *invalidate = rm ?
-   (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
-   pe->phb->ioda.tce_inval_reg;
-   const unsigned shift = tbl->it_page_shift;

/* We'll invalidate DMA address in PE scope */
start = 0x2ull << 60;
-   start |= (pe->pe_number & 0xFF);
+   start |= (pe_number & 0xFF);
end = start;

/* Figure out the start, end and step */
@@ -1798,6 +1791,24 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
iommu_table *tbl,
}
}

+static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
+   unsigned long index, unsigned long npages, bool rm)
+{
+   struct iommu_table_group_link *tgl;
+
+   list_for_each_entry_rcu(tgl, &tbl->it_group_list, next) {
+   struct pnv_ioda_pe *pe = container_of(tgl->table_group,
+   struct pnv_ioda_pe, table_group);
+   __be64 __iomem *invalidate = rm ?
+   (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
+   pe->phb->ioda.tce_inval_reg;
+
+   pnv_pci_ioda2_tce_do_invalidate(pe->pe_number, rm,
+   invalidate, tbl->it_page_shift,
+   index, npages);
+   }
+}
+
static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
long npages, unsigned long uaddr,
enum dma_data_direction direction,
--
2.4.0.rc3.8.gfb3e7d5






--
Alexey
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel v11 21/34] powerpc/powernv/ioda2: Add TCE invalidation for all attached groups

2015-06-04 Thread Gavin Shan

On Fri, May 29, 2015 at 06:44:45PM +1000, Alexey Kardashevskiy wrote:
>The iommu_table struct keeps a list of IOMMU groups it is used for.
>At the moment there is just a single group attached but further
>patches will add TCE table sharing. When sharing is enabled, TCE cache
>in each PE needs to be invalidated so does the patch.
>
>This does not change pnv_pci_ioda1_tce_invalidate() as there is no plan
>to enable TCE table sharing on PHBs older than IODA2.
>
>Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Gavin Shan 

>---
>Changes:
>v10:
>* new to the series
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 35 ---
> 1 file changed, 23 insertions(+), 12 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 3fd8b18..94fccc8 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -24,6 +24,7 @@
> #include 
> #include 
> #include 
>+#include 
>
> #include 
> #include 
>@@ -1764,23 +1765,15 @@ static inline void 
>pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
>   __raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
> }
>
>-static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
>-  unsigned long index, unsigned long npages, bool rm)
>+static void pnv_pci_ioda2_tce_do_invalidate(unsigned pe_number, bool rm,
>+  __be64 __iomem *invalidate, unsigned shift,
>+  unsigned long index, unsigned long npages)

The better function name would be: pnv_pci_ioda2_do_tce_invalidate(), and
it seems we needn't "bool rm" any more since "invalidate" has been assigned
with virtual/real address by caller.

Thanks,
Gavin

> {
>-  struct iommu_table_group_link *tgl = list_first_entry_or_null(
>-  &tbl->it_group_list, struct iommu_table_group_link,
>-  next);
>-  struct pnv_ioda_pe *pe = container_of(tgl->table_group,
>-  struct pnv_ioda_pe, table_group);
>   unsigned long start, end, inc;
>-  __be64 __iomem *invalidate = rm ?
>-  (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
>-  pe->phb->ioda.tce_inval_reg;
>-  const unsigned shift = tbl->it_page_shift;
>
>   /* We'll invalidate DMA address in PE scope */
>   start = 0x2ull << 60;
>-  start |= (pe->pe_number & 0xFF);
>+  start |= (pe_number & 0xFF);
>   end = start;
>
>   /* Figure out the start, end and step */
>@@ -1798,6 +1791,24 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
>iommu_table *tbl,
>   }
> }
>
>+static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
>+  unsigned long index, unsigned long npages, bool rm)
>+{
>+  struct iommu_table_group_link *tgl;
>+
>+  list_for_each_entry_rcu(tgl, &tbl->it_group_list, next) {
>+  struct pnv_ioda_pe *pe = container_of(tgl->table_group,
>+  struct pnv_ioda_pe, table_group);
>+  __be64 __iomem *invalidate = rm ?
>+  (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
>+  pe->phb->ioda.tce_inval_reg;
>+
>+  pnv_pci_ioda2_tce_do_invalidate(pe->pe_number, rm,
>+  invalidate, tbl->it_page_shift,
>+  index, npages);
>+  }
>+}
>+
> static int pnv_ioda2_tce_build(struct iommu_table *tbl, long index,
>   long npages, unsigned long uaddr,
>   enum dma_data_direction direction,
>-- 
>2.4.0.rc3.8.gfb3e7d5
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel v11 20/34] powerpc/powernv/ioda2: Move TCE kill register address to PE

2015-06-04 Thread Gavin Shan

On Fri, May 29, 2015 at 06:44:44PM +1000, Alexey Kardashevskiy wrote:
>At the moment the DMA setup code looks for the "ibm,opal-tce-kill"
>property which contains the TCE kill register address. Writing to
>this register invalidates TCE cache on IODA/IODA2 hub.
>
>This moves the register address from iommu_table to pnv_pnb as this
>register belongs to PHB and invalidates TCE cache for all tables of
>all attached PEs.
>
>This moves the property reading/remapping code to a helper which is
>called when DMA is being configured for PE and which does DMA setup
>for both IODA1 and IODA2.
>
>This adds a new pnv_pci_ioda2_tce_invalidate_entire() helper which
>invalidates cache for the entire table. It should be called after
>every call to opal_pci_map_pe_dma_window(). It was not required before
>because there was just a single TCE table and 64bit DMA was handled via
>bypass window (which has no table so no cache was used) but this is going
>to change with Dynamic DMA windows (DDW).
>
>Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Gavin Shan 

Thanks,
Gavin

>---
>Changes:
>v11:
>* s/pnv_pci_ioda2_tvt_invalidate/pnv_pci_ioda2_tce_invalidate_entire/g
>(cannot think of better-and-shorter name)
>* moved tce_inval_reg_phys/tce_inval_reg to pnv_phb
>
>v10:
>* fixed error from checkpatch.pl
>* removed comment at "ibm,opal-tce-kill" parsing as irrelevant
>* s/addr/val/ in pnv_pci_ioda2_tvt_invalidate() as it was not a kernel address
>
>v9:
>* new in the series
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 66 ++-
> arch/powerpc/platforms/powernv/pci.h  |  7 +++-
> 2 files changed, 44 insertions(+), 29 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c 
>b/arch/powerpc/platforms/powernv/pci-ioda.c
>index 1d0bb5b..3fd8b18 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1679,8 +1679,8 @@ static void pnv_pci_ioda1_tce_invalidate(struct 
>iommu_table *tbl,
>   struct pnv_ioda_pe *pe = container_of(tgl->table_group,
>   struct pnv_ioda_pe, table_group);
>   __be64 __iomem *invalidate = rm ?
>-  (__be64 __iomem *)pe->tce_inval_reg_phys :
>-  (__be64 __iomem *)tbl->it_index;
>+  (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
>+  pe->phb->ioda.tce_inval_reg;
>   unsigned long start, end, inc;
>   const unsigned shift = tbl->it_page_shift;
>
>@@ -1751,6 +1751,19 @@ static struct iommu_table_ops pnv_ioda1_iommu_ops = {
>   .get = pnv_tce_get,
> };
>
>+static inline void pnv_pci_ioda2_tce_invalidate_entire(struct pnv_ioda_pe *pe)
>+{
>+  /* 01xb - invalidate TCEs that match the specified PE# */
>+  unsigned long val = (0x4ull << 60) | (pe->pe_number & 0xFF);
>+  struct pnv_phb *phb = pe->phb;
>+
>+  if (!phb->ioda.tce_inval_reg)
>+  return;
>+
>+  mb(); /* Ensure above stores are visible */
>+  __raw_writeq(cpu_to_be64(val), phb->ioda.tce_inval_reg);
>+}
>+
> static void pnv_pci_ioda2_tce_invalidate(struct iommu_table *tbl,
>   unsigned long index, unsigned long npages, bool rm)
> {
>@@ -1761,8 +1774,8 @@ static void pnv_pci_ioda2_tce_invalidate(struct 
>iommu_table *tbl,
>   struct pnv_ioda_pe, table_group);
>   unsigned long start, end, inc;
>   __be64 __iomem *invalidate = rm ?
>-  (__be64 __iomem *)pe->tce_inval_reg_phys :
>-  (__be64 __iomem *)tbl->it_index;
>+  (__be64 __iomem *)pe->phb->ioda.tce_inval_reg_phys :
>+  pe->phb->ioda.tce_inval_reg;
>   const unsigned shift = tbl->it_page_shift;
>
>   /* We'll invalidate DMA address in PE scope */
>@@ -1820,7 +1833,6 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
>*phb,
> {
>
>   struct page *tce_mem = NULL;
>-  const __be64 *swinvp;
>   struct iommu_table *tbl;
>   unsigned int i;
>   int64_t rc;
>@@ -1877,20 +1889,11 @@ static void pnv_pci_ioda_setup_dma_pe(struct pnv_phb 
>*phb,
> base << 28, IOMMU_PAGE_SHIFT_4K);
>
>   /* OPAL variant of P7IOC SW invalidated TCEs */
>-  swinvp = of_get_property(phb->hose->dn, "ibm,opal-tce-kill", NULL);
>-  if (swinvp) {
>-  /* We need a couple more fields -- an address and a data
>-   * to or.  Since the bus is only printed out on table free
>-   * errors, and on the first pass the data will be a relative
>-   * bus number, print that out instead.
>-   */
>-  pe->tce_inval_reg_phys = be64_to_cpup(swinvp);
>-  tbl->it_index = (unsigned long)ioremap(pe->tce_inval_reg_phys,
>-  8);
>+  if (phb->ioda.tce_inval_reg)
>   tbl->it_type |= (TCE_PCI_SWINV_CREATE |
>TCE_PCI_SWINV_FREE   |
>TCE_PCI_SWINV_PAIR);
>-  }
>+
>   tbl->it_ops = &pnv_iod

Re: [PATCH kernel v11 16/34] powerpc/spapr: vfio: Replace iommu_table with iommu_table_group

2015-06-04 Thread Gavin Shan

On Fri, May 29, 2015 at 06:44:40PM +1000, Alexey Kardashevskiy wrote:
>Modern IBM POWERPC systems support multiple (currently two) TCE tables
>per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
>for TCE tables. Right now just one table is supported.
>
>This defines iommu_table_group struct which stores pointers to
>iommu_group and iommu_table(s). This replaces iommu_table with
>iommu_table_group where iommu_table was used to identify a group:
>- iommu_register_group();
>- iommudata of generic iommu_group;
>
>This removes @data from iommu_table as it_table_group provides
>same access to pnv_ioda_pe.
>
>For IODA, instead of embedding iommu_table, the new iommu_table_group
>keeps pointers to those. The iommu_table structs are allocated
>dynamically.
>
>For P5IOC2, both iommu_table_group and iommu_table are embedded into
>PE struct. As there is no EEH and SRIOV support for P5IOC2,
>iommu_free_table() should not be called on iommu_table struct pointers
>so we can keep it embedded in pnv_phb::p5ioc2.
>
>For pSeries, this replaces multiple calls of kzalloc_node() with a new
>iommu_pseries_alloc_group() helper and stores the table group struct
>pointer into the pci_dn struct. For release, a iommu_table_free_group()
>helper is added.
>
>This moves iommu_table struct allocation from SR-IOV code to
>the generic DMA initialization code in pnv_pci_ioda_setup_dma_pe and
>pnv_pci_ioda2_setup_dma_pe as this is where DMA is actually initialized.
>This change is here because those lines had to be changed anyway.
>
>This should cause no behavioural change.
>
>Signed-off-by: Alexey Kardashevskiy 
>[aw: for the vfio related changes]
>Acked-by: Alex Williamson 

Reviewed-by: Gavin Shan 

>---
>Changes:
>v11:
>* iommu_table_group moved outside #ifdef CONFIG_IOMMU_API as iommu_table
>is dynamically allocated and it needs a pointer to PE and
>iommu_table_group is this pointer
>
>v10:
>* new to the series, separated from
>"powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group"
>* iommu_table is not embedded into iommu_table_group but allocated
>dynamically in most cases
>* iommu_table allocation is moved to a single place for IODA2's
>pnv_pci_ioda_setup_dma_pe where it belongs to
>* added list of groups into iommu_table; most of the code just looks at
>the first item to keep the patch simpler
>---
> arch/powerpc/include/asm/iommu.h|  19 ++---
> arch/powerpc/include/asm/pci-bridge.h   |   2 +-
> arch/powerpc/kernel/iommu.c |  17 ++---
> arch/powerpc/platforms/powernv/pci-ioda.c   |  55 +++---
> arch/powerpc/platforms/powernv/pci-p5ioc2.c |  18 +++--
> arch/powerpc/platforms/powernv/pci.h|   3 +-
> arch/powerpc/platforms/pseries/iommu.c  | 107 +++-
> drivers/vfio/vfio_iommu_spapr_tce.c |  23 +++---
> 8 files changed, 152 insertions(+), 92 deletions(-)
>
>diff --git a/arch/powerpc/include/asm/iommu.h 
>b/arch/powerpc/include/asm/iommu.h
>index e2a45c3..5a7267f 100644
>--- a/arch/powerpc/include/asm/iommu.h
>+++ b/arch/powerpc/include/asm/iommu.h
>@@ -91,14 +91,9 @@ struct iommu_table {
>   struct iommu_pool pools[IOMMU_NR_POOLS];
>   unsigned long *it_map;   /* A simple allocation bitmap for now */
>   unsigned long  it_page_shift;/* table iommu page size */
>-#ifdef CONFIG_IOMMU_API
>-  struct iommu_group *it_group;
>-#endif
>+  struct iommu_table_group *it_table_group;
>   struct iommu_table_ops *it_ops;
>   void (*set_bypass)(struct iommu_table *tbl, bool enable);
>-#ifdef CONFIG_PPC_POWERNV
>-  void   *data;
>-#endif
> };
>
> /* Pure 2^n version of get_order */
>@@ -129,14 +124,22 @@ extern void iommu_free_table(struct iommu_table *tbl, 
>const char *node_name);
>  */
> extern struct iommu_table *iommu_init_table(struct iommu_table * tbl,
>   int nid);
>+#define IOMMU_TABLE_GROUP_MAX_TABLES  1
>+
>+struct iommu_table_group {
>+  struct iommu_group *group;
>+  struct iommu_table *tables[IOMMU_TABLE_GROUP_MAX_TABLES];
>+};
>+

Number of TCE tables supported in group would be worthy to be
changed dynamically in long run, but not for now. P7IOC has one
table per group while PHB3 has two tables per group.

Thanks,
Gavin

> #ifdef CONFIG_IOMMU_API
>-extern void iommu_register_group(struct iommu_table *tbl,
>+
>+extern void iommu_register_group(struct iommu_table_group *table_group,
>int pci_domain_number, unsigned long pe_num);
> extern int iommu_add_device(struct device *dev);
> extern void iommu_del_device(struct device *dev);
> extern int __init tce_iommu_bus_notifier_init(void);
> #else
>-static inline void iommu_register_group(struct iommu_table *tbl,
>+static inline void iommu_register_group(struct iommu_table_group *table_group,
>   int pci_domain_number,
>   unsigned long pe_num)
> {
>diff --git a/arch/powerpc/include/as

Re: [PATCH kernel v11 05/34] powerpc/iommu: Always release iommu_table in iommu_free_table()

2015-06-04 Thread Gavin Shan

On Fri, May 29, 2015 at 06:44:29PM +1000, Alexey Kardashevskiy wrote:
>At the moment iommu_free_table() only releases memory if
>the table was initialized for the platform code use, i.e. it had
>it_map initialized (which purpose is to track DMA memory space use).
>
>With dynamic DMA windows, we will need to be able to release
>iommu_table even if it was used for VFIO in which case it_map is NULL
>so does the patch.
>
>Signed-off-by: Alexey Kardashevskiy 

Reviewed-by: Gavin Shan 

Thanks,
Gavin

>---
>Changes:
>v11:
>* fixed parameter checks
>---
> arch/powerpc/kernel/iommu.c | 8 +---
> 1 file changed, 5 insertions(+), 3 deletions(-)
>
>diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
>index 3d47eb3..73eb39a 100644
>--- a/arch/powerpc/kernel/iommu.c
>+++ b/arch/powerpc/kernel/iommu.c
>@@ -713,9 +713,11 @@ void iommu_free_table(struct iommu_table *tbl, const char 
>*node_name)
>   unsigned long bitmap_sz;
>   unsigned int order;
>
>-  if (!tbl || !tbl->it_map) {
>-  printk(KERN_ERR "%s: expected TCE map for %s\n", __func__,
>-  node_name);
>+  if (!tbl)
>+  return;
>+
>+  if (!tbl->it_map) {
>+  kfree(tbl);
>   return;
>   }
>
>-- 
>2.4.0.rc3.8.gfb3e7d5
>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel v11 33/34] vfio: powerpc/spapr: Register memory and define IOMMU v2

2015-06-04 Thread Gavin Shan

On Fri, May 29, 2015 at 06:44:57PM +1000, Alexey Kardashevskiy wrote:
>The existing implementation accounts the whole DMA window in
>the locked_vm counter. This is going to be worse with multiple
>containers and huge DMA windows. Also, real-time accounting would requite
>additional tracking of accounted pages due to the page size difference -
>IOMMU uses 4K pages and system uses 4K or 64K pages.
>
>Another issue is that actual pages pinning/unpinning happens on every
>DMA map/unmap request. This does not affect the performance much now as
>we spend way too much time now on switching context between
>guest/userspace/host but this will start to matter when we add in-kernel
>DMA map/unmap acceleration.
>
>This introduces a new IOMMU type for SPAPR - VFIO_SPAPR_TCE_v2_IOMMU.
>New IOMMU deprecates VFIO_IOMMU_ENABLE/VFIO_IOMMU_DISABLE and introduces
>2 new ioctls to register/unregister DMA memory -
>VFIO_IOMMU_SPAPR_REGISTER_MEMORY and VFIO_IOMMU_SPAPR_UNREGISTER_MEMORY -
>which receive user space address and size of a memory region which
>needs to be pinned/unpinned and counted in locked_vm.
>New IOMMU splits physical pages pinning and TCE table update
>into 2 different operations. It requires:
>1) guest pages to be registered first
>2) consequent map/unmap requests to work only with pre-registered memory.
>For the default single window case this means that the entire guest
>(instead of 2GB) needs to be pinned before using VFIO.
>When a huge DMA window is added, no additional pinning will be
>required, otherwise it would be guest RAM + 2GB.
>
>The new memory registration ioctls are not supported by
>VFIO_SPAPR_TCE_IOMMU. Dynamic DMA window and in-kernel acceleration
>will require memory to be preregistered in order to work.
>
>The accounting is done per the user process.
>
>This advertises v2 SPAPR TCE IOMMU and restricts what the userspace
>can do with v1 or v2 IOMMUs.
>
>In order to support memory pre-registration, we need a way to track
>the use of every registered memory region and only allow unregistration
>if a region is not in use anymore. So we need a way to tell from what
>region the just cleared TCE was from.
>
>This adds a userspace view of the TCE table into iommu_table struct.
>It contains userspace address, one per TCE entry. The table is only
>allocated when the ownership over an IOMMU group is taken which means
>it is only used from outside of the powernv code (such as VFIO).
>
>Signed-off-by: Alexey Kardashevskiy 
>[aw: for the vfio related changes]
>Acked-by: Alex Williamson 
>---
>Changes:
>v11:
>* mm_iommu_put() does not return a code so this does not check it
>* moved "v2" in tce_container to pack the struct
>
>v10:
>* moved it_userspace allocation to vfio_iommu_spapr_tce as it VFIO
>specific thing
>* squashed "powerpc/iommu: Add userspace view of TCE table" into this as
>it is
>a part of IOMMU v2
>* s/tce_iommu_use_page_v2/tce_iommu_prereg_ua_to_hpa/
>* fixed some function names to have "tce_iommu_" in the beginning rather
>just "tce_"
>* as mm_iommu_mapped_inc() can now fail, check for the return code
>
>v9:
>* s/tce_get_hva_cached/tce_iommu_use_page_v2/
>
>v7:
>* now memory is registered per mm (i.e. process)
>* moved memory registration code to powerpc/mmu
>* merged "vfio: powerpc/spapr: Define v2 IOMMU" into this
>* limited new ioctls to v2 IOMMU
>* updated doc
>* unsupported ioclts return -ENOTTY instead of -EPERM
>
>v6:
>* tce_get_hva_cached() returns hva via a pointer
>
>v4:
>* updated docs
>* s/kzmalloc/vzalloc/
>* in tce_pin_pages()/tce_unpin_pages() removed @vaddr, @size and
>replaced offset with index
>* renamed vfio_iommu_type_register_memory to vfio_iommu_spapr_register_memory
>and removed duplicating vfio_iommu_spapr_register_memory
>---
> Documentation/vfio.txt  |  31 ++-
> arch/powerpc/include/asm/iommu.h|   6 +
> drivers/vfio/vfio_iommu_spapr_tce.c | 512 ++--
> include/uapi/linux/vfio.h   |  27 ++
> 4 files changed, 487 insertions(+), 89 deletions(-)
>
>diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
>index 96978ec..7dcf2b5 100644
>--- a/Documentation/vfio.txt
>+++ b/Documentation/vfio.txt
>@@ -289,10 +289,12 @@ PPC64 sPAPR implementation note
>
> This implementation has some specifics:
>
>-1) Only one IOMMU group per container is supported as an IOMMU group
>-represents the minimal entity which isolation can be guaranteed for and
>-groups are allocated statically, one per a Partitionable Endpoint (PE)
>+1) On older systems (POWER7 with P5IOC2/IODA1) only one IOMMU group per
>+container is supported as an IOMMU table is allocated at the boot time,
>+one table per a IOMMU group which is a Partitionable Endpoint (PE)
> (PE is often a PCI domain but not always).
>+Newer systems (POWER8 with IODA2) have improved hardware design which allows
>+to remove this limitation and have multiple IOMMU groups per a VFIO container.
>
> 2) The hardware supports so called DMA windows - the PCI address range
> within which

Re: [PATCH v3 3/4] KVM: x86: Add EOI exit bitmap inference

2015-06-04 Thread Steve Rutherford

On Wed, Jun 03, 2015 at 11:16:16AM +0200, Paolo Bonzini wrote:
> 
> 
> On 03/06/2015 01:51, Steve Rutherford wrote:
> > +static inline void kvm_arch_irq_routing_update(struct kvm *kvm)
> > +{
> > +}
> 
> Please add the static inline to all arches instead of putting it in
> #ifndef __KVM_HAVE_IOAPIC.  It's not related to the existence of an ioapic.
> 
> > 
> > +void kvm_arch_irq_routing_update(struct kvm *kvm)
> > +{
> > +   struct kvm_ioapic *ioapic = kvm->arch.vioapic;
> > +
> > +   if (ioapic)
> > +   return;
> > +   if (!lapic_in_kernel(kvm))
> > +   return;
> > +   kvm_make_scan_ioapic_request(kvm);
> > +}
> > +
> 
> It's weird to have a function in ioapic.c that only does something if
> you _do not_ have an ioapic. :)
> 
> > +
> > +void kvm_scan_ioapic_routes(struct kvm_vcpu *vcpu, u64 *eoi_exit_bitmap)
> 
> This must stay in arch/x86/kvm/.  I'd put both of these in irq_comm.c.
> 
> Then you do not need kvm_arch_nr_userspace_ioapic_pins anymore.
Yes! This is way cleaner. I'll make these changes.

Steve
> 
> Paolo
> 
> > +{
> > +   struct kvm *kvm = vcpu->kvm;
> > +   struct kvm_kernel_irq_routing_entry *entry;
> > +   struct kvm_irq_routing_table *table;
> > +   u32 i, nr_ioapic_pins;
> > +   int idx;
> > +
> > +   /* kvm->irq_routing must be read after clearing
> > +* KVM_SCAN_IOAPIC. */
> > +   smp_mb();
> > +   idx = srcu_read_lock(&kvm->irq_srcu);
> > +   table = kvm->irq_routing;
> > +   nr_ioapic_pins = min_t(u32, table->nr_rt_entries,
> > + kvm_arch_nr_userspace_ioapic_pins(kvm));
> > +   for (i = 0; i < nr_ioapic_pins; ++i) {
> > +   hlist_for_each_entry(entry, &table->map[i], link) {
> > +   u32 dest_id, dest_mode;
> > +
> > +   if (entry->type != KVM_IRQ_ROUTING_MSI)
> > +   continue;
> > +   dest_id = (entry->msi.address_lo >> 12) & 0xff;
> > +   dest_mode = (entry->msi.address_lo >> 2) & 0x1;
> > +   if (kvm_apic_match_dest(vcpu, NULL, 0, dest_id,
> > +   dest_mode)) {
> > +   u32 vector = entry->msi.data & 0xff;
> > +
> > +   __set_bit(vector,
> > + (unsigned long *) eoi_exit_bitmap);
> > +   }
> > +   }
> > +   }
> > +   srcu_read_unlock(&kvm->irq_srcu, idx);
> > +}
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v3 1/4] KVM: x86: Split the APIC from the rest of IRQCHIP.

2015-06-04 Thread Steve Rutherford

On Wed, Jun 03, 2015 at 10:54:41AM +0200, Paolo Bonzini wrote:
> 
> 
> On 03/06/2015 01:51, Steve Rutherford wrote:
> > First patch in a series which enables the relocation of the
> > PIC/IOAPIC to userspace.
> > 
> > Adds capability KVM_CAP_SPLIT_IRQCHIP;
> > 
> > KVM_CAP_SPLIT_IRQCHIP enables the construction of LAPICs without the
> > rest of the irqchip.
> 
> The documentation is not updated.
Ack.
> 
> Changing other arches is definitely a no-no, unfortunately.  But there
> are so many s/irqchip_in_kernel/lapic_in_kernel/ changes here, that I
> wonder if you should just keep irqchip_in_kernel true in the "split
> irqchip" case.  You are already testing irqchip_split in a few cases,
> and you can add ioapic_in_kernel whenever you need to test
> "lapic_in_kernel && !irqchip_split" at the same time.

>From the perspective of avoiding impacting other architectures, this is a
good idea, but the naming seems strange in the x86 case. Having
irqchip_in_kernel be "true" when the ioapic/pic are in userspace seems
strange. Admittedly, the irqchip isn't a "real" concept on x86, so
inventing a new meaning is fine.

Despite my hesitation, I'll change the naming around.

Steve

> 
> Paolo
> 
> > Compile tested for x86.
> > 
> > Signed-off-by: Steve Rutherford 
> > Suggested-by: Andrew Honig 
> > ---
> >  Documentation/virtual/kvm/api.txt | 15 
> >  arch/powerpc/kvm/irq.h|  5 
> >  arch/s390/kvm/irq.h   |  4 
> >  arch/x86/include/asm/kvm_host.h   |  2 ++
> >  arch/x86/kvm/assigned-dev.c   |  4 ++--
> >  arch/x86/kvm/irq.c|  6 ++---
> >  arch/x86/kvm/irq.h| 11 +
> >  arch/x86/kvm/irq_comm.c   |  7 ++
> >  arch/x86/kvm/lapic.c  | 13 +++
> >  arch/x86/kvm/mmu.c|  2 +-
> >  arch/x86/kvm/svm.c|  4 ++--
> >  arch/x86/kvm/vmx.c| 12 +-
> >  arch/x86/kvm/x86.c| 49 
> > +++
> >  include/kvm/arm_vgic.h|  1 +
> >  include/linux/kvm_host.h  |  1 +
> >  include/uapi/linux/kvm.h  |  1 +
> >  virt/kvm/irqchip.c|  2 +-
> >  17 files changed, 104 insertions(+), 35 deletions(-)
> > 
> > diff --git a/Documentation/virtual/kvm/api.txt 
> > b/Documentation/virtual/kvm/api.txt
> > index 6955444..9a43d42 100644
> > --- a/Documentation/virtual/kvm/api.txt
> > +++ b/Documentation/virtual/kvm/api.txt
> > @@ -2979,6 +2979,7 @@ len must be a multiple of sizeof(struct 
> > kvm_s390_irq). It must be > 0
> >  and it must not exceed (max_vcpus + 32) * sizeof(struct kvm_s390_irq),
> >  which is the maximum number of possibly pending cpu-local interrupts.
> >  
> > +
> >  5. The kvm_run structure
> >  
> >  
> > @@ -3575,6 +3576,20 @@ struct {
> >  
> >  KVM handlers should exit to userspace with rc = -EREMOTE.
> >  
> > +7.5 KVM_SPLIT_IRQCHIP
> > +
> > +Capability: KVM_CAP_SPLIT_IRQCHIP
> > +Architectures: x86
> > +Type:  VM ioctl
> > +Parameters: None
> > +Returns: 0 on success, -1 on error
> > +
> > +Create a local apic for each processor in the kernel.  This differs from
> > +KVM_CREATE_IRQCHIP in that it only creates the local apic; it creates 
> > neither
> > +the ioapic nor the pic in the kernel. Also, enables in kernel routing of
> > +interrupt requests. Fails if VCPU has already been created, or if the 
> > irqchip is
> > +already in the kernel.
> > +
> >  
> >  8. Other capabilities.
> >  --
> > diff --git a/arch/powerpc/kvm/irq.h b/arch/powerpc/kvm/irq.h
> > index 5a9a10b..5e6fa06 100644
> > --- a/arch/powerpc/kvm/irq.h
> > +++ b/arch/powerpc/kvm/irq.h
> > @@ -17,4 +17,9 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
> > return ret;
> >  }
> >  
> > +static inline int lapic_in_kernel(struct kvm *kvm)
> > +{
> > +   return irqchip_in_kernel(kvm);
> > +}
> > +
> >  #endif
> > diff --git a/arch/s390/kvm/irq.h b/arch/s390/kvm/irq.h
> > index d98e415..db876c3 100644
> > --- a/arch/s390/kvm/irq.h
> > +++ b/arch/s390/kvm/irq.h
> > @@ -19,4 +19,8 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
> > return 1;
> >  }
> >  
> > +static inline int lapic_in_kernel(struct kvm *kvm)
> > +{
> > +   return irqchip_in_kernel(kvm);
> > +}
> >  #endif
> > diff --git a/arch/x86/include/asm/kvm_host.h 
> > b/arch/x86/include/asm/kvm_host.h
> > index 7276107..af3225a 100644
> > --- a/arch/x86/include/asm/kvm_host.h
> > +++ b/arch/x86/include/asm/kvm_host.h
> > @@ -639,6 +639,8 @@ struct kvm_arch {
> > bool boot_vcpu_runs_old_kvmclock;
> >  
> > u64 disabled_quirks;
> > +
> > +   bool irqchip_split;
> >  };
> >  
> >  struct kvm_vm_stat {
> > diff --git a/arch/x86/kvm/assigned-dev.c b/arch/x86/kvm/assigned-dev.c
> > index d090ecf..1237e92 100644
> > --- a/arch/x86/kvm/assigned-dev.c
> > +++ b/arch/x86/kvm/assigned-dev.c
> > @@ -291,7 +291,7 @@ static int kvm_deassign_irq(struct kvm *kvm,
> >  {
> > unsigned long guest_i

Re: [PATCH v3 4/4] KVM: x86: Add support for local interrupt requests from userspace

2015-06-04 Thread Steve Rutherford

On Wed, Jun 03, 2015 at 11:38:21AM +0200, Paolo Bonzini wrote:

> However, why is the roundtrip to userspace necessary?  Could you pass
> the extint index directly as an argument to KVM_INTERRUPT?  It's
> backwards-compatible, because KVM_INTERRUPT so far could not be used
> together with an in-kernel LAPIC.  If you could do that, you could also
> avoid the new userspace_extint_available field.
This is possible, and definitely simpler, but not accurate to the spec.
In general, the PIC fires an INT, which leads to the CPU responding with
and INTA, and fetching the interrupt vector. It might not be strictly
necessary for this handshake to occur, but it is how the hardware did it
originally. 

In certain cases, having the interface modelled after the hardware is 
convenient. For example, devices can send external interrupt MSIs,
which require an Interrupt Ack to fetch the vector. They're a bit weird,
and I have absolutely no idea why someone would want these, but they are
a definitely a thing.

Looking back at KVM though, it doesn't look like KVM even supports these,
so this may not be a real issue. Eliding the roundtrip might be acceptable.
It's certainly simpler.

> Userspace can figure out who's the BSP.  The rendez-vous between the
> irqchip and the BSP's VCPU thread is still needed, but it can be done
> entirely in userspace.
Good point. I'll push this up into userspace. 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvmtool: don't use PCI config space IRQ line field

2015-06-04 Thread Andre Przywara

In PCI config space there is an interrupt line field (offset 0x3f),
which is used to initially communicate the IRQ line number from
firmware to the OS. _Hardware_ should never use this information,
as the OS is free to write any information in there.
But kvmtool uses this number when it triggers IRQs in the guest,
which fails starting with Linux 3.19-rc1, where the PCI layer starts
writing the virtual IRQ number in there.

Fix that by storing the IRQ number in a separate field in
struct virtio_pci, which is independent from the PCI config space
and cannot be influenced by the guest.
This fixes ARM/ARM64 guests using PCI with newer kernels.

Signed-off-by: Andre Przywara 
---
 include/kvm/virtio-pci.h | 8 
 virtio/pci.c | 9 ++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/include/kvm/virtio-pci.h b/include/kvm/virtio-pci.h
index c795ce7..b70cadd 100644
--- a/include/kvm/virtio-pci.h
+++ b/include/kvm/virtio-pci.h
@@ -30,6 +30,14 @@ struct virtio_pci {
u8  isr;
u32 features;
 
+   /*
+* We cannot rely on the INTERRUPT_LINE byte in the config space once
+* we have run guest code, as the OS is allowed to use that field
+* as a scratch pad to communicate between driver and PCI layer.
+* So store our legacy interrupt line number in here for internal use.
+*/
+   u8  legacy_irq_line;
+
/* MSI-X */
u16 config_vector;
u32 config_gsi;
diff --git a/virtio/pci.c b/virtio/pci.c
index 7556239..e17e5a9 100644
--- a/virtio/pci.c
+++ b/virtio/pci.c
@@ -141,7 +141,7 @@ static bool virtio_pci__io_in(struct ioport *ioport, struct 
kvm_cpu *vcpu, u16 p
break;
case VIRTIO_PCI_ISR:
ioport__write8(data, vpci->isr);
-   kvm__irq_line(kvm, vpci->pci_hdr.irq_line, VIRTIO_IRQ_LOW);
+   kvm__irq_line(kvm, vpci->legacy_irq_line, VIRTIO_IRQ_LOW);
vpci->isr = VIRTIO_IRQ_LOW;
break;
default:
@@ -299,7 +299,7 @@ int virtio_pci__signal_vq(struct kvm *kvm, struct 
virtio_device *vdev, u32 vq)
kvm__irq_trigger(kvm, vpci->gsis[vq]);
} else {
vpci->isr = VIRTIO_IRQ_HIGH;
-   kvm__irq_trigger(kvm, vpci->pci_hdr.irq_line);
+   kvm__irq_trigger(kvm, vpci->legacy_irq_line);
}
return 0;
 }
@@ -323,7 +323,7 @@ int virtio_pci__signal_config(struct kvm *kvm, struct 
virtio_device *vdev)
kvm__irq_trigger(kvm, vpci->config_gsi);
} else {
vpci->isr = VIRTIO_PCI_ISR_CONFIG;
-   kvm__irq_trigger(kvm, vpci->pci_hdr.irq_line);
+   kvm__irq_trigger(kvm, vpci->legacy_irq_line);
}
 
return 0;
@@ -422,6 +422,9 @@ int virtio_pci__init(struct kvm *kvm, void *dev, struct 
virtio_device *vdev,
if (r < 0)
goto free_msix_mmio;
 
+   /* save the IRQ that device__register() has allocated */
+   vpci->legacy_irq_line = vpci->pci_hdr.irq_line;
+
return 0;
 
 free_msix_mmio:
-- 
2.3.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: x86: default legacy PCI device assignment support to "n"

2015-06-04 Thread Bandan Das

Paolo Bonzini  writes:

> On 04/06/2015 16:31, Bandan Das wrote:
>> > VFIO has proved itself a much better option than KVM's built-in
>> > device assignment.  It is mature, provides better isolation because
>> > it enforces ACS, and even the userspace code is being tested on
>> > a wider variety of hardware these days than the legacy support.
>> >
>> > Disable legacy device assignment by default.
>>
>> Shouldn't we mark it as Deprecated then ?
>
> Yes, good idea!  You mean just " (DEPRECATED)" after the string, right?

Yeah, that is what I meant. The "Help" message does mention VFIO as the
newer option. Sometimes, seeing the word "DEPRECATED" makes one rethink if they
really need to enable it.

> Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: x86: default legacy PCI device assignment support to "n"

2015-06-04 Thread Alex Williamson

On Thu, 2015-06-04 at 16:33 +0200, Paolo Bonzini wrote:
> 
> On 04/06/2015 16:31, Bandan Das wrote:
> > > VFIO has proved itself a much better option than KVM's built-in
> > > device assignment.  It is mature, provides better isolation because
> > > it enforces ACS, and even the userspace code is being tested on
> > > a wider variety of hardware these days than the legacy support.
> > >
> > > Disable legacy device assignment by default.
> >
> > Shouldn't we mark it as Deprecated then ?
> 
> Yes, good idea!  You mean just " (DEPRECATED)" after the string, right?

The ioctls should probably also be listed as deprecated in
Documentation/virtual/kvm/api.txt.  There's also
Documentation/ABI/obsolete if we intend to remove the code eventually.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 03/12] KVM: arm64: guest debug, define API headers

2015-06-04 Thread Andrew Jones

On Thu, Jun 04, 2015 at 02:49:17PM +0100, Alex Bennée wrote:
> 
> Christoffer Dall  writes:
> 
> > On Fri, May 29, 2015 at 10:30:19AM +0100, Alex Bennée wrote:
> >> This commit defines the API headers for guest debugging. There are two
> >> architecture specific debug structures:
> >> 
> >>   - kvm_guest_debug_arch, allows us to pass in HW debug registers
> >>   - kvm_debug_exit_arch, signals exception and possible faulting address
> >> 
> >> The type of debugging being used is controlled by the architecture
> >> specific control bits of the kvm_guest_debug->control flags in the ioctl
> >> structure.
> >> 
> >> Signed-off-by: Alex Bennée 
> >> Reviewed-by: David Hildenbrand 
> >> Reviewed-by: Andrew Jones 
> >> Acked-by: Christoffer Dall 
> >
> > I can re-confirm my ack despite the changes in v4, but this really is
> > borderline to keep exiting r-b and a-b tags around from previous patches
> > I would think, but ok.
> 
> I was wondering how much a patch has to change for the r-b tags to
> become invalid. The meat of the API hasn't changed much though.
> 
> Drew/David,
> 
> Are you still happy with this patch?

I'm fine with it. Sorry I haven't had time to look over the rest of the
series yet this round.

drew

> 
> >
> > -Christoffer
> >
> >> 
> >> ---
> >> v2
> >>- expose hsr and pc directly to user-space
> >> v3
> >>- s/control/controlled/ in commit message
> >>- add v8 to ARM ARM comment (ARM Architecture Reference Manual)
> >>- add rb tag
> >>- rm pc, add far
> >>- re-word comments on alignment
> >>- rename KVM_ARM_NDBG_REGS -> KVM_ARM_MAX_DBG_REGS
> >> v4
> >>- now uses common HW/SW BP define
> >>- add a-b-tag
> >>- use u32 for control regs
> >> v5
> >>- revert to have arch specific KVM_GUESTDBG_USE_SW/HW_BP
> >>- rm stale comments dbgctrl was stored as u64
> >> ---
> >>  arch/arm64/include/uapi/asm/kvm.h | 20 
> >>  1 file changed, 20 insertions(+)
> >> 
> >> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> >> b/arch/arm64/include/uapi/asm/kvm.h
> >> index d268320..43758e7 100644
> >> --- a/arch/arm64/include/uapi/asm/kvm.h
> >> +++ b/arch/arm64/include/uapi/asm/kvm.h
> >> @@ -100,12 +100,32 @@ struct kvm_sregs {
> >>  struct kvm_fpu {
> >>  };
> >>  
> >> +/*
> >> + * See v8 ARM ARM D7.3: Debug Registers
> >> + *
> >> + * The architectural limit is 16 debug registers of each type although
> >> + * in practice there are usually less (see ID_AA64DFR0_EL1).
> >> + */
> >> +#define KVM_ARM_MAX_DBG_REGS 16
> >>  struct kvm_guest_debug_arch {
> >> +  __u32 dbg_bcr[KVM_ARM_MAX_DBG_REGS];
> >> +  __u64 dbg_bvr[KVM_ARM_MAX_DBG_REGS];
> >> +  __u32 dbg_wcr[KVM_ARM_MAX_DBG_REGS];
> >> +  __u64 dbg_wvr[KVM_ARM_MAX_DBG_REGS];
> >>  };
> >>  
> >>  struct kvm_debug_exit_arch {
> >> +  __u32 hsr;
> >> +  __u64 far;
> >>  };
> >>  
> >> +/*
> >> + * Architecture specific defines for kvm_guest_debug->control
> >> + */
> >> +
> >> +#define KVM_GUESTDBG_USE_SW_BP(1 << 16)
> >> +#define KVM_GUESTDBG_USE_HW_BP(1 << 17)
> >> +
> >>  struct kvm_sync_regs {
> >>  };
> >>  
> >> -- 
> >> 2.4.1
> >> 
> 
> -- 
> Alex Bennée
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: x86: default legacy PCI device assignment support to "n"

2015-06-04 Thread Bandan Das

Paolo Bonzini  writes:

> VFIO has proved itself a much better option than KVM's built-in
> device assignment.  It is mature, provides better isolation because
> it enforces ACS, and even the userspace code is being tested on
> a wider variety of hardware these days than the legacy support.
>
> Disable legacy device assignment by default.

Shouldn't we mark it as Deprecated then ?

Bandan
> Signed-off-by: Paolo Bonzini 
> ---
>  arch/x86/kvm/Kconfig | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
> index 413a7bf9efbb..a0f06a5947c5 100644
> --- a/arch/x86/kvm/Kconfig
> +++ b/arch/x86/kvm/Kconfig
> @@ -88,13 +88,14 @@ config KVM_MMU_AUDIT
>  config KVM_DEVICE_ASSIGNMENT
>   bool "KVM legacy PCI device assignment support"
>   depends on KVM && PCI && IOMMU_API
> - default y
> + default n
>   ---help---
> Provide support for legacy PCI device assignment through KVM.  The
> kernel now also supports a full featured userspace device driver
> -   framework through VFIO, which supersedes much of this support.
> +   framework through VFIO, which supersedes this support and provides
> +   better security.
>  
> -   If unsure, say Y.
> +   If unsure, say N.
>  
>  # OK, it's a little counter-intuitive to do this, but it puts it neatly under
>  # the virtualization menu.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: x86: default legacy PCI device assignment support to "n"

2015-06-04 Thread Paolo Bonzini



On 04/06/2015 16:31, Bandan Das wrote:
> > VFIO has proved itself a much better option than KVM's built-in
> > device assignment.  It is mature, provides better isolation because
> > it enforces ACS, and even the userspace code is being tested on
> > a wider variety of hardware these days than the legacy support.
> >
> > Disable legacy device assignment by default.
>
> Shouldn't we mark it as Deprecated then ?

Yes, good idea!  You mean just " (DEPRECATED)" after the string, right?

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 06/13] KVM: x86: save/load state on SMM switch

2015-06-04 Thread Paolo Bonzini



On 03/06/2015 21:02, Radim Krčmář wrote:
> > +   if (r < 0)
> > +   return;
> 
> And if we fail to write it, is there other option than throwing an error
> to userspace?  (Unset HF_SMM_MASK and pretend that nothing happened
> doesn't find much support in docs.)

Just do nothing, I guess.  If this is ROM, things will work (for some
definition of work) the same as in real hardware.  If it's MMIO, they
will break as soon as the next instruction is executed.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/13] SMM implementation for KVM

2015-06-04 Thread Paolo Bonzini



On 04/06/2015 14:26, Radim Krčmář wrote:
> A list of minor stuff I noticed (you'll probably agree with [12/13]):

Good eyes. :)  I'll fix at least these:

> [06/13] get_smstate can't be a valid C function, CAPS would be clearer
> [06/13] mangled whitespace in trace_kvm_enter_smm
> [08/13] '(u16)' or '0x &' seem better than '65535 &'
> [12/13] add role.smm to Documentation/virtual/kvm/mmu.txt

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 03/12] KVM: arm64: guest debug, define API headers

2015-06-04 Thread Alex Bennée


Christoffer Dall  writes:

> On Fri, May 29, 2015 at 10:30:19AM +0100, Alex Bennée wrote:
>> This commit defines the API headers for guest debugging. There are two
>> architecture specific debug structures:
>> 
>>   - kvm_guest_debug_arch, allows us to pass in HW debug registers
>>   - kvm_debug_exit_arch, signals exception and possible faulting address
>> 
>> The type of debugging being used is controlled by the architecture
>> specific control bits of the kvm_guest_debug->control flags in the ioctl
>> structure.
>> 
>> Signed-off-by: Alex Bennée 
>> Reviewed-by: David Hildenbrand 
>> Reviewed-by: Andrew Jones 
>> Acked-by: Christoffer Dall 
>
> I can re-confirm my ack despite the changes in v4, but this really is
> borderline to keep exiting r-b and a-b tags around from previous patches
> I would think, but ok.

I was wondering how much a patch has to change for the r-b tags to
become invalid. The meat of the API hasn't changed much though.

Drew/David,

Are you still happy with this patch?

>
> -Christoffer
>
>> 
>> ---
>> v2
>>- expose hsr and pc directly to user-space
>> v3
>>- s/control/controlled/ in commit message
>>- add v8 to ARM ARM comment (ARM Architecture Reference Manual)
>>- add rb tag
>>- rm pc, add far
>>- re-word comments on alignment
>>- rename KVM_ARM_NDBG_REGS -> KVM_ARM_MAX_DBG_REGS
>> v4
>>- now uses common HW/SW BP define
>>- add a-b-tag
>>- use u32 for control regs
>> v5
>>- revert to have arch specific KVM_GUESTDBG_USE_SW/HW_BP
>>- rm stale comments dbgctrl was stored as u64
>> ---
>>  arch/arm64/include/uapi/asm/kvm.h | 20 
>>  1 file changed, 20 insertions(+)
>> 
>> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
>> b/arch/arm64/include/uapi/asm/kvm.h
>> index d268320..43758e7 100644
>> --- a/arch/arm64/include/uapi/asm/kvm.h
>> +++ b/arch/arm64/include/uapi/asm/kvm.h
>> @@ -100,12 +100,32 @@ struct kvm_sregs {
>>  struct kvm_fpu {
>>  };
>>  
>> +/*
>> + * See v8 ARM ARM D7.3: Debug Registers
>> + *
>> + * The architectural limit is 16 debug registers of each type although
>> + * in practice there are usually less (see ID_AA64DFR0_EL1).
>> + */
>> +#define KVM_ARM_MAX_DBG_REGS 16
>>  struct kvm_guest_debug_arch {
>> +__u32 dbg_bcr[KVM_ARM_MAX_DBG_REGS];
>> +__u64 dbg_bvr[KVM_ARM_MAX_DBG_REGS];
>> +__u32 dbg_wcr[KVM_ARM_MAX_DBG_REGS];
>> +__u64 dbg_wvr[KVM_ARM_MAX_DBG_REGS];
>>  };
>>  
>>  struct kvm_debug_exit_arch {
>> +__u32 hsr;
>> +__u64 far;
>>  };
>>  
>> +/*
>> + * Architecture specific defines for kvm_guest_debug->control
>> + */
>> +
>> +#define KVM_GUESTDBG_USE_SW_BP  (1 << 16)
>> +#define KVM_GUESTDBG_USE_HW_BP  (1 << 17)
>> +
>>  struct kvm_sync_regs {
>>  };
>>  
>> -- 
>> 2.4.1
>> 

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 07/12] KVM: arm64: guest debug, add support for single-step

2015-06-04 Thread Alex Bennée


Christoffer Dall  writes:

> On Fri, May 29, 2015 at 10:30:23AM +0100, Alex Bennée wrote:
>> This adds support for single-stepping the guest. To do this we need to
>> manipulate the guests PSTATE.SS and MDSCR_EL1.SS bits which we do in the
>> kvm_arm_setup/clear_debug() so we don't affect the apparent state of the
>> guest. Additionally while the host is debugging the guest we suppress
>> the ability of the guest to single-step itself.
>
> I feel like there should be a slightly more elaborate explanation of
> exactly what works and what doesn't work when the guest is single
> stepping something and which choices we've made for supporting or not
> supporting this.

OK, I shall put bit more explanation. I was trying to avoid too much
exposition in the commit comments vs the code.

>
>> 
>> Signed-off-by: Alex Bennée 
>> 
>> ---
>> v2
>>   - Move pstate/mdscr manipulation into C
>>   - don't export guest_debug to assembly
>>   - add accessor for saved_debug regs
>>   - tweak save/restore of mdscr_el1
>> v3
>>   - don't save PC in debug information struct
>>   - rename debug_saved_regs->guest_debug_state
>>   - save whole value, only use bits in restore
>>   - add save/restore_guest-debug_regs helper functions
>>   - simplify commit message for clarity
>>   - rm vcpu_debug_saved_reg access fn
>> v4
>>   - added more comments based on suggestions
>>   - guest_debug_state->guest_debug_preserved
>>   - no point masking restore, we will trap out
>> v5
>>   - more comments
>>   - don't bother preserving pstate.ss
>
> it would have been good if there was some comment explaining the reason
> for this change.
>
>> ---
>>  arch/arm/kvm/arm.c|  4 ++-
>>  arch/arm64/include/asm/kvm_host.h | 11 
>>  arch/arm64/kvm/debug.c| 58 
>> ---
>>  arch/arm64/kvm/handle_exit.c  |  2 ++
>>  4 files changed, 70 insertions(+), 5 deletions(-)
>> 
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 064c105..9b3ed6d 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -302,7 +302,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>>  kvm_arm_set_running_vcpu(NULL);
>>  }
>>  
>> -#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | 
>> KVM_GUESTDBG_USE_SW_BP)
>> +#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |\
>> +KVM_GUESTDBG_USE_SW_BP | \
>> +KVM_GUESTDBG_SINGLESTEP)
>>  
>>  /**
>>   * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
>> diff --git a/arch/arm64/include/asm/kvm_host.h 
>> b/arch/arm64/include/asm/kvm_host.h
>> index 7cb99b5..e2db6a6 100644
>> --- a/arch/arm64/include/asm/kvm_host.h
>> +++ b/arch/arm64/include/asm/kvm_host.h
>> @@ -123,6 +123,17 @@ struct kvm_vcpu_arch {
>>   * here.
>>   */
>>  
>> +/*
>> + * Guest registers we preserve during guest debugging.
>> + *
>> + * These shadow registers are updated by the kvm_handle_sys_reg
>> + * trap handler if the guest accesses or updates them while we
>> + * are using guest debug.
>> + */
>> +struct {
>> +u32 mdscr_el1;
>> +} guest_debug_preserved;
>> +
>>  /* Don't run the guest */
>>  bool pause;
>>  
>> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
>> index 8d1bfa4..10a6baa 100644
>> --- a/arch/arm64/kvm/debug.c
>> +++ b/arch/arm64/kvm/debug.c
>> @@ -19,11 +19,41 @@
>>  
>>  #include 
>>  
>> +#include 
>> +#include 
>>  #include 
>> +#include 
>> +
>> +/* These are the bits of MDSCR_EL1 we may manipulate */
>> +#define MDSCR_EL1_DEBUG_MASK(DBG_MDSCR_SS | \
>> +DBG_MDSCR_KDE | \
>> +DBG_MDSCR_MDE)
>>  
>>  static DEFINE_PER_CPU(u32, mdcr_el2);
>>  
>>  /**
>> + * save/restore_guest_debug_regs
>> + *
>> + * For some debug operations we need to tweak some guest registers. As
>> + * a result we need to save the state of those registers before we
>> + * make those modifications. This does get confused if the guest
>> + * attempts to control single step while being debugged. It will start
>> + * working again once it is no longer being debugged by the host.
>
> What gets confused and what starts working?

Maybe I should cut from "This does get..." and put more explanation in
the single step comment later.

>
>> + *
>> + * Guest access to MDSCR_EL1 is trapped by the hypervisor and handled
>> + * after we have restored the preserved value to the main context.
>> + */
>> +static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
>> +{
>> +vcpu->arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, 
>> MDSCR_EL1);
>> +}
>> +
>> +static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
>> +{
>> +vcpu_sys_reg(vcpu, MDSCR_EL1) = 
>> vcpu->arch.guest_debug_preserved.mdscr_el1;
>> +}
>> +
>> +/**
>>   * kvm_arm_init_debug - grab what we need for debug
>>   *
>>   * Currently the sole task of this function is to retrieve the initial
>> @@ -38,7 +

[PATCH v2] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Alex Bennée

The elr_el2 and spsr_el2 registers in fact contain the processor state
before entry into EL2. In the case of guest state it could be in either
el0 or el1.

Signed-off-by: Alex Bennée 

---
v2
  - s/hypervisor code/EL2/
  - comment: pc/pstate before entering/on return from el2
---
 arch/arm64/kvm/hyp.S | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
index 5befd01..519805f 100644
--- a/arch/arm64/kvm/hyp.S
+++ b/arch/arm64/kvm/hyp.S
@@ -50,8 +50,8 @@
stp x29, lr, [x3, #80]
 
mrs x19, sp_el0
-   mrs x20, elr_el2// EL1 PC
-   mrs x21, spsr_el2   // EL1 pstate
+   mrs x20, elr_el2// pc before entering el2
+   mrs x21, spsr_el2   // pstate before entering el2
 
stp x19, x20, [x3, #96]
str x21, [x3, #112]
@@ -82,8 +82,8 @@
ldr x21, [x3, #16]
 
msr sp_el0, x19
-   msr elr_el2, x20// EL1 PC
-   msr spsr_el2, x21   // EL1 pstate
+   msr elr_el2, x20// pc on return from el2
+   msr spsr_el2, x21   // pstate on return from el2
 
add x3, x2, #CPU_XREG_OFFSET(19)
ldp x19, x20, [x3]
-- 
2.4.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 00/13] SMM implementation for KVM

2015-06-04 Thread Radim Krčmář

2015-05-27 19:05+0200, Paolo Bonzini:
> This brings together the remaining parts of SMM.  For now I've left the
> "weird" interaction between SMM and NMI blocking, and I'm using the same
> format for the state save area (which is also the one used by QEMU) as
> the RFC.
> 
> It builds on the previous cleanup patches, which (with the exception
> of "KVM: x86: pass kvm_mmu_page to gfn_to_rmap") are now in kvm/queue.
> The first six patches are more or less the same as the previous version,
> while the address spaces part hopefully touches all affected functions
> now.
> 
> Patches 1-6 implement the SMM API and world switch; patches 7-12
> implements the multiple address spaces; patch 13 ties the loose
> ends and advertises the capability.
> 
> Tested with SeaBIOS and OVMF, where SMM provides the trusted base
> for secure boot.

Good work!  ABI looks ok and unknown bugs shouldn't be major.
With ~HF_SMM_INSIDE_NMI_MASK [04/13], early return [06/13] and
INIT [14/13] fixed,
Reviewed-by: Radim Krčmář 


---
A list of minor stuff I noticed (you'll probably agree with [12/13]):
[06/13] get_smstate can't be a valid C function, CAPS would be clearer
[06/13] get_smstate uses kvm_read_guest_virt_system for every field,
buffering the whole block should be much faster
[06/13] mangled whitespace in trace_kvm_enter_smm
[08/13] '(u16)' or '0x &' seem better than '65535 &'
[10/13] try_async_pf() could pass memslot to kvm_arch_setup_async_pf()
[11/13] '#define kvm_for_each_address_space' would fit a lot of places
[12/13] add role.smm to Documentation/virtual/kvm/mmu.txt
[14/13] kvm_lapic_latched_init could use a linebreak

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/6] QEMU support for KVM Guest Debug on arm64

2015-06-04 Thread Peter Maydell

On 29 May 2015 at 16:19, Alex Bennée  wrote:
> You may be wondering what happened to v3 and v4. They do exist but
> they didn't change much from the the original patches as I've been
> mostly looking the kernel side of the equation. So in summary the
> changes are:
>
>   - updates to the kernel ABI
>   - don't fall over on kernels without debug support
>   - better logging, syncing and use of internals.h
>   - debug exception re-injection for guest events*

Some generic remarks (which we've talked about in irc):

 * does this correctly handle single step over emulated MMIO insns?
   how about single step over insns emulated in the kernel
   without trapping out to userspace? (eg some of the sysregs)
   kvm_skip_instr() doesn't seem to update PSTATE.SS...
 * the kernel currently does kvm_skip_instr() before the
   emulated MMIO exit, not afterwards. That feels conceptually
   the wrong way round -- are there any interesting corner cases
   we would get wrong currently but that naturally fall out in
   the wash if it's done afterwards?
 * what about debugging a 32-bit guest which uses the 32-bit
   ARM/Thumb bkpt insns?

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 06/13] KVM: x86: save/load state on SMM switch

2015-06-04 Thread Radim Krčmář

2015-06-04 08:14+0200, Paolo Bonzini:
> On 03/06/2015 21:02, Radim Krčmář wrote:
>>> +   r = kvm_write_guest(vcpu->kvm, vcpu->arch.smbase + 0xfe00, buf, 
>>> sizeof(buf));
>> 
>> The state is saved in SMRAM, but we are accessing it using the non-SMM
>> address space ... how did it pass testing?
>> (Restore is using SMM address space, so I'm guessing that the mapping
>>  from QEMU wasn't really utilizing two separate address spaces.)
> 
> At this point of the series there are no separate address spaces yet.
> Patch 10 then changes it everywhere:
> 
> @@ -6558,7 +6558,7 @@ static void process_smi(struct kvm_vcpu *vcpu)

My bad, people using jackhammers at 7am are getting the better of me.

> Why did I order it this way?  Because it is already possible to test
> this code with the default SMBASE of 0x3, and it is already
> possible to run the full firmware if you hack it not to close SMRAM
> (for this I used q35's high SMRAM).  It is not possible to test the
> code partially if you first add the two address spaces, and only
> implement the world switch second.

The ordering makes sense;  I wanted to point out the early return,
noticed this as well and missed that it was fixed later, sorry.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 6/6] target-arm: kvm - re-inject guest debug exceptions

2015-06-04 Thread Peter Maydell

On 29 May 2015 at 16:19, Alex Bennée  wrote:
> From: Alex Bennée 
>
> If we can't find details for the debug exception in our debug state
> then we can assume the exception is due to debugging inside the guest.
> To inject the exception into the guest state we re-use the TCG exception
> code (do_interupt).
>
> However while guest debugging is in effect we currently can't handle the
> guest using single step which is heavily used by GDB.

Is this just the kernel not supporting this, or would QEMU need
to change as well? Worth mentioning either way.

> Signed-off-by: Alex Bennée 
>
> ---
> v5:
>   - new for v5
> ---
>  target-arm/cpu.h|  1 +
>  target-arm/helper-a64.c | 17 ++---
>  target-arm/internals.h  |  1 +
>  target-arm/kvm.c| 30 ++
>  4 files changed, 38 insertions(+), 11 deletions(-)
>
> diff --git a/target-arm/cpu.h b/target-arm/cpu.h
> index 083211c..95ae3a8 100644
> --- a/target-arm/cpu.h
> +++ b/target-arm/cpu.h
> @@ -56,6 +56,7 @@
>  #define EXCP_SMC13   /* Secure Monitor Call */
>  #define EXCP_VIRQ   14
>  #define EXCP_VFIQ   15
> +#define EXCP_WAPT   16

Why do we need a new EXCP value here? In hardware watchpoints
are reported via a particular syndrome value and (for AArch32)
as Data Aborts, so I would expect that we would just do the same.
The code in this patch doesn't seem to treat EXCP_WAPT any
differently from EXCP_DABT...

>  #define ARMV7M_EXCP_RESET   1
>  #define ARMV7M_EXCP_NMI 2
> diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c
> index 861f6fa..32bd27d 100644
> --- a/target-arm/helper-a64.c
> +++ b/target-arm/helper-a64.c
> @@ -25,6 +25,7 @@
>  #include "qemu/bitops.h"
>  #include "internals.h"
>  #include "qemu/crc32c.h"
> +#include "sysemu/kvm.h"
>  #include  /* For crc32 */
>
>  /* C2.4.7 Multiply and divide */
> @@ -478,10 +479,13 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  }
>
>  arm_log_exception(cs->exception_index);
> -qemu_log_mask(CPU_LOG_INT, "...from EL%d\n", arm_current_el(env));
> +qemu_log_mask(CPU_LOG_INT, "...from EL%d PC 0x%" PRIx64 "\n",
> +  arm_current_el(env), env->pc);
> +
>  if (qemu_loglevel_mask(CPU_LOG_INT)
>  && !excp_is_internal(cs->exception_index)) {
> -qemu_log_mask(CPU_LOG_INT, "...with ESR 0x%" PRIx32 "\n",
> +qemu_log_mask(CPU_LOG_INT, "...with ESR %x/0x%" PRIx32 "\n",
> +  env->exception.syndrome >> ARM_EL_EC_SHIFT,
>env->exception.syndrome);
>  }

If you just want to improve our debug logging that should be
a separate patch.

>
> @@ -494,6 +498,7 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  switch (cs->exception_index) {
>  case EXCP_PREFETCH_ABORT:
>  case EXCP_DATA_ABORT:
> +case EXCP_WAPT:
>  env->cp15.far_el[new_el] = env->exception.vaddress;
>  qemu_log_mask(CPU_LOG_INT, "...with FAR 0x%" PRIx64 "\n",
>env->cp15.far_el[new_el]);
> @@ -539,6 +544,12 @@ void aarch64_cpu_do_interrupt(CPUState *cs)
>  aarch64_restore_sp(env, new_el);
>
>  env->pc = addr;
> -cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +
> +qemu_log_mask(CPU_LOG_INT, "...to EL%d PC 0x%" PRIx64 " PSTATE 0x%x\n",
> +  new_el, env->pc, pstate_read(env));
> +
> +if (!kvm_enabled()) {
> +cs->interrupt_request |= CPU_INTERRUPT_EXITTB;
> +}

Do we need a similar change in the AArch32 do_interrupt
code, to handle the case of debugging a 32-bit guest?

>  }
>  #endif
> diff --git a/target-arm/internals.h b/target-arm/internals.h
> index 2cc3017..10e8999 100644
> --- a/target-arm/internals.h
> +++ b/target-arm/internals.h
> @@ -58,6 +58,7 @@ static const char * const excnames[] = {
>  [EXCP_SMC] = "Secure Monitor Call",
>  [EXCP_VIRQ] = "Virtual IRQ",
>  [EXCP_VFIQ] = "Virtual FIQ",
> +[EXCP_WAPT] = "Watchpoint",
>  };
>
>  static inline void arm_log_exception(int idx)
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index e1fccdd..6f608d8 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -523,9 +523,11 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  struct kvm_debug_exit_arch *arch_info = &run->debug.arch;
>  int hsr_ec = arch_info->hsr >> ARM_EL_EC_SHIFT;
>  ARMCPU *cpu = ARM_CPU(cs);
> +CPUClass *cc = CPU_GET_CLASS(cs);
>  CPUARMState *env = &cpu->env;
> +int forward_excp = EXCP_BKPT;
>
> -/* Ensure PC is synchronised */
> +/* Ensure all state is synchronised */
>  kvm_cpu_synchronize_state(cs);

Not sure the comment is providing any useful information now;
it probably should either be more explanatory or just deleted.

>  switch (hsr_ec) {
> @@ -533,7 +535,14 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  if (cs->singlestep_enabled) {
>  return true;
>  } else {
> -error_report("Came out of SIN

Re: [PATCH v5 08/12] KVM: arm64: re-factor hyp.S debug register code

2015-06-04 Thread Christoffer Dall

On Thu, Jun 04, 2015 at 11:34:46AM +0100, Alex Bennée wrote:
> 
> Christoffer Dall  writes:
> 
> > On Fri, May 29, 2015 at 10:30:24AM +0100, Alex Bennée wrote:
> >> This is a pre-cursor to sharing the code with the guest debug support.
> >> This replaces the big macro that fishes data out of a fixed location
> >> with a more general helper macro to restore a set of debug registers. It
> >> uses macro substitution so it can be re-used for debug control and value
> >> registers. It does however rely on the debug registers being 64 bit
> >> aligned (as they happen to be in the hyp ABI).
> 
> Oops I'd better fix that commit comment.
> 
> >> 
> >> Signed-off-by: Alex Bennée 
> >> 
> >> ---
> >> v3:
> >>   - return to the patch series
> >>   - add save and restore targets
> >>   - change register use and document
> >> v4:
> >>   - keep original setup/restore names
> >>   - don't use split u32/u64 structure yet
> >> ---
> >>  arch/arm64/kvm/hyp.S | 519 
> >> ++-
> >>  1 file changed, 140 insertions(+), 379 deletions(-)
> >> 
> >> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> >> index 74e63d8..9c4897d 100644
> >> --- a/arch/arm64/kvm/hyp.S
> >> +++ b/arch/arm64/kvm/hyp.S
> >
> >
> > [...]
> >
> >> @@ -465,195 +318,52 @@
> >>msr mdscr_el1,  x25
> >>  .endm
> >>  
> >> -.macro restore_debug
> >> -  // x2: base address for cpu context
> >> -  // x3: tmp register
> >> -
> >> -  mrs x26, id_aa64dfr0_el1
> >> -  ubfxx24, x26, #12, #4   // Extract BRPs
> >> -  ubfxx25, x26, #20, #4   // Extract WRPs
> >> -  mov w26, #15
> >> -  sub w24, w26, w24   // How many BPs to skip
> >> -  sub w25, w26, w25   // How many WPs to skip
> >> -
> >> -  add x3, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
> >> +.macro restore_debug type
> >> +// x4: pointer to register set
> >> +// x5: number of registers to skip
> >
> > nit: use tabs instead of spaces here...
> >
> >> +  // x6..x22 trashed
> >>  
> >
> > [...]
> >
> >> @@ -887,12 +597,63 @@ __restore_sysregs:
> >>restore_sysregs
> >>ret
> >>  
> >> +/* Save debug state */
> >>  __save_debug:
> >> -  save_debug
> >> +  // x0: base address for vcpu context
> >> +  // x2: ptr to current CPU context
> >> +  // x4/x5: trashed
> >
> > I had a bunch of questions here which I think you missed last time
> > around:
> >  1. why do we need the vcpu context?
> 
> We don't, I'll drop that
> 
> >  2. what does 'current' mean here?
> 
> Either the host or vcpu context depending which way we are currently going.
> 

drop 'current' please.

> >  3. you're also trashing everything that save_debug trashes, so x4/5,
> > x6-x22 trashed would be more correct
> 
> OK
> 
> >
> >> +
> >> +  mrs x26, id_aa64dfr0_el1
> >> +  ubfxx24, x26, #12, #4   // Extract BRPs
> >> +  ubfxx25, x26, #20, #4   // Extract WRPs
> >> +  mov w26, #15
> >> +  sub w24, w26, w24   // How many BPs to skip
> >> +  sub w25, w26, w25   // How many WPs to skip
> >> +
> >> +  mov x5, x24
> >> +  add x4, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
> >> +  save_debug dbgbcr
> >> +  add x4, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
> >> +  save_debug dbgbvr
> >> +
> >> +  mov x5, x25
> >> +  add x4, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
> >> +  save_debug dbgwcr
> >> +  add x4, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1)
> >> +  save_debug dbgwvr
> >> +
> >> +  mrs x21, mdccint_el1
> >> +  str x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
> >>ret
> >>  
> >> +/* Restore debug state */
> >>  __restore_debug:
> >> -  restore_debug
> >> +  // x0: base address for cpu context
> >> +  // x2: ptr to current CPU context
> >> +  // x4/x5: trashed
> >
> > and you missed these comments too, basically same stuff, but why was it
> > 'cpu' here and not 'vcpu' like above?
> 
> Again we use the functions both for restoring host and vcpu debug context.
> 
Well, the two functions operate on the same structures, so I would like
them to be consistent...

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 5/6] target-arm: kvm - add support for HW assisted debug

2015-06-04 Thread Peter Maydell

On 29 May 2015 at 16:19, Alex Bennée  wrote:
> This adds basic support for HW assisted debug. The ioctl interface to
> KVM allows us to pass an implementation defined number of break and
> watch point registers. When KVM_GUESTDBG_USE_HW_BP is specified these
> debug registers will be installed in place on the world switch into the
> guest.
>
> The hardware is actually capable of more advanced matching but it is
> unclear if this expressiveness is available via the gdbstub protocol.
>
> Signed-off-by: Alex Bennée 
>
> ---
> v2
>   - correct setting of PMC/BAS/MASK
>   - improved commentary
>   - added helper function to check watchpoint in range
>   - fix find/deletion of watchpoints
> v3
>   - use internals.h definitions
> ---
>  target-arm/kvm.c |  35 +++---
>  target-arm/kvm64.c   | 304 
> ++-
>  target-arm/kvm_arm.h |  21 
>  3 files changed, 338 insertions(+), 22 deletions(-)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index de2865a..e1fccdd 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -17,6 +17,7 @@
>
>  #include "qemu-common.h"
>  #include "qemu/timer.h"
> +#include "qemu/error-report.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_arm.h"
> @@ -540,6 +541,16 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  return true;
>  }
>  break;
> +case EC_BREAKPOINT:
> +if (kvm_arm_find_hw_breakpoint(cs, env->pc)) {
> +return true;
> +}
> +break;
> +case EC_WATCHPOINT:
> +if (kvm_arm_find_hw_watchpoint(cs, arch_info->far)) {
> +return true;
> +}

I assume the kernel is giving us the hardware behaviour of
"watchpoint traps happen before load/store execution, not after"?
(This is the correct thing from the POV of the rest of QEMU.)

> +break;
>  default:
>  error_report("%s: unhandled debug exit (%"PRIx32", %"PRIx64")\n",
>   __func__, arch_info->hsr, env->pc);
> @@ -601,6 +612,10 @@ void kvm_arch_update_guest_debug(CPUState *cs, struct 
> kvm_guest_debug *dbg)
>  if (kvm_sw_breakpoints_active(cs)) {
>  dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
>  }
> +if (kvm_hw_breakpoints_active(cs)) {
> +dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_HW_BP;
> +kvm_copy_hw_breakpoint_data(&dbg->arch);
> +}
>  }
>
>  /* C6.6.29 BRK instruction */
> @@ -627,26 +642,6 @@ int kvm_arch_remove_sw_breakpoint(CPUState *cs, struct 
> kvm_sw_breakpoint *bp)
>  return 0;
>  }
>
> -int kvm_arch_insert_hw_breakpoint(target_ulong addr,
> -  target_ulong len, int type)
> -{
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return -EINVAL;
> -}
> -
> -int kvm_arch_remove_hw_breakpoint(target_ulong addr,
> -  target_ulong len, int type)
> -{
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return -EINVAL;
> -}
> -
> -
> -void kvm_arch_remove_all_hw_breakpoints(void)
> -{
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -}
> -
>  void kvm_arch_init_irq_routing(KVMState *s)
>  {
>  }
> diff --git a/target-arm/kvm64.c b/target-arm/kvm64.c
> index 61592d2..06d4e1e 100644
> --- a/target-arm/kvm64.c
> +++ b/target-arm/kvm64.c
> @@ -2,6 +2,7 @@
>   * ARM implementation of KVM hooks, 64 bit specific code
>   *
>   * Copyright Mian-M. Hamayun 2013, Virtual Open Systems
> + * Copyright Alex Bennée 2014, Linaro

Really 2014? :-)

>   *
>   * This work is licensed under the terms of the GNU GPL, version 2 or later.
>   * See the COPYING file in the top-level directory.
> @@ -12,12 +13,18 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
>
> +#include 
>  #include 
>
>  #include "config-host.h"
>  #include "qemu-common.h"
>  #include "qemu/timer.h"
> +#include "qemu/host-utils.h"
> +#include "qemu/error-report.h"
> +#include "exec/gdbstub.h"
>  #include "sysemu/sysemu.h"
>  #include "sysemu/kvm.h"
>  #include "kvm_arm.h"
> @@ -26,21 +33,314 @@
>  #include "hw/arm/arm.h"
>
>  static bool have_guest_debug;
> +/* Max and current break/watch point counts */
> +int max_hw_bp, max_hw_wp;
> +int cur_hw_bp, cur_hw_wp;
> +struct kvm_guest_debug_arch guest_debug_registers;

...did we figure out how this works for SMP?

>
>  /**
> - * kvm_arm_init_debug()
> + * kvm_arm_init_debug() - check for guest debug capabilities
>   * @cs: CPUState
>   *
> - * Check for guest debug capabilities.
> + * kvm_check_extension returns 0 if we have no debug registers or the
> + * number we have.
>   *
>   */
>  static void kvm_arm_init_debug(CPUState *cs)
>  {
>  have_guest_debug = kvm_check_extension(cs->kvm_state,
> KVM_CAP_SET_GUEST_DEBUG);
> +max_hw_wp = kvm_check_extension(cs->kvm_state, 
> KVM_CAP_GUEST_DEBUG_HW_WPS);
> +max_hw_bp

Re: [PATCH v5 03/12] KVM: arm64: guest debug, define API headers

2015-06-04 Thread Christoffer Dall

On Fri, May 29, 2015 at 10:30:19AM +0100, Alex Bennée wrote:
> This commit defines the API headers for guest debugging. There are two
> architecture specific debug structures:
> 
>   - kvm_guest_debug_arch, allows us to pass in HW debug registers
>   - kvm_debug_exit_arch, signals exception and possible faulting address
> 
> The type of debugging being used is controlled by the architecture
> specific control bits of the kvm_guest_debug->control flags in the ioctl
> structure.
> 
> Signed-off-by: Alex Bennée 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Andrew Jones 
> Acked-by: Christoffer Dall 

I can re-confirm my ack despite the changes in v4, but this really is
borderline to keep exiting r-b and a-b tags around from previous patches
I would think, but ok.

-Christoffer

> 
> ---
> v2
>- expose hsr and pc directly to user-space
> v3
>- s/control/controlled/ in commit message
>- add v8 to ARM ARM comment (ARM Architecture Reference Manual)
>- add rb tag
>- rm pc, add far
>- re-word comments on alignment
>- rename KVM_ARM_NDBG_REGS -> KVM_ARM_MAX_DBG_REGS
> v4
>- now uses common HW/SW BP define
>- add a-b-tag
>- use u32 for control regs
> v5
>- revert to have arch specific KVM_GUESTDBG_USE_SW/HW_BP
>- rm stale comments dbgctrl was stored as u64
> ---
>  arch/arm64/include/uapi/asm/kvm.h | 20 
>  1 file changed, 20 insertions(+)
> 
> diff --git a/arch/arm64/include/uapi/asm/kvm.h 
> b/arch/arm64/include/uapi/asm/kvm.h
> index d268320..43758e7 100644
> --- a/arch/arm64/include/uapi/asm/kvm.h
> +++ b/arch/arm64/include/uapi/asm/kvm.h
> @@ -100,12 +100,32 @@ struct kvm_sregs {
>  struct kvm_fpu {
>  };
>  
> +/*
> + * See v8 ARM ARM D7.3: Debug Registers
> + *
> + * The architectural limit is 16 debug registers of each type although
> + * in practice there are usually less (see ID_AA64DFR0_EL1).
> + */
> +#define KVM_ARM_MAX_DBG_REGS 16
>  struct kvm_guest_debug_arch {
> + __u32 dbg_bcr[KVM_ARM_MAX_DBG_REGS];
> + __u64 dbg_bvr[KVM_ARM_MAX_DBG_REGS];
> + __u32 dbg_wcr[KVM_ARM_MAX_DBG_REGS];
> + __u64 dbg_wvr[KVM_ARM_MAX_DBG_REGS];
>  };
>  
>  struct kvm_debug_exit_arch {
> + __u32 hsr;
> + __u64 far;
>  };
>  
> +/*
> + * Architecture specific defines for kvm_guest_debug->control
> + */
> +
> +#define KVM_GUESTDBG_USE_SW_BP   (1 << 16)
> +#define KVM_GUESTDBG_USE_HW_BP   (1 << 17)
> +
>  struct kvm_sync_regs {
>  };
>  
> -- 
> 2.4.1
> 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 05/12] KVM: arm: introduce kvm_arm_init/setup/clear_debug

2015-06-04 Thread Christoffer Dall

On Fri, May 29, 2015 at 10:30:21AM +0100, Alex Bennée wrote:
> This is a precursor for later patches which will need to do more to
> setup debug state before entering the hyp.S switch code. The existing
> functionality for setting mdcr_el2 has been moved out of hyp.S and now
> uses the value kept in vcpu->arch.mdcr_el2.
> 
> As the assembler used to previously mask and preserve MDCR_EL2.HPMN I've
> had to add a mechanism to save the value of mdcr_el2 as a per-cpu
> variable during the initialisation code. The kernel never sets this
> number so we are assuming the bootcode has set up the correct value
> here.
> 
> This also moves the conditional setting of the TDA bit from the hyp code
> into the C code which is currently used for the lazy debug register
> context switch code.
> 
> Signed-off-by: Alex Bennée 
> 
Reviewed-by: Christoffer Dall 
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 07/12] KVM: arm64: guest debug, add support for single-step

2015-06-04 Thread Christoffer Dall

On Fri, May 29, 2015 at 10:30:23AM +0100, Alex Bennée wrote:
> This adds support for single-stepping the guest. To do this we need to
> manipulate the guests PSTATE.SS and MDSCR_EL1.SS bits which we do in the
> kvm_arm_setup/clear_debug() so we don't affect the apparent state of the
> guest. Additionally while the host is debugging the guest we suppress
> the ability of the guest to single-step itself.

I feel like there should be a slightly more elaborate explanation of
exactly what works and what doesn't work when the guest is single
stepping something and which choices we've made for supporting or not
supporting this.

> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - Move pstate/mdscr manipulation into C
>   - don't export guest_debug to assembly
>   - add accessor for saved_debug regs
>   - tweak save/restore of mdscr_el1
> v3
>   - don't save PC in debug information struct
>   - rename debug_saved_regs->guest_debug_state
>   - save whole value, only use bits in restore
>   - add save/restore_guest-debug_regs helper functions
>   - simplify commit message for clarity
>   - rm vcpu_debug_saved_reg access fn
> v4
>   - added more comments based on suggestions
>   - guest_debug_state->guest_debug_preserved
>   - no point masking restore, we will trap out
> v5
>   - more comments
>   - don't bother preserving pstate.ss

it would have been good if there was some comment explaining the reason
for this change.

> ---
>  arch/arm/kvm/arm.c|  4 ++-
>  arch/arm64/include/asm/kvm_host.h | 11 
>  arch/arm64/kvm/debug.c| 58 
> ---
>  arch/arm64/kvm/handle_exit.c  |  2 ++
>  4 files changed, 70 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 064c105..9b3ed6d 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -302,7 +302,9 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>   kvm_arm_set_running_vcpu(NULL);
>  }
>  
> -#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE | 
> KVM_GUESTDBG_USE_SW_BP)
> +#define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |\
> + KVM_GUESTDBG_USE_SW_BP | \
> + KVM_GUESTDBG_SINGLESTEP)
>  
>  /**
>   * kvm_arch_vcpu_ioctl_set_guest_debug - set up guest debugging
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index 7cb99b5..e2db6a6 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -123,6 +123,17 @@ struct kvm_vcpu_arch {
>* here.
>*/
>  
> + /*
> +  * Guest registers we preserve during guest debugging.
> +  *
> +  * These shadow registers are updated by the kvm_handle_sys_reg
> +  * trap handler if the guest accesses or updates them while we
> +  * are using guest debug.
> +  */
> + struct {
> + u32 mdscr_el1;
> + } guest_debug_preserved;
> +
>   /* Don't run the guest */
>   bool pause;
>  
> diff --git a/arch/arm64/kvm/debug.c b/arch/arm64/kvm/debug.c
> index 8d1bfa4..10a6baa 100644
> --- a/arch/arm64/kvm/debug.c
> +++ b/arch/arm64/kvm/debug.c
> @@ -19,11 +19,41 @@
>  
>  #include 
>  
> +#include 
> +#include 
>  #include 
> +#include 
> +
> +/* These are the bits of MDSCR_EL1 we may manipulate */
> +#define MDSCR_EL1_DEBUG_MASK (DBG_MDSCR_SS | \
> + DBG_MDSCR_KDE | \
> + DBG_MDSCR_MDE)
>  
>  static DEFINE_PER_CPU(u32, mdcr_el2);
>  
>  /**
> + * save/restore_guest_debug_regs
> + *
> + * For some debug operations we need to tweak some guest registers. As
> + * a result we need to save the state of those registers before we
> + * make those modifications. This does get confused if the guest
> + * attempts to control single step while being debugged. It will start
> + * working again once it is no longer being debugged by the host.

What gets confused and what starts working?

> + *
> + * Guest access to MDSCR_EL1 is trapped by the hypervisor and handled
> + * after we have restored the preserved value to the main context.
> + */
> +static void save_guest_debug_regs(struct kvm_vcpu *vcpu)
> +{
> + vcpu->arch.guest_debug_preserved.mdscr_el1 = vcpu_sys_reg(vcpu, 
> MDSCR_EL1);
> +}
> +
> +static void restore_guest_debug_regs(struct kvm_vcpu *vcpu)
> +{
> + vcpu_sys_reg(vcpu, MDSCR_EL1) = 
> vcpu->arch.guest_debug_preserved.mdscr_el1;
> +}
> +
> +/**
>   * kvm_arm_init_debug - grab what we need for debug
>   *
>   * Currently the sole task of this function is to retrieve the initial
> @@ -38,7 +68,6 @@ void kvm_arm_init_debug(void)
>   __this_cpu_write(mdcr_el2, kvm_call_hyp(__kvm_get_mdcr_el2));
>  }
>  
> -
>  /**
>   * kvm_arm_setup_debug - set up debug related stuff
>   *
> @@ -73,12 +102,33 @@ void kvm_arm_setup_debug(struct kvm_vcpu *vcpu)
>   if (trap_debug)
>   vcpu->arch.mdcr_el2 |= MDCR_EL2_TDA;
>  
> - /* Trap breakpoints? */
> - i

Re: [PATCH v5 10/12] KVM: arm64: guest debug, HW assisted debug support

2015-06-04 Thread Christoffer Dall

On Fri, May 29, 2015 at 10:30:26AM +0100, Alex Bennée wrote:
> This adds support for userspace to control the HW debug registers for
> guest debug. In the debug ioctl we copy the IMPDEF defined number of
> registers into a new register set called host_debug_state. There is now
> a new vcpu parameter called debug_ptr which selects which register set
> is to copied into the real registers when world switch occurs.
> 
> I've moved some helper functions into the hw_breakpoint.h header for
> re-use.
> 
> As with single step we need to tweak the guest registers to enable the
> exceptions so we need to save and restore those bits.
> 
> Two new capabilities have been added to the KVM_EXTENSION ioctl to allow
> userspace to query the number of hardware break and watch points
> available on the host hardware.
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>- switched to C setup
>- replace host debug registers directly into context
>- minor tweak to api docs
>- setup right register for debug
>- add FAR_EL2 to debug exit structure
>- add support for trapping debug register access
> v3
>- remove stray trace statement
>- fix spacing around operators (various)
>- clean-up usage of trap_debug
>- introduce debug_ptr, replace excessive memcpy stuff
>- don't use memcpy in ioctl, just assign
>- update cap ioctl documentation
>- reword a number comments
>- rename host_debug_state->external_debug_state
> v4
>- use the new u32/u64 split debug_ptr approach
>- fix some wording/comments
> v5
>- don't set MDSCR_EL1.KDE (not needed)
> ---
>  Documentation/virtual/kvm/api.txt  |  7 ++-
>  arch/arm/kvm/arm.c |  7 +++
>  arch/arm64/include/asm/hw_breakpoint.h | 12 +++
>  arch/arm64/include/asm/kvm_host.h  |  3 ++-
>  arch/arm64/include/uapi/asm/kvm.h  |  2 +-
>  arch/arm64/kernel/hw_breakpoint.c  | 12 ---
>  arch/arm64/kvm/debug.c | 37 
> +-
>  arch/arm64/kvm/handle_exit.c   |  6 ++
>  arch/arm64/kvm/reset.c | 12 +++
>  include/uapi/linux/kvm.h   |  2 ++
>  10 files changed, 80 insertions(+), 20 deletions(-)
> 
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 33c8143..ada57df 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -2668,7 +2668,7 @@ The top 16 bits of the control field are architecture 
> specific control
>  flags which can include the following:
>  
>- KVM_GUESTDBG_USE_SW_BP: using software breakpoints [x86, arm64]
> -  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390]
> +  - KVM_GUESTDBG_USE_HW_BP: using hardware breakpoints [x86, s390, arm64]
>- KVM_GUESTDBG_INJECT_DB: inject DB type exception [x86]
>- KVM_GUESTDBG_INJECT_BP: inject BP type exception [x86]
>- KVM_GUESTDBG_EXIT_PENDING:  trigger an immediate guest exit [s390]
> @@ -2683,6 +2683,11 @@ updated to the correct (supplied) values.
>  The second part of the structure is architecture specific and
>  typically contains a set of debug registers.
>  
> +For arm64 the number of debug registers is implementation defined and
> +can be determined by querying the KVM_CAP_GUEST_DEBUG_HW_BPS and
> +KVM_CAP_GUEST_DEBUG_HW_WPS capabilities which return a positive number
> +indicating the number of supported registers.
> +
>  When debug events exit the main run loop with the reason
>  KVM_EXIT_DEBUG with the kvm_debug_exit_arch part of the kvm_run
>  structure containing architecture specific debug information.
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 0d17c7b..6df47c1 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -307,6 +307,7 @@ void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
>  
>  #define KVM_GUESTDBG_VALID_MASK (KVM_GUESTDBG_ENABLE |\
>   KVM_GUESTDBG_USE_SW_BP | \
> + KVM_GUESTDBG_USE_HW_BP | \
>   KVM_GUESTDBG_SINGLESTEP)
>  
>  /**
> @@ -327,6 +328,12 @@ int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu 
> *vcpu,
>  
>   if (dbg->control & KVM_GUESTDBG_ENABLE) {
>   vcpu->guest_debug = dbg->control;
> +
> + /* Hardware assisted Break and Watch points */
> + if (vcpu->guest_debug & KVM_GUESTDBG_USE_HW_BP) {

does KVM_GUESTDBG_USE_HW_BP cover watch points as well or why is this
mentionend in the comment?

I asked this once before already...

> + vcpu->arch.external_debug_state = dbg->arch;
> + }
> +
>   } else {
>   /* If not enabled clear all flags */
>   vcpu->guest_debug = 0;
> diff --git a/arch/arm64/include/asm/hw_breakpoint.h 
> b/arch/arm64/include/asm/hw_breakpoint.h
> index 52b484b..c450552 100644
> --- a/arch/arm64/include/asm/hw_breakpoint.h
> +++ b/arch/arm64/include/asm/hw_breakpoi

Re: [PATCH v5 4/6] target-arm: kvm - support for single step

2015-06-04 Thread Peter Maydell

On 29 May 2015 at 16:19, Alex Bennée  wrote:
> This adds support for single-step. There isn't much to do on the QEMU
> side as after we set-up the request for single step via the debug ioctl
> it is all handled within the kernel.
>
> Signed-off-by: Alex Bennée 
>
> ---
> v2
>   - convert to using HSR_EC
> v3
>   - use internals.h definitions
> ---
>  target-arm/kvm.c | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index c3bad6f..de2865a 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -528,6 +528,13 @@ static int kvm_handle_debug(CPUState *cs, struct kvm_run 
> *run)
>  kvm_cpu_synchronize_state(cs);
>
>  switch (hsr_ec) {
> +case EC_SOFTWARESTEP:
> +if (cs->singlestep_enabled) {
> +return true;
> +} else {
> +error_report("Came out of SINGLE STEP when not enabled");

All-caps there is a bit odd.

> +}
> +break;
>  case EC_AA64_BKPT:
>  if (kvm_find_sw_breakpoint(cs, env->pc)) {
>  return true;
> @@ -588,6 +595,9 @@ int kvm_arch_on_sigbus(int code, void *addr)
>
>  void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
>  {
> +if (cs->singlestep_enabled) {
> +dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_SINGLESTEP;
> +}
>  if (kvm_sw_breakpoints_active(cs)) {
>  dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
>  }

Again, is there a guard somewhere to prevent us trying to enable
singlestep if the kernel doesn't support it?

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 3/6] target-arm: kvm - implement software breakpoints

2015-06-04 Thread Peter Maydell

On 29 May 2015 at 16:19, Alex Bennée  wrote:
> These don't involve messing around with debug registers, just setting
> the breakpoint instruction in memory. GDB will not use this mechanism if
> it can't access the memory to write the breakpoint.
>
> All the kernel has to do is ensure the hypervisor traps the breakpoint
> exceptions and returns to userspace.
>
> Signed-off-by: Alex Bennée 
>
> --
> v2
>   - handle debug exit with new hsr exception info
>   - add verbosity to UNIMP message
> v3
>   - sync with kvm_cpu_synchronize_state() before checking PC.
>   - use internals.h defines
>   - use env->pc
>   - use proper format types
> ---
>  target-arm/kvm.c | 88 
> 
>  1 file changed, 76 insertions(+), 12 deletions(-)
>
> diff --git a/target-arm/kvm.c b/target-arm/kvm.c
> index fdd9ba3..c3bad6f 100644
> --- a/target-arm/kvm.c
> +++ b/target-arm/kvm.c
> @@ -510,9 +510,60 @@ void kvm_arch_post_run(CPUState *cs, struct kvm_run *run)
>  {
>  }
>
> +/* See v8 ARM ARM D7.2.27 ESR_ELx, Exception Syndrome Register
> + *
> + * To minimise translating between kernel and user-space the kernel
> + * ABI just provides user-space with the full exception syndrome
> + * register value to be decoded in QEMU.
> + */
> +
> +static int kvm_handle_debug(CPUState *cs, struct kvm_run *run)
> +{
> +struct kvm_debug_exit_arch *arch_info = &run->debug.arch;
> +int hsr_ec = arch_info->hsr >> ARM_EL_EC_SHIFT;
> +ARMCPU *cpu = ARM_CPU(cs);
> +CPUARMState *env = &cpu->env;
> +
> +/* Ensure PC is synchronised */
> +kvm_cpu_synchronize_state(cs);
> +
> +switch (hsr_ec) {
> +case EC_AA64_BKPT:
> +if (kvm_find_sw_breakpoint(cs, env->pc)) {
> +return true;
> +}
> +break;
> +default:
> +error_report("%s: unhandled debug exit (%"PRIx32", %"PRIx64")\n",
> + __func__, arch_info->hsr, env->pc);
> +}
> +
> +/* If we don't handle this it could be it really is for the
> +   guest to handle */
> +qemu_log_mask(LOG_UNIMP,
> +  "%s: re-injecting exception not yet implemented"
> +  " (0x%"PRIx32", %"PRIx64")\n",
> +  __func__, hsr_ec, env->pc);
> +
> +return false;
> +}
> +
>  int kvm_arch_handle_exit(CPUState *cs, struct kvm_run *run)
>  {
> -return 0;
> +int ret = 0;
> +
> +switch (run->exit_reason) {
> +case KVM_EXIT_DEBUG:
> +if (kvm_handle_debug(cs, run)) {
> +ret = EXCP_DEBUG;
> +} /* otherwise return to guest */
> +break;
> +default:
> +qemu_log_mask(LOG_UNIMP, "%s: un-handled exit reason %d\n",
> +  __func__, run->exit_reason);
> +break;
> +}
> +return ret;
>  }
>
>  bool kvm_arch_stop_on_emulation_error(CPUState *cs)
> @@ -537,14 +588,33 @@ int kvm_arch_on_sigbus(int code, void *addr)
>
>  void kvm_arch_update_guest_debug(CPUState *cs, struct kvm_guest_debug *dbg)
>  {
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> +if (kvm_sw_breakpoints_active(cs)) {
> +dbg->control |= KVM_GUESTDBG_ENABLE | KVM_GUESTDBG_USE_SW_BP;
> +}
>  }
>
> -int kvm_arch_insert_sw_breakpoint(CPUState *cs,
> -  struct kvm_sw_breakpoint *bp)
> +/* C6.6.29 BRK instruction */
> +int kvm_arch_insert_sw_breakpoint(CPUState *cs, struct kvm_sw_breakpoint *bp)
>  {
> -qemu_log_mask(LOG_UNIMP, "%s: not implemented\n", __func__);
> -return -EINVAL;
> +static const uint32_t brk = 0xd420;

#define, please, since you're using it here and in the remove fn.

> +
> +if (cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&bp->saved_insn, 4, 0) ||
> +cpu_memory_rw_debug(cs, bp->pc, (uint8_t *)&brk, 4, 1)) {
> +return -EINVAL;
> +}
> +return 0;
> +}

Shouldn't we be testing the "does the kernel implement debug"
flag before we allow gdb to write in bp insns or mess with
dbg->control ?

-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 09/12] KVM: arm64: introduce vcpu->arch.debug_ptr

2015-06-04 Thread Christoffer Dall

On Fri, May 29, 2015 at 10:30:25AM +0100, Alex Bennée wrote:
> This introduces a level of indirection for the debug registers. Instead
> of using the sys_regs[] directly we store registers in a structure in
> the vcpu. As we are no longer tied to the layout of the sys_regs[] we
> can make the copies size appropriate for control and value registers.
> 
> This also entails updating the sys_regs code to access this new
> structure. Instead of passing a register index we now pass an offset
> into the kvm_guest_debug_arch structure.
> 
> We also need to ensure the GET/SET_ONE_REG ioctl operations store the
> registers in their correct location.
> 
> Signed-off-by: Alex Bennée 
> ---
>  arch/arm/kvm/arm.c|   3 +
>  arch/arm64/include/asm/kvm_asm.h  |  24 +++-
>  arch/arm64/include/asm/kvm_host.h |  12 +++-
>  arch/arm64/kernel/asm-offsets.c   |   6 ++
>  arch/arm64/kvm/hyp.S  | 107 +---
>  arch/arm64/kvm/sys_regs.c | 126 
> +++---
>  6 files changed, 188 insertions(+), 90 deletions(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 9b3ed6d..0d17c7b 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -279,6 +279,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
>   /* Set up the timer */
>   kvm_timer_vcpu_init(vcpu);
>  
> + /* Set the debug registers to be the guests */
> + vcpu->arch.debug_ptr = &vcpu->arch.vcpu_debug_state;
> +
>   return 0;
>  }
>  
> diff --git a/arch/arm64/include/asm/kvm_asm.h 
> b/arch/arm64/include/asm/kvm_asm.h
> index d6b507e..e997404 100644
> --- a/arch/arm64/include/asm/kvm_asm.h
> +++ b/arch/arm64/include/asm/kvm_asm.h
> @@ -46,24 +46,16 @@
>  #define  CNTKCTL_EL1 20  /* Timer Control Register (EL1) */
>  #define  PAR_EL1 21  /* Physical Address Register */
>  #define MDSCR_EL122  /* Monitor Debug System Control Register */
> -#define DBGBCR0_EL1  23  /* Debug Breakpoint Control Registers (0-15) */
> -#define DBGBCR15_EL1 38
> -#define DBGBVR0_EL1  39  /* Debug Breakpoint Value Registers (0-15) */
> -#define DBGBVR15_EL1 54
> -#define DBGWCR0_EL1  55  /* Debug Watchpoint Control Registers (0-15) */
> -#define DBGWCR15_EL1 70
> -#define DBGWVR0_EL1  71  /* Debug Watchpoint Value Registers (0-15) */
> -#define DBGWVR15_EL1 86
> -#define MDCCINT_EL1  87  /* Monitor Debug Comms Channel Interrupt Enable 
> Reg */
> +#define MDCCINT_EL1  23  /* Monitor Debug Comms Channel Interrupt Enable 
> Reg */
>  
>  /* 32bit specific registers. Keep them at the end of the range */
> -#define  DACR32_EL2  88  /* Domain Access Control Register */
> -#define  IFSR32_EL2  89  /* Instruction Fault Status Register */
> -#define  FPEXC32_EL2 90  /* Floating-Point Exception Control 
> Register */
> -#define  DBGVCR32_EL291  /* Debug Vector Catch Register */
> -#define  TEECR32_EL1 92  /* ThumbEE Configuration Register */
> -#define  TEEHBR32_EL193  /* ThumbEE Handler Base Register */
> -#define  NR_SYS_REGS 94
> +#define  DACR32_EL2  24  /* Domain Access Control Register */
> +#define  IFSR32_EL2  25  /* Instruction Fault Status Register */
> +#define  FPEXC32_EL2 26  /* Floating-Point Exception Control 
> Register */
> +#define  DBGVCR32_EL227  /* Debug Vector Catch Register */
> +#define  TEECR32_EL1 28  /* ThumbEE Configuration Register */
> +#define  TEEHBR32_EL129  /* ThumbEE Handler Base Register */
> +#define  NR_SYS_REGS 30
>  
>  /* 32bit mapping */
>  #define c0_MPIDR (MPIDR_EL1 * 2) /* MultiProcessor ID Register */
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e2db6a6..e5040b6 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -108,11 +108,21 @@ struct kvm_vcpu_arch {
>   /* Exception Information */
>   struct kvm_vcpu_fault_info fault;
>  
> - /* Debug state */
> + /* Guest debug state */
>   u64 debug_flags;
>  
> + /*
> +  * For debugging the guest we need to keep a set of debug
> +  * registers which can override the guests own debug state

the guest's

> +  * while being used. These are set via the KVM_SET_GUEST_DEBUG
> +  * ioctl.

Which ones are set via the KVM_SET_GUEST_DEBUG ioctl?

This comment doesn't feel like it's properly explaining the fields
below, because it feels like there's missing a set of registers for this
to add up.  I would suggest rewording this comment to something like:

  We maintain more than a single set of debug registers to support
  debugging the guest from the host and to maintain separate host and
  guest state during world switches.  vcpu_debug_state are the debug
  registers of the vcpu as the guest sees them.  host_debug_state are
  the host registers whi

Re: [PATCH v5 2/6] target-arm: kvm64: introduce kvm_arm_init_debug()

2015-06-04 Thread Peter Maydell

On 29 May 2015 at 16:19, Alex Bennée  wrote:
> As we haven't always had guest debug support we need to probe for it.
> Additionally we don't do this in the start-up capability code so we
> don't fall over on old kernels.
>
> Signed-off-by: Alex Bennée 
> ---

Reviewed-by: Peter Maydell 

thanks
-- PMM
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Marc Zyngier

On 04/06/15 11:46, Alex Bennée wrote:
> 
> Marc Zyngier  writes:
> 
>> On 04/06/15 11:20, Alex Bennée wrote:
>>>
>>> Marc Zyngier  writes:
>>>
 On 04/06/15 10:34, Christoffer Dall wrote:
> On Thu, May 28, 2015 at 10:43:06AM +0100, Alex Bennée wrote:
>> The elr_el2 and spsr_el2 registers in fact contain the processor state
>> before entry into the hypervisor code.
>
> be careful with your use of the hypervisor, in the KVM design the
> hypervisor is split across EL1 and EL2.
>>>
>>> "before entry into EL2."
>>>
>
>> In the case of guest state it
>> could be in either el0 or el1.
>
> true
>
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  arch/arm64/kvm/hyp.S | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index d755922..1940a4c 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -50,8 +50,8 @@
>>  stp x29, lr, [x3, #80]
>>  
>>  mrs x19, sp_el0
>> -mrs x20, elr_el2// EL1 PC
>> -mrs x21, spsr_el2   // EL1 pstate
>> +mrs x20, elr_el2// PC before hyp entry
>> +mrs x21, spsr_el2   // pstate before hyp entry
>>  
>>  stp x19, x20, [x3, #96]
>>  str x21, [x3, #112]
>> @@ -82,8 +82,8 @@
>>  ldr x21, [x3, #16]
>>  
>>  msr sp_el0, x19
>> -msr elr_el2, x20// EL1 PC
>> -msr spsr_el2, x21   // EL1 pstate
>> +msr elr_el2, x20// PC to restore
>> +msr spsr_el2, x21   // pstate to restore
>
> I don't feel like 'to restore' is much more meaningful here.
>
> I would actually vote for removin the comments all together, since one
> should really understand the code as opposed to the comments when
> reading this kind of stuff.
>
> Meh, I'm not sure.  Your patch is definitely better than doing nothing.
>
> Marc?

 While I definitely agree that people should pay more attention to the
 code rather than blindly trusting comments, I still think there is some
 value in disambiguating the exception entry/return, because this bit of
 code assumes some intimate knowledge of the ARMv8 exception model.

 As for the comments themselves, I'd rather have some wording that
 clearly indicate that we're dealing with guest information, i.e:

mrs x20, elr_el2// Guest PC
mrs x21, spsr_el2   // Guest pstate

 (and the same for the exception return). The "before hyp entry" and "to
 restore" are not really useful (all the registers we are
 saving/restoring fall into these categories). What I wanted to convey
 here was that despite using an EL2 register, we are dealing with guest
 registers.
>>>
>>> Which would be great it we were. However the code is used to
>>> save/restore the host context as well as the guest context hence my
>>> weasely words. 
>>
>> Gahhh. You're right. I'm spending too much time on the VHE code these
>> days. Guess I'll stick to the weasel words then. Can you respin it with
>> Christoffer's comment addressed?
> 
> Sure. Do you want it separated from the guest debug series or will you
> be happy to take it with it when ready?

I'll take it now, no need to wait on the whole debug series to fix this.

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Alex Bennée


Marc Zyngier  writes:

> On 04/06/15 11:20, Alex Bennée wrote:
>> 
>> Marc Zyngier  writes:
>> 
>>> On 04/06/15 10:34, Christoffer Dall wrote:
 On Thu, May 28, 2015 at 10:43:06AM +0100, Alex Bennée wrote:
> The elr_el2 and spsr_el2 registers in fact contain the processor state
> before entry into the hypervisor code.

 be careful with your use of the hypervisor, in the KVM design the
 hypervisor is split across EL1 and EL2.
>> 
>> "before entry into EL2."
>> 

> In the case of guest state it
> could be in either el0 or el1.

 true

>
> Signed-off-by: Alex Bennée 
> ---
>  arch/arm64/kvm/hyp.S | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index d755922..1940a4c 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -50,8 +50,8 @@
>   stp x29, lr, [x3, #80]
>  
>   mrs x19, sp_el0
> - mrs x20, elr_el2// EL1 PC
> - mrs x21, spsr_el2   // EL1 pstate
> + mrs x20, elr_el2// PC before hyp entry
> + mrs x21, spsr_el2   // pstate before hyp entry
>  
>   stp x19, x20, [x3, #96]
>   str x21, [x3, #112]
> @@ -82,8 +82,8 @@
>   ldr x21, [x3, #16]
>  
>   msr sp_el0, x19
> - msr elr_el2, x20// EL1 PC
> - msr spsr_el2, x21   // EL1 pstate
> + msr elr_el2, x20// PC to restore
> + msr spsr_el2, x21   // pstate to restore

 I don't feel like 'to restore' is much more meaningful here.

 I would actually vote for removin the comments all together, since one
 should really understand the code as opposed to the comments when
 reading this kind of stuff.

 Meh, I'm not sure.  Your patch is definitely better than doing nothing.

 Marc?
>>>
>>> While I definitely agree that people should pay more attention to the
>>> code rather than blindly trusting comments, I still think there is some
>>> value in disambiguating the exception entry/return, because this bit of
>>> code assumes some intimate knowledge of the ARMv8 exception model.
>>>
>>> As for the comments themselves, I'd rather have some wording that
>>> clearly indicate that we're dealing with guest information, i.e:
>>>
>>> mrs x20, elr_el2// Guest PC
>>> mrs x21, spsr_el2   // Guest pstate
>>>
>>> (and the same for the exception return). The "before hyp entry" and "to
>>> restore" are not really useful (all the registers we are
>>> saving/restoring fall into these categories). What I wanted to convey
>>> here was that despite using an EL2 register, we are dealing with guest
>>> registers.
>> 
>> Which would be great it we were. However the code is used to
>> save/restore the host context as well as the guest context hence my
>> weasely words. 
>
> Gahhh. You're right. I'm spending too much time on the VHE code these
> days. Guess I'll stick to the weasel words then. Can you respin it with
> Christoffer's comment addressed?

Sure. Do you want it separated from the guest debug series or will you
be happy to take it with it when ready?

>
> Thanks,
>
>   M.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 08/12] KVM: arm64: re-factor hyp.S debug register code

2015-06-04 Thread Alex Bennée


Christoffer Dall  writes:

> On Fri, May 29, 2015 at 10:30:24AM +0100, Alex Bennée wrote:
>> This is a pre-cursor to sharing the code with the guest debug support.
>> This replaces the big macro that fishes data out of a fixed location
>> with a more general helper macro to restore a set of debug registers. It
>> uses macro substitution so it can be re-used for debug control and value
>> registers. It does however rely on the debug registers being 64 bit
>> aligned (as they happen to be in the hyp ABI).

Oops I'd better fix that commit comment.

>> 
>> Signed-off-by: Alex Bennée 
>> 
>> ---
>> v3:
>>   - return to the patch series
>>   - add save and restore targets
>>   - change register use and document
>> v4:
>>   - keep original setup/restore names
>>   - don't use split u32/u64 structure yet
>> ---
>>  arch/arm64/kvm/hyp.S | 519 
>> ++-
>>  1 file changed, 140 insertions(+), 379 deletions(-)
>> 
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index 74e63d8..9c4897d 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>
>
> [...]
>
>> @@ -465,195 +318,52 @@
>>  msr mdscr_el1,  x25
>>  .endm
>>  
>> -.macro restore_debug
>> -// x2: base address for cpu context
>> -// x3: tmp register
>> -
>> -mrs x26, id_aa64dfr0_el1
>> -ubfxx24, x26, #12, #4   // Extract BRPs
>> -ubfxx25, x26, #20, #4   // Extract WRPs
>> -mov w26, #15
>> -sub w24, w26, w24   // How many BPs to skip
>> -sub w25, w26, w25   // How many WPs to skip
>> -
>> -add x3, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
>> +.macro restore_debug type
>> +// x4: pointer to register set
>> +// x5: number of registers to skip
>
> nit: use tabs instead of spaces here...
>
>> +// x6..x22 trashed
>>  
>
> [...]
>
>> @@ -887,12 +597,63 @@ __restore_sysregs:
>>  restore_sysregs
>>  ret
>>  
>> +/* Save debug state */
>>  __save_debug:
>> -save_debug
>> +// x0: base address for vcpu context
>> +// x2: ptr to current CPU context
>> +// x4/x5: trashed
>
> I had a bunch of questions here which I think you missed last time
> around:
>  1. why do we need the vcpu context?

We don't, I'll drop that

>  2. what does 'current' mean here?

Either the host or vcpu context depending which way we are currently going.

>  3. you're also trashing everything that save_debug trashes, so x4/5,
> x6-x22 trashed would be more correct

OK

>
>> +
>> +mrs x26, id_aa64dfr0_el1
>> +ubfxx24, x26, #12, #4   // Extract BRPs
>> +ubfxx25, x26, #20, #4   // Extract WRPs
>> +mov w26, #15
>> +sub w24, w26, w24   // How many BPs to skip
>> +sub w25, w26, w25   // How many WPs to skip
>> +
>> +mov x5, x24
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
>> +save_debug dbgbcr
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
>> +save_debug dbgbvr
>> +
>> +mov x5, x25
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
>> +save_debug dbgwcr
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1)
>> +save_debug dbgwvr
>> +
>> +mrs x21, mdccint_el1
>> +str x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
>>  ret
>>  
>> +/* Restore debug state */
>>  __restore_debug:
>> -restore_debug
>> +// x0: base address for cpu context
>> +// x2: ptr to current CPU context
>> +// x4/x5: trashed
>
> and you missed these comments too, basically same stuff, but why was it
> 'cpu' here and not 'vcpu' like above?

Again we use the functions both for restoring host and vcpu debug context.

>
> note again, that you're explicitly touching x24, xx25, and x26 here as
> well, so shouldn't they be listed as trashed?
>
>> +
>> +mrs x26, id_aa64dfr0_el1
>> +ubfxx24, x26, #12, #4   // Extract BRPs
>> +ubfxx25, x26, #20, #4   // Extract WRPs
>> +mov w26, #15
>> +sub w24, w26, w24   // How many BPs to skip
>> +sub w25, w26, w25   // How many WPs to skip
>> +
>> +mov x5, x24
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
>> +restore_debug dbgbcr
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
>> +restore_debug dbgbvr
>> +
>> +mov x5, x25
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
>> +restore_debug dbgwcr
>> +add x4, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1)
>> +restore_debug dbgwvr
>> +
>> +ldr x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
>> +msr mdccint_el1, x21
>> +
>>  ret
>>  
>>  __save_fpsimd:
>
> Thanks,
> -Christoffer

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Marc Zyngier

On 04/06/15 11:20, Alex Bennée wrote:
> 
> Marc Zyngier  writes:
> 
>> On 04/06/15 10:34, Christoffer Dall wrote:
>>> On Thu, May 28, 2015 at 10:43:06AM +0100, Alex Bennée wrote:
 The elr_el2 and spsr_el2 registers in fact contain the processor state
 before entry into the hypervisor code.
>>>
>>> be careful with your use of the hypervisor, in the KVM design the
>>> hypervisor is split across EL1 and EL2.
> 
> "before entry into EL2."
> 
>>>
 In the case of guest state it
 could be in either el0 or el1.
>>>
>>> true
>>>

 Signed-off-by: Alex Bennée 
 ---
  arch/arm64/kvm/hyp.S | 8 
  1 file changed, 4 insertions(+), 4 deletions(-)

 diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
 index d755922..1940a4c 100644
 --- a/arch/arm64/kvm/hyp.S
 +++ b/arch/arm64/kvm/hyp.S
 @@ -50,8 +50,8 @@
stp x29, lr, [x3, #80]
  
mrs x19, sp_el0
 -  mrs x20, elr_el2// EL1 PC
 -  mrs x21, spsr_el2   // EL1 pstate
 +  mrs x20, elr_el2// PC before hyp entry
 +  mrs x21, spsr_el2   // pstate before hyp entry
  
stp x19, x20, [x3, #96]
str x21, [x3, #112]
 @@ -82,8 +82,8 @@
ldr x21, [x3, #16]
  
msr sp_el0, x19
 -  msr elr_el2, x20// EL1 PC
 -  msr spsr_el2, x21   // EL1 pstate
 +  msr elr_el2, x20// PC to restore
 +  msr spsr_el2, x21   // pstate to restore
>>>
>>> I don't feel like 'to restore' is much more meaningful here.
>>>
>>> I would actually vote for removin the comments all together, since one
>>> should really understand the code as opposed to the comments when
>>> reading this kind of stuff.
>>>
>>> Meh, I'm not sure.  Your patch is definitely better than doing nothing.
>>>
>>> Marc?
>>
>> While I definitely agree that people should pay more attention to the
>> code rather than blindly trusting comments, I still think there is some
>> value in disambiguating the exception entry/return, because this bit of
>> code assumes some intimate knowledge of the ARMv8 exception model.
>>
>> As for the comments themselves, I'd rather have some wording that
>> clearly indicate that we're dealing with guest information, i.e:
>>
>>  mrs x20, elr_el2// Guest PC
>>  mrs x21, spsr_el2   // Guest pstate
>>
>> (and the same for the exception return). The "before hyp entry" and "to
>> restore" are not really useful (all the registers we are
>> saving/restoring fall into these categories). What I wanted to convey
>> here was that despite using an EL2 register, we are dealing with guest
>> registers.
> 
> Which would be great it we were. However the code is used to
> save/restore the host context as well as the guest context hence my
> weasely words. 

Gahhh. You're right. I'm spending too much time on the VHE code these
days. Guess I'll stick to the weasel words then. Can you respin it with
Christoffer's comment addressed?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 08/12] KVM: arm64: re-factor hyp.S debug register code

2015-06-04 Thread Christoffer Dall

On Fri, May 29, 2015 at 10:30:24AM +0100, Alex Bennée wrote:
> This is a pre-cursor to sharing the code with the guest debug support.
> This replaces the big macro that fishes data out of a fixed location
> with a more general helper macro to restore a set of debug registers. It
> uses macro substitution so it can be re-used for debug control and value
> registers. It does however rely on the debug registers being 64 bit
> aligned (as they happen to be in the hyp ABI).
> 
> Signed-off-by: Alex Bennée 
> 
> ---
> v3:
>   - return to the patch series
>   - add save and restore targets
>   - change register use and document
> v4:
>   - keep original setup/restore names
>   - don't use split u32/u64 structure yet
> ---
>  arch/arm64/kvm/hyp.S | 519 
> ++-
>  1 file changed, 140 insertions(+), 379 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index 74e63d8..9c4897d 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S


[...]

> @@ -465,195 +318,52 @@
>   msr mdscr_el1,  x25
>  .endm
>  
> -.macro restore_debug
> - // x2: base address for cpu context
> - // x3: tmp register
> -
> - mrs x26, id_aa64dfr0_el1
> - ubfxx24, x26, #12, #4   // Extract BRPs
> - ubfxx25, x26, #20, #4   // Extract WRPs
> - mov w26, #15
> - sub w24, w26, w24   // How many BPs to skip
> - sub w25, w26, w25   // How many WPs to skip
> -
> - add x3, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
> +.macro restore_debug type
> +// x4: pointer to register set
> +// x5: number of registers to skip

nit: use tabs instead of spaces here...

> + // x6..x22 trashed
>  

[...]

> @@ -887,12 +597,63 @@ __restore_sysregs:
>   restore_sysregs
>   ret
>  
> +/* Save debug state */
>  __save_debug:
> - save_debug
> + // x0: base address for vcpu context
> + // x2: ptr to current CPU context
> + // x4/x5: trashed

I had a bunch of questions here which I think you missed last time
around:
 1. why do we need the vcpu context?
 2. what does 'current' mean here?
 3. you're also trashing everything that save_debug trashes, so x4/5,
x6-x22 trashed would be more correct

> +
> + mrs x26, id_aa64dfr0_el1
> + ubfxx24, x26, #12, #4   // Extract BRPs
> + ubfxx25, x26, #20, #4   // Extract WRPs
> + mov w26, #15
> + sub w24, w26, w24   // How many BPs to skip
> + sub w25, w26, w25   // How many WPs to skip
> +
> + mov x5, x24
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
> + save_debug dbgbcr
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
> + save_debug dbgbvr
> +
> + mov x5, x25
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
> + save_debug dbgwcr
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1)
> + save_debug dbgwvr
> +
> + mrs x21, mdccint_el1
> + str x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
>   ret
>  
> +/* Restore debug state */
>  __restore_debug:
> - restore_debug
> + // x0: base address for cpu context
> + // x2: ptr to current CPU context
> + // x4/x5: trashed

and you missed these comments too, basically same stuff, but why was it
'cpu' here and not 'vcpu' like above?

note again, that you're explicitly touching x24, xx25, and x26 here as
well, so shouldn't they be listed as trashed?

> +
> + mrs x26, id_aa64dfr0_el1
> + ubfxx24, x26, #12, #4   // Extract BRPs
> + ubfxx25, x26, #20, #4   // Extract WRPs
> + mov w26, #15
> + sub w24, w26, w24   // How many BPs to skip
> + sub w25, w26, w25   // How many WPs to skip
> +
> + mov x5, x24
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGBCR0_EL1)
> + restore_debug dbgbcr
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGBVR0_EL1)
> + restore_debug dbgbvr
> +
> + mov x5, x25
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGWCR0_EL1)
> + restore_debug dbgwcr
> + add x4, x2, #CPU_SYSREG_OFFSET(DBGWVR0_EL1)
> + restore_debug dbgwvr
> +
> + ldr x21, [x2, #CPU_SYSREG_OFFSET(MDCCINT_EL1)]
> + msr mdccint_el1, x21
> +
>   ret
>  
>  __save_fpsimd:

Thanks,
-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Alex Bennée


Marc Zyngier  writes:

> On 04/06/15 10:34, Christoffer Dall wrote:
>> On Thu, May 28, 2015 at 10:43:06AM +0100, Alex Bennée wrote:
>>> The elr_el2 and spsr_el2 registers in fact contain the processor state
>>> before entry into the hypervisor code.
>> 
>> be careful with your use of the hypervisor, in the KVM design the
>> hypervisor is split across EL1 and EL2.

"before entry into EL2."

>> 
>>> In the case of guest state it
>>> could be in either el0 or el1.
>> 
>> true
>> 
>>>
>>> Signed-off-by: Alex Bennée 
>>> ---
>>>  arch/arm64/kvm/hyp.S | 8 
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>>> index d755922..1940a4c 100644
>>> --- a/arch/arm64/kvm/hyp.S
>>> +++ b/arch/arm64/kvm/hyp.S
>>> @@ -50,8 +50,8 @@
>>> stp x29, lr, [x3, #80]
>>>  
>>> mrs x19, sp_el0
>>> -   mrs x20, elr_el2// EL1 PC
>>> -   mrs x21, spsr_el2   // EL1 pstate
>>> +   mrs x20, elr_el2// PC before hyp entry
>>> +   mrs x21, spsr_el2   // pstate before hyp entry
>>>  
>>> stp x19, x20, [x3, #96]
>>> str x21, [x3, #112]
>>> @@ -82,8 +82,8 @@
>>> ldr x21, [x3, #16]
>>>  
>>> msr sp_el0, x19
>>> -   msr elr_el2, x20// EL1 PC
>>> -   msr spsr_el2, x21   // EL1 pstate
>>> +   msr elr_el2, x20// PC to restore
>>> +   msr spsr_el2, x21   // pstate to restore
>> 
>> I don't feel like 'to restore' is much more meaningful here.
>> 
>> I would actually vote for removin the comments all together, since one
>> should really understand the code as opposed to the comments when
>> reading this kind of stuff.
>> 
>> Meh, I'm not sure.  Your patch is definitely better than doing nothing.
>> 
>> Marc?
>
> While I definitely agree that people should pay more attention to the
> code rather than blindly trusting comments, I still think there is some
> value in disambiguating the exception entry/return, because this bit of
> code assumes some intimate knowledge of the ARMv8 exception model.
>
> As for the comments themselves, I'd rather have some wording that
> clearly indicate that we're dealing with guest information, i.e:
>
>   mrs x20, elr_el2// Guest PC
>   mrs x21, spsr_el2   // Guest pstate
>
> (and the same for the exception return). The "before hyp entry" and "to
> restore" are not really useful (all the registers we are
> saving/restoring fall into these categories). What I wanted to convey
> here was that despite using an EL2 register, we are dealing with guest
> registers.

Which would be great it we were. However the code is used to
save/restore the host context as well as the guest context hence my
weasely words. 

>
> Would this address your concerns?
>
> Thanks,
>
>   M.

-- 
Alex Bennée
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Marc Zyngier

On 04/06/15 10:34, Christoffer Dall wrote:
> On Thu, May 28, 2015 at 10:43:06AM +0100, Alex Bennée wrote:
>> The elr_el2 and spsr_el2 registers in fact contain the processor state
>> before entry into the hypervisor code.
> 
> be careful with your use of the hypervisor, in the KVM design the
> hypervisor is split across EL1 and EL2.
> 
>> In the case of guest state it
>> could be in either el0 or el1.
> 
> true
> 
>>
>> Signed-off-by: Alex Bennée 
>> ---
>>  arch/arm64/kvm/hyp.S | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
>> index d755922..1940a4c 100644
>> --- a/arch/arm64/kvm/hyp.S
>> +++ b/arch/arm64/kvm/hyp.S
>> @@ -50,8 +50,8 @@
>>  stp x29, lr, [x3, #80]
>>  
>>  mrs x19, sp_el0
>> -mrs x20, elr_el2// EL1 PC
>> -mrs x21, spsr_el2   // EL1 pstate
>> +mrs x20, elr_el2// PC before hyp entry
>> +mrs x21, spsr_el2   // pstate before hyp entry
>>  
>>  stp x19, x20, [x3, #96]
>>  str x21, [x3, #112]
>> @@ -82,8 +82,8 @@
>>  ldr x21, [x3, #16]
>>  
>>  msr sp_el0, x19
>> -msr elr_el2, x20// EL1 PC
>> -msr spsr_el2, x21   // EL1 pstate
>> +msr elr_el2, x20// PC to restore
>> +msr spsr_el2, x21   // pstate to restore
> 
> I don't feel like 'to restore' is much more meaningful here.
> 
> I would actually vote for removin the comments all together, since one
> should really understand the code as opposed to the comments when
> reading this kind of stuff.
> 
> Meh, I'm not sure.  Your patch is definitely better than doing nothing.
> 
> Marc?

While I definitely agree that people should pay more attention to the
code rather than blindly trusting comments, I still think there is some
value in disambiguating the exception entry/return, because this bit of
code assumes some intimate knowledge of the ARMv8 exception model.

As for the comments themselves, I'd rather have some wording that
clearly indicate that we're dealing with guest information, i.e:

mrs x20, elr_el2// Guest PC
mrs x21, spsr_el2   // Guest pstate

(and the same for the exception return). The "before hyp entry" and "to
restore" are not really useful (all the registers we are
saving/restoring fall into these categories). What I wanted to convey
here was that despite using an EL2 register, we are dealing with guest
registers.

Would this address your concerns?

Thanks,

M.
-- 
Jazz is not dead. It just smells funny...
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: arm64: fix misleading comments in save/restore

2015-06-04 Thread Christoffer Dall

On Thu, May 28, 2015 at 10:43:06AM +0100, Alex Bennée wrote:
> The elr_el2 and spsr_el2 registers in fact contain the processor state
> before entry into the hypervisor code.

be careful with your use of the hypervisor, in the KVM design the
hypervisor is split across EL1 and EL2.

> In the case of guest state it
> could be in either el0 or el1.

true

> 
> Signed-off-by: Alex Bennée 
> ---
>  arch/arm64/kvm/hyp.S | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/kvm/hyp.S b/arch/arm64/kvm/hyp.S
> index d755922..1940a4c 100644
> --- a/arch/arm64/kvm/hyp.S
> +++ b/arch/arm64/kvm/hyp.S
> @@ -50,8 +50,8 @@
>   stp x29, lr, [x3, #80]
>  
>   mrs x19, sp_el0
> - mrs x20, elr_el2// EL1 PC
> - mrs x21, spsr_el2   // EL1 pstate
> + mrs x20, elr_el2// PC before hyp entry
> + mrs x21, spsr_el2   // pstate before hyp entry
>  
>   stp x19, x20, [x3, #96]
>   str x21, [x3, #112]
> @@ -82,8 +82,8 @@
>   ldr x21, [x3, #16]
>  
>   msr sp_el0, x19
> - msr elr_el2, x20// EL1 PC
> - msr spsr_el2, x21   // EL1 pstate
> + msr elr_el2, x20// PC to restore
> + msr spsr_el2, x21   // pstate to restore

I don't feel like 'to restore' is much more meaningful here.

I would actually vote for removin the comments all together, since one
should really understand the code as opposed to the comments when
reading this kind of stuff.

Meh, I'm not sure.  Your patch is definitely better than doing nothing.

Marc?

-Christoffer
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH kernel v11 17/34] powerpc/spapr: vfio: Switch from iommu_table to new iommu_table_group

2015-06-04 Thread Alexey Kardashevskiy


On 06/01/2015 04:24 PM, David Gibson wrote:

On Fri, May 29, 2015 at 06:44:41PM +1000, Alexey Kardashevskiy wrote:

Modern IBM POWERPC systems support multiple (currently two) TCE tables
per IOMMU group (a.k.a. PE). This adds a iommu_table_group container
for TCE tables. Right now just one table is supported.

For IODA, instead of embedding iommu_table, the new iommu_table_group
keeps pointers to those. The iommu_table structs are allocated
dynamically now by a pnv_pci_table_alloc() helper as PCI hotplug
code (for EEH recovery) and SRIOV are supported there.

For P5IOC2, both iommu_table_group and iommu_table are embedded into
PE struct. As there is no EEH and SRIOV support for P5IOC2,
iommu_free_table() should not be called on iommu_table struct pointers
so we can keep it embedded in pnv_phb::p5ioc2.

For pSeries, this replaces multiple calls of kzalloc_node() with a new
iommu_pseries_group_alloc() helper and stores the table group struct
pointer into the pci_dn struct. For release, a iommu_table_group_free()
helper is added.

This moves iommu_table struct allocation from SR-IOV code to
the generic DMA initialization code in pnv_pci_ioda2_setup_dma_pe.

This replaces a single pointer to iommu_group with a list of
iommu_table_group structs. For now it is just a single iommu_table_group
in this list but later with TCE table sharing enabled, the list will
keep all the IOMMU groups which use the particular table. The list
uses iommu_table_group_link structs rather than iommu_table_group::next
as a VFIO container may have 2 IOMMU tables, each will have its own list
head pointer as it is mainly for TCE invalidation code which should
walk through all attached groups and invalidate TCE cache so
the table has to keep the list head pointer. The other option would
be storing list head in a VFIO container but it would not work as
the platform code (which does TCE table update and invalidation) has
no idea about VFIO.

This should cause no behavioural change.

Signed-off-by: Alexey Kardashevskiy 
[aw: for the vfio related changes]
Acked-by: Alex Williamson 
Reviewed-by: David Gibson 
Reviewed-by: Gavin Shan 


It looks like this commit message doesn't match the code - it seems
like an older or newer version of the message from the previous patch.

>

This patch seems instead to be about changing the table_group <-> table
relationship from 1:1 to many:many.



I'll put this:

===
So far one TCE table could only be used by one IOMMU group. However
IODA2 hardware allows programming the same TCE table address to
multiple PE allowing sharing tables.

This replaces a single pointer to a group in a iommu_table struct
with a linked list of groups which provides the way of invalidating
TCE cache for every PE when an actual TCE table is updated. This adds 
pnv_pci_link_table_and_group() and pnv_pci_unlink_table_and_group() helpers 
to manage the list. However without VFIO, it is still going

to be a single IOMMU group per iommu_table.

This changes iommu_add_device() to add a device to a first group
from the group list of a table as it is only called from the platform
init code or PCI bus notifier and at these moments there is only
one group per table.

This does not change TCE invalidation code to loop through all
attached groups in order to simplify this patch and because
it is not really needed in most cases. IODA2 is fixed in a later
patch.

===



---
Changes:
v10:
* iommu_table is not embedded into iommu_table_group but allocated
dynamically
* iommu_table allocation is moved to a single place for IODA2's
pnv_pci_ioda_setup_dma_pe where it belongs to
* added list of groups into iommu_table; most of the code just looks at
the first item to keep the patch simpler

v9:
* s/it_group/it_table_group/
* added and used iommu_table_group_free(), from now iommu_free_table()
is only used for VIO
* added iommu_pseries_group_alloc()
* squashed "powerpc/iommu: Introduce iommu_table_alloc() helper" into this
---
  arch/powerpc/include/asm/iommu.h|   8 +-
  arch/powerpc/kernel/iommu.c |   9 +-
  arch/powerpc/platforms/powernv/pci-ioda.c   |  45 ++
  arch/powerpc/platforms/powernv/pci-p5ioc2.c |   3 +
  arch/powerpc/platforms/powernv/pci.c|  76 +
  arch/powerpc/platforms/powernv/pci.h|   7 ++
  arch/powerpc/platforms/pseries/iommu.c  |  33 +++-
  drivers/vfio/vfio_iommu_spapr_tce.c | 122 
  8 files changed, 242 insertions(+), 61 deletions(-)

diff --git a/arch/powerpc/include/asm/iommu.h b/arch/powerpc/include/asm/iommu.h
index 5a7267f..44a20cc 100644
--- a/arch/powerpc/include/asm/iommu.h
+++ b/arch/powerpc/include/asm/iommu.h
@@ -91,7 +91,7 @@ struct iommu_table {
struct iommu_pool pools[IOMMU_NR_POOLS];
unsigned long *it_map;   /* A simple allocation bitmap for now */
unsigned long  it_page_shift;/* table iommu page size */
-   struct iommu_table_group *it_table_group;
+   struct list_head

[PATCH v2 14/13] KVM: x86: latch INITs while in system management mode

2015-06-04 Thread Paolo Bonzini

Do not process INITs immediately while in system management mode, keep
it instead in apic->pending_events.  Tell userspace if an INIT is
pending when they issue GET_VCPU_EVENTS, and similarly handle the
new field in SET_VCPU_EVENTS.

Note that the same treatment should be done while in VMX non-root mode.

Signed-off-by: Paolo Bonzini 
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/x86/include/uapi/asm/kvm.h   |  2 +-
 arch/x86/kvm/lapic.c  | 13 -
 arch/x86/kvm/lapic.h  |  5 +
 arch/x86/kvm/x86.c| 11 ++-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 91ab5f2354aa..461956a0ee8e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -829,7 +829,7 @@ struct kvm_vcpu_events {
__u8 smm;
__u8 pending;
__u8 smm_inside_nmi;
-   __u8 pad;
+   __u8 latched_init;
} smi;
 };
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 30100a3c1bed..a4ae82eb82aa 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -316,7 +316,7 @@ struct kvm_vcpu_events {
__u8 smm;
__u8 pending;
__u8 smm_inside_nmi;
-   __u8 pad;
+   __u8 latched_init;
} smi;
__u32 reserved[9];
 };
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b8e47e2b1d94..beeef05bb4d9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2057,8 +2057,19 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
if (!kvm_vcpu_has_lapic(vcpu) || !apic->pending_events)
return;
 
-   pe = xchg(&apic->pending_events, 0);
+   /*
+* INITs are latched while in SMM.  Because an SMM CPU cannot
+* be in KVM_MP_STATE_INIT_RECEIVED state, just eat SIPIs
+* and delay processing of INIT until the next RSM.
+*/
+   if (is_smm(vcpu)) {
+   WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
+   if (test_bit(KVM_APIC_SIPI, &apic->pending_events))
+   clear_bit(KVM_APIC_SIPI, &apic->pending_events);
+   return;
+   }
 
+   pe = xchg(&apic->pending_events, 0);
if (test_bit(KVM_APIC_INIT, &pe)) {
kvm_lapic_reset(vcpu, true);
kvm_vcpu_reset(vcpu, true);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 9d8fcde52027..f2f4e10ab772 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -159,6 +159,11 @@ static inline bool kvm_lowest_prio_delivery(struct 
kvm_lapic_irq *irq)
irq->msi_redir_hint);
 }
 
+static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
+{
+   return kvm_vcpu_has_lapic(vcpu) && test_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
+}
+
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 36962b7627df..5e6d19730ef9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3240,7 +3240,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
kvm_vcpu *vcpu,
events->smi.pending = vcpu->arch.smi_pending;
events->smi.smm_inside_nmi =
!!(vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK);
-   events->smi.pad = 0;
+   events->smi.latched_init = kvm_lapic_latched_init(vcpu);
 
events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
 | KVM_VCPUEVENT_VALID_SHADOW
@@ -3289,6 +3289,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
kvm_vcpu *vcpu,
vcpu->arch.hflags |= HF_SMM_INSIDE_NMI_MASK;
else
vcpu->arch.hflags &= ~HF_SMM_INSIDE_NMI_MASK;
+   if (kvm_vcpu_has_lapic(vcpu)) {
+   if (events->smi.latched_init)
+   set_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
+   else
+   clear_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
+   }
}
 
kvm_make_request(KVM_REQ_EVENT, vcpu);
@@ -5495,6 +5501,9 @@ static void kvm_smm_changed(struct kvm_vcpu *vcpu)
if (unlikely(vcpu->arch.smi_pending)) {
kvm_make_request(KVM_REQ_SMI, vcpu);
vcpu->arch.smi_pending = 0;
+   } else {
+   /* Process a latched INIT, if any.  */
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
}
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v2 14/13] KVM: x86: latch INITs while in system management mode

2015-06-04 Thread Paolo Bonzini

Do not process INITs immediately while in system management mode, keep
it instead in apic->pending_events.  Tell userspace if an INIT is
pending when they issue GET_VCPU_EVENTS, and similarly handle the
new field in SET_VCPU_EVENTS.

Note that the same treatment should be done while in VMX non-root mode.

Signed-off-by: Paolo Bonzini 
---
 Documentation/virtual/kvm/api.txt |  2 +-
 arch/x86/include/uapi/asm/kvm.h   |  2 +-
 arch/x86/kvm/lapic.c  | 13 -
 arch/x86/kvm/lapic.h  |  5 +
 arch/x86/kvm/x86.c| 11 ++-
 5 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 91ab5f2354aa..461956a0ee8e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -829,7 +829,7 @@ struct kvm_vcpu_events {
__u8 smm;
__u8 pending;
__u8 smm_inside_nmi;
-   __u8 pad;
+   __u8 latched_init;
} smi;
 };
 
diff --git a/arch/x86/include/uapi/asm/kvm.h b/arch/x86/include/uapi/asm/kvm.h
index 30100a3c1bed..a4ae82eb82aa 100644
--- a/arch/x86/include/uapi/asm/kvm.h
+++ b/arch/x86/include/uapi/asm/kvm.h
@@ -316,7 +316,7 @@ struct kvm_vcpu_events {
__u8 smm;
__u8 pending;
__u8 smm_inside_nmi;
-   __u8 pad;
+   __u8 latched_init;
} smi;
__u32 reserved[9];
 };
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index b8e47e2b1d94..beeef05bb4d9 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2057,8 +2057,19 @@ void kvm_apic_accept_events(struct kvm_vcpu *vcpu)
if (!kvm_vcpu_has_lapic(vcpu) || !apic->pending_events)
return;
 
-   pe = xchg(&apic->pending_events, 0);
+   /*
+* INITs are latched while in SMM.  Because an SMM CPU cannot
+* be in KVM_MP_STATE_INIT_RECEIVED state, just eat SIPIs
+* and delay processing of INIT until the next RSM.
+*/
+   if (is_smm(vcpu)) {
+   WARN_ON_ONCE(vcpu->arch.mp_state == KVM_MP_STATE_INIT_RECEIVED);
+   if (test_bit(KVM_APIC_SIPI, &apic->pending_events))
+   clear_bit(KVM_APIC_SIPI, &apic->pending_events);
+   return;
+   }
 
+   pe = xchg(&apic->pending_events, 0);
if (test_bit(KVM_APIC_INIT, &pe)) {
kvm_lapic_reset(vcpu, true);
kvm_vcpu_reset(vcpu, true);
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 9d8fcde52027..f2f4e10ab772 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -159,6 +159,11 @@ static inline bool kvm_lowest_prio_delivery(struct 
kvm_lapic_irq *irq)
irq->msi_redir_hint);
 }
 
+static inline int kvm_lapic_latched_init(struct kvm_vcpu *vcpu)
+{
+   return kvm_vcpu_has_lapic(vcpu) && test_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
+}
+
 bool kvm_apic_pending_eoi(struct kvm_vcpu *vcpu, int vector);
 
 void wait_lapic_expire(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 36962b7627df..5e6d19730ef9 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3240,7 +3240,7 @@ static void kvm_vcpu_ioctl_x86_get_vcpu_events(struct 
kvm_vcpu *vcpu,
events->smi.pending = vcpu->arch.smi_pending;
events->smi.smm_inside_nmi =
!!(vcpu->arch.hflags & HF_SMM_INSIDE_NMI_MASK);
-   events->smi.pad = 0;
+   events->smi.latched_init = kvm_lapic_latched_init(vcpu);
 
events->flags = (KVM_VCPUEVENT_VALID_NMI_PENDING
 | KVM_VCPUEVENT_VALID_SHADOW
@@ -3289,6 +3289,12 @@ static int kvm_vcpu_ioctl_x86_set_vcpu_events(struct 
kvm_vcpu *vcpu,
vcpu->arch.hflags |= HF_SMM_INSIDE_NMI_MASK;
else
vcpu->arch.hflags &= ~HF_SMM_INSIDE_NMI_MASK;
+   if (kvm_vcpu_has_lapic(vcpu)) {
+   if (events->smi.latched_init)
+   set_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
+   else
+   clear_bit(KVM_APIC_INIT, 
&vcpu->arch.apic->pending_events);
+   }
}
 
kvm_make_request(KVM_REQ_EVENT, vcpu);
@@ -5495,6 +5501,9 @@ static void kvm_smm_changed(struct kvm_vcpu *vcpu)
if (unlikely(vcpu->arch.smi_pending)) {
kvm_make_request(KVM_REQ_SMI, vcpu);
vcpu->arch.smi_pending = 0;
+   } else {
+   /* Process a latched INIT, if any.  */
+   kvm_make_request(KVM_REQ_EVENT, vcpu);
}
}
 
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] KVM: x86: clear hidden CPU state at reset time

2015-06-04 Thread Paolo Bonzini

This was noticed by Radim while reviewing the implementation of
system management mode.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/x86.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 79dde1656db6..bd6bcd54cd44 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7254,6 +7254,8 @@ void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 
 void kvm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
+   vcpu->arch.hflags = 0;
+
atomic_set(&vcpu->arch.nmi_queued, 0);
vcpu->arch.nmi_pending = 0;
vcpu->arch.nmi_injected = false;
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Paolo Bonzini

On 04/06/2015 10:23, Xiao Guangrong wrote:
>>
>> So, why do you need to always use IPAT=0?  Can patch 15 keep the current
>> logic for RAM, like this:
>>
>> if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
>> ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
>>   VMX_EPT_MT_EPTE_SHIFT;
>> else
>> ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
>> | VMX_EPT_IPAT_BIT;
> 
> Yeah, it's okay, actually we considered this way, however
> - it's light enough, it did not hurt guest performance based on our
>   benchmark.
> - the logic has always used for noncherent_dma case, extend it to
>   normal case should have low risk and also help us to check the logic.

But noncoherent_dma is not the common case, so it's not necessarily true
that the risk is low.

> - completely follow MTRRS spec would be better than host hides it.

We are a virtualization platform, we know well when MTRRs are necessary.

Tis a risk from blindly obeying the guest MTRRs: userspace can see stale
data if the guest's accesses bypass the cache.  AMD bypasses this by
enabling snooping even in cases that ordinarily wouldn't snoop; for
Intel the solution is that RAM-backed areas should always use IPAT.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Paolo Bonzini



On 04/06/2015 10:26, Xiao Guangrong wrote:
>>
>> CR0.CD is always 0 in both host and guest, i guess it's why we cleared
>> CR0.CD and CR0.NW in vmx_set_cr0().
> 
> It reminds me that we should check guest CR0.CD before check guest MTRR
> and disable guest PAT if guest CR0.CD = 1.

I think it can be done separately, on top.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Xiao Guangrong




On 06/04/2015 04:23 PM, Xiao Guangrong wrote:



On 06/03/2015 03:55 PM, Paolo Bonzini wrote:



On 03/06/2015 04:56, Xiao Guangrong wrote:



On 06/01/2015 05:36 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Currently guest MTRR is completely prohibited if cache snoop is
supported on
IOMMU (!noncoherent_dma) and host does the emulation based on the
knowledge
from host side, however, host side is not the good point to know
what the purpose of guest is. A good example is that pass-throughed VGA
frame buffer is not always UC as host expected


Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.


Yes, frame-buffer is always UC in current code, however, UC for
frame-buffer causes bad performance.


Understood now, thanks.


So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
emulate guest cache type as guest expects.


Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
coherency.  AMD, has special logic to do this, for example:

- if guest PAT says "UC" and host MTRR says "WB", the processor will not
cache the memory but will snoop the cache as if CR0.CD=1

- if guest PAT says "WC" and host (nested page table) PAT says "WB" and
host MTRR says "WB", the processor will still do write combining but
also snoop the cache as if CR0.CD=1

I am worried that the lack of this feature could cause problems if
guests map QEMU's VGA framebuffer as uncached.  We have this problem on
ARM, so it's not 100% theoretical.


CR0.CD is always 0 in both host and guest, i guess it's why we cleared
CR0.CD and CR0.NW in vmx_set_cr0().


It reminds me that we should check guest CR0.CD before check guest MTRR
and disable guest PAT if guest CR0.CD = 1.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 14/15] KVM: MTRR: do not map huage page for non-consistent range

2015-06-04 Thread Xiao Guangrong




On 06/03/2015 03:55 PM, Paolo Bonzini wrote:



On 03/06/2015 04:56, Xiao Guangrong wrote:



On 06/01/2015 05:36 PM, Paolo Bonzini wrote:



On 30/05/2015 12:59, Xiao Guangrong wrote:

Currently guest MTRR is completely prohibited if cache snoop is
supported on
IOMMU (!noncoherent_dma) and host does the emulation based on the
knowledge
from host side, however, host side is not the good point to know
what the purpose of guest is. A good example is that pass-throughed VGA
frame buffer is not always UC as host expected


Can you explain how?  The original idea was that such a framebuffer
would be kvm_is_reserved_pfn and thus be unconditionally UC.


Yes, frame-buffer is always UC in current code, however, UC for
frame-buffer causes bad performance.


Understood now, thanks.


So that guest will configure the range to MTRR, this patchset follows
guest MTRR and cooperates with guest PAT (ept.VMX_EPT_IPAT_BIT = 0) to
emulate guest cache type as guest expects.


Unlike e.g. CR0.CD=1, UC memory does not snoop the cache to preserve
coherency.  AMD, has special logic to do this, for example:

- if guest PAT says "UC" and host MTRR says "WB", the processor will not
cache the memory but will snoop the cache as if CR0.CD=1

- if guest PAT says "WC" and host (nested page table) PAT says "WB" and
host MTRR says "WB", the processor will still do write combining but
also snoop the cache as if CR0.CD=1

I am worried that the lack of this feature could cause problems if
guests map QEMU's VGA framebuffer as uncached.  We have this problem on
ARM, so it's not 100% theoretical.


CR0.CD is always 0 in both host and guest, i guess it's why we cleared
CR0.CD and CR0.NW in vmx_set_cr0().



So, why do you need to always use IPAT=0?  Can patch 15 keep the current
logic for RAM, like this:

if (is_mmio || kvm_arch_has_noncoherent_dma(vcpu->kvm))
ret = kvm_mtrr_get_guest_memory_type(vcpu, gfn) <<
  VMX_EPT_MT_EPTE_SHIFT;
else
ret = (MTRR_TYPE_WRBACK << VMX_EPT_MT_EPTE_SHIFT)
| VMX_EPT_IPAT_BIT;


Yeah, it's okay, actually we considered this way, however
- it's light enough, it did not hurt guest performance based on our
  benchmark.
- the logic has always used for noncherent_dma case, extend it to
  normal case should have low risk and also help us to check the logic.
- completely follow MTRRS spec would be better than host hides it.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: x86: fix kvm_apic_has_events to check for NULL pointer

2015-06-04 Thread Paolo Bonzini

Malicious (or egregiously buggy) userspace can trigger it, but it
should never happen in normal operation.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/lapic.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 71b150cae5f9..9d8fcde52027 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -150,7 +150,7 @@ static inline bool kvm_apic_vid_enabled(struct kvm *kvm)
 
 static inline bool kvm_apic_has_events(struct kvm_vcpu *vcpu)
 {
-   return vcpu->arch.apic->pending_events;
+   return kvm_vcpu_has_lapic(vcpu) && vcpu->arch.apic->pending_events;
 }
 
 static inline bool kvm_lowest_prio_delivery(struct kvm_lapic_irq *irq)
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] kvm: x86: default legacy PCI device assignment support to "n"

2015-06-04 Thread Paolo Bonzini

VFIO has proved itself a much better option than KVM's built-in
device assignment.  It is mature, provides better isolation because
it enforces ACS, and even the userspace code is being tested on
a wider variety of hardware these days than the legacy support.

Disable legacy device assignment by default.

Signed-off-by: Paolo Bonzini 
---
 arch/x86/kvm/Kconfig | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 413a7bf9efbb..a0f06a5947c5 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -88,13 +88,14 @@ config KVM_MMU_AUDIT
 config KVM_DEVICE_ASSIGNMENT
bool "KVM legacy PCI device assignment support"
depends on KVM && PCI && IOMMU_API
-   default y
+   default n
---help---
  Provide support for legacy PCI device assignment through KVM.  The
  kernel now also supports a full featured userspace device driver
- framework through VFIO, which supersedes much of this support.
+ framework through VFIO, which supersedes this support and provides
+ better security.
 
- If unsure, say Y.
+ If unsure, say N.
 
 # OK, it's a little counter-intuitive to do this, but it puts it neatly under
 # the virtualization menu.
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

59 matches

Mail list logo