date:20121022

[PATCH v3 00/14] KVM/ARM Implementation

2012-10-22 Thread Christoffer Dall

The following series implements KVM support for ARM processors,
specifically on the Cortex A-15 platform.

Work is done in collaboration between Columbia University, Virtual Open
Systems and ARM/Linaro.

The patch series applies to Linux 3.7-rc2 with kvm/next merged:
 git://git.kernel.org/pub/scm/virt/kvm/kvm.git
branch: next (03604b3114)

This is Version 13 of the patch series, the first 10 versions were
reviewed on the KVM/ARM and KVM mailing lists. Changes can also be
pulled from:
git://github.com/virtualopensystems/linux-kvm-arm.git
branch: kvm-arm-v13
branch: kvm-arm-v13-vgic
branch: kvm-arm-v13-vgic-timers

A non-flattened edition of the patch series, which can always be merged,
can be found at:
 git://github.com/virtualopensystems/linux-kvm-arm.git kvm-arm-master

This patch series requires QEMU compatibility.  Use the branch
 git://github.com/virtualopensystems/qemu.git kvm-arm

There is also WIP QEMU patches to support virtio on ARM:
 git://github.com/virtualopensystems/qemu.git kvm-arm-virtio

Following this patch series, which implements core KVM support, are two
other patch series implementing Virtual Generic Interrupt Controller
(VGIC) support and Architected Generic Timers.  All three patch series
should be applied for full QEMU compatibility.

The implementation is broken up into a logical set of patches, the first
are preparatory patches:
  1. ARM: Add page table defines for KVM
  3. ARM: Section based HYP idmaps
  3. ARM: Factor out cpuid implementor and part_number fields

The main implementation is broken up into separate patches, the first
containing a skeleton of files, makefile changes, the basic user space
interface and KVM architecture specific stubs.  Subsequent patches
implement parts of the system as listed:
  4. Skeleton and reset hooks
  5. Hypervisor initialization
  6. Memory virtualization setup (hyp mode mappings and 2nd stage)
  7. Inject IRQs and FIQs from userspace
  8. World-switch implementation and Hyp exception vectors
  9. Emulation framework and coproc emulation
 10. Coproc user space API
 11. Demux multiplexed coproc registers
 12. User spac API to get/set VFP registers
 13. Handle guest user memory aborts
 14. Handle guest MMIO aborts

Testing:
 Tested on FAST Models and Versatile Express test-chip2.  Tested by
 running three simultaenous VMs, all running SMP, on an SMP host, each
 VM running hackbench and cyclictest and with extreme memory pressure
 applied to the host with swapping enabled to provoke page eviction.
 Also tested KSM merging and GCC inside VMs.  Fully boots both Ubuntu
 (user space Thumb-2) and Debian (user space ARM) guests.

For a guide on how to set up a testing environment and try out these
patches, see:
 http://www.virtualopensystems.com/media/pdf/kvm-arm-guide.pdf

Changes since v12:
 - Documentation updates
 - Change Hyp-ABI to function call based paradigm
 - Cleanup world-switch code
 - Unify HIFAR/HDFAR on the vcpu struct
 - Simplify vcpu register access in sofware
 - Enforce use of vcpu field accessors
 - Factor out mmio handling into separate file
 - Check for overlaps in mmio address mappings
 - Bugfix in mmio decoding
 - Complete rework of ARM mmio load/store instruction

Changes since v11:
 - Memory setup and page table defines reworked
 - We do not export unused perf bitfields anymore
 - No module support anymore and following cleanup
 - Hide vcpu register accessors
 - Fix unmap range mmu notifier race condition
 - Factored out A15 coprocs in separate file
 - Factored out world-switch assembly macros to separate file
 - Add dmux of multiplexed coprocs to user space
 - Add VFP get/set interface to user space
 - Addressed various cleanup comments from reviewers

Changes since v10:
 - Boot in Hyp mode and user HVC to initialize HVBAR
 - Support VGIC
 - Support Arch timers
 - Support Thumb-2 mmio instruction decoding
 - Transition to GET_ONE/SET_ONE register API
 - Added KVM_VCPU_GET_REG_LIST
 - New interrupt injection API
 - Don't pin guest pages anymore
 - Fix race condition in page fault handler
 - Cleanup guest instruction copying.
 - Fix race when copying SMP guest instructions
 - Inject data/prefetch aborts when guest does something strange

Changes since v9:
 - Addressed reviewer comments (see mailing list archive)
 - Limit the user of .arch_extensiion sec/virt for compilers that need them
 - VFP/Neon Support (Antonios Motakis)
 - Run exit handling under preemption and still handle guest cache ops
 - Add support for IO mapping at Hyp level (VGIC prep)
 - Add support for IO mapping at Guest level (VGIC prep)
 - Remove backdoor call to irq_svc
 - Complete rework of CP15 handling and register reset (Rusty Russell)
 - Don't use HSTR for anything else than CR 15
 - New ioctl to set emulation target core (only A15 supported for now)
 - Support KVM_GET_MSRS / KVM_SET_MSRS
 - Add page accounting and page table eviction
 - Change pgd lock to spinlock and fix sleeping in atomic bugs
 - Check

[PATCH v3 01/14] ARM: Add page table and page defines needed by KVM

2012-10-22 Thread Christoffer Dall

KVM uses the stage-2 page tables and the Hyp page table format,
so we define the fields and page protection flags needed by KVM.

The nomenclature is this:
 - page_hyp:PL2 code/data mappings
 - page_hyp_device: PL2 device mappings (vgic access)
 - page_s2: Stage-2 code/data page mappings
 - page_s2_device:  Stage-2 device mappings (vgic access)

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/pgtable-3level.h |   18 ++
 arch/arm/include/asm/pgtable.h|7 +++
 arch/arm/mm/mmu.c |   25 +
 3 files changed, 50 insertions(+)

diff --git a/arch/arm/include/asm/pgtable-3level.h 
b/arch/arm/include/asm/pgtable-3level.h
index b249035..eaba5a4 100644
--- a/arch/arm/include/asm/pgtable-3level.h
+++ b/arch/arm/include/asm/pgtable-3level.h
@@ -102,11 +102,29 @@
  */
 #define L_PGD_SWAPPER  (_AT(pgdval_t, 1)  55)/* 
swapper_pg_dir entry */
 
+/*
+ * 2nd stage PTE definitions for LPAE.
+ */
+#define L_PTE_S2_MT_UNCACHED(_AT(pteval_t, 0x5)  2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITETHROUGH (_AT(pteval_t, 0xa)  2) /* MemAttr[3:0] */
+#define L_PTE_S2_MT_WRITEBACK   (_AT(pteval_t, 0xf)  2) /* MemAttr[3:0] */
+#define L_PTE_S2_RDONLY (_AT(pteval_t, 1)  6)   /* HAP[1]   
*/
+#define L_PTE_S2_RDWR   (_AT(pteval_t, 2)  6)   /* HAP[2:1] */
+
+/*
+ * Hyp-mode PL2 PTE definitions for LPAE.
+ */
+#define L_PTE_HYP  L_PTE_USER
+
 #ifndef __ASSEMBLY__
 
 #define pud_none(pud)  (!pud_val(pud))
 #define pud_bad(pud)   (!(pud_val(pud)  2))
 #define pud_present(pud)   (pud_val(pud))
+#define pmd_table(pmd) ((pmd_val(pmd)  PMD_TYPE_MASK) == \
+PMD_TYPE_TABLE)
+#define pmd_sect(pmd)  ((pmd_val(pmd)  PMD_TYPE_MASK) == \
+PMD_TYPE_SECT)
 
 #define pud_clear(pudp)\
do {\
diff --git a/arch/arm/include/asm/pgtable.h b/arch/arm/include/asm/pgtable.h
index 08c1231..dfb3918 100644
--- a/arch/arm/include/asm/pgtable.h
+++ b/arch/arm/include/asm/pgtable.h
@@ -70,6 +70,9 @@ extern void __pgd_error(const char *file, int line, pgd_t);
 
 extern pgprot_tpgprot_user;
 extern pgprot_tpgprot_kernel;
+extern pgprot_tpgprot_hyp_device;
+extern pgprot_tpgprot_s2;
+extern pgprot_tpgprot_s2_device;
 
 #define _MOD_PROT(p, b)__pgprot(pgprot_val(p) | (b))
 
@@ -82,6 +85,10 @@ extern pgprot_t  pgprot_kernel;
 #define PAGE_READONLY_EXEC _MOD_PROT(pgprot_user, L_PTE_USER | 
L_PTE_RDONLY)
 #define PAGE_KERNEL_MOD_PROT(pgprot_kernel, L_PTE_XN)
 #define PAGE_KERNEL_EXEC   pgprot_kernel
+#define PAGE_HYP   _MOD_PROT(pgprot_kernel, L_PTE_HYP)
+#define PAGE_HYP_DEVICE_MOD_PROT(pgprot_hyp_device, L_PTE_HYP)
+#define PAGE_S2_MOD_PROT(pgprot_s2, L_PTE_S2_RDONLY)
+#define PAGE_S2_DEVICE _MOD_PROT(pgprot_s2_device, L_PTE_USER | 
L_PTE_S2_RDONLY)
 
 #define __PAGE_NONE__pgprot(_L_PTE_DEFAULT | L_PTE_RDONLY | 
L_PTE_XN)
 #define __PAGE_SHARED  __pgprot(_L_PTE_DEFAULT | L_PTE_USER | L_PTE_XN)
diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
index 941dfb9..087d949 100644
--- a/arch/arm/mm/mmu.c
+++ b/arch/arm/mm/mmu.c
@@ -57,43 +57,61 @@ static unsigned int cachepolicy __initdata = 
CPOLICY_WRITEBACK;
 static unsigned int ecc_mask __initdata = 0;
 pgprot_t pgprot_user;
 pgprot_t pgprot_kernel;
+pgprot_t pgprot_hyp_device;
+pgprot_t pgprot_s2;
+pgprot_t pgprot_s2_device;
 
 EXPORT_SYMBOL(pgprot_user);
 EXPORT_SYMBOL(pgprot_kernel);
+EXPORT_SYMBOL(pgprot_hyp_device);
+EXPORT_SYMBOL(pgprot_s2);
+EXPORT_SYMBOL(pgprot_s2_device);
 
 struct cachepolicy {
const char  policy[16];
unsigned intcr_mask;
pmdval_tpmd;
pteval_tpte;
+   pteval_tpte_s2;
 };
 
+#ifdef CONFIG_ARM_LPAE
+#define s2_policy(policy)  policy
+#else
+#define s2_policy(policy)  0
+#endif
+
 static struct cachepolicy cache_policies[] __initdata = {
{
.policy = uncached,
.cr_mask= CR_W|CR_C,
.pmd= PMD_SECT_UNCACHED,
.pte= L_PTE_MT_UNCACHED,
+   .pte_s2 = s2_policy(L_PTE_S2_MT_UNCACHED),
}, {
.policy = buffered,
.cr_mask= CR_C,
.pmd= PMD_SECT_BUFFERED,
.pte= L_PTE_MT_BUFFERABLE,
+   .pte_s2 = s2_policy(L_PTE_S2_MT_UNCACHED),
}, {
.policy = writethrough,
.cr_mask= 0,
.pmd= PMD_SECT_WT,

[PATCH v3 02/14] ARM: Section based HYP idmap

2012-10-22 Thread Christoffer Dall

Add a method (hyp_idmap_setup) to populate a hyp pgd with an
identity mapping of the code contained in the .hyp.idmap.text
section.

Offer a method to drop the this identity mapping through
hyp_idmap_teardown.

Make all the above depend on CONFIG_ARM_VIRT_EXT and CONFIG_ARM_LPAE.

Cc: Will Deacon will.dea...@arm.com
Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/idmap.h|5 ++
 arch/arm/include/asm/pgtable-3level-hwdef.h |1 
 arch/arm/kernel/vmlinux.lds.S   |6 ++
 arch/arm/mm/idmap.c |   74 +++
 4 files changed, 73 insertions(+), 13 deletions(-)

diff --git a/arch/arm/include/asm/idmap.h b/arch/arm/include/asm/idmap.h
index bf863ed..36708ba 100644
--- a/arch/arm/include/asm/idmap.h
+++ b/arch/arm/include/asm/idmap.h
@@ -11,4 +11,9 @@ extern pgd_t *idmap_pgd;
 
 void setup_mm_for_reboot(void);
 
+#ifdef CONFIG_ARM_VIRT_EXT
+void hyp_idmap_teardown(pgd_t *hyp_pgd);
+void hyp_idmap_setup(pgd_t *hyp_pgd);
+#endif
+
 #endif /* __ASM_IDMAP_H */
diff --git a/arch/arm/include/asm/pgtable-3level-hwdef.h 
b/arch/arm/include/asm/pgtable-3level-hwdef.h
index d795282..a2d404e 100644
--- a/arch/arm/include/asm/pgtable-3level-hwdef.h
+++ b/arch/arm/include/asm/pgtable-3level-hwdef.h
@@ -44,6 +44,7 @@
 #define PMD_SECT_XN(_AT(pmdval_t, 1)  54)
 #define PMD_SECT_AP_WRITE  (_AT(pmdval_t, 0))
 #define PMD_SECT_AP_READ   (_AT(pmdval_t, 0))
+#define PMD_SECT_AP1   (_AT(pmdval_t, 1)  6)
 #define PMD_SECT_TEX(x)(_AT(pmdval_t, 0))
 
 /*
diff --git a/arch/arm/kernel/vmlinux.lds.S b/arch/arm/kernel/vmlinux.lds.S
index 36ff15b..12fd2eb 100644
--- a/arch/arm/kernel/vmlinux.lds.S
+++ b/arch/arm/kernel/vmlinux.lds.S
@@ -19,7 +19,11 @@
ALIGN_FUNCTION();   \
VMLINUX_SYMBOL(__idmap_text_start) = .; \
*(.idmap.text)  \
-   VMLINUX_SYMBOL(__idmap_text_end) = .;
+   VMLINUX_SYMBOL(__idmap_text_end) = .;   \
+   ALIGN_FUNCTION();   \
+   VMLINUX_SYMBOL(__hyp_idmap_text_start) = .; \
+   *(.hyp.idmap.text)  \
+   VMLINUX_SYMBOL(__hyp_idmap_text_end) = .;
 
 #ifdef CONFIG_HOTPLUG_CPU
 #define ARM_CPU_DISCARD(x)
diff --git a/arch/arm/mm/idmap.c b/arch/arm/mm/idmap.c
index ab88ed4..ea7430e 100644
--- a/arch/arm/mm/idmap.c
+++ b/arch/arm/mm/idmap.c
@@ -1,4 +1,6 @@
+#include linux/module.h
 #include linux/kernel.h
+#include linux/slab.h
 
 #include asm/cputype.h
 #include asm/idmap.h
@@ -6,6 +8,7 @@
 #include asm/pgtable.h
 #include asm/sections.h
 #include asm/system_info.h
+#include asm/virt.h
 
 pgd_t *idmap_pgd;
 
@@ -59,11 +62,20 @@ static void idmap_add_pud(pgd_t *pgd, unsigned long addr, 
unsigned long end,
} while (pud++, addr = next, addr != end);
 }
 
-static void identity_mapping_add(pgd_t *pgd, unsigned long addr, unsigned long 
end)
+static void identity_mapping_add(pgd_t *pgd, const char *text_start,
+const char *text_end, unsigned long prot)
 {
-   unsigned long prot, next;
+   unsigned long addr, end;
+   unsigned long next;
+
+   addr = virt_to_phys(text_start);
+   end = virt_to_phys(text_end);
+
+   pr_info(Setting up static %sidentity map for 0x%llx - 0x%llx\n,
+   prot ? HYP  : ,
+   (long long)addr, (long long)end);
+   prot |= PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
 
-   prot = PMD_TYPE_SECT | PMD_SECT_AP_WRITE | PMD_SECT_AF;
if (cpu_architecture() = CPU_ARCH_ARMv5TEJ  !cpu_is_xscale())
prot |= PMD_BIT4;
 
@@ -78,24 +90,62 @@ extern char  __idmap_text_start[], __idmap_text_end[];
 
 static int __init init_static_idmap(void)
 {
-   phys_addr_t idmap_start, idmap_end;
-
idmap_pgd = pgd_alloc(init_mm);
if (!idmap_pgd)
return -ENOMEM;
 
-   /* Add an identity mapping for the physical address of the section. */
-   idmap_start = virt_to_phys((void *)__idmap_text_start);
-   idmap_end = virt_to_phys((void *)__idmap_text_end);
-
-   pr_info(Setting up static identity map for 0x%llx - 0x%llx\n,
-   (long long)idmap_start, (long long)idmap_end);
-   identity_mapping_add(idmap_pgd, idmap_start, idmap_end);
+   identity_mapping_add(idmap_pgd, __idmap_text_start,
+__idmap_text_end, 0);
 
return 0;
 }
 early_initcall(init_static_idmap);
 
+#if defined(CONFIG_ARM_VIRT_EXT)  defined(CONFIG_ARM_LPAE)
+static void hyp_idmap_del_pmd(pgd_t *pgd, unsigned long addr)
+{
+   pud_t *pud;
+   pmd_t *pmd;
+
+   pud = pud_offset(pgd, addr);
+

[PATCH v3 03/14] ARM: Factor out cpuid implementor and part number

2012-10-22 Thread Christoffer Dall

Decoding the implementor and part number of the CPU id in the CPU ID
register is needed by KVM, so we factor it out to share the code.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/cputype.h   |   26 ++
 arch/arm/kernel/perf_event_cpu.c |   30 +++---
 2 files changed, 41 insertions(+), 15 deletions(-)

diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
index cb47d28..306fb2c 100644
--- a/arch/arm/include/asm/cputype.h
+++ b/arch/arm/include/asm/cputype.h
@@ -51,6 +51,22 @@ extern unsigned int processor_id;
 #define read_cpuid_ext(reg) 0
 #endif
 
+#define IMPLEMENTOR_ARM0x41
+#define IMPLEMENTOR_INTEL  0x69
+
+#define PART_NUMBER_ARM11360xB360
+#define PART_NUMBER_ARM11560xB560
+#define PART_NUMBER_ARM11760xB760
+#define PART_NUMBER_ARM11MPCORE0xB020
+#define PART_NUMBER_CORTEX_A8  0xC080
+#define PART_NUMBER_CORTEX_A9  0xC090
+#define PART_NUMBER_CORTEX_A5  0xC050
+#define PART_NUMBER_CORTEX_A15 0xC0F0
+#define PART_NUMBER_CORTEX_A7  0xC070
+
+#define PART_NUMBER_XSCALE10x1
+#define PART_NUMBER_XSCALE20x2
+
 /*
  * The CPU ID never changes at run time, so we might as well tell the
  * compiler that it's constant.  Use this function to read the CPU ID
@@ -61,6 +77,16 @@ static inline unsigned int __attribute_const__ 
read_cpuid_id(void)
return read_cpuid(CPUID_ID);
 }
 
+static inline unsigned int __attribute_const__ read_cpuid_implementor(void)
+{
+   return (read_cpuid_id()  0xFF00)  24;
+}
+
+static inline unsigned int __attribute_const__ read_cpuid_part_number(void)
+{
+   return (read_cpuid_id()  0xFFF0);
+}
+
 static inline unsigned int __attribute_const__ read_cpuid_cachetype(void)
 {
return read_cpuid(CPUID_CACHETYPE);
diff --git a/arch/arm/kernel/perf_event_cpu.c b/arch/arm/kernel/perf_event_cpu.c
index 8d7d8d4..ff18566 100644
--- a/arch/arm/kernel/perf_event_cpu.c
+++ b/arch/arm/kernel/perf_event_cpu.c
@@ -200,46 +200,46 @@ static struct arm_pmu *__devinit probe_current_pmu(void)
struct arm_pmu *pmu = NULL;
int cpu = get_cpu();
unsigned long cpuid = read_cpuid_id();
-   unsigned long implementor = (cpuid  0xFF00)  24;
-   unsigned long part_number = (cpuid  0xFFF0);
+   unsigned long implementor = read_cpuid_implementor();
+   unsigned long part_number = read_cpuid_part_number();
 
pr_info(probing PMU on CPU %d\n, cpu);
 
/* ARM Ltd CPUs. */
-   if (0x41 == implementor) {
+   if (implementor == IMPLEMENTOR_ARM) {
switch (part_number) {
-   case 0xB360:/* ARM1136 */
-   case 0xB560:/* ARM1156 */
-   case 0xB760:/* ARM1176 */
+   case PART_NUMBER_ARM1136:
+   case PART_NUMBER_ARM1156:
+   case PART_NUMBER_ARM1176:
pmu = armv6pmu_init();
break;
-   case 0xB020:/* ARM11mpcore */
+   case PART_NUMBER_ARM11MPCORE:
pmu = armv6mpcore_pmu_init();
break;
-   case 0xC080:/* Cortex-A8 */
+   case PART_NUMBER_CORTEX_A8:
pmu = armv7_a8_pmu_init();
break;
-   case 0xC090:/* Cortex-A9 */
+   case PART_NUMBER_CORTEX_A9:
pmu = armv7_a9_pmu_init();
break;
-   case 0xC050:/* Cortex-A5 */
+   case PART_NUMBER_CORTEX_A5:
pmu = armv7_a5_pmu_init();
break;
-   case 0xC0F0:/* Cortex-A15 */
+   case PART_NUMBER_CORTEX_A15:
pmu = armv7_a15_pmu_init();
break;
-   case 0xC070:/* Cortex-A7 */
+   case PART_NUMBER_CORTEX_A7:
pmu = armv7_a7_pmu_init();
break;
}
/* Intel CPUs [xscale]. */
-   } else if (0x69 == implementor) {
+   } else if (implementor == IMPLEMENTOR_INTEL) {
part_number = (cpuid  13)  0x7;
switch (part_number) {
-   case 1:
+   case PART_NUMBER_XSCALE1:
pmu = xscale1pmu_init();
break;
-   case 2:
+   case PART_NUMBER_XSCALE2:
pmu = xscale2pmu_init();
break;
}

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 04/14] KVM: ARM: Initial skeleton to compile KVM support

2012-10-22 Thread Christoffer Dall

Targets KVM support for Cortex A-15 processors.

Contains all the framework components, make files, header files, some
tracing functionality, and basic user space API.

Only supported core is Cortex-A15 for now.

Most functionality is in arch/arm/kvm/* or arch/arm/include/asm/kvm_*.h.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt  |   58 ++
 arch/arm/Kconfig   |2 
 arch/arm/Makefile  |1 
 arch/arm/include/asm/kvm_arm.h |   24 +++
 arch/arm/include/asm/kvm_asm.h |   58 ++
 arch/arm/include/asm/kvm_coproc.h  |   24 +++
 arch/arm/include/asm/kvm_emulate.h |   50 +
 arch/arm/include/asm/kvm_host.h|  120 +
 arch/arm/include/uapi/asm/kvm.h|   83 +
 arch/arm/kvm/Kconfig   |   44 +
 arch/arm/kvm/Makefile  |   21 ++
 arch/arm/kvm/arm.c |  343 
 arch/arm/kvm/coproc.c  |   22 ++
 arch/arm/kvm/emulate.c |  151 
 arch/arm/kvm/guest.c   |  221 +++
 arch/arm/kvm/init.S|   19 ++
 arch/arm/kvm/interrupts.S  |   19 ++
 arch/arm/kvm/mmu.c |   17 ++
 arch/arm/kvm/reset.c   |   74 
 arch/arm/kvm/trace.h   |   52 +
 include/uapi/linux/kvm.h   |7 +
 21 files changed, 1406 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_arm.h
 create mode 100644 arch/arm/include/asm/kvm_asm.h
 create mode 100644 arch/arm/include/asm/kvm_coproc.h
 create mode 100644 arch/arm/include/asm/kvm_emulate.h
 create mode 100644 arch/arm/include/asm/kvm_host.h
 create mode 100644 arch/arm/include/uapi/asm/kvm.h
 create mode 100644 arch/arm/kvm/Kconfig
 create mode 100644 arch/arm/kvm/Makefile
 create mode 100644 arch/arm/kvm/arm.c
 create mode 100644 arch/arm/kvm/coproc.c
 create mode 100644 arch/arm/kvm/emulate.c
 create mode 100644 arch/arm/kvm/guest.c
 create mode 100644 arch/arm/kvm/init.S
 create mode 100644 arch/arm/kvm/interrupts.S
 create mode 100644 arch/arm/kvm/mmu.c
 create mode 100644 arch/arm/kvm/reset.c
 create mode 100644 arch/arm/kvm/trace.h

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 4258180..b56bbd5 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -293,7 +293,7 @@ kvm_run' (see below).
 4.11 KVM_GET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (out)
 Returns: 0 on success, -1 on error
@@ -314,7 +314,7 @@ struct kvm_regs {
 4.12 KVM_SET_REGS
 
 Capability: basic
-Architectures: all
+Architectures: all except ARM
 Type: vcpu ioctl
 Parameters: struct kvm_regs (in)
 Returns: 0 on success, -1 on error
@@ -600,7 +600,7 @@ struct kvm_fpu {
 4.24 KVM_CREATE_IRQCHIP
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, ARM
 Type: vm ioctl
 Parameters: none
 Returns: 0 on success, -1 on error
@@ -608,7 +608,8 @@ Returns: 0 on success, -1 on error
 Creates an interrupt controller model in the kernel.  On x86, creates a virtual
 ioapic, a virtual PIC (two PICs, nested), and sets up future vcpus to have a
 local APIC.  IRQ routing for GSIs 0-15 is set to both PIC and IOAPIC; GSI 16-23
-only go to the IOAPIC.  On ia64, a IOSAPIC is created.
+only go to the IOAPIC.  On ia64, a IOSAPIC is created. On ARM, a GIC is
+created.
 
 
 4.25 KVM_IRQ_LINE
@@ -1774,6 +1775,14 @@ registers, find a list below:
   PPC   | KVM_REG_PPC_VPA_SLB   | 128
   PPC   | KVM_REG_PPC_VPA_DTL   | 128
 
+ARM registers are mapped using the lower 32 bits.  The upper 16 of that
+is the register group type, or coprocessor number:
+
+ARM core registers have the following id bit patterns:
+  0x4002  0010 index into the kvm_regs struct:16
+
+
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
@@ -1791,6 +1800,7 @@ The list of registers accessible using this interface is 
identical to the
 list in 4.68.
 
 
+
 4.70 KVM_KVMCLOCK_CTRL
 
 Capability: KVM_CAP_KVMCLOCK_CTRL
@@ -2072,6 +2082,46 @@ KVM_S390_INT_EXTERNAL_CALL (vcpu) - sigp external call; 
source cpu in parm
 Note that the vcpu ioctl is asynchronous to vcpu execution.
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+  EINVAL:    the target is unknown, or the combination of features is invalid.
+  ENOENT:    a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have.  This will cause a reset of the cpu
+registers to their initial values.  If this is not called, KVM_RUN

[PATCH v3 05/14] KVM: ARM: Hypervisor inititalization

2012-10-22 Thread Christoffer Dall

Sets up KVM code to handle all exceptions taken to Hyp mode.

When the kernel is booted in Hyp mode, calling hvc #0xff with r0 pointing to
the new vectors, the HVBAR is changed to the the vector pointers.  This allows
subsystems (like KVM here) to execute code in Hyp-mode with the MMU disabled.

We initialize other Hyp-mode registers and enables the MMU for Hyp-mode from
the id-mapped hyp initialization code. Afterwards, the HVBAR is changed to
point to KVM Hyp vectors used to catch guest faults and to switch to Hyp mode
to perform a world-switch into a KVM guest.

If the KVM module is unloaded we call hvc #0xff once more to disable the MMU
in Hyp mode again and install a vector handler to change the HVBAR for a
subsequent reload of KVM or another hypervisor.

Also provides memory mapping code to map required code pages, data structures,
and I/O regions  accessed in Hyp mode at the same virtual address as the host
kernel virtual addresses, but which conforms to the architectural requirements
for translations in Hyp mode. This interface is added in arch/arm/kvm/arm_mmu.c
and comprises:
 - create_hyp_mappings(from, to);
 - create_hyp_io_mappings(from, to, phys_addr);
 - free_hyp_pmds();

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h  |  107 ++
 arch/arm/include/asm/kvm_asm.h  |   20 +++
 arch/arm/include/asm/kvm_mmu.h  |   39 +
 arch/arm/include/asm/pgtable-3level-hwdef.h |4 +
 arch/arm/kvm/arm.c  |  172 ++
 arch/arm/kvm/init.S |  107 ++
 arch/arm/kvm/interrupts.S   |   48 ++
 arch/arm/kvm/mmu.c  |  210 +++
 mm/memory.c |2 
 9 files changed, 709 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_mmu.h

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index c196a22..f6e8f6f 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -21,4 +21,111 @@
 
 #include asm/types.h
 
+/* Hyp Configuration Register (HCR) bits */
+#define HCR_TGE(1  27)
+#define HCR_TVM(1  26)
+#define HCR_TTLB   (1  25)
+#define HCR_TPU(1  24)
+#define HCR_TPC(1  23)
+#define HCR_TSW(1  22)
+#define HCR_TAC(1  21)
+#define HCR_TIDCP  (1  20)
+#define HCR_TSC(1  19)
+#define HCR_TID3   (1  18)
+#define HCR_TID2   (1  17)
+#define HCR_TID1   (1  16)
+#define HCR_TID0   (1  15)
+#define HCR_TWE(1  14)
+#define HCR_TWI(1  13)
+#define HCR_DC (1  12)
+#define HCR_BSU(3  10)
+#define HCR_BSU_IS (1  10)
+#define HCR_FB (1  9)
+#define HCR_VA (1  8)
+#define HCR_VI (1  7)
+#define HCR_VF (1  6)
+#define HCR_AMO(1  5)
+#define HCR_IMO(1  4)
+#define HCR_FMO(1  3)
+#define HCR_PTW(1  2)
+#define HCR_SWIO   (1  1)
+#define HCR_VM 1
+
+/*
+ * The bits we set in HCR:
+ * TAC:Trap ACTLR
+ * TSC:Trap SMC
+ * TSW:Trap cache operations by set/way
+ * TWI:Trap WFI
+ * TIDCP:  Trap L2CTLR/L2ECTLR
+ * BSU_IS: Upgrade barriers to the inner shareable domain
+ * FB: Force broadcast of all maintainance operations
+ * AMO:Override CPSR.A and enable signaling with VA
+ * IMO:Override CPSR.I and enable signaling with VI
+ * FMO:Override CPSR.F and enable signaling with VF
+ * SWIO:   Turn set/way invalidates into set/way clean+invalidate
+ */
+#define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
+   HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
+   HCR_SWIO | HCR_TIDCP)
+
+/* Hyp System Control Register (HSCTLR) bits */
+#define HSCTLR_TE  (1  30)
+#define HSCTLR_EE  (1  25)
+#define HSCTLR_FI  (1  21)
+#define HSCTLR_WXN (1  19)
+#define HSCTLR_I   (1  12)
+#define HSCTLR_C   (1  2)
+#define HSCTLR_A   (1  1)
+#define HSCTLR_M   1
+#define HSCTLR_MASK(HSCTLR_M | HSCTLR_A | HSCTLR_C | HSCTLR_I | \
+HSCTLR_WXN | HSCTLR_FI | HSCTLR_EE | HSCTLR_TE)
+
+/* TTBCR and HTCR Registers bits */
+#define TTBCR_EAE  (1  31)
+#define TTBCR_IMP  (1  30)
+#define TTBCR_SH1  (3  28)
+#define TTBCR_ORGN1(3  26)
+#define TTBCR_IRGN1(3  24)
+#define TTBCR_EPD1 (1  23)
+#define TTBCR_A1   (1  22)
+#define TTBCR_T1SZ (3  16)
+#define TTBCR_SH0  (3  12)
+#define TTBCR_ORGN0(3  10)
+#define TTBCR_IRGN0(3  8)
+#define TTBCR_EPD0 (1  7)

[PATCH v3 06/14] KVM: ARM: Memory virtualization setup

2012-10-22 Thread Christoffer Dall

This commit introduces the framework for guest memory management
through the use of 2nd stage translation. Each VM has a pointer
to a level-1 table (the pgd field in struct kvm_arch) which is
used for the 2nd stage translations. Entries are added when handling
guest faults (later patch) and the table itself can be allocated and
freed through the following functions implemented in
arch/arm/kvm/arm_mmu.c:
 - kvm_alloc_stage2_pgd(struct kvm *kvm);
 - kvm_free_stage2_pgd(struct kvm *kvm);

Each entry in TLBs and caches are tagged with a VMID identifier in
addition to ASIDs. The VMIDs are assigned consecutively to VMs in the
order that VMs are executed, and caches and tlbs are invalidated when
the VMID space has been used to allow for more than 255 simultaenously
running guests.

The 2nd stage pgd is allocated in kvm_arch_init_vm(). The table is
freed in kvm_arch_destroy_vm(). Both functions are called from the main
KVM code.

We pre-allocate page table memory to be able to synchronize using a
spinlock and be called under rcu_read_lock from the MMU notifiers.  We
steal the mmu_memory_cache implementation from x86 and adapt for our
specific usage.

We support MMU notifiers (thanks to Marc Zyngier) through
kvm_unmap_hva and kvm_set_spte_hva.

Finally, define kvm_phys_addr_ioremap() to map a device at a guest IPA,
which is used by VGIC support to map the virtual CPU interface registers
to the guest. This support is added by Marc Zyngier.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_asm.h  |2 
 arch/arm/include/asm/kvm_host.h |   19 ++
 arch/arm/include/asm/kvm_mmu.h  |9 +
 arch/arm/kvm/Kconfig|1 
 arch/arm/kvm/arm.c  |   37 
 arch/arm/kvm/interrupts.S   |   10 +
 arch/arm/kvm/mmu.c  |  393 +++
 arch/arm/kvm/trace.h|   46 +
 8 files changed, 515 insertions(+), 2 deletions(-)

diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 954bf7c..47a0e57 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -57,6 +57,7 @@
 #define ARM_EXCEPTION_HVC7
 
 #ifndef __ASSEMBLY__
+struct kvm;
 struct kvm_vcpu;
 
 extern char __kvm_hyp_init[];
@@ -71,6 +72,7 @@ extern char __kvm_hyp_code_start[];
 extern char __kvm_hyp_code_end[];
 
 extern void __kvm_flush_vm_context(void);
+extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
 #endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 15d4c0b..68d1005 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -117,4 +117,23 @@ int kvm_arm_copy_reg_indices(struct kvm_vcpu *vcpu, u64 
__user *indices);
 struct kvm_one_reg;
 int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
 int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg);
+u64 kvm_call_hyp(void *hypfn, ...);
+
+#define KVM_ARCH_WANT_MMU_NOTIFIER
+struct kvm;
+int kvm_unmap_hva(struct kvm *kvm, unsigned long hva);
+int kvm_unmap_hva_range(struct kvm *kvm,
+   unsigned long start, unsigned long end);
+void kvm_set_spte_hva(struct kvm *kvm, unsigned long hva, pte_t pte);
+
+/* We do not have shadow page tables, hence the empty hooks */
+static inline int kvm_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
+
+static inline int kvm_test_age_hva(struct kvm *kvm, unsigned long hva)
+{
+   return 0;
+}
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 741ab8f..9bd0508 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -33,6 +33,15 @@ int create_hyp_mappings(void *from, void *to);
 int create_hyp_io_mappings(void *from, void *to, phys_addr_t);
 void free_hyp_pmds(void);
 
+int kvm_alloc_stage2_pgd(struct kvm *kvm);
+void kvm_free_stage2_pgd(struct kvm *kvm);
+int kvm_phys_addr_ioremap(struct kvm *kvm, phys_addr_t guest_ipa,
+ phys_addr_t pa, unsigned long size);
+
+int kvm_handle_guest_abort(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+void kvm_mmu_free_memory_caches(struct kvm_vcpu *vcpu);
+
 unsigned long kvm_mmu_get_httbr(void);
 int kvm_mmu_init(void);
 void kvm_mmu_exit(void);
diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index a07ddcc..47c5500 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -36,6 +36,7 @@ config KVM_ARM_HOST
depends on KVM
depends on MMU
depends on CPU_V7  ARM_VIRT_EXT
+   select  MMU_NOTIFIER
---help---
  Provides host support for ARM processors.
 
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 8e1ea2b..5ac3132 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -81,12 +81,33 @@ void

[PATCH v3 07/14] KVM: ARM: Inject IRQs and FIQs from userspace

2012-10-22 Thread Christoffer Dall

From: Christoffer Dall cd...@cs.columbia.edu

All interrupt injection is now based on the VM ioctl KVM_IRQ_LINE.  This
works semantically well for the GIC as we in fact raise/lower a line on
a machine component (the gic).  The IOCTL uses the follwing struct.

struct kvm_irq_level {
union {
__u32 irq; /* GSI */
__s32 status;  /* not used for KVM_IRQ_LEVEL */
};
__u32 level;   /* 0 or 1 */
};

ARM can signal an interrupt either at the CPU level, or at the in-kernel irqchip
(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
specific cpus.  The irq field is interpreted like this:

  bits:  | 31 ... 24 | 23  ... 16 | 15...0 |
  field: | irq_type  | vcpu_index |   irq_number   |

The irq_type field has the following values:
- irq_type[0]: out-of-kernel GIC: irq_number 0 is IRQ, irq_number 1 is FIQ
- irq_type[1]: in-kernel GIC: SPI, irq_number between 32 and 1019 (incl.)
   (the vcpu_index field is ignored)
- irq_type[2]: in-kernel GIC: PPI, irq_number between 16 and 31 (incl.)

The irq_number thus corresponds to the irq ID in as in the GICv2 specs.

This is documented in Documentation/kvm/api.txt.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   25 +++--
 arch/arm/include/asm/kvm_arm.h|1 +
 arch/arm/include/uapi/asm/kvm.h   |   21 +++
 arch/arm/kvm/arm.c|   70 +
 arch/arm/kvm/trace.h  |   25 +
 include/uapi/linux/kvm.h  |1 +
 6 files changed, 139 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index b56bbd5..4514292 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -615,15 +615,32 @@ created.
 4.25 KVM_IRQ_LINE
 
 Capability: KVM_CAP_IRQCHIP
-Architectures: x86, ia64
+Architectures: x86, ia64, arm
 Type: vm ioctl
 Parameters: struct kvm_irq_level
 Returns: 0 on success, -1 on error
 
 Sets the level of a GSI input to the interrupt controller model in the kernel.
-Requires that an interrupt controller model has been previously created with
-KVM_CREATE_IRQCHIP.  Note that edge-triggered interrupts require the level
-to be set to 1 and then back to 0.
+On some architectures it is required that an interrupt controller model has
+been previously created with KVM_CREATE_IRQCHIP.  Note that edge-triggered
+interrupts require the level to be set to 1 and then back to 0.
+
+ARM can signal an interrupt either at the CPU level, or at the in-kernel 
irqchip
+(GIC), and for in-kernel irqchip can tell the GIC to use PPIs designated for
+specific cpus.  The irq field is interpreted like this:
+
+  bits:  | 31 ... 24 | 23  ... 16 | 15...0 |
+  field: | irq_type  | vcpu_index | irq_id |
+
+The irq_type field has the following values:
+- irq_type[0]: out-of-kernel GIC: irq_id 0 is IRQ, irq_id 1 is FIQ
+- irq_type[1]: in-kernel GIC: SPI, irq_id between 32 and 1019 (incl.)
+   (the vcpu_index field is ignored)
+- irq_type[2]: in-kernel GIC: PPI, irq_id between 16 and 31 (incl.)
+
+(The irq_id field thus corresponds nicely to the IRQ ID in the ARM GIC specs)
+
+In both cases, level is used to raise/lower the line.
 
 struct kvm_irq_level {
union {
diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index f6e8f6f..4f54cda 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -68,6 +68,7 @@
 #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \
HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \
HCR_SWIO | HCR_TIDCP)
+#define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index bfc2123..138a588c 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -22,6 +22,7 @@
 #include asm/types.h
 
 #define __KVM_HAVE_GUEST_DEBUG
+#define __KVM_HAVE_IRQ_LINE
 
 #define KVM_REG_SIZE(id)   \
(1U  (((id)  KVM_REG_SIZE_MASK)  KVM_REG_SIZE_SHIFT))
@@ -80,4 +81,24 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_CORE   (0x0010  KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name) (offsetof(struct kvm_regs, name) / 4)
 
+/* KVM_IRQ_LINE irq field index values */
+#define KVM_ARM_IRQ_TYPE_SHIFT 24
+#define KVM_ARM_IRQ_TYPE_MASK  0xff
+#define KVM_ARM_IRQ_VCPU_SHIFT 16
+#define KVM_ARM_IRQ_VCPU_MASK  0xff
+#define KVM_ARM_IRQ_NUM_SHIFT  0
+#define KVM_ARM_IRQ_NUM_MASK   0x
+
+/* irq_type field */
+#define KVM_ARM_IRQ_TYPE_CPU   0

[PATCH v3 08/14] KVM: ARM: World-switch implementation

2012-10-22 Thread Christoffer Dall

Provides complete world-switch implementation to switch to other guests
running in non-secure modes. Includes Hyp exception handlers that
capture necessary exception information and stores the information on
the VCPU and KVM structures.

The following Hyp-ABI is also documented in the code:

Hyp-ABI: Calling HYP-mode functions from host (in SVC mode):
   Switching to Hyp mode is done through a simple HVC #0 instruction. The
   exception vector code will check that the HVC comes from VMID==0 and if
   so will push the necessary state (SPSR, lr_usr) on the Hyp stack.
   - r0 contains a pointer to a HYP function
   - r1, r2, and r3 contain arguments to the above function.
   - The HYP function will be called with its arguments in r0, r1 and r2.
   On HYP function return, we return directly to SVC.

A call to a function executing in Hyp mode is performed like the following:

svc code
ldr r0, =BSYM(my_hyp_fn)
ldr r1, =my_param
hvc #0  ; Call my_hyp_fn(my_param) from HYP mode
svc code

Otherwise, the world-switch is pretty straight-forward. All state that
can be modified by the guest is first backed up on the Hyp stack and the
VCPU values is loaded onto the hardware. State, which is not loaded, but
theoretically modifiable by the guest is protected through the
virtualiation features to generate a trap and cause software emulation.
Upon guest returns, all state is restored from hardware onto the VCPU
struct and the original state is restored from the Hyp-stack onto the
hardware.

SMP support using the VMPIDR calculated on the basis of the host MPIDR
and overriding the low bits with KVM vcpu_id contributed by Marc Zyngier.

Reuse of VMIDs has been implemented by Antonios Motakis and adapated from
a separate patch into the appropriate patches introducing the
functionality. Note that the VMIDs are stored per VM as required by the ARM
architecture reference manual.

To support VFP/NEON we trap those instructions using the HPCTR. When
we trap, we switch the FPU.  After a guest exit, the VFP state is
returned to the host.  When disabling access to floating point
instructions, we also mask FPEXC_EN in order to avoid the guest
receiving Undefined instruction exceptions before we have a chance to
switch back the floating point state.  We are reusing vfp_hard_struct,
so we depend on VFPv3 being enabled in the host kernel, if not, we still
trap cp10 and cp11 in order to inject an undefined instruction exception
whenever the guest tries to use VFP/NEON. VFP/NEON developed by
Antionios Motakis and Rusty Russell.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Antonios Motakis a.mota...@virtualopensystems.com
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h  |   38 
 arch/arm/include/asm/kvm_host.h |9 +
 arch/arm/kernel/asm-offsets.c   |   23 ++
 arch/arm/kvm/arm.c  |  165 
 arch/arm/kvm/interrupts.S   |  352 +-
 arch/arm/kvm/interrupts_head.S  |  409 +++
 6 files changed, 993 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/kvm/interrupts_head.S

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 4f54cda..aecd05f 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -98,6 +98,18 @@
 #define TTBCR_T0SZ 3
 #define HTCR_MASK  (TTBCR_T0SZ | TTBCR_IRGN0 | TTBCR_ORGN0 | TTBCR_SH0)
 
+/* Hyp System Trap Register */
+#define HSTR_T(x)  (1  x)
+#define HSTR_TTEE  (1  16)
+#define HSTR_TJDBX (1  17)
+
+/* Hyp Coprocessor Trap Register */
+#define HCPTR_TCP(x)   (1  x)
+#define HCPTR_TCP_MASK (0x3fff)
+#define HCPTR_TASE (1  15)
+#define HCPTR_TTA  (1  20)
+#define HCPTR_TCPAC(1  31)
+
 /* Hyp Debug Configuration Register bits */
 #define HDCR_TDRA  (1  11)
 #define HDCR_TDOSA (1  10)
@@ -128,5 +140,31 @@
 #define VTTBR_X(5 - VTCR_GUEST_T0SZ)
 #endif
 
+/* Hyp Syndrome Register (HSR) bits */
+#define HSR_EC_SHIFT   (26)
+#define HSR_EC (0x3fU  HSR_EC_SHIFT)
+#define HSR_IL (1U  25)
+#define HSR_ISS(HSR_IL - 1)
+#define HSR_ISV_SHIFT  (24)
+#define HSR_ISV(1U  HSR_ISV_SHIFT)
+
+#define HSR_EC_UNKNOWN (0x00)
+#define HSR_EC_WFI (0x01)
+#define HSR_EC_CP15_32 (0x03)
+#define HSR_EC_CP15_64 (0x04)
+#define HSR_EC_CP14_MR (0x05)
+#define HSR_EC_CP14_LS (0x06)
+#define HSR_EC_CP_0_13 (0x07)
+#define HSR_EC_CP10_ID (0x08)
+#define HSR_EC_JAZELLE (0x09)
+#define HSR_EC_BXJ (0x0A)
+#define HSR_EC_CP14_64 (0x0C)
+#define HSR_EC_SVC_HYP (0x11)
+#define HSR_EC_HVC (0x12)
+#define HSR_EC_SMC (0x13)
+#define HSR_EC_IABT(0x20)
+#define HSR_EC_IABT_HYP(0x21)
+#define HSR_EC_DABT(0x24)
+#define HSR_EC_DABT_HYP(0x25)

[PATCH v3 09/14] KVM: ARM: Emulation framework and CP15 emulation

2012-10-22 Thread Christoffer Dall

Adds a new important function in the main KVM/ARM code called
handle_exit() which is called from kvm_arch_vcpu_ioctl_run() on returns
from guest execution. This function examines the Hyp-Syndrome-Register
(HSR), which contains information telling KVM what caused the exit from
the guest.

Some of the reasons for an exit are CP15 accesses, which are
not allowed from the guest and this commit handles these exits by
emulating the intended operation in software and skipping the guest
instruction.

Minor notes about the coproc register reset:
1) We reserve a value of 0 as an invalid cp15 offset, to catch bugs in our
   table, at cost of 4 bytes per vcpu.

2) Added comments on the table indicating how we handle each register, for
   simplicity of understanding.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |9 +
 arch/arm/include/asm/kvm_coproc.h  |   14 +
 arch/arm/include/asm/kvm_emulate.h |6 +
 arch/arm/include/asm/kvm_host.h|4 
 arch/arm/kvm/Makefile  |3 
 arch/arm/kvm/arm.c |  175 +
 arch/arm/kvm/coproc.c  |  363 
 arch/arm/kvm/coproc.h  |  153 +++
 arch/arm/kvm/coproc_a15.c  |  164 
 arch/arm/kvm/emulate.c |  218 ++
 arch/arm/kvm/trace.h   |   45 
 11 files changed, 1149 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/kvm/coproc.h
 create mode 100644 arch/arm/kvm/coproc_a15.c

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index aecd05f..6b8bb51 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -70,6 +70,11 @@
HCR_SWIO | HCR_TIDCP)
 #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF)
 
+/* System Control Register (SCTLR) bits */
+#define SCTLR_TE   (1  30)
+#define SCTLR_EE   (1  25)
+#define SCTLR_V(1  13)
+
 /* Hyp System Control Register (HSCTLR) bits */
 #define HSCTLR_TE  (1  30)
 #define HSCTLR_EE  (1  25)
@@ -147,6 +152,10 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_CV_SHIFT   (24)
+#define HSR_CV (1U  HSR_CV_SHIFT)
+#define HSR_COND_SHIFT (20)
+#define HSR_COND   (0xfU  HSR_COND_SHIFT)
 
 #define HSR_EC_UNKNOWN (0x00)
 #define HSR_EC_WFI (0x01)
diff --git a/arch/arm/include/asm/kvm_coproc.h 
b/arch/arm/include/asm/kvm_coproc.h
index b6d023d..bd1ace0 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -21,4 +21,18 @@
 
 void kvm_reset_coprocs(struct kvm_vcpu *vcpu);
 
+struct kvm_coproc_target_table {
+   unsigned target;
+   const struct coproc_reg *table;
+   size_t num;
+};
+void kvm_register_target_coproc_table(struct kvm_coproc_target_table *table);
+
+int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp_0_13_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_coproc_table_init(void);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index 7d3e904..ac48156 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -25,6 +25,12 @@
 u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
 
+int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
+void kvm_inject_undefined(struct kvm_vcpu *vcpu);
+void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
+void kvm_inject_pabt(struct kvm_vcpu *vcpu, unsigned long addr);
+
 static inline u32 *vcpu_pc(struct kvm_vcpu *vcpu)
 {
return vcpu-arch.regs.pc;
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 5f5c975..2ce654c 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -94,6 +94,10 @@ struct kvm_vcpu_arch {
 * Anything that is not used directly from assembly code goes
 * here.
 */
+   /* dcache set/way operation pending */
+   int last_pcpu;
+   cpumask_t require_dcache_flush;
+
/* IO related fields */
struct {
bool sign_extend;   /* for byte/halfword loads */
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 7acf3ea..ea5b282 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -18,4

[PATCH v3 10/14] KVM: ARM: User space API for getting/setting co-proc registers

2012-10-22 Thread Christoffer Dall

The following three ioctls are implemented:
 -  KVM_GET_REG_LIST
 -  KVM_GET_ONE_REG
 -  KVM_SET_ONE_REG

Now we have a table for all the cp15 registers, we can drive a generic
API.

The register IDs carry the following encoding:

ARM registers are mapped using the lower 32 bits.  The upper 16 of that
is the register group type, or coprocessor number:

ARM 32-bit CP15 registers have the following id bit patterns:
  0x4002  000F zero:1 crn:4 crm:4 opc1:4 opc2:3

ARM 64-bit CP15 registers have the following id bit patterns:
  0x4003  000F zero:1 zero:4 crm:4 opc1:4 zero:3

For futureproofing, we need to tell QEMU about the CP15 registers the
host lets the guest access.

It will need this information to restore a current guest on a future
CPU or perhaps a future KVM which allow some of these to be changed.

We use a separate table for these, as they're only for the userspace API.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   44 +
 arch/arm/include/asm/kvm_coproc.h |9 +
 arch/arm/include/asm/kvm_host.h   |4 
 arch/arm/kvm/coproc.c |  327 +
 arch/arm/kvm/guest.c  |9 +
 5 files changed, 389 insertions(+), 4 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 4514292..242ff3e 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1798,6 +1798,11 @@ is the register group type, or coprocessor number:
 ARM core registers have the following id bit patterns:
   0x4002  0010 index into the kvm_regs struct:16
 
+ARM 32-bit CP15 registers have the following id bit patterns:
+  0x4002  000F zero:1 crn:4 crm:4 opc1:4 opc2:3
+
+ARM 64-bit CP15 registers have the following id bit patterns:
+  0x4003  000F zero:1 zero:4 crm:4 opc1:4 zero:3
 
 
 4.69 KVM_GET_ONE_REG
@@ -2139,6 +2144,45 @@ This ioctl returns the guest registers that are 
supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.77 KVM_ARM_VCPU_INIT
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct struct kvm_vcpu_init (in)
+Returns: 0 on success; -1 on error
+Errors:
+  EINVAL:    the target is unknown, or the combination of features is invalid.
+  ENOENT:    a features bit specified is unknown.
+
+This tells KVM what type of CPU to present to the guest, and what
+optional features it should have.  This will cause a reset of the cpu
+registers to their initial values.  If this is not called, KVM_RUN will
+return ENOEXEC for that vcpu.
+
+Note that because some registers reflect machine topology, all vcpus
+should be created before this ioctl is invoked.
+
+4.78 KVM_GET_REG_LIST
+
+Capability: basic
+Architectures: arm
+Type: vcpu ioctl
+Parameters: struct kvm_reg_list (in/out)
+Returns: 0 on success; -1 on error
+Errors:
+  E2BIG: the reg index list is too big to fit in the array specified by
+ the user (the number required will be written into n).
+
+struct kvm_reg_list {
+   __u64 n; /* number of registers in reg[] */
+   __u64 reg[0];
+};
+
+This ioctl returns the guest registers that are supported for the
+KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/arm/include/asm/kvm_coproc.h 
b/arch/arm/include/asm/kvm_coproc.h
index bd1ace0..4917c2f 100644
--- a/arch/arm/include/asm/kvm_coproc.h
+++ b/arch/arm/include/asm/kvm_coproc.h
@@ -34,5 +34,14 @@ int kvm_handle_cp14_load_store(struct kvm_vcpu *vcpu, struct 
kvm_run *run);
 int kvm_handle_cp14_access(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_32(struct kvm_vcpu *vcpu, struct kvm_run *run);
 int kvm_handle_cp15_64(struct kvm_vcpu *vcpu, struct kvm_run *run);
+
+unsigned long kvm_arm_num_guest_msrs(struct kvm_vcpu *vcpu);
+int kvm_arm_copy_msrindices(struct kvm_vcpu *vcpu, u64 __user *uindices);
 void kvm_coproc_table_init(void);
+
+struct kvm_one_reg;
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
 #endif /* __ARM_KVM_COPROC_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 2ce654c..606e21a 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -27,6 +27,7 @@
 #define KVM_MEMORY_SLOTS 32
 #define KVM_PRIVATE_MEM_SLOTS 4
 #define KVM_COALESCED_MMIO_PAGE_OFFSET 1
+#define KVM_HAVE_ONE_REG
 
 #define KVM_VCPU_MAX_FEATURES 0
 
@@ -139,6 +140,9 @@ int kvm_unmap_hva_range(struct kvm *kvm,
unsigned long start, unsigned long end);
 void kvm_set_spte_hva(struct kvm *kvm, unsigned long

[PATCH v3 11/14] KVM: ARM: Demux CCSIDR in the userspace API

2012-10-22 Thread Christoffer Dall

The Cache Size Selection Register (CSSELR) selects the current Cache
Size ID Register (CCSIDR).  You write which cache you are interested
in to CSSELR, and read the information out of CCSIDR.

Which cache numbers are valid is known by reading the Cache Level ID
Register (CLIDR).

To export this state to userspace, we add a KVM_REG_ARM_DEMUX
numberspace (17), which uses 8 bits to represent which register is
being demultiplexed (0 for CCSIDR), and the lower 8 bits to represent
this demultiplexing (in our case, the CSSELR value, which is 4 bits).

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell rusty.russ...@linaro.org
---
 Documentation/virtual/kvm/api.txt |2 
 arch/arm/include/uapi/asm/kvm.h   |9 ++
 arch/arm/kvm/coproc.c |  163 -
 3 files changed, 171 insertions(+), 3 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 242ff3e..6f87d61 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1804,6 +1804,8 @@ ARM 32-bit CP15 registers have the following id bit 
patterns:
 ARM 64-bit CP15 registers have the following id bit patterns:
   0x4003  000F zero:1 zero:4 crm:4 opc1:4 zero:3
 
+ARM CCSIDR registers are demultiplexed by CSSELR value:
+  0x4002  0011 00 csselr:8
 
 4.69 KVM_GET_ONE_REG
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index 138a588c..d79d064 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -81,6 +81,15 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_CORE   (0x0010  KVM_REG_ARM_COPROC_SHIFT)
 #define KVM_REG_ARM_CORE_REG(name) (offsetof(struct kvm_regs, name) / 4)
 
+/* Some registers need more space to represent values. */
+#define KVM_REG_ARM_DEMUX  (0x0011  KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_DEMUX_ID_MASK  0xFF00
+#define KVM_REG_ARM_DEMUX_ID_SHIFT 8
+#define KVM_REG_ARM_DEMUX_ID_CCSIDR(0x00  KVM_REG_ARM_DEMUX_ID_SHIFT)
+#define KVM_REG_ARM_DEMUX_VAL_MASK 0x00FF
+#define KVM_REG_ARM_DEMUX_VAL_SHIFT0
+
+
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
 #define KVM_ARM_IRQ_TYPE_MASK  0xff
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 95a0f5e..9ce5861 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -35,6 +35,12 @@
  * Co-processor emulation
  */
 
+/* 3 bits per cache level, as per CLIDR, but non-existent caches always 0 */
+static u32 cache_levels;
+
+/* CSSELR values; used to index KVM_REG_ARM_DEMUX_ID_CCSIDR */
+#define CSSELR_MAX 12
+
 int kvm_handle_cp10_id(struct kvm_vcpu *vcpu, struct kvm_run *run)
 {
kvm_inject_undefined(vcpu);
@@ -548,11 +554,112 @@ static int set_invariant_cp15(u64 id, void __user *uaddr)
return 0;
 }
 
+static bool is_valid_cache(u32 val)
+{
+   u32 level, ctype;
+
+   if (val = CSSELR_MAX)
+   return -ENOENT;
+
+   /* Bottom bit is Instruction or Data bit.  Next 3 bits are level. */
+level = (val  1);
+ctype = (cache_levels  (level * 3))  7;
+
+   switch (ctype) {
+   case 0: /* No cache */
+   return false;
+   case 1: /* Instruction cache only */
+   return (val  1);
+   case 2: /* Data cache only */
+   case 4: /* Unified cache */
+   return !(val  1);
+   case 3: /* Separate instruction and data caches */
+   return true;
+   default: /* Reserved: we can't know instruction or data. */
+   return false;
+   }
+}
+
+/* Which cache CCSIDR represents depends on CSSELR value. */
+static u32 get_ccsidr(u32 csselr)
+{
+   u32 ccsidr;
+
+   /* Make sure noone else changes CSSELR during this! */
+   local_irq_disable();
+   /* Put value into CSSELR */
+   asm volatile(mcr p15, 2, %0, c0, c0, 0 : : r (csselr));
+   /* Read result out of CCSIDR */
+   asm volatile(mrc p15, 1, %0, c0, c0, 0 : =r (ccsidr));
+   local_irq_enable();
+
+   return ccsidr;
+}
+
+static int demux_c15_get(u64 id, void __user *uaddr)
+{
+   u32 val;
+   u32 __user *uval = uaddr;
+
+   /* Fail if we have unknown bits set. */
+   if (id  ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+  | ((1  KVM_REG_ARM_COPROC_SHIFT)-1)))
+   return -ENOENT;
+
+   switch (id  KVM_REG_ARM_DEMUX_ID_MASK) {
+   case KVM_REG_ARM_DEMUX_ID_CCSIDR:
+   if (KVM_REG_SIZE(id) != 4)
+   return -ENOENT;
+   val = (id  KVM_REG_ARM_DEMUX_VAL_MASK)
+KVM_REG_ARM_DEMUX_VAL_SHIFT;
+   if (!is_valid_cache(val))
+   return -ENOENT;
+
+   return put_user(get_ccsidr(val), uval);
+

[PATCH v3 12/14] KVM: ARM: VFP userspace interface

2012-10-22 Thread Christoffer Dall

From: Rusty Russell rusty.russ...@linaro.org

We use space #18 for floating point regs.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell ru...@rustcorp.com.au
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |6 +
 arch/arm/include/uapi/asm/kvm.h   |   12 ++
 arch/arm/kvm/coproc.c |  178 +
 3 files changed, 196 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 6f87d61..764c5df 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1807,6 +1807,12 @@ ARM 64-bit CP15 registers have the following id bit 
patterns:
 ARM CCSIDR registers are demultiplexed by CSSELR value:
   0x4002  0011 00 csselr:8
 
+ARM 32-bit VFP control registers have the following id bit patterns:
+  0x4002  0012 1 regno:12
+
+ARM 64-bit FP registers have the following id bit patterns:
+  0x4002  0012 0 regno:12
+
 4.69 KVM_GET_ONE_REG
 
 Capability: KVM_CAP_ONE_REG
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index d79d064..fb41608 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -89,6 +89,18 @@ struct kvm_arch_memory_slot {
 #define KVM_REG_ARM_DEMUX_VAL_MASK 0x00FF
 #define KVM_REG_ARM_DEMUX_VAL_SHIFT0
 
+/* VFP registers: we could overload CP10 like ARM does, but that's ugly. */
+#define KVM_REG_ARM_VFP(0x0012  
KVM_REG_ARM_COPROC_SHIFT)
+#define KVM_REG_ARM_VFP_MASK   0x
+#define KVM_REG_ARM_VFP_BASE_REG   0x0
+#define KVM_REG_ARM_VFP_FPSID  0x1000
+#define KVM_REG_ARM_VFP_FPSCR  0x1001
+#define KVM_REG_ARM_VFP_MVFR1  0x1006
+#define KVM_REG_ARM_VFP_MVFR0  0x1007
+#define KVM_REG_ARM_VFP_FPEXC  0x1008
+#define KVM_REG_ARM_VFP_FPINST 0x1009
+#define KVM_REG_ARM_VFP_FPINST20x100A
+
 
 /* KVM_IRQ_LINE irq field index values */
 #define KVM_ARM_IRQ_TYPE_SHIFT 24
diff --git a/arch/arm/kvm/coproc.c b/arch/arm/kvm/coproc.c
index 9ce5861..0b9b521 100644
--- a/arch/arm/kvm/coproc.c
+++ b/arch/arm/kvm/coproc.c
@@ -26,6 +26,8 @@
 #include asm/cacheflush.h
 #include asm/cputype.h
 #include trace/events/kvm.h
+#include asm/vfp.h
+#include ../vfp/vfpinstr.h
 
 #include trace.h
 #include coproc.h
@@ -652,6 +654,170 @@ static int demux_c15_set(u64 id, void __user *uaddr)
}
 }
 
+#ifdef CONFIG_VFPv3
+static const int vfp_sysregs[] = { KVM_REG_ARM_VFP_FPEXC,
+  KVM_REG_ARM_VFP_FPSCR,
+  KVM_REG_ARM_VFP_FPINST,
+  KVM_REG_ARM_VFP_FPINST2,
+  KVM_REG_ARM_VFP_MVFR0,
+  KVM_REG_ARM_VFP_MVFR1,
+  KVM_REG_ARM_VFP_FPSID };
+
+static unsigned int num_fp_regs(void)
+{
+   if (((fmrx(MVFR0)  MVFR0_A_SIMD_MASK)  MVFR0_A_SIMD_BIT) == 2)
+   return 32;
+   else
+   return 16;
+}
+
+static unsigned int num_vfp_regs(void)
+{
+   /* Normal FP regs + control regs. */
+   return num_fp_regs() + ARRAY_SIZE(vfp_sysregs);
+}
+
+static int copy_vfp_regids(u64 __user *uindices)
+{
+   unsigned int i;
+   const u64 u32reg = KVM_REG_ARM | KVM_REG_SIZE_U32 | KVM_REG_ARM_VFP;
+   const u64 u64reg = KVM_REG_ARM | KVM_REG_SIZE_U64 | KVM_REG_ARM_VFP;
+
+   for (i = 0; i  num_fp_regs(); i++) {
+   if (put_user((u64reg | KVM_REG_ARM_VFP_BASE_REG) + i,
+uindices))
+   return -EFAULT;
+   uindices++;
+   }
+
+   for (i = 0; i  ARRAY_SIZE(vfp_sysregs); i++) {
+   if (put_user(u32reg | vfp_sysregs[i], uindices))
+   return -EFAULT;
+   uindices++;
+   }
+
+   return num_vfp_regs();
+}
+
+static int vfp_get_reg(const struct kvm_vcpu *vcpu, u64 id, void __user *uaddr)
+{
+   u32 vfpid = (id  KVM_REG_ARM_VFP_MASK);
+   u32 val;
+
+   /* Fail if we have unknown bits set. */
+   if (id  ~(KVM_REG_ARCH_MASK|KVM_REG_SIZE_MASK|KVM_REG_ARM_COPROC_MASK
+  | ((1  KVM_REG_ARM_COPROC_SHIFT)-1)))
+   return -ENOENT;
+
+   if (vfpid  num_fp_regs()) {
+   if (KVM_REG_SIZE(id) != 8)
+   return -ENOENT;
+   return reg_to_user(uaddr, vcpu-arch.vfp_guest.fpregs[vfpid],
+  id);
+   }
+
+   /* FP control registers are all 32 bit. */
+   if (KVM_REG_SIZE(id) != 4)
+   return -ENOENT;
+
+   switch (vfpid) {
+   case KVM_REG_ARM_VFP_FPEXC:
+   return reg_to_user(uaddr, vcpu-arch.vfp_guest.fpexc, id);
+   case KVM_REG_ARM_VFP_FPSCR:
+   return reg_to_user(uaddr, vcpu-arch.vfp_guest.fpscr, id);
+   case

[PATCH v3 14/14] KVM: ARM: Handle I/O aborts

2012-10-22 Thread Christoffer Dall

When the guest accesses I/O memory this will create data abort
exceptions and they are handled by decoding the HSR information
(physical address, read/write, length, register) and forwarding reads
and writes to QEMU which performs the device emulation.

Certain classes of load/store operations do not support the syndrome
information provided in the HSR and we therefore must be able to fetch
the offending instruction from guest memory and decode it manually.

We only support instruction decoding for valid reasonable MMIO operations
where trapping them do not provide sufficient information in the HSR (no
16-bit Thumb instructions provide register writeback that we care about).

The following instruciton types are NOT supported for MMIO operations
despite the HSR not containing decode info:
 - any Load/Store multiple
 - any load/store exclusive
 - any load/store dual
 - anything with the PC as the dest register

This requires changing the general flow somewhat since new calls to run
the VCPU must check if there's a pending MMIO load and perform the write
after userspace has made the data available.

Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
(1) Guest complicated mmio instruction traps.
(2) The hardware doesn't tell us enough, so we need to read the actual
instruction which was being exectuted.
(3) KVM maps the instruction virtual address to a physical address.
(4) The guest (SMP) swaps out that page, and fills it with something else.
(5) We read the physical address, but now that's the wrong thing.

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com
Signed-off-by: Rusty Russell rusty.russ...@linaro.org
Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |3 
 arch/arm/include/asm/kvm_asm.h |2 
 arch/arm/include/asm/kvm_emulate.h |8 
 arch/arm/include/asm/kvm_host.h|3 
 arch/arm/include/asm/kvm_mmio.h|   51 +++
 arch/arm/kvm/Makefile  |2 
 arch/arm/kvm/arm.c |   14 +
 arch/arm/kvm/emulate.c |  581 
 arch/arm/kvm/interrupts.S  |   38 ++
 arch/arm/kvm/mmio.c|  152 +
 arch/arm/kvm/mmu.c |7 
 arch/arm/kvm/trace.h   |   21 +
 12 files changed, 878 insertions(+), 4 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_mmio.h
 create mode 100644 arch/arm/kvm/mmio.c

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 61d8a26..4f1bb01 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -152,8 +152,11 @@
 #define HSR_ISS(HSR_IL - 1)
 #define HSR_ISV_SHIFT  (24)
 #define HSR_ISV(1U  HSR_ISV_SHIFT)
+#define HSR_SRT_SHIFT  (16)
+#define HSR_SRT_MASK   (0xf  HSR_SRT_SHIFT)
 #define HSR_FSC(0x3f)
 #define HSR_FSC_TYPE   (0x3c)
+#define HSR_SSE(1  21)
 #define HSR_WNR(1  6)
 #define HSR_CV_SHIFT   (24)
 #define HSR_CV (1U  HSR_CV_SHIFT)
diff --git a/arch/arm/include/asm/kvm_asm.h b/arch/arm/include/asm/kvm_asm.h
index 6fccdb3..99c0faf 100644
--- a/arch/arm/include/asm/kvm_asm.h
+++ b/arch/arm/include/asm/kvm_asm.h
@@ -77,6 +77,8 @@ extern void __kvm_flush_vm_context(void);
 extern void __kvm_tlb_flush_vmid(struct kvm *kvm);
 
 extern int __kvm_vcpu_run(struct kvm_vcpu *vcpu);
+
+extern u64 __kvm_va_to_pa(struct kvm_vcpu *vcpu, u32 va, bool priv);
 #endif
 
 #endif /* __ARM_KVM_ASM_H__ */
diff --git a/arch/arm/include/asm/kvm_emulate.h 
b/arch/arm/include/asm/kvm_emulate.h
index ac48156..b94863a 100644
--- a/arch/arm/include/asm/kvm_emulate.h
+++ b/arch/arm/include/asm/kvm_emulate.h
@@ -21,11 +21,14 @@
 
 #include linux/kvm_host.h
 #include asm/kvm_asm.h
+#include asm/kvm_mmio.h
 
 u32 *vcpu_reg(struct kvm_vcpu *vcpu, u8 reg_num);
 u32 *vcpu_spsr(struct kvm_vcpu *vcpu);
 
 int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run);
+int kvm_emulate_mmio_ls(struct kvm_vcpu *vcpu, phys_addr_t fault_ipa,
+   struct kvm_exit_mmio *mmio);
 void kvm_skip_instr(struct kvm_vcpu *vcpu, bool is_wide_instr);
 void kvm_inject_undefined(struct kvm_vcpu *vcpu);
 void kvm_inject_dabt(struct kvm_vcpu *vcpu, unsigned long addr);
@@ -53,4 +56,9 @@ static inline bool vcpu_mode_priv(struct kvm_vcpu *vcpu)
return cpsr_mode  USR_MODE;;
 }
 
+static inline bool kvm_vcpu_reg_is_pc(struct kvm_vcpu *vcpu, int reg)
+{
+   return reg == 15;
+}
+
 #endif /* __ARM_KVM_EMULATE_H__ */
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 606e21a..2eddd96 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -99,6 +99,9 @@ struct kvm_vcpu_arch {
int last_pcpu;
cpumask_t require_dcache_flush;
 
+   /* Don't run the guest: see copy_current_insn() */
+   bool pause;
+
/* IO related

[PATCH v3 00/13] KVM/ARM vGIC support

2012-10-22 Thread Christoffer Dall

The following series implements support for the virtual generic
interrupt controller architecture for KVM/ARM.

Changes since v2:
 - Get rid of hardcoded guest cpu and distributor physical addresses
   and instead provide the address through the KVM_SET_DEVICE_ADDRESS
   ioctl.
 - Fix level/edge bugs
 - Fix reboot bug: retire queued, disabled interrupts

This patch series can also be pulled from:
git://github.com/virtualopensystems/linux-kvm-arm.git
branch: kvm-arm-v13-vgic


---

Christoffer Dall (2):
  KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl
  ARM: KVM: VGIC accept vcpu and dist base addresses from user space

Marc Zyngier (11):
  ARM: KVM: Keep track of currently running vcpus
  ARM: KVM: Initial VGIC infrastructure support
  ARM: KVM: Initial VGIC MMIO support code
  ARM: KVM: VGIC distributor handling
  ARM: KVM: VGIC virtual CPU interface management
  ARM: KVM: vgic: retire queued, disabled interrupts
  ARM: KVM: VGIC interrupt injection
  ARM: KVM: VGIC control interface world switch
  ARM: KVM: VGIC initialisation code
  ARM: KVM: vgic: reduce the number of vcpu kick
  ARM: KVM: Add VGIC configuration option


 Documentation/virtual/kvm/api.txt |   37 +
 arch/arm/include/asm/kvm_arm.h|   12 
 arch/arm/include/asm/kvm_host.h   |   17 +
 arch/arm/include/asm/kvm_mmu.h|2 
 arch/arm/include/asm/kvm_vgic.h   |  320 +
 arch/arm/include/uapi/asm/kvm.h   |   13 
 arch/arm/kernel/asm-offsets.c |   12 
 arch/arm/kvm/Kconfig  |7 
 arch/arm/kvm/Makefile |1 
 arch/arm/kvm/arm.c|  138 
 arch/arm/kvm/interrupts.S |4 
 arch/arm/kvm/interrupts_head.S|   68 ++
 arch/arm/kvm/mmio.c   |3 
 arch/arm/kvm/vgic.c   | 1251 +
 include/uapi/linux/kvm.h  |8 
 virt/kvm/kvm_main.c   |5 
 16 files changed, 1893 insertions(+), 5 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_vgic.h
 create mode 100644 arch/arm/kvm/vgic.c

-- 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 01/13] KVM: ARM: Introduce KVM_SET_DEVICE_ADDRESS ioctl

2012-10-22 Thread Christoffer Dall

On ARM (and possibly other architectures) some bits are specific to the
model being emulated for the guest and user space needs a way to tell
the kernel about those bits.  An example is mmio device base addresses,
where KVM must know the base address for a given device to properly
emulate mmio accesses within a certain address range or directly map a
device with virtualiation extensions into the guest address space.

We try to make this API slightly more generic than for our specific use,
but so far only the VGIC uses this feature.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 Documentation/virtual/kvm/api.txt |   37 +
 arch/arm/include/uapi/asm/kvm.h   |   13 +
 arch/arm/kvm/arm.c|   24 +++-
 include/uapi/linux/kvm.h  |8 
 4 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 764c5df..428b625 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -2191,6 +2191,43 @@ This ioctl returns the guest registers that are 
supported for the
 KVM_GET_ONE_REG/KVM_SET_ONE_REG calls.
 
 
+4.80 KVM_SET_DEVICE_ADDRESS
+
+Capability: KVM_CAP_SET_DEVICE_ADDRESS
+Architectures: arm
+Type: vm ioctl
+Parameters: struct kvm_device_address (in)
+Returns: 0 on success, -1 on error
+Errors:
+  ENODEV: The device id is unknown
+  ENXIO:  Device not supported on current system
+  EEXIST: Address already set
+  E2BIG:  Address outside guest physical address space
+
+struct kvm_device_address {
+   __u32 id;
+   __u64 addr;
+};
+
+Specify a device address in the guest's physical address space where guests
+can access emulated or directly exposed devices, which the host kernel needs
+to know about. The id field is an architecture specific identifier for a
+specific device.
+
+ARM divides the id field into two parts, a device id and an address type id
+specific to the individual device.
+
+  bits:  | 31...16 | 15...0 |
+  field: | device id   |  addr type id  |
+
+ARM currently only require this when using the in-kernel GIC support for the
+hardware vGIC features, using KVM_ARM_DEVICE_VGIC_V2 as the device id.  When
+setting the base address for the guest's mapping of the vGIC virtual CPU
+and distributor interface, the ioctl must be called after calling
+KVM_CREATE_IRQCHIP, but before calling KVM_RUN on any of the VCPUs.  Calling
+this ioctl twice for any of the base addresses will return -EEXIST.
+
+
 5. The kvm_run structure
 
 
diff --git a/arch/arm/include/uapi/asm/kvm.h b/arch/arm/include/uapi/asm/kvm.h
index fb41608..a7ae073 100644
--- a/arch/arm/include/uapi/asm/kvm.h
+++ b/arch/arm/include/uapi/asm/kvm.h
@@ -42,6 +42,19 @@ struct kvm_regs {
 #define KVM_ARM_TARGET_CORTEX_A15  0
 #define KVM_ARM_NUM_TARGETS1
 
+/* KVM_SET_DEVICE_ADDRESS ioctl id encoding */
+#define KVM_DEVICE_TYPE_SHIFT  0
+#define KVM_DEVICE_TYPE_MASK   (0x  KVM_DEVICE_TYPE_SHIFT)
+#define KVM_DEVICE_ID_SHIFT16
+#define KVM_DEVICE_ID_MASK (0x  KVM_DEVICE_ID_SHIFT)
+
+/* Supported device IDs */
+#define KVM_ARM_DEVICE_VGIC_V2 0
+
+/* Supported VGIC address types  */
+#define KVM_VGIC_V2_ADDR_TYPE_DIST 0
+#define KVM_VGIC_V2_ADDR_TYPE_CPU  1
+
 struct kvm_vcpu_init {
__u32 target;
__u32 features[7];
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index acdfa63..c192399 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -164,6 +164,9 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_COALESCED_MMIO:
r = KVM_COALESCED_MMIO_PAGE_OFFSET;
break;
+   case KVM_CAP_SET_DEVICE_ADDR:
+   r = 1;
+   break;
default:
r = 0;
break;
@@ -776,10 +779,29 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
return -EINVAL;
 }
 
+static int kvm_vm_ioctl_set_device_address(struct kvm *kvm,
+  struct kvm_device_address *dev_addr)
+{
+   return -ENODEV;
+}
+
 long kvm_arch_vm_ioctl(struct file *filp,
   unsigned int ioctl, unsigned long arg)
 {
-   return -EINVAL;
+   struct kvm *kvm = filp-private_data;
+   void __user *argp = (void __user *)arg;
+
+   switch (ioctl) {
+   case KVM_SET_DEVICE_ADDRESS: {
+   struct kvm_device_address dev_addr;
+
+   if (copy_from_user(dev_addr, argp, sizeof(dev_addr)))
+   return -EFAULT;
+   return kvm_vm_ioctl_set_device_address(kvm, dev_addr);
+   }
+   default:
+   return -EINVAL;
+   }
 }
 
 static void cpu_init_hyp_mode(void *vector)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 72f018b..1aaa15e 100644
---

[PATCH v3 02/13] ARM: KVM: Keep track of currently running vcpus

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

When an interrupt occurs for the guest, it is sometimes necessary
to find out which vcpu was running at that point.

Keep track of which vcpu is being tun in kvm_arch_vcpu_ioctl_run(),
and allow the data to be retrived using either:
- kvm_arm_get_running_vcpu(): returns the vcpu running at this point
  on the current CPU. Can only be used in a non-preemptable context.
- kvm_arm_get_running_vcpus(): returns the per-CPU variable holding
  the the running vcpus, useable for per-CPU interrupts.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_host.h |   10 ++
 arch/arm/kvm/arm.c  |   30 ++
 2 files changed, 40 insertions(+)

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 2eddd96..c6f1102 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -156,4 +156,14 @@ static inline int kvm_test_age_hva(struct kvm *kvm, 
unsigned long hva)
 {
return 0;
 }
+
+struct kvm_vcpu *kvm_arm_get_running_vcpu(void);
+struct kvm_vcpu __percpu **kvm_get_running_vcpus(void);
+
+int kvm_arm_copy_coproc_indices(struct kvm_vcpu *vcpu, u64 __user *uindices);
+unsigned long kvm_arm_num_coproc_regs(struct kvm_vcpu *vcpu);
+struct kvm_one_reg;
+int kvm_arm_coproc_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+int kvm_arm_coproc_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *);
+
 #endif /* __ARM_KVM_HOST_H__ */
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index c192399..828b5af 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -53,11 +53,38 @@ static DEFINE_PER_CPU(unsigned long, 
kvm_arm_hyp_stack_page);
 static struct vfp_hard_struct __percpu *kvm_host_vfp_state;
 static unsigned long hyp_default_vectors;
 
+/* Per-CPU variable containing the currently running vcpu. */
+static DEFINE_PER_CPU(struct kvm_vcpu *, kvm_arm_running_vcpu);
+
 /* The VMID used in the VTTBR */
 static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
+{
+   BUG_ON(preemptible());
+   __get_cpu_var(kvm_arm_running_vcpu) = vcpu;
+}
+
+/**
+ * kvm_arm_get_running_vcpu - get the vcpu running on the current CPU.
+ * Must be called from non-preemptible context
+ */
+struct kvm_vcpu *kvm_arm_get_running_vcpu(void)
+{
+   BUG_ON(preemptible());
+   return __get_cpu_var(kvm_arm_running_vcpu);
+}
+
+/**
+ * kvm_arm_get_running_vcpus - get the per-CPU array on currently running 
vcpus.
+ */
+struct kvm_vcpu __percpu **kvm_get_running_vcpus(void)
+{
+   return kvm_arm_running_vcpu;
+}
+
 int kvm_arch_hardware_enable(void *garbage)
 {
return 0;
@@ -299,10 +326,13 @@ void kvm_arch_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
cpumask_clear_cpu(cpu, vcpu-arch.require_dcache_flush);
flush_cache_all(); /* We'd really want v7_flush_dcache_all() */
}
+
+   kvm_arm_set_running_vcpu(vcpu);
 }
 
 void kvm_arch_vcpu_put(struct kvm_vcpu *vcpu)
 {
+   kvm_arm_set_running_vcpu(NULL);
 }
 
 int kvm_arch_vcpu_ioctl_set_guest_debug(struct kvm_vcpu *vcpu,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 03/13] ARM: KVM: Initial VGIC infrastructure support

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Wire the basic framework code for VGIC support. Nothing to enable
yet.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_host.h |7 
 arch/arm/include/asm/kvm_vgic.h |   70 +++
 arch/arm/kvm/arm.c  |   21 +++-
 arch/arm/kvm/interrupts.S   |4 ++
 arch/arm/kvm/mmio.c |3 ++
 virt/kvm/kvm_main.c |5 ++-
 6 files changed, 107 insertions(+), 3 deletions(-)
 create mode 100644 arch/arm/include/asm/kvm_vgic.h

diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index c6f1102..9bbccdf 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -22,6 +22,7 @@
 #include asm/kvm.h
 #include asm/kvm_asm.h
 #include asm/fpstate.h
+#include asm/kvm_vgic.h
 
 #define KVM_MAX_VCPUS NR_CPUS
 #define KVM_MEMORY_SLOTS 32
@@ -57,6 +58,9 @@ struct kvm_arch {
 
/* Stage-2 page table */
pgd_t *pgd;
+
+   /* Interrupt controller */
+   struct vgic_distvgic;
 };
 
 #define KVM_NR_MEM_OBJS 40
@@ -91,6 +95,9 @@ struct kvm_vcpu_arch {
struct vfp_hard_struct vfp_guest;
struct vfp_hard_struct *vfp_host;
 
+   /* VGIC state */
+   struct vgic_cpu vgic_cpu;
+
/*
 * Anything that is not used directly from assembly code goes
 * here.
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
new file mode 100644
index 000..d75540a
--- /dev/null
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -0,0 +1,70 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Marc Zyngier marc.zyng...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_ARM_KVM_VGIC_H
+#define __ASM_ARM_KVM_VGIC_H
+
+struct vgic_dist {
+};
+
+struct vgic_cpu {
+};
+
+struct kvm;
+struct kvm_vcpu;
+struct kvm_run;
+struct kvm_exit_mmio;
+
+#ifndef CONFIG_KVM_ARM_VGIC
+static inline int kvm_vgic_hyp_init(void)
+{
+   return 0;
+}
+
+static inline int kvm_vgic_init(struct kvm *kvm)
+{
+   return 0;
+}
+
+static inline int kvm_vgic_create(struct kvm *kvm)
+{
+   return 0;
+}
+
+static inline void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {}
+static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {}
+
+static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
+{
+   return 0;
+}
+
+static inline bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
+   struct kvm_exit_mmio *mmio)
+{
+   return false;
+}
+
+static inline int irqchip_in_kernel(struct kvm *kvm)
+{
+   return 0;
+}
+#endif
+
+#endif
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 828b5af..a57b107 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -183,6 +183,9 @@ int kvm_dev_ioctl_check_extension(long ext)
 {
int r;
switch (ext) {
+#ifdef CONFIG_KVM_ARM_VGIC
+   case KVM_CAP_IRQCHIP:
+#endif
case KVM_CAP_USER_MEMORY:
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
case KVM_CAP_ONE_REG:
@@ -304,6 +307,10 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 {
/* Force users to call KVM_ARM_VCPU_INIT */
vcpu-arch.target = -1;
+
+   /* Set up VGIC */
+   kvm_vgic_vcpu_init(vcpu);
+
return 0;
 }
 
@@ -363,7 +370,7 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu,
  */
 int kvm_arch_vcpu_runnable(struct kvm_vcpu *v)
 {
-   return !!v-arch.irq_lines;
+   return !!v-arch.irq_lines || kvm_vgic_vcpu_pending_irq(v);
 }
 
 int kvm_arch_vcpu_in_guest_mode(struct kvm_vcpu *v)
@@ -632,6 +639,8 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
update_vttbr(vcpu-kvm);
 
+   kvm_vgic_sync_to_cpu(vcpu);
+
local_irq_disable();
 
/*
@@ -644,6 +653,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
local_irq_enable();
+   kvm_vgic_sync_from_cpu(vcpu);
continue;
}
 
@@ -682,6 +692,8 @@ int kvm_arch_vcpu_ioctl_run(struct

[PATCH v3 04/13] ARM: KVM: Initial VGIC MMIO support code

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Wire the initial in-kernel MMIO support code for the VGIC, used
for the distributor emulation.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_vgic.h |6 +-
 arch/arm/kvm/Makefile   |1 
 arch/arm/kvm/vgic.c |  138 +++
 3 files changed, 144 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kvm/vgic.c

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index d75540a..b444ecf 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -30,7 +30,11 @@ struct kvm_vcpu;
 struct kvm_run;
 struct kvm_exit_mmio;
 
-#ifndef CONFIG_KVM_ARM_VGIC
+#ifdef CONFIG_KVM_ARM_VGIC
+bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
+ struct kvm_exit_mmio *mmio);
+
+#else
 static inline int kvm_vgic_hyp_init(void)
 {
return 0;
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 574c67c..3370c09 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += $(addprefix ../../../virt/kvm/, 
kvm_main.o coalesc
 
 obj-$(CONFIG_KVM_ARM_HOST) += arm.o guest.o mmu.o emulate.o reset.o
 obj-$(CONFIG_KVM_ARM_HOST) += coproc.o coproc_a15.o mmio.o
+obj-$(CONFIG_KVM_ARM_VGIC) += vgic.o
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
new file mode 100644
index 000..26ada3b
--- /dev/null
+++ b/arch/arm/kvm/vgic.c
@@ -0,0 +1,138 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Marc Zyngier marc.zyng...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include linux/kvm.h
+#include linux/kvm_host.h
+#include linux/interrupt.h
+#include linux/io.h
+#include asm/kvm_emulate.h
+
+#define ACCESS_READ_VALUE  (1  0)
+#define ACCESS_READ_RAZ(0  0)
+#define ACCESS_READ_MASK(x)((x)  (1  0))
+#define ACCESS_WRITE_IGNORED   (0  1)
+#define ACCESS_WRITE_SETBIT(1  1)
+#define ACCESS_WRITE_CLEARBIT  (2  1)
+#define ACCESS_WRITE_VALUE (3  1)
+#define ACCESS_WRITE_MASK(x)   ((x)  (3  1))
+
+/**
+ * vgic_reg_access - access vgic register
+ * @mmio:   pointer to the data describing the mmio access
+ * @reg:pointer to the virtual backing of the vgic distributor struct
+ * @offset: least significant 2 bits used for word offset
+ * @mode:   ACCESS_ mode (see defines above)
+ *
+ * Helper to make vgic register access easier using one of the access
+ * modes defined for vgic register access
+ * (read,raz,write-ignored,setbit,clearbit,write)
+ */
+static void vgic_reg_access(struct kvm_exit_mmio *mmio, u32 *reg,
+   u32 offset, int mode)
+{
+   int word_offset = offset  3;
+   int shift = word_offset * 8;
+   u32 mask;
+   u32 regval;
+
+   /*
+* Any alignment fault should have been delivered to the guest
+* directly (ARM ARM B3.12.7 Prioritization of aborts).
+*/
+
+   mask = (~0U)  (word_offset * 8);
+   if (reg)
+   regval = *reg;
+   else {
+   BUG_ON(mode != (ACCESS_READ_RAZ | ACCESS_WRITE_IGNORED));
+   regval = 0;
+   }
+
+   if (mmio-is_write) {
+   u32 data = (*((u32 *)mmio-data)  mask)  shift;
+   switch (ACCESS_WRITE_MASK(mode)) {
+   case ACCESS_WRITE_IGNORED:
+   return;
+
+   case ACCESS_WRITE_SETBIT:
+   regval |= data;
+   break;
+
+   case ACCESS_WRITE_CLEARBIT:
+   regval = ~data;
+   break;
+
+   case ACCESS_WRITE_VALUE:
+   regval = (regval  ~(mask  shift)) | data;
+   break;
+   }
+   *reg = regval;
+   } else {
+   switch (ACCESS_READ_MASK(mode)) {
+   case ACCESS_READ_RAZ:
+   regval = 0;
+   /* fall through */
+
+   case ACCESS_READ_VALUE:
+   *((u32 *)mmio-data) = (regval  shift)  mask;
+   }
+   }
+}
+
+/* All this should be handled by kvm_bus_io_*()... FIXME!!! */
+struct mmio_range {
+   unsigned long base;
+   unsigned long len;
+   bool

[PATCH v3 05/13] ARM: KVM: VGIC accept vcpu and dist base addresses from user space

2012-10-22 Thread Christoffer Dall

User space defines the model to emulate to a guest and should therefore
decide which addresses are used for both the virtual CPU interface
directly mapped in the guest physical address space and for the emulated
distributor interface, which is mapped in software by the in-kernel VGIC
support.

Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_mmu.h  |2 +
 arch/arm/include/asm/kvm_vgic.h |9 ++
 arch/arm/kvm/arm.c  |   16 ++
 arch/arm/kvm/vgic.c |   61 +++
 4 files changed, 87 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_mmu.h b/arch/arm/include/asm/kvm_mmu.h
index 9bd0508..0800531 100644
--- a/arch/arm/include/asm/kvm_mmu.h
+++ b/arch/arm/include/asm/kvm_mmu.h
@@ -26,6 +26,8 @@
  * To save a bit of memory and to avoid alignment issues we assume 39-bit IPA
  * for now, but remember that the level-1 table must be aligned to its size.
  */
+#define KVM_PHYS_SHIFT (38)
+#define KVM_PHYS_MASK  ((1ULL  KVM_PHYS_SHIFT) - 1)
 #define PTRS_PER_PGD2  512
 #define PGD2_ORDER get_order(PTRS_PER_PGD2 * sizeof(pgd_t))
 
diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index b444ecf..9ca8d21 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -20,6 +20,9 @@
 #define __ASM_ARM_KVM_VGIC_H
 
 struct vgic_dist {
+   /* Distributor and vcpu interface mapping in the guest */
+   phys_addr_t vgic_dist_base;
+   phys_addr_t vgic_cpu_base;
 };
 
 struct vgic_cpu {
@@ -31,6 +34,7 @@ struct kvm_run;
 struct kvm_exit_mmio;
 
 #ifdef CONFIG_KVM_ARM_VGIC
+int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
 bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  struct kvm_exit_mmio *mmio);
 
@@ -40,6 +44,11 @@ static inline int kvm_vgic_hyp_init(void)
return 0;
 }
 
+static inline int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 
addr)
+{
+   return 0;
+}
+
 static inline int kvm_vgic_init(struct kvm *kvm)
 {
return 0;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index a57b107..f92b4ec 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -61,6 +61,8 @@ static atomic64_t kvm_vmid_gen = ATOMIC64_INIT(1);
 static u8 kvm_next_vmid;
 static DEFINE_SPINLOCK(kvm_vmid_lock);
 
+static bool vgic_present;
+
 static void kvm_arm_set_running_vcpu(struct kvm_vcpu *vcpu)
 {
BUG_ON(preemptible());
@@ -824,7 +826,19 @@ int kvm_vm_ioctl_get_dirty_log(struct kvm *kvm, struct 
kvm_dirty_log *log)
 static int kvm_vm_ioctl_set_device_address(struct kvm *kvm,
   struct kvm_device_address *dev_addr)
 {
-   return -ENODEV;
+   unsigned long dev_id, type;
+
+   dev_id = (dev_addr-id  KVM_DEVICE_ID_MASK)  KVM_DEVICE_ID_SHIFT;
+   type = (dev_addr-id  KVM_DEVICE_TYPE_MASK)  KVM_DEVICE_TYPE_SHIFT;
+
+   switch (dev_id) {
+   case KVM_ARM_DEVICE_VGIC_V2:
+   if (!vgic_present)
+   return -ENXIO;
+   return kvm_vgic_set_addr(kvm, type, dev_addr-addr);
+   default:
+   return -ENODEV;
+   }
 }
 
 long kvm_arch_vm_ioctl(struct file *filp,
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 26ada3b..f85b275 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -22,6 +22,13 @@
 #include linux/io.h
 #include asm/kvm_emulate.h
 
+#define VGIC_ADDR_UNDEF(-1)
+#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == (typeof(_x))VGIC_ADDR_UNDEF)
+
+#define VGIC_DIST_SIZE 0x1000
+#define VGIC_CPU_SIZE  0x2000
+
+
 #define ACCESS_READ_VALUE  (1  0)
 #define ACCESS_READ_RAZ(0  0)
 #define ACCESS_READ_MASK(x)((x)  (1  0))
@@ -136,3 +143,57 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct 
kvm_run *run, struct kvm_exi
 {
return KVM_EXIT_MMIO;
 }
+
+static bool vgic_ioaddr_overlap(struct kvm *kvm)
+{
+   phys_addr_t dist = kvm-arch.vgic.vgic_dist_base;
+   phys_addr_t cpu = kvm-arch.vgic.vgic_cpu_base;
+
+   if (IS_VGIC_ADDR_UNDEF(dist) || IS_VGIC_ADDR_UNDEF(cpu))
+   return false;
+   if ((dist = cpu  dist + VGIC_DIST_SIZE  cpu) ||
+   (cpu = dist  cpu + VGIC_CPU_SIZE  dist))
+   return true;
+   return false;
+}
+
+int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr)
+{
+   int r = 0;
+   struct vgic_dist *vgic = kvm-arch.vgic;
+
+   if (addr  ~KVM_PHYS_MASK)
+   return -E2BIG;
+
+   if (addr  ~PAGE_MASK)
+   return -EINVAL;
+
+   mutex_lock(kvm-lock);
+   switch (type) {
+   case KVM_VGIC_V2_ADDR_TYPE_DIST:
+   if (!IS_VGIC_ADDR_UNDEF(vgic-vgic_dist_base))
+   return -EEXIST;
+   if (addr + VGIC_DIST_SIZE  addr)
+   return -EINVAL;
+

[PATCH v3 06/13] ARM: KVM: VGIC distributor handling

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Add the GIC distributor emulation code. A number of the GIC features
are simply ignored as they are not required to boot a Linux guest.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_vgic.h |  167 ++
 arch/arm/kvm/vgic.c |  471 +++
 2 files changed, 637 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 9ca8d21..9e60b1d 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -19,10 +19,177 @@
 #ifndef __ASM_ARM_KVM_VGIC_H
 #define __ASM_ARM_KVM_VGIC_H
 
+#include linux/kernel.h
+#include linux/kvm.h
+#include linux/kvm_host.h
+#include linux/irqreturn.h
+#include linux/spinlock.h
+#include linux/types.h
+
+#define VGIC_NR_IRQS   128
+#define VGIC_NR_SHARED_IRQS(VGIC_NR_IRQS - 32)
+#define VGIC_MAX_CPUS  NR_CPUS
+
+/* Sanity checks... */
+#if (VGIC_MAX_CPUS  8)
+#error Invalid number of CPU interfaces
+#endif
+
+#if (VGIC_NR_IRQS  31)
+#error VGIC_NR_IRQS must be a multiple of 32
+#endif
+
+#if (VGIC_NR_IRQS  1024)
+#error VGIC_NR_IRQS must be = 1024
+#endif
+
+/*
+ * The GIC distributor registers describing interrupts have two parts:
+ * - 32 per-CPU interrupts (SGI + PPI)
+ * - a bunch of shared interrups (SPI)
+ */
+struct vgic_bitmap {
+   union {
+   u32 reg[1];
+   unsigned long reg_ul[0];
+   } percpu[VGIC_MAX_CPUS];
+   union {
+   u32 reg[VGIC_NR_SHARED_IRQS / 32];
+   unsigned long reg_ul[0];
+   } shared;
+};
+
+static inline u32 *vgic_bitmap_get_reg(struct vgic_bitmap *x,
+  int cpuid, u32 offset)
+{
+   offset = 2;
+   BUG_ON(offset  (VGIC_NR_IRQS / 32));
+   if (!offset)
+   return x-percpu[cpuid].reg;
+   else
+   return x-shared.reg + offset - 1;
+}
+
+static inline int vgic_bitmap_get_irq_val(struct vgic_bitmap *x,
+int cpuid, int irq)
+{
+   if (irq  32)
+   return test_bit(irq, x-percpu[cpuid].reg_ul);
+
+   return test_bit(irq - 32, x-shared.reg_ul);
+}
+
+static inline void vgic_bitmap_set_irq_val(struct vgic_bitmap *x,
+  int cpuid, int irq, int val)
+{
+   unsigned long *reg;
+
+   if (irq  32)
+   reg = x-percpu[cpuid].reg_ul;
+   else {
+   reg =  x-shared.reg_ul;
+   irq -= 32;
+   }
+
+   if (val)
+   set_bit(irq, reg);
+   else
+   clear_bit(irq, reg);
+}
+
+static inline unsigned long *vgic_bitmap_get_cpu_map(struct vgic_bitmap *x,
+int cpuid)
+{
+   if (unlikely(cpuid = VGIC_MAX_CPUS))
+   return NULL;
+   return x-percpu[cpuid].reg_ul;
+}
+
+static inline unsigned long *vgic_bitmap_get_shared_map(struct vgic_bitmap *x)
+{
+   return x-shared.reg_ul;
+}
+
+struct vgic_bytemap {
+   union {
+   u32 reg[8];
+   unsigned long reg_ul[0];
+   } percpu[VGIC_MAX_CPUS];
+   union {
+   u32 reg[VGIC_NR_SHARED_IRQS  / 4];
+   unsigned long reg_ul[0];
+   } shared;
+};
+
+static inline u32 *vgic_bytemap_get_reg(struct vgic_bytemap *x,
+   int cpuid, u32 offset)
+{
+   offset = 2;
+   BUG_ON(offset  (VGIC_NR_IRQS / 4));
+   if (offset  4)
+   return x-percpu[cpuid].reg + offset;
+   else
+   return x-shared.reg + offset - 8;
+}
+
+static inline int vgic_bytemap_get_irq_val(struct vgic_bytemap *x,
+  int cpuid, int irq)
+{
+   u32 *reg, shift;
+   shift = (irq  3) * 8;
+   reg = vgic_bytemap_get_reg(x, cpuid, irq);
+   return (*reg  shift)  0xff;
+}
+
+static inline void vgic_bytemap_set_irq_val(struct vgic_bytemap *x,
+   int cpuid, int irq, int val)
+{
+   u32 *reg, shift;
+   shift = (irq  3) * 8;
+   reg = vgic_bytemap_get_reg(x, cpuid, irq);
+   *reg = ~(0xff  shift);
+   *reg |= (val  0xff)  shift;
+}
+
 struct vgic_dist {
+#ifdef CONFIG_KVM_ARM_VGIC
+   spinlock_t  lock;
+
+   /* Virtual control interface mapping */
+   void __iomem*vctrl_base;
+
/* Distributor and vcpu interface mapping in the guest */
phys_addr_t vgic_dist_base;
phys_addr_t vgic_cpu_base;
+
+   /* Distributor enabled */
+   u32 enabled;
+
+   /* Interrupt enabled (one bit per IRQ) */
+   struct vgic_bitmap  irq_enabled;
+
+   /* Interrupt 'pin' level */
+   struct vgic_bitmap  irq_state;
+
+   /* Level-triggered interrupt in

[PATCH v3 07/13] ARM: KVM: VGIC virtual CPU interface management

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Add VGIC virtual CPU interface code, picking pending interrupts
from the distributor and stashing them in the VGIC control interface
list registers.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_vgic.h |   41 +++
 arch/arm/kvm/vgic.c |  226 +++
 2 files changed, 266 insertions(+), 1 deletion(-)

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 9e60b1d..7229324 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -193,8 +193,45 @@ struct vgic_dist {
 };
 
 struct vgic_cpu {
+#ifdef CONFIG_KVM_ARM_VGIC
+   /* per IRQ to LR mapping */
+   u8  vgic_irq_lr_map[VGIC_NR_IRQS];
+
+   /* Pending interrupts on this VCPU */
+   DECLARE_BITMAP( pending, VGIC_NR_IRQS);
+
+   /* Bitmap of used/free list registers */
+   DECLARE_BITMAP( lr_used, 64);
+
+   /* Number of list registers on this CPU */
+   int nr_lr;
+
+   /* CPU vif control registers for world switch */
+   u32 vgic_hcr;
+   u32 vgic_vmcr;
+   u32 vgic_misr;  /* Saved only */
+   u32 vgic_eisr[2];   /* Saved only */
+   u32 vgic_elrsr[2];  /* Saved only */
+   u32 vgic_apr;
+   u32 vgic_lr[64];/* A15 has only 4... */
+#endif
 };
 
+#define VGIC_HCR_EN(1  0)
+#define VGIC_HCR_UIE   (1  1)
+
+#define VGIC_LR_VIRTUALID  (0x3ff  0)
+#define VGIC_LR_PHYSID_CPUID   (7  10)
+#define VGIC_LR_STATE  (3  28)
+#define VGIC_LR_PENDING_BIT(1  28)
+#define VGIC_LR_ACTIVE_BIT (1  29)
+#define VGIC_LR_EOI(1  19)
+
+#define VGIC_MISR_EOI  (1  0)
+#define VGIC_MISR_U(1  1)
+
+#define LR_EMPTY   0xff
+
 struct kvm;
 struct kvm_vcpu;
 struct kvm_run;
@@ -202,9 +239,13 @@ struct kvm_exit_mmio;
 
 #ifdef CONFIG_KVM_ARM_VGIC
 int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
+void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
+void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
+int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  struct kvm_exit_mmio *mmio);
 
+#define irqchip_in_kernel(k)   (!!((k)-arch.vgic.vctrl_base))
 #else
 static inline int kvm_vgic_hyp_init(void)
 {
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 82feee8..d7cdec5 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -587,7 +587,25 @@ static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 
reg)
 
 static int compute_pending_for_cpu(struct kvm_vcpu *vcpu)
 {
-   return 0;
+   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
+   unsigned long *pending, *enabled, *pend;
+   int vcpu_id;
+
+   vcpu_id = vcpu-vcpu_id;
+   pend = vcpu-arch.vgic_cpu.pending;
+
+   pending = vgic_bitmap_get_cpu_map(dist-irq_state, vcpu_id);
+   enabled = vgic_bitmap_get_cpu_map(dist-irq_enabled, vcpu_id);
+   bitmap_and(pend, pending, enabled, 32);
+
+   pending = vgic_bitmap_get_shared_map(dist-irq_state);
+   enabled = vgic_bitmap_get_shared_map(dist-irq_enabled);
+   bitmap_and(pend + 1, pending, enabled, VGIC_NR_SHARED_IRQS);
+   bitmap_and(pend + 1, pend + 1,
+  vgic_bitmap_get_shared_map(dist-irq_spi_target[vcpu_id]),
+  VGIC_NR_SHARED_IRQS);
+
+   return (find_first_bit(pend, VGIC_NR_IRQS)  VGIC_NR_IRQS);
 }
 
 /*
@@ -613,6 +631,212 @@ static void vgic_update_state(struct kvm *kvm)
}
 }
 
+#define LR_PHYSID(lr)  (((lr)  VGIC_LR_PHYSID_CPUID)  10)
+#define MK_LR_PEND(src, irq)   (VGIC_LR_PENDING_BIT | ((src)  10) | (irq))
+/*
+ * Queue an interrupt to a CPU virtual interface. Return true on success,
+ * or false if it wasn't possible to queue it.
+ */
+static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 sgi_source_id, int irq)
+{
+   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
+   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
+   int lr, is_level;
+
+   /* Sanitize the input... */
+   BUG_ON(sgi_source_id  ~7);
+   BUG_ON(sgi_source_id  irq  15);
+   BUG_ON(irq = VGIC_NR_IRQS);
+
+   kvm_debug(Queue IRQ%d\n, irq);
+
+   lr = vgic_cpu-vgic_irq_lr_map[irq];
+   is_level = !vgic_irq_is_edge(dist, irq);
+
+   /* Do we have an active interrupt for the same CPUID? */
+   if (lr != LR_EMPTY 
+   (LR_PHYSID(vgic_cpu-vgic_lr[lr]) == sgi_source_id)) {
+   kvm_debug(LR%d piggyback for IRQ%d %x\n, lr, irq, 
vgic_cpu-vgic_lr[lr]);
+   BUG_ON(!test_bit(lr, vgic_cpu-lr_used));
+   vgic_cpu-vgic_lr[lr] |= VGIC_LR_PENDING_BIT;
+   if (is_level)
+   vgic_cpu-vgic_lr[lr] |=

[PATCH v3 08/13] ARM: KVM: vgic: retire queued, disabled interrupts

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

An interrupt may have been disabled after being made pending on the
CPU interface (the classic case is a timer running while we're
rebooting the guest - the interrupt would kick as soon as the CPU
interface gets enabled, with deadly consequences).

The solution is to examine already active LRs, and check the
interrupt is still enabled. If not, just retire it.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kvm/vgic.c |   30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index d7cdec5..dda5623 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -633,6 +633,34 @@ static void vgic_update_state(struct kvm *kvm)
 
 #define LR_PHYSID(lr)  (((lr)  VGIC_LR_PHYSID_CPUID)  10)
 #define MK_LR_PEND(src, irq)   (VGIC_LR_PENDING_BIT | ((src)  10) | (irq))
+
+/*
+ * An interrupt may have been disabled after being made pending on the
+ * CPU interface (the classic case is a timer running while we're
+ * rebooting the guest - the interrupt would kick as soon as the CPU
+ * interface gets enabled, with deadly consequences).
+ *
+ * The solution is to examine already active LRs, and check the
+ * interrupt is still enabled. If not, just retire it.
+ */
+static void vgic_retire_disabled_irqs(struct kvm_vcpu *vcpu)
+{
+   struct vgic_cpu *vgic_cpu = vcpu-arch.vgic_cpu;
+   struct vgic_dist *dist = vcpu-kvm-arch.vgic;
+   int lr;
+
+   for_each_set_bit(lr, vgic_cpu-lr_used, vgic_cpu-nr_lr) {
+   int irq = vgic_cpu-vgic_lr[lr]  VGIC_LR_VIRTUALID;
+
+   if (!vgic_bitmap_get_irq_val(dist-irq_enabled,
+vcpu-vcpu_id, irq)) {
+   vgic_cpu-vgic_irq_lr_map[irq] = LR_EMPTY;
+   clear_bit(lr, vgic_cpu-lr_used);
+   vgic_cpu-vgic_lr[lr] = ~VGIC_LR_STATE;
+   }
+   }
+}
+
 /*
  * Queue an interrupt to a CPU virtual interface. Return true on success,
  * or false if it wasn't possible to queue it.
@@ -696,6 +724,8 @@ static void __kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu)
 
vcpu_id = vcpu-vcpu_id;
 
+   vgic_retire_disabled_irqs(vcpu);
+
/*
 * We may not have any pending interrupt, or the interrupts
 * may have been serviced from another vcpu. In all cases,

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 09/13] ARM: KVM: VGIC interrupt injection

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Plug the interrupt injection code. Interrupts can now be generated
from user space.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_vgic.h |8 +++
 arch/arm/kvm/arm.c  |   29 +
 arch/arm/kvm/vgic.c |   90 +++
 3 files changed, 127 insertions(+)

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 7229324..6e3d303 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -241,6 +241,8 @@ struct kvm_exit_mmio;
 int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
 void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
+int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
+   bool level);
 int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu);
 bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *run,
  struct kvm_exit_mmio *mmio);
@@ -271,6 +273,12 @@ static inline void kvm_vgic_vcpu_init(struct kvm_vcpu 
*vcpu) {}
 static inline void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu) {}
 static inline void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu) {}
 
+static inline int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid,
+ const struct kvm_irq_level *irq)
+{
+   return 0;
+}
+
 static inline int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
 {
return 0;
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index f92b4ec..877e285 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -763,10 +763,31 @@ int kvm_vm_ioctl_irq_line(struct kvm *kvm, struct 
kvm_irq_level *irq_level)
 
switch (irq_type) {
case KVM_ARM_IRQ_TYPE_CPU:
+   if (irqchip_in_kernel(kvm))
+   return -ENXIO;
+
if (irq_num  KVM_ARM_IRQ_CPU_FIQ)
return -EINVAL;
 
return vcpu_interrupt_line(vcpu, irq_num, level);
+#ifdef CONFIG_KVM_ARM_VGIC
+   case KVM_ARM_IRQ_TYPE_PPI:
+   if (!irqchip_in_kernel(kvm))
+   return -ENXIO;
+
+   if (irq_num  16 || irq_num  31)
+   return -EINVAL;
+
+   return kvm_vgic_inject_irq(kvm, vcpu-vcpu_id, irq_num, level);
+   case KVM_ARM_IRQ_TYPE_SPI:
+   if (!irqchip_in_kernel(kvm))
+   return -ENXIO;
+
+   if (irq_num  32 || irq_num  KVM_ARM_IRQ_GIC_MAX)
+   return -EINVAL;
+
+   return kvm_vgic_inject_irq(kvm, 0, irq_num, level);
+#endif
}
 
return -EINVAL;
@@ -848,6 +869,14 @@ long kvm_arch_vm_ioctl(struct file *filp,
void __user *argp = (void __user *)arg;
 
switch (ioctl) {
+#ifdef CONFIG_KVM_ARM_VGIC
+   case KVM_CREATE_IRQCHIP: {
+   if (vgic_present)
+   return kvm_vgic_create(kvm);
+   else
+   return -EINVAL;
+   }
+#endif
case KVM_SET_DEVICE_ADDRESS: {
struct kvm_device_address dev_addr;
 
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index dda5623..70040bb 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -75,6 +75,7 @@
 #define ACCESS_WRITE_MASK(x)   ((x)  (3  1))
 
 static void vgic_update_state(struct kvm *kvm);
+static void vgic_kick_vcpus(struct kvm *kvm);
 static void vgic_dispatch_sgi(struct kvm_vcpu *vcpu, u32 reg);
 
 static inline int vgic_irq_is_edge(struct vgic_dist *dist, int irq)
@@ -542,6 +543,9 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run 
*run, struct kvm_exi
kvm_prepare_mmio(run, mmio);
kvm_handle_mmio_return(vcpu, run);
 
+   if (updated_state)
+   vgic_kick_vcpus(vcpu-kvm);
+
return true;
 }
 
@@ -867,6 +871,92 @@ int kvm_vgic_vcpu_pending_irq(struct kvm_vcpu *vcpu)
return test_bit(vcpu-vcpu_id, dist-irq_pending_on_cpu);
 }
 
+static void vgic_kick_vcpus(struct kvm *kvm)
+{
+   struct kvm_vcpu *vcpu;
+   int c;
+
+   /*
+* We've injected an interrupt, time to find out who deserves
+* a good kick...
+*/
+   kvm_for_each_vcpu(c, vcpu, kvm) {
+   if (kvm_vgic_vcpu_pending_irq(vcpu))
+   kvm_vcpu_kick(vcpu);
+   }
+}
+
+static bool vgic_update_irq_state(struct kvm *kvm, int cpuid,
+ unsigned int irq_num, bool level)
+{
+   struct vgic_dist *dist = kvm-arch.vgic;
+   struct kvm_vcpu *vcpu;
+   int is_edge, is_level, state;
+   int enabled;
+   bool ret = true;
+
+   spin_lock(dist-lock);
+
+   is_edge = vgic_irq_is_edge(dist, irq_num);
+   is_level = !is_edge;
+   state = vgic_bitmap_get_irq_val(dist-irq_state, cpuid, irq_num);
+
+

[PATCH v3 10/13] ARM: KVM: VGIC control interface world switch

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Enable the VGIC control interface to be save-restored on world switch.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arm.h |   12 +++
 arch/arm/kernel/asm-offsets.c  |   12 +++
 arch/arm/kvm/interrupts_head.S |   68 
 3 files changed, 92 insertions(+)

diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h
index 4f1bb01..e1e39d6 100644
--- a/arch/arm/include/asm/kvm_arm.h
+++ b/arch/arm/include/asm/kvm_arm.h
@@ -188,4 +188,16 @@
 #define HSR_EC_DABT(0x24)
 #define HSR_EC_DABT_HYP(0x25)
 
+/* GICH offsets */
+#define GICH_HCR   0x0
+#define GICH_VTR   0x4
+#define GICH_VMCR  0x8
+#define GICH_MISR  0x10
+#define GICH_EISR0 0x20
+#define GICH_EISR1 0x24
+#define GICH_ELRSR00x30
+#define GICH_ELRSR10x34
+#define GICH_APR   0xf0
+#define GICH_LR0   0x100
+
 #endif /* __ARM_KVM_ARM_H__ */
diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index cf97d92..fba332b 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -167,6 +167,18 @@ int main(void)
   DEFINE(VCPU_HxFAR,   offsetof(struct kvm_vcpu, arch.hxfar));
   DEFINE(VCPU_HPFAR,   offsetof(struct kvm_vcpu, arch.hpfar));
   DEFINE(VCPU_HYP_PC,  offsetof(struct kvm_vcpu, arch.hyp_pc));
+#ifdef CONFIG_KVM_ARM_VGIC
+  DEFINE(VCPU_VGIC_CPU,offsetof(struct kvm_vcpu, 
arch.vgic_cpu));
+  DEFINE(VGIC_CPU_HCR, offsetof(struct vgic_cpu, vgic_hcr));
+  DEFINE(VGIC_CPU_VMCR,offsetof(struct vgic_cpu, vgic_vmcr));
+  DEFINE(VGIC_CPU_MISR,offsetof(struct vgic_cpu, vgic_misr));
+  DEFINE(VGIC_CPU_EISR,offsetof(struct vgic_cpu, vgic_eisr));
+  DEFINE(VGIC_CPU_ELRSR,   offsetof(struct vgic_cpu, vgic_elrsr));
+  DEFINE(VGIC_CPU_APR, offsetof(struct vgic_cpu, vgic_apr));
+  DEFINE(VGIC_CPU_LR,  offsetof(struct vgic_cpu, vgic_lr));
+  DEFINE(VGIC_CPU_NR_LR,   offsetof(struct vgic_cpu, nr_lr));
+  DEFINE(KVM_VGIC_VCTRL,   offsetof(struct kvm, arch.vgic.vctrl_base));
+#endif
   DEFINE(KVM_VTTBR,offsetof(struct kvm, arch.vttbr));
 #endif
   return 0; 
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 2ac8b4a..c2423d8 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -341,6 +341,45 @@
  * @vcpup: Register pointing to VCPU struct
  */
 .macro save_vgic_state vcpup
+#ifdef CONFIG_KVM_ARM_VGIC
+   /* Get VGIC VCTRL base into r2 */
+   ldr r2, [\vcpup, #VCPU_KVM]
+   ldr r2, [r2, #KVM_VGIC_VCTRL]
+   cmp r2, #0
+   beq 2f
+
+   /* Compute the address of struct vgic_cpu */
+   add r11, \vcpup, #VCPU_VGIC_CPU
+
+   /* Save all interesting registers */
+   ldr r3, [r2, #GICH_HCR]
+   ldr r4, [r2, #GICH_VMCR]
+   ldr r5, [r2, #GICH_MISR]
+   ldr r6, [r2, #GICH_EISR0]
+   ldr r7, [r2, #GICH_EISR1]
+   ldr r8, [r2, #GICH_ELRSR0]
+   ldr r9, [r2, #GICH_ELRSR1]
+   ldr r10, [r2, #GICH_APR]
+
+   str r3, [r11, #VGIC_CPU_HCR]
+   str r4, [r11, #VGIC_CPU_VMCR]
+   str r5, [r11, #VGIC_CPU_MISR]
+   str r6, [r11, #VGIC_CPU_EISR]
+   str r7, [r11, #(VGIC_CPU_EISR + 4)]
+   str r8, [r11, #VGIC_CPU_ELRSR]
+   str r9, [r11, #(VGIC_CPU_ELRSR + 4)]
+   str r10, [r11, #VGIC_CPU_APR]
+
+   /* Save list registers */
+   add r2, r2, #GICH_LR0
+   add r3, r11, #VGIC_CPU_LR
+   ldr r4, [r11, #VGIC_CPU_NR_LR]
+1: ldr r6, [r2], #4
+   str r6, [r3], #4
+   subsr4, r4, #1
+   bne 1b
+2:
+#endif
 .endm
 
 /*
@@ -348,6 +387,35 @@
  * @vcpup: Register pointing to VCPU struct
  */
 .macro restore_vgic_state  vcpup
+#ifdef CONFIG_KVM_ARM_VGIC
+   /* Get VGIC VCTRL base into r2 */
+   ldr r2, [\vcpup, #VCPU_KVM]
+   ldr r2, [r2, #KVM_VGIC_VCTRL]
+   cmp r2, #0
+   beq 2f
+
+   /* Compute the address of struct vgic_cpu */
+   add r11, \vcpup, #VCPU_VGIC_CPU
+
+   /* We only restore a minimal set of registers */
+   ldr r3, [r11, #VGIC_CPU_HCR]
+   ldr r4, [r11, #VGIC_CPU_VMCR]
+   ldr r8, [r11, #VGIC_CPU_APR]
+
+   str r3, [r2, #GICH_HCR]
+   str r4, [r2, #GICH_VMCR]
+   str r8, [r2, #GICH_APR]
+
+   /* Restore list registers */
+   add r2, r2, #GICH_LR0
+   add r3, r11, #VGIC_CPU_LR
+   ldr r4, [r11, #VGIC_CPU_NR_LR]
+1: ldr r6, [r3], #4
+   str r6, [r2], #4
+   subsr4, r4, #1
+   bne 1b
+2:
+#endif
 .endm
 
 /* Configures the HSTR (Hyp System Trap Register) on entry/return

--
To unsubscribe from this list: send the

[PATCH v3 11/13] ARM: KVM: VGIC initialisation code

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Add the init code for the hypervisor, the virtual machine, and
the virtual CPUs.

An interrupt handler is also wired to allow the VGIC maintenance
interrupts, used to deal with level triggered interrupts and LR
underflows.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_vgic.h |   11 ++
 arch/arm/kvm/arm.c  |   14 ++
 arch/arm/kvm/vgic.c |  237 +++
 3 files changed, 258 insertions(+), 4 deletions(-)

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 6e3d303..1287f75 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -154,6 +154,7 @@ static inline void vgic_bytemap_set_irq_val(struct 
vgic_bytemap *x,
 struct vgic_dist {
 #ifdef CONFIG_KVM_ARM_VGIC
spinlock_t  lock;
+   boolready;
 
/* Virtual control interface mapping */
void __iomem*vctrl_base;
@@ -239,6 +240,10 @@ struct kvm_exit_mmio;
 
 #ifdef CONFIG_KVM_ARM_VGIC
 int kvm_vgic_set_addr(struct kvm *kvm, unsigned long type, u64 addr);
+int kvm_vgic_hyp_init(void);
+int kvm_vgic_init(struct kvm *kvm);
+int kvm_vgic_create(struct kvm *kvm);
+void kvm_vgic_vcpu_init(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_to_cpu(struct kvm_vcpu *vcpu);
 void kvm_vgic_sync_from_cpu(struct kvm_vcpu *vcpu);
 int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, unsigned int irq_num,
@@ -248,6 +253,7 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run 
*run,
  struct kvm_exit_mmio *mmio);
 
 #define irqchip_in_kernel(k)   (!!((k)-arch.vgic.vctrl_base))
+#define vgic_initialized(k)((k)-arch.vgic.ready)
 #else
 static inline int kvm_vgic_hyp_init(void)
 {
@@ -294,6 +300,11 @@ static inline int irqchip_in_kernel(struct kvm *kvm)
 {
return 0;
 }
+
+static inline bool kvm_vgic_initialized(struct kvm *kvm)
+{
+   return true;
+}
 #endif
 
 #endif
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 877e285..d367831 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -187,6 +187,8 @@ int kvm_dev_ioctl_check_extension(long ext)
switch (ext) {
 #ifdef CONFIG_KVM_ARM_VGIC
case KVM_CAP_IRQCHIP:
+   r = vgic_present;
+   break;
 #endif
case KVM_CAP_USER_MEMORY:
case KVM_CAP_DESTROY_MEMORY_REGION_WORKS:
@@ -622,6 +624,14 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
if (unlikely(vcpu-arch.target  0))
return -ENOEXEC;
 
+   /* Initalize the VGIC before running a vcpu the first time on this VM */
+   if (unlikely(irqchip_in_kernel(vcpu-kvm) 
+!vgic_initialized(vcpu-kvm))) {
+   ret = kvm_vgic_init(vcpu-kvm);
+   if (ret)
+   return ret;
+   }
+
if (run-exit_reason == KVM_EXIT_MMIO) {
ret = kvm_handle_mmio_return(vcpu, vcpu-run);
if (ret)
@@ -1023,8 +1033,8 @@ static int init_hyp_mode(void)
 * Init HYP view of VGIC
 */
err = kvm_vgic_hyp_init();
-   if (err)
-   goto out_free_mappings;
+   if (!err)
+   vgic_present = true;
 
return 0;
 out_free_vfp:
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 70040bb..415ddb8 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -20,7 +20,14 @@
 #include linux/kvm_host.h
 #include linux/interrupt.h
 #include linux/io.h
+#include linux/of.h
+#include linux/of_address.h
+#include linux/of_irq.h
+
 #include asm/kvm_emulate.h
+#include asm/hardware/gic.h
+#include asm/kvm_arm.h
+#include asm/kvm_mmu.h
 
 /*
  * How the whole thing works (courtesy of Christoffer Dall):
@@ -59,11 +66,18 @@
  */
 
 #define VGIC_ADDR_UNDEF(-1)
-#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == (typeof(_x))VGIC_ADDR_UNDEF)
+#define IS_VGIC_ADDR_UNDEF(_x)  ((_x) == VGIC_ADDR_UNDEF)
 
 #define VGIC_DIST_SIZE 0x1000
 #define VGIC_CPU_SIZE  0x2000
 
+/* Physical address of vgic virtual cpu interface */
+static phys_addr_t vgic_vcpu_base;
+
+/* Virtual control interface base address */
+static void __iomem *vgic_vctrl_base;
+
+static struct device_node *vgic_node;
 
 #define ACCESS_READ_VALUE  (1  0)
 #define ACCESS_READ_RAZ(0  0)
@@ -527,7 +541,7 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run 
*run, struct kvm_exi
 
if (!irqchip_in_kernel(vcpu-kvm) ||
mmio-phys_addr  base ||
-   (mmio-phys_addr + mmio-len)  (base + dist-vgic_dist_size))
+   (mmio-phys_addr + mmio-len)  (base + VGIC_DIST_SIZE))
return false;
 
range = find_matching_range(vgic_ranges, mmio, base);
@@ -957,6 +971,225 @@ int kvm_vgic_inject_irq(struct kvm *kvm, int cpuid, 
unsigned int irq_num,

[PATCH v3 12/13] ARM: KVM: vgic: reduce the number of vcpu kick

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

If we have level interrupts already programmed to fire on a vcpu,
there is no reason to kick it after injecting a new interrupt,
as we're guaranteed that we'll exit when the level interrupt will
be EOId (VGIC_LR_EOI is set).

The exit will force a reload of the VGIC, injecting the new interrupts.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_vgic.h |   10 ++
 arch/arm/kvm/arm.c  |   10 +-
 arch/arm/kvm/vgic.c |   10 --
 3 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h
index 1287f75..447ec7a 100644
--- a/arch/arm/include/asm/kvm_vgic.h
+++ b/arch/arm/include/asm/kvm_vgic.h
@@ -215,6 +215,9 @@ struct vgic_cpu {
u32 vgic_elrsr[2];  /* Saved only */
u32 vgic_apr;
u32 vgic_lr[64];/* A15 has only 4... */
+
+   /* Number of level-triggered interrupt in progress */
+   atomic_tirq_active_count;
 #endif
 };
 
@@ -254,6 +257,8 @@ bool vgic_handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run 
*run,
 
 #define irqchip_in_kernel(k)   (!!((k)-arch.vgic.vctrl_base))
 #define vgic_initialized(k)((k)-arch.vgic.ready)
+#define vgic_active_irq(v) 
(atomic_read((v)-arch.vgic_cpu.irq_active_count) == 0)
+
 #else
 static inline int kvm_vgic_hyp_init(void)
 {
@@ -305,6 +310,11 @@ static inline bool kvm_vgic_initialized(struct kvm *kvm)
 {
return true;
 }
+
+static inline int vgic_active_irq(struct kvm_vcpu *vcpu)
+{
+   return 0;
+}
 #endif
 
 #endif
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index d367831..cab9cb7 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -94,7 +94,15 @@ int kvm_arch_hardware_enable(void *garbage)
 
 int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
 {
-   return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE;
+   if (kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE) {
+   if (vgic_active_irq(vcpu) 
+   cmpxchg(vcpu-mode, EXITING_GUEST_MODE, IN_GUEST_MODE) == 
EXITING_GUEST_MODE)
+   return 0;
+
+   return 1;
+   }
+
+   return 0;
 }
 
 void kvm_arch_hardware_disable(void *garbage)
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 415ddb8..146de1d 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -705,8 +705,10 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
sgi_source_id, int irq)
kvm_debug(LR%d piggyback for IRQ%d %x\n, lr, irq, 
vgic_cpu-vgic_lr[lr]);
BUG_ON(!test_bit(lr, vgic_cpu-lr_used));
vgic_cpu-vgic_lr[lr] |= VGIC_LR_PENDING_BIT;
-   if (is_level)
+   if (is_level) {
vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI;
+   atomic_inc(vgic_cpu-irq_active_count);
+   }
return true;
}
 
@@ -718,8 +720,10 @@ static bool vgic_queue_irq(struct kvm_vcpu *vcpu, u8 
sgi_source_id, int irq)
 
kvm_debug(LR%d allocated for IRQ%d %x\n, lr, irq, sgi_source_id);
vgic_cpu-vgic_lr[lr] = MK_LR_PEND(sgi_source_id, irq);
-   if (is_level)
+   if (is_level) {
vgic_cpu-vgic_lr[lr] |= VGIC_LR_EOI;
+   atomic_inc(vgic_cpu-irq_active_count);
+   }
 
vgic_cpu-vgic_irq_lr_map[irq] = lr;
clear_bit(lr, (unsigned long *)vgic_cpu-vgic_elrsr);
@@ -1011,6 +1015,8 @@ static irqreturn_t vgic_maintenance_handler(int irq, void 
*data)
 
vgic_bitmap_set_irq_val(dist-irq_active,
vcpu-vcpu_id, irq, 0);
+   atomic_dec(vgic_cpu-irq_active_count);
+   smp_mb();
vgic_cpu-vgic_lr[lr] = ~VGIC_LR_EOI;
writel_relaxed(vgic_cpu-vgic_lr[lr],
   dist-vctrl_base + GICH_LR0 + (lr  2));

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 13/13] ARM: KVM: Add VGIC configuration option

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

It is now possible to select the VGIC configuration option.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kvm/Kconfig |7 +++
 1 file changed, 7 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 47c5500..867551e 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -40,6 +40,13 @@ config KVM_ARM_HOST
---help---
  Provides host support for ARM processors.
 
+config KVM_ARM_VGIC
+bool KVM support for Virtual GIC
+   depends on KVM_ARM_HOST  OF
+   select HAVE_KVM_IRQCHIP
+   ---help---
+ Adds support for a hardware assisted, in-kernel GIC emulation.
+
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 0/5] KVM/ARM Architected Timers support

2012-10-22 Thread Christoffer Dall

The following series implements support for the architected generic
timers for KVM/ARM.

This is an unmodified repost of the previously submitted series.

This patch series can also be pulled from:
git://github.com/virtualopensystems/linux-kvm-arm.git
branch: kvm-arm-v13-vgic-timers

---

Marc Zyngier (5):
  ARM: arch_timers: switch to physical timers if HYP mode is available
  ARM: KVM: arch_timers: Add minimal infrastructure
  ARM: KVM: arch_timers: Add guest timer core support
  ARM: KVM: arch_timers: Add timer world switch
  ARM: KVM: arch_timers: Wire the init code and config option


 arch/arm/include/asm/kvm_arch_timer.h |  100 
 arch/arm/include/asm/kvm_host.h   |5 +
 arch/arm/kernel/arch_timer.c  |7 +
 arch/arm/kernel/asm-offsets.c |8 +
 arch/arm/kvm/Kconfig  |7 +
 arch/arm/kvm/Makefile |1 
 arch/arm/kvm/arm.c|   14 ++
 arch/arm/kvm/interrupts.S |2 
 arch/arm/kvm/interrupts_head.S|   60 ++
 arch/arm/kvm/reset.c  |9 +
 arch/arm/kvm/timer.c  |  204 +
 arch/arm/kvm/vgic.c   |1 
 12 files changed, 417 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/include/asm/kvm_arch_timer.h
 create mode 100644 arch/arm/kvm/timer.c

-- 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 1/5] ARM: arch_timers: switch to physical timers if HYP mode is available

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

If we're booted in HYP mode, it is possible that we'll run some
kind of virtualized environment. In this case, it is a better to
switch to the physical timers, and leave the virtual timers to
guests.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
---
 arch/arm/kernel/arch_timer.c |7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/arch/arm/kernel/arch_timer.c b/arch/arm/kernel/arch_timer.c
index c8ef207..8adcd04 100644
--- a/arch/arm/kernel/arch_timer.c
+++ b/arch/arm/kernel/arch_timer.c
@@ -26,6 +26,7 @@
 #include asm/arch_timer.h
 #include asm/system_info.h
 #include asm/sched_clock.h
+#include asm/virt.h
 
 static unsigned long arch_timer_rate;
 
@@ -489,10 +490,14 @@ int __init arch_timer_of_register(void)
arch_timer_ppi[i] = irq_of_parse_and_map(np, i);
 
/*
+* If HYP mode is available, we know that the physical timer
+* has been configured to be accessible from PL1. Use it, so
+* that a guest can use the virtual timer instead.
+*
 * If no interrupt provided for virtual timer, we'll have to
 * stick to the physical timer. It'd better be accessible...
 */
-   if (!arch_timer_ppi[VIRT_PPI]) {
+   if (is_hyp_mode_available() || !arch_timer_ppi[VIRT_PPI]) {
arch_timer_use_virtual = false;
 
if (!arch_timer_ppi[PHYS_SECURE_PPI] ||

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 2/5] ARM: KVM: arch_timers: Add minimal infrastructure

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Add some very minimal architected timer related infrastructure.
For the moment, we just provide empty structures, and enable/disable
access to the physical timer across world switch.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arch_timer.h |   45 +
 arch/arm/include/asm/kvm_host.h   |5 
 arch/arm/kvm/interrupts.S |2 +
 arch/arm/kvm/interrupts_head.S|   19 ++
 4 files changed, 71 insertions(+)
 create mode 100644 arch/arm/include/asm/kvm_arch_timer.h

diff --git a/arch/arm/include/asm/kvm_arch_timer.h 
b/arch/arm/include/asm/kvm_arch_timer.h
new file mode 100644
index 000..513b852
--- /dev/null
+++ b/arch/arm/include/asm/kvm_arch_timer.h
@@ -0,0 +1,45 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Marc Zyngier marc.zyng...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#ifndef __ASM_ARM_KVM_ARCH_TIMER_H
+#define __ASM_ARM_KVM_ARCH_TIMER_H
+
+struct arch_timer_kvm {
+};
+
+struct arch_timer_cpu {
+};
+
+#ifndef CONFIG_KVM_ARM_TIMER
+static inline int kvm_timer_hyp_init(void)
+{
+   return 0;
+};
+
+static inline int kvm_timer_init(struct kvm *kvm)
+{
+   return 0;
+}
+
+static inline void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu) {}
+static inline void kvm_timer_sync_to_cpu(struct kvm_vcpu *vcpu) {}
+static inline void kvm_timer_sync_from_cpu(struct kvm_vcpu *vcpu) {}
+static inline void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu) {}
+#endif
+
+#endif
diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
index 9bbccdf..7127fe7 100644
--- a/arch/arm/include/asm/kvm_host.h
+++ b/arch/arm/include/asm/kvm_host.h
@@ -23,6 +23,7 @@
 #include asm/kvm_asm.h
 #include asm/fpstate.h
 #include asm/kvm_vgic.h
+#include asm/kvm_arch_timer.h
 
 #define KVM_MAX_VCPUS NR_CPUS
 #define KVM_MEMORY_SLOTS 32
@@ -47,6 +48,9 @@ struct kvm_arch {
/* VTTBR value associated with below pgd and vmid */
u64vttbr;
 
+   /* Timer */
+   struct arch_timer_kvm   timer;
+
/*
 * Anything that is not used directly from assembly code goes
 * here.
@@ -97,6 +101,7 @@ struct kvm_vcpu_arch {
 
/* VGIC state */
struct vgic_cpu vgic_cpu;
+   struct arch_timer_cpu timer_cpu;
 
/*
 * Anything that is not used directly from assembly code goes
diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S
index e418c9b..5a09e89 100644
--- a/arch/arm/kvm/interrupts.S
+++ b/arch/arm/kvm/interrupts.S
@@ -92,6 +92,7 @@ ENTRY(__kvm_vcpu_run)
save_host_regs
 
restore_vgic_state r0
+   restore_timer_state r0
 
@ Store hardware CP15 state and load guest state
read_cp15_state
@@ -186,6 +187,7 @@ after_vfp_restore:
read_cp15_state 1, r1
write_cp15_state
 
+   save_timer_state r1
save_vgic_state r1
 
restore_host_regs
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index c2423d8..0003aab 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -418,6 +418,25 @@
 #endif
 .endm
 
+#define CNTHCTL_PL1PCTEN   (1  0)
+#define CNTHCTL_PL1PCEN(1  1)
+
+.macro save_timer_statevcpup
+   @ Allow physical timer/counter access for the host
+   mrc p15, 4, r2, c14, c1, 0  @ CNTHCTL
+   orr r2, r2, #(CNTHCTL_PL1PCEN | CNTHCTL_PL1PCTEN)
+   mcr p15, 4, r2, c14, c1, 0  @ CNTHCTL
+.endm
+
+.macro restore_timer_state vcpup
+   @ Disallow physical timer access for the guest
+   @ Physical counter access is allowed
+   mrc p15, 4, r2, c14, c1, 0  @ CNTHCTL
+   orr r2, r2, #CNTHCTL_PL1PCTEN
+   bic r2, r2, #CNTHCTL_PL1PCEN
+   mcr p15, 4, r2, c14, c1, 0  @ CNTHCTL
+.endm
+
 /* Configures the HSTR (Hyp System Trap Register) on entry/return
  * (hardware reset value is 0) */
 .macro set_hstr entry

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 4/5] ARM: KVM: arch_timers: Add timer world switch

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

Do the necessary save/restore dance for the timers in the world
switch code. In the process, allow the guest to read the physical
counter, which is useful for its own clock_event_device.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kernel/asm-offsets.c  |8 
 arch/arm/kvm/arm.c |3 +++
 arch/arm/kvm/interrupts_head.S |   41 
 3 files changed, 52 insertions(+)

diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
index fba332b..813d386 100644
--- a/arch/arm/kernel/asm-offsets.c
+++ b/arch/arm/kernel/asm-offsets.c
@@ -177,6 +177,14 @@ int main(void)
   DEFINE(VGIC_CPU_APR, offsetof(struct vgic_cpu, vgic_apr));
   DEFINE(VGIC_CPU_LR,  offsetof(struct vgic_cpu, vgic_lr));
   DEFINE(VGIC_CPU_NR_LR,   offsetof(struct vgic_cpu, nr_lr));
+#ifdef CONFIG_KVM_ARM_TIMER
+  DEFINE(VCPU_TIMER_CNTV_CTL,  offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_ctl));
+  DEFINE(VCPU_TIMER_CNTV_CVALH,offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_cval32.high));
+  DEFINE(VCPU_TIMER_CNTV_CVALL,offsetof(struct kvm_vcpu, 
arch.timer_cpu.cntv_cval32.low));
+  DEFINE(KVM_TIMER_CNTVOFF_H,  offsetof(struct kvm, 
arch.timer.cntvoff32.high));
+  DEFINE(KVM_TIMER_CNTVOFF_L,  offsetof(struct kvm, arch.timer.cntvoff32.low));
+  DEFINE(KVM_TIMER_ENABLED,offsetof(struct kvm, arch.timer.enabled));
+#endif
   DEFINE(KVM_VGIC_VCTRL,   offsetof(struct kvm, arch.vgic.vctrl_base));
 #endif
   DEFINE(KVM_VTTBR,offsetof(struct kvm, arch.vttbr));
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index cab9cb7..09b7072 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -660,6 +660,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
update_vttbr(vcpu-kvm);
 
kvm_vgic_sync_to_cpu(vcpu);
+   kvm_timer_sync_to_cpu(vcpu);
 
local_irq_disable();
 
@@ -673,6 +674,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 
if (ret = 0 || need_new_vmid_gen(vcpu-kvm)) {
local_irq_enable();
+   kvm_timer_sync_from_cpu(vcpu);
kvm_vgic_sync_from_cpu(vcpu);
continue;
}
@@ -712,6 +714,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct 
kvm_run *run)
 * Back from guest
 */
 
+   kvm_timer_sync_from_cpu(vcpu);
kvm_vgic_sync_from_cpu(vcpu);
 
ret = handle_exit(vcpu, run, ret);
diff --git a/arch/arm/kvm/interrupts_head.S b/arch/arm/kvm/interrupts_head.S
index 0003aab..ece84d1 100644
--- a/arch/arm/kvm/interrupts_head.S
+++ b/arch/arm/kvm/interrupts_head.S
@@ -422,6 +422,25 @@
 #define CNTHCTL_PL1PCEN(1  1)
 
 .macro save_timer_statevcpup
+#ifdef CONFIG_KVM_ARM_TIMER
+   ldr r4, [\vcpup, #VCPU_KVM]
+   ldr r2, [r4, #KVM_TIMER_ENABLED]
+   cmp r2, #0
+   beq 1f
+
+   mrc p15, 0, r2, c14, c3, 1  @ CNTV_CTL
+   and r2, #3
+   str r2, [\vcpup, #VCPU_TIMER_CNTV_CTL]
+   bic r2, #1  @ Clear ENABLE
+   mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
+   isb
+
+   mrrcp15, 3, r2, r3, c14 @ CNTV_CVAL
+   str r3, [\vcpup, #VCPU_TIMER_CNTV_CVALH]
+   str r2, [\vcpup, #VCPU_TIMER_CNTV_CVALL]
+
+1:
+#endif
@ Allow physical timer/counter access for the host
mrc p15, 4, r2, c14, c1, 0  @ CNTHCTL
orr r2, r2, #(CNTHCTL_PL1PCEN | CNTHCTL_PL1PCTEN)
@@ -435,6 +454,28 @@
orr r2, r2, #CNTHCTL_PL1PCTEN
bic r2, r2, #CNTHCTL_PL1PCEN
mcr p15, 4, r2, c14, c1, 0  @ CNTHCTL
+
+#ifdef CONFIG_KVM_ARM_TIMER
+   ldr r4, [\vcpup, #VCPU_KVM]
+   ldr r2, [r4, #KVM_TIMER_ENABLED]
+   cmp r2, #0
+   beq 1f
+
+   ldr r3, [r4, #KVM_TIMER_CNTVOFF_H]
+   ldr r2, [r4, #KVM_TIMER_CNTVOFF_L]
+   mcrrp15, 4, r2, r3, c14 @ CNTVOFF
+   isb
+
+   ldr r3, [\vcpup, #VCPU_TIMER_CNTV_CVALH]
+   ldr r2, [\vcpup, #VCPU_TIMER_CNTV_CVALL]
+   mcrrp15, 3, r2, r3, c14 @ CNTV_CVAL
+
+   ldr r2, [\vcpup, #VCPU_TIMER_CNTV_CTL]
+   and r2, #3
+   mcr p15, 0, r2, c14, c3, 1  @ CNTV_CTL
+   isb
+1:
+#endif
 .endm
 
 /* Configures the HSTR (Hyp System Trap Register) on entry/return

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 5/5] ARM: KVM: arch_timers: Wire the init code and config option

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

It is now possible to select CONFIG_KVM_ARM_TIMER to enable the
KVM architected timer support.

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/kvm/Kconfig  |7 +++
 arch/arm/kvm/Makefile |1 +
 arch/arm/kvm/arm.c|   11 +++
 arch/arm/kvm/vgic.c   |1 +
 4 files changed, 20 insertions(+)

diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig
index 867551e..ade2673 100644
--- a/arch/arm/kvm/Kconfig
+++ b/arch/arm/kvm/Kconfig
@@ -47,6 +47,13 @@ config KVM_ARM_VGIC
---help---
  Adds support for a hardware assisted, in-kernel GIC emulation.
 
+config KVM_ARM_TIMER
+bool KVM support for Architected Timers
+   depends on KVM_ARM_VGIC  ARM_ARCH_TIMER
+   select HAVE_KVM_IRQCHIP
+   ---help---
+ Adds support for the Architected Timers in virtual machines
+
 source drivers/virtio/Kconfig
 
 endif # VIRTUALIZATION
diff --git a/arch/arm/kvm/Makefile b/arch/arm/kvm/Makefile
index 3370c09..6b19e5c 100644
--- a/arch/arm/kvm/Makefile
+++ b/arch/arm/kvm/Makefile
@@ -21,3 +21,4 @@ obj-$(CONFIG_KVM_ARM_HOST) += $(addprefix ../../../virt/kvm/, 
kvm_main.o coalesc
 obj-$(CONFIG_KVM_ARM_HOST) += arm.o guest.o mmu.o emulate.o reset.o
 obj-$(CONFIG_KVM_ARM_HOST) += coproc.o coproc_a15.o mmio.o
 obj-$(CONFIG_KVM_ARM_VGIC) += vgic.o
+obj-$(CONFIG_KVM_ARM_TIMER) += timer.o
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 09b7072..69bec17 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -286,6 +286,7 @@ out:
 void kvm_arch_vcpu_free(struct kvm_vcpu *vcpu)
 {
kvm_mmu_free_memory_caches(vcpu);
+   kvm_timer_vcpu_terminate(vcpu);
kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
 
@@ -323,6 +324,9 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
/* Set up VGIC */
kvm_vgic_vcpu_init(vcpu);
 
+   /* Set up the timer */
+   kvm_timer_vcpu_init(vcpu);
+
return 0;
 }
 
@@ -1047,6 +1051,13 @@ static int init_hyp_mode(void)
if (!err)
vgic_present = true;
 
+   /*
+* Init HYP architected timer support
+*/
+   err = kvm_timer_hyp_init();
+   if (err)
+   goto out_free_mappings;
+
return 0;
 out_free_vfp:
free_percpu(kvm_host_vfp_state);
diff --git a/arch/arm/kvm/vgic.c b/arch/arm/kvm/vgic.c
index 146de1d..090ea79 100644
--- a/arch/arm/kvm/vgic.c
+++ b/arch/arm/kvm/vgic.c
@@ -1168,6 +1168,7 @@ int kvm_vgic_init(struct kvm *kvm)
for (i = 32; i  VGIC_NR_IRQS; i += 4)
vgic_set_target_reg(kvm, 0, i);
 
+   kvm_timer_init(kvm);
kvm-arch.vgic.ready = true;
 out:
mutex_unlock(kvm-lock);

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v3 3/5] ARM: KVM: arch_timers: Add guest timer core support

2012-10-22 Thread Christoffer Dall

From: Marc Zyngier marc.zyng...@arm.com

We can inject a timer interrupt into the guest as a result of
three possible events:
- The virtual timer interrupt has fired while we were still
  executing the guest
- The timer interrupt hasn't fired, but it expired while we
  were doing the world switch
- A hrtimer we programmed earlier has fired

Signed-off-by: Marc Zyngier marc.zyng...@arm.com
Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com
---
 arch/arm/include/asm/kvm_arch_timer.h |   57 +
 arch/arm/kvm/reset.c  |9 +
 arch/arm/kvm/timer.c  |  204 +
 3 files changed, 269 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kvm/timer.c

diff --git a/arch/arm/include/asm/kvm_arch_timer.h 
b/arch/arm/include/asm/kvm_arch_timer.h
index 513b852..bd5e501 100644
--- a/arch/arm/include/asm/kvm_arch_timer.h
+++ b/arch/arm/include/asm/kvm_arch_timer.h
@@ -19,13 +19,68 @@
 #ifndef __ASM_ARM_KVM_ARCH_TIMER_H
 #define __ASM_ARM_KVM_ARCH_TIMER_H
 
+#include linux/clocksource.h
+#include linux/hrtimer.h
+#include linux/workqueue.h
+
 struct arch_timer_kvm {
+#ifdef CONFIG_KVM_ARM_TIMER
+   /* Is the timer enabled */
+   boolenabled;
+
+   /*
+* Virtual offset (kernel access it through cntvoff, HYP code
+* access it as two 32bit values).
+*/
+   union {
+   cycle_t cntvoff;
+   struct {
+   u32 low;/* Restored only */
+   u32 high;   /* Restored only */
+   } cntvoff32;
+   };
+#endif
 };
 
 struct arch_timer_cpu {
+#ifdef CONFIG_KVM_ARM_TIMER
+   /* Registers: control register, timer value */
+   u32 cntv_ctl;   /* Saved/restored */
+   union {
+   cycle_t cntv_cval;
+   struct {
+   u32 low;/* Saved/restored */
+   u32 high;   /* Saved/restored */
+   } cntv_cval32;
+   };
+
+   /*
+* Anything that is not used directly from assembly code goes
+* here.
+*/
+
+   /* Background timer used when the guest is not running */
+   struct hrtimer  timer;
+
+   /* Work queued with the above timer expires */
+   struct work_struct  expired;
+
+   /* Background timer active */
+   boolarmed;
+
+   /* Timer IRQ */
+   const struct kvm_irq_level  *irq;
+#endif
 };
 
-#ifndef CONFIG_KVM_ARM_TIMER
+#ifdef CONFIG_KVM_ARM_TIMER
+int kvm_timer_hyp_init(void);
+int kvm_timer_init(struct kvm *kvm);
+void kvm_timer_vcpu_init(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_to_cpu(struct kvm_vcpu *vcpu);
+void kvm_timer_sync_from_cpu(struct kvm_vcpu *vcpu);
+void kvm_timer_vcpu_terminate(struct kvm_vcpu *vcpu);
+#else
 static inline int kvm_timer_hyp_init(void)
 {
return 0;
diff --git a/arch/arm/kvm/reset.c b/arch/arm/kvm/reset.c
index 290a13d..bb17def 100644
--- a/arch/arm/kvm/reset.c
+++ b/arch/arm/kvm/reset.c
@@ -37,6 +37,12 @@ static struct kvm_regs a15_regs_reset = {
.cpsr = SVC_MODE | PSR_A_BIT | PSR_I_BIT | PSR_F_BIT,
 };
 
+#ifdef CONFIG_KVM_ARM_TIMER
+static const struct kvm_irq_level a15_virt_timer_ppi = {
+   .irq= 27,   /* A7/A15 specific */
+   .level  = 1,
+};
+#endif
 
 
/***
  * Exported reset function
@@ -59,6 +65,9 @@ int kvm_reset_vcpu(struct kvm_vcpu *vcpu)
return -EINVAL;
cpu_reset = a15_regs_reset;
vcpu-arch.midr = read_cpuid_id();
+#ifdef CONFIG_KVM_ARM_TIMER
+   vcpu-arch.timer_cpu.irq = a15_virt_timer_ppi;
+#endif
break;
default:
return -ENODEV;
diff --git a/arch/arm/kvm/timer.c b/arch/arm/kvm/timer.c
new file mode 100644
index 000..a241298
--- /dev/null
+++ b/arch/arm/kvm/timer.c
@@ -0,0 +1,204 @@
+/*
+ * Copyright (C) 2012 ARM Ltd.
+ * Author: Marc Zyngier marc.zyng...@arm.com
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+
+#include linux/of_irq.h
+#include linux/kvm.h
+#include linux/kvm_host.h
+#include linux/interrupt.h
+
+#include asm/arch_timer.h
+

Re: 1.1.1 - 1.1.2 migrate /managedsave issue

2012-10-22 Thread Philipp Hahn

Hello Doug,

On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote:
 I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs.
...
 I had upgraded to qemu-kvm 1.1.2
... 
 qemu: warning: error while loading state for instance 0x0 of device 'ram'
 load of migration failed

That error can be from many things. For me it was that the PXE-ROM images for 
the network cards were updated as well. Their size changed over the next 
power-of-two size, so kvm needed to allocate less/more memory and changed 
some PCI configuration registers, where the size of the ROM region is stored.
On loading the saved state those sizes were compared and failed to validate. 
KVM then aborts loading the saved state with that little helpful message.

So you might want to check, if your case is similar to mine.

I diagnosed that using gdb to single step kvm until I found 
hw/pci.c#get_pci_config_device() returning -EINVAL.

Hope that helps.

Sincerely
Philipp
-- 
Philipp Hahn   Open Source Software Engineer  h...@univention.de
Univention GmbHbe open.   fon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen fax: +49 421 22 232-99
   http://www.univention.de/


signature.asc
Description: This is a digitally signed message part.

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-22 Thread Vasilis Liaskovitis

Hi,
On Thu, Oct 18, 2012 at 02:33:02PM +0200, Avi Kivity wrote:
 On 10/18/2012 11:27 AM, Vasilis Liaskovitis wrote:
  On Wed, Oct 17, 2012 at 12:03:51PM +0200, Avi Kivity wrote:
  On 10/17/2012 11:19 AM, Vasilis Liaskovitis wrote:
   
   I don't think so, but probably there's a limit of DIMMs that real
   controllers have, something like 8 max.
   
   In the case of i440fx specifically, do you mean that we should model the 
   DRB
   (Dram row boundary registers in section 3.2.19 of the i440fx spec) ?
   
   The i440fx DRB registers only supports up to 8 DRAM rows (let's say 1 row
   maps 1-1 to a DimmDevice for this discussion) and only supports up to 
   2GB of
   memory afaict (bit 31 and above is ignored).
   
   I 'd rather not model this part of the i440fx - having only 8 DIMMs 
   seems too
   restrictive. The rest of the patchset supports up to 255 DIMMs so it 
   would be a
   waste imho to model an old pc memory controller that only supports 8 
   DIMMs.
   
   There was also an old discussion about i440fx modeling here:
   https://lists.nongnu.org/archive/html/qemu-devel/2011-07/msg02705.html
   the general direction was that i440fx is too old and we don't want to 
   precisely
   emulate the DRB registers, since they lack flexibility.
   
   Possible solutions:
   
   1) is there a newer and more flexible chipset that we could model?
  
  Look for q35 on this list.
  
  thanks, I 'll take a look. It sounds like the other options below are more
  straightforward now, but let me know if you prefer q35 integration as a 
  priority.
 
 At least validate that what you're doing fits with how q35 works.

In terms of pmc modeling, the q35 page http://wiki.qemu.org/Features/Q35
mentions:

Refactor i440fx to create i440fx-pmc class
ich9: model ICH9 Super I/O chip
ich9: make i440fx-pmc a generic PCNorthBridge class and add support for ich9
northbridge 

is this still the plan? There was an old patchset creating i440fx-pmc here:
http://lists.gnu.org/archive/html/qemu-devel/2012-01/msg03501.html
but I am not sure if it has been dropped or worked on. v3 of the q35 patchset
doesn't include a pmc I think.

It would be good to know what the current plan regarding pmc modeling (for both
q35 and i440fx) is.

thanks,

- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
 After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
 the pieces of io data can be collected and write them to the guest memory
 or MMIO together.
 
 Unfortunately, kvm splits the mmio access into 8 bytes and store them to
 vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
 will cause vcpu-mmio_fragments overflow
 
 The bug can be exposed by isapc (-M isapc):
 
 [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
 [ ..]
 [23154.858083] Call Trace:
 [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
 [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
 [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
 
 
 Actually, we can use one mmio_fragment to store a large mmio access for the
 mmio access is always continuous then split it when we pass the mmio-exit-info
 to userspace. After that, we only need two entries to store mmio info for
 the cross-mmio pages access
 
I wonder can we put the data into coalesced mmio buffer instead of
exiting for each 8 bytes? Is it worth the complexity?

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
 ---
  arch/x86/kvm/x86.c   |  127 
 +-
  include/linux/kvm_host.h |   16 +-
  2 files changed, 84 insertions(+), 59 deletions(-)
 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 8b90dd5..41ceb51 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3779,9 +3779,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t 
 gpa,
  static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
  void *val, int bytes)
  {
 - struct kvm_mmio_fragment *frag = vcpu-mmio_fragments[0];
 -
 - memcpy(vcpu-run-mmio.data, frag-data, frag-len);
   return X86EMUL_CONTINUE;
  }
 
 @@ -3799,6 +3796,64 @@ static const struct read_write_emulator_ops 
 write_emultor = {
   .write = true,
  };
 
 +static bool get_current_mmio_info(struct kvm_vcpu *vcpu, gpa_t *gpa,
 +   unsigned *len, void **data)
 +{
 + struct kvm_mmio_fragment *frag;
 + int cur = vcpu-mmio_cur_fragment;
 +
 + if (cur = vcpu-mmio_nr_fragments)
 + return false;
 +
 + frag = vcpu-mmio_fragments[cur];
 + if (frag-pos = frag-len) {
 + if (++vcpu-mmio_cur_fragment = vcpu-mmio_nr_fragments)
 + return false;
 + frag++;
 + }
 +
 + *gpa = frag-gpa + frag-pos;
 + *data = frag-data + frag-pos;
 + *len = min(8u, frag-len - frag-pos);
 + return true;
 +}
 +
 +static void complete_current_mmio(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_mmio_fragment *frag;
 + gpa_t gpa;
 + unsigned len;
 + void *data;
 +
 + get_current_mmio_info(vcpu, gpa, len, data);
 +
 + if (!vcpu-mmio_is_write)
 + memcpy(data, vcpu-run-mmio.data, len);
 +
 + /* Increase frag-pos to switch to the next mmio. */
 + frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment];
 + frag-pos += len;
 +}
 +
 +static bool vcpu_fill_mmio_exit_info(struct kvm_vcpu *vcpu)
 +{
 + gpa_t gpa;
 + unsigned len;
 + void *data;
 +
 + if (!get_current_mmio_info(vcpu, gpa, len, data))
 + return false;
 +
 + vcpu-run-mmio.len = len;
 + vcpu-run-mmio.is_write = vcpu-mmio_is_write;
 + vcpu-run-exit_reason = KVM_EXIT_MMIO;
 + vcpu-run-mmio.phys_addr = gpa;
 +
 + if (vcpu-mmio_is_write)
 + memcpy(vcpu-run-mmio.data, data, len);
 + return true;
 +}
 +
  static int emulator_read_write_onepage(unsigned long addr, void *val,
  unsigned int bytes,
  struct x86_exception *exception,
 @@ -3834,18 +3889,12 @@ mmio:
   bytes -= handled;
   val += handled;
 
 - while (bytes) {
 - unsigned now = min(bytes, 8U);
 -
 - frag = vcpu-mmio_fragments[vcpu-mmio_nr_fragments++];
 - frag-gpa = gpa;
 - frag-data = val;
 - frag-len = now;
 -
 - gpa += now;
 - val += now;
 - bytes -= now;
 - }
 + WARN_ON(vcpu-mmio_nr_fragments = KVM_MAX_MMIO_FRAGMENTS);
 + frag = vcpu-mmio_fragments[vcpu-mmio_nr_fragments++];
 + frag-pos = 0;
 + frag-gpa = gpa;
 + frag-data = val;
 + frag-len = bytes;
   return X86EMUL_CONTINUE;
  }
 
 @@ -3855,7 +3904,6 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, 
 unsigned long addr,
   const struct read_write_emulator_ops *ops)
  {
   struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt);
 - gpa_t gpa;
   int rc;
 
   if (ops-read_write_prepare 
 @@ -3887,17 +3935,13 @@ int emulator_read_write(struct x86_emulate_ctxt 
 *ctxt, unsigned long addr,
   if (!vcpu-mmio_nr_fragments)
   return rc;
 
 - gpa =

Re: [RFC PATCH v3 06/19] Implement -dimm command line option

2012-10-22 Thread Avi Kivity

On 10/19/2012 07:48 PM, Blue Swirl wrote:

 DIMMs would be allowed to be hotplugged in the generic mem-controller 
 scheme only
 (unless it makes sense to allow hotplug in the remaining pmc DRBs and
 start using the generic scheme once we run out of emulated DRBs)


 440fx seems a lost cause, so we can go wild and just implement pv dimms.
 
 Maybe. But what would be a PV DIMM? Do we need any DIMM-like
 granularity at all, instead the guest could be told to use a list of
 RAM regions with arbitrary start and end addresses? 

Guests are likely to support something that has the same constraints as
real hardware.  If we allow non-power-of-two DIMMs, we might find that
guests don't support them well.

 Isn't ballooning
 also related?

It is related in that it is also a memory hotplug technology.  But
ballooning is subtractive and fine-grained where classic hotplug is
additive and coarse grained.  We can use both together, but I don't
think any work is needed at the qemu level.

 
  For q35 I'd like to stay within the spec.
 
 That may not last forever when machines have terabytes of memory.

At least there's work for chipset implementers.  Or we can do PV-DIMMs
for q35 too.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Xiao Guangrong

On 10/22/2012 05:16 PM, Gleb Natapov wrote:
 On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
 After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
 the pieces of io data can be collected and write them to the guest memory
 or MMIO together.

 Unfortunately, kvm splits the mmio access into 8 bytes and store them to
 vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
 will cause vcpu-mmio_fragments overflow

 The bug can be exposed by isapc (-M isapc):

 [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
 [ ..]
 [23154.858083] Call Trace:
 [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
 [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 
 [kvm]
 [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]


 Actually, we can use one mmio_fragment to store a large mmio access for the
 mmio access is always continuous then split it when we pass the 
 mmio-exit-info
 to userspace. After that, we only need two entries to store mmio info for
 the cross-mmio pages access

 I wonder can we put the data into coalesced mmio buffer instead of

If we put all mmio data into coalesced buffer, we should:
- ensure the userspace program uses KVM_REGISTER_COALESCED_MMIO to register
  all mmio regions.

- even if the MMIO region is not used by emulated-device, it also need to be
  registered.

It will breaks old version userspace program.

 exiting for each 8 bytes? Is it worth the complexity?

Simpler way is always better but i failed, so i appreciate your guys comments.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 07:09:38PM +0800, Xiao Guangrong wrote:
 On 10/22/2012 05:16 PM, Gleb Natapov wrote:
  On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
  After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
  the pieces of io data can be collected and write them to the guest memory
  or MMIO together.
 
  Unfortunately, kvm splits the mmio access into 8 bytes and store them to
  vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
  will cause vcpu-mmio_fragments overflow
 
  The bug can be exposed by isapc (-M isapc):
 
  [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
  [ ..]
  [23154.858083] Call Trace:
  [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
  [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 
  [kvm]
  [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
 
 
  Actually, we can use one mmio_fragment to store a large mmio access for the
  mmio access is always continuous then split it when we pass the 
  mmio-exit-info
  to userspace. After that, we only need two entries to store mmio info for
  the cross-mmio pages access
 
  I wonder can we put the data into coalesced mmio buffer instead of
 
 If we put all mmio data into coalesced buffer, we should:
 - ensure the userspace program uses KVM_REGISTER_COALESCED_MMIO to register
   all mmio regions.
 
It appears to not be so.
Userspace calls kvm_flush_coalesced_mmio_buffer() after returning from
KVM_RUN which looks like this:

void kvm_flush_coalesced_mmio_buffer(void)
{
KVMState *s = kvm_state;

if (s-coalesced_flush_in_progress) {
return;
}

s-coalesced_flush_in_progress = true;

if (s-coalesced_mmio_ring) {
struct kvm_coalesced_mmio_ring *ring = s-coalesced_mmio_ring;
while (ring-first != ring-last) {
struct kvm_coalesced_mmio *ent;

ent = ring-coalesced_mmio[ring-first];

cpu_physical_memory_write(ent-phys_addr, ent-data, ent-len);
smp_wmb();
ring-first = (ring-first + 1) % KVM_COALESCED_MMIO_MAX;
}
}

s-coalesced_flush_in_progress = false;
}

Nowhere in this function we check that MMIO region was registered with
KVM_REGISTER_COALESCED_MMIO. We do not even check that the address is
MMIO.

 - even if the MMIO region is not used by emulated-device, it also need to be
   registered.
Same. I think writes to non registered region will be discarded.

 
 It will breaks old version userspace program.
 
  exiting for each 8 bytes? Is it worth the complexity?
 
 Simpler way is always better but i failed, so i appreciate your guys comments.
 
Why have you failed? Exiting for each 8 bytes is infinitely better than
buffer overflow.  My question about complexity was towards theoretically
more complex code that will use coalesced MMIO buffer.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: 1.1.1 - 1.1.2 migrate /managedsave issue

2012-10-22 Thread Avi Kivity

On 10/22/2012 09:04 AM, Philipp Hahn wrote:
 Hello Doug,
 
 On Saturday 20 October 2012 00:46:43 Doug Goldstein wrote:
 I'm using libvirt 0.10.2 and I had qemu-kvm 1.1.1 running all my VMs.
 ...
 I had upgraded to qemu-kvm 1.1.2
 ... 
 qemu: warning: error while loading state for instance 0x0 of device 'ram'
 load of migration failed
 
 That error can be from many things. For me it was that the PXE-ROM images for 
 the network cards were updated as well. Their size changed over the next 
 power-of-two size, so kvm needed to allocate less/more memory and changed 
 some PCI configuration registers, where the size of the ROM region is stored.
 On loading the saved state those sizes were compared and failed to validate. 
 KVM then aborts loading the saved state with that little helpful message.
 
 So you might want to check, if your case is similar to mine.
 
 I diagnosed that using gdb to single step kvm until I found 
 hw/pci.c#get_pci_config_device() returning -EINVAL.
 

Seems reasonable.  Doug, please verify to see if it's the same issue or
another one.

Juan, how can we fix this?  It's clear that the option ROM size has to
be fixed and not change whenever the blob is updated.  This will fix it
for future releases.  But what to do about the ones in the field?

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 13:23, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 07:09:38PM +0800, Xiao Guangrong wrote:
 On 10/22/2012 05:16 PM, Gleb Natapov wrote:
 On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
 After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
 the pieces of io data can be collected and write them to the guest memory
 or MMIO together.

 Unfortunately, kvm splits the mmio access into 8 bytes and store them to
 vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
 will cause vcpu-mmio_fragments overflow

 The bug can be exposed by isapc (-M isapc):

 [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
 [ ..]
 [23154.858083] Call Trace:
 [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
 [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 
 [kvm]
 [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]


 Actually, we can use one mmio_fragment to store a large mmio access for the
 mmio access is always continuous then split it when we pass the 
 mmio-exit-info
 to userspace. After that, we only need two entries to store mmio info for
 the cross-mmio pages access

 I wonder can we put the data into coalesced mmio buffer instead of

 If we put all mmio data into coalesced buffer, we should:
 - ensure the userspace program uses KVM_REGISTER_COALESCED_MMIO to register
   all mmio regions.

 It appears to not be so.
 Userspace calls kvm_flush_coalesced_mmio_buffer() after returning from
 KVM_RUN which looks like this:

Nope, no longer, only on accesses to devices that actually use such
regions (and there are only two ATM). The current design of a global
coalesced mmio ring is horrible /wrt latency.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 01:35:56PM +0200, Jan Kiszka wrote:
 On 2012-10-22 13:23, Gleb Natapov wrote:
  On Mon, Oct 22, 2012 at 07:09:38PM +0800, Xiao Guangrong wrote:
  On 10/22/2012 05:16 PM, Gleb Natapov wrote:
  On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
  After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
  the pieces of io data can be collected and write them to the guest memory
  or MMIO together.
 
  Unfortunately, kvm splits the mmio access into 8 bytes and store them to
  vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
  will cause vcpu-mmio_fragments overflow
 
  The bug can be exposed by isapc (-M isapc):
 
  [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
  [ ..]
  [23154.858083] Call Trace:
  [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
  [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 
  [kvm]
  [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 
  [kvm]
 
 
  Actually, we can use one mmio_fragment to store a large mmio access for 
  the
  mmio access is always continuous then split it when we pass the 
  mmio-exit-info
  to userspace. After that, we only need two entries to store mmio info for
  the cross-mmio pages access
 
  I wonder can we put the data into coalesced mmio buffer instead of
 
  If we put all mmio data into coalesced buffer, we should:
  - ensure the userspace program uses KVM_REGISTER_COALESCED_MMIO to register
all mmio regions.
 
  It appears to not be so.
  Userspace calls kvm_flush_coalesced_mmio_buffer() after returning from
  KVM_RUN which looks like this:
 
 Nope, no longer, only on accesses to devices that actually use such
 regions (and there are only two ATM). The current design of a global
 coalesced mmio ring is horrible /wrt latency.
 
Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
is gone. So this will break new userspace, not old. By global you mean
shared between devices (or memory regions)?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 13:43, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 01:35:56PM +0200, Jan Kiszka wrote:
 On 2012-10-22 13:23, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 07:09:38PM +0800, Xiao Guangrong wrote:
 On 10/22/2012 05:16 PM, Gleb Natapov wrote:
 On Fri, Oct 19, 2012 at 03:37:32PM +0800, Xiao Guangrong wrote:
 After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
 the pieces of io data can be collected and write them to the guest memory
 or MMIO together.

 Unfortunately, kvm splits the mmio access into 8 bytes and store them to
 vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
 will cause vcpu-mmio_fragments overflow

 The bug can be exposed by isapc (-M isapc):

 [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
 [ ..]
 [23154.858083] Call Trace:
 [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
 [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 
 [kvm]
 [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 
 [kvm]


 Actually, we can use one mmio_fragment to store a large mmio access for 
 the
 mmio access is always continuous then split it when we pass the 
 mmio-exit-info
 to userspace. After that, we only need two entries to store mmio info for
 the cross-mmio pages access

 I wonder can we put the data into coalesced mmio buffer instead of

 If we put all mmio data into coalesced buffer, we should:
 - ensure the userspace program uses KVM_REGISTER_COALESCED_MMIO to register
   all mmio regions.

 It appears to not be so.
 Userspace calls kvm_flush_coalesced_mmio_buffer() after returning from
 KVM_RUN which looks like this:

 Nope, no longer, only on accesses to devices that actually use such
 regions (and there are only two ATM). The current design of a global
 coalesced mmio ring is horrible /wrt latency.

 Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
 is gone. So this will break new userspace, not old. By global you mean
 shared between devices (or memory regions)?

Yes. We only have a single ring per VM, so we cannot flush multi-second
VGA access separately from other devices. In theory solvable by
introducing per-region rings that can be driven separately.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH] update-linux-headers.sh: Handle new kernel uapi/ directories

2012-10-22 Thread Peter Maydell

Recent kernels have moved to keeping the userspace headers
in uapi/ subdirectories. This breaks the detection of whether an
architecture has KVM support in the kernel because kvm.h has
moved in the kernel source tree. Update the check to support
both the old and new locations.

Signed-off-by: Peter Maydell peter.mayd...@linaro.org
---
This would otherwise cause us to ignore the architectures which
have moved over to uapi/ (which for QEMU's purposes means everything
but x86...)

 scripts/update-linux-headers.sh |3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 67be2ef..4c7b566 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -34,7 +34,8 @@ ARCHLIST=$(cd $linux/arch  echo *)
 
 for arch in $ARCHLIST; do
 # Discard anything which isn't a KVM-supporting architecture
-if ! [ -e $linux/arch/$arch/include/asm/kvm.h ]; then
+if ! [ -e $linux/arch/$arch/include/asm/kvm.h ] 
+! [ -e $linux/arch/$arch/include/uapi/asm/kvm.h ] ; then
 continue
 fi
 
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/22/2012 01:45 PM, Jan Kiszka wrote:

 Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
 is gone. So this will break new userspace, not old. By global you mean
 shared between devices (or memory regions)?
 
 Yes. We only have a single ring per VM, so we cannot flush multi-second
 VGA access separately from other devices. In theory solvable by
 introducing per-region rings that can be driven separately.

But in practice unneeded.  Real time VMs can disable coalescing and not
use planar VGA modes.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm, async_pf: exit idleness when handling KVM_PV_REASON_PAGE_NOT_PRESENT

2012-10-22 Thread Avi Kivity

On 10/19/2012 06:11 PM, Sasha Levin wrote:
 KVM_PV_REASON_PAGE_NOT_PRESENT kicks cpu out of idleness, but we haven't
 marked that spot as an exit from idleness.
 
 Not doing so can cause RCU warnings such as:
 
 [  732.788386] ===
 [  732.789803] [ INFO: suspicious RCU usage. ]
 [  732.790032] 3.7.0-rc1-next-20121019-sasha-2-g6d8d02d-dirty #63 
 Tainted: GW
 [  732.790032] ---
 [  732.790032] include/linux/rcupdate.h:738 rcu_read_lock() used illegally 
 while idle!
 [  732.790032]
 [  732.790032] other info that might help us debug this:
 [  732.790032]
 [  732.790032]
 [  732.790032] RCU used illegally from idle CPU!
 [  732.790032] rcu_scheduler_active = 1, debug_locks = 1
 [  732.790032] RCU used illegally from extended quiescent state!
 [  732.790032] 2 locks held by trinity-child31/8252:
 [  732.790032]  #0:  (rq-lock){-.-.-.}, at: [83a67528] 
 __schedule+0x178/0x8f0
 [  732.790032]  #1:  (rcu_read_lock){.+.+..}, at: [81152bde] 
 cpuacct_charge+0xe/0x200
 [  732.790032]
 [  732.790032] stack backtrace:
 [  732.790032] Pid: 8252, comm: trinity-child31 Tainted: GW
 3.7.0-rc1-next-20121019-sasha-2-g6d8d02d-dirty #63
 [  732.790032] Call Trace:
 [  732.790032]  [8118266b] lockdep_rcu_suspicious+0x10b/0x120
 [  732.790032]  [81152c60] cpuacct_charge+0x90/0x200
 [  732.790032]  [81152bde] ? cpuacct_charge+0xe/0x200
 [  732.790032]  [81158093] update_curr+0x1a3/0x270
 [  732.790032]  [81158a6a] dequeue_entity+0x2a/0x210
 [  732.790032]  [81158ea5] dequeue_task_fair+0x45/0x130
 [  732.790032]  [8114ae29] dequeue_task+0x89/0xa0
 [  732.790032]  [8114bb9e] deactivate_task+0x1e/0x20
 [  732.790032]  [83a67c29] __schedule+0x879/0x8f0
 [  732.790032]  [8117e20d] ? trace_hardirqs_off+0xd/0x10
 [  732.790032]  [810a37a5] ? kvm_async_pf_task_wait+0x1d5/0x2b0
 [  732.790032]  [83a67cf5] schedule+0x55/0x60
 [  732.790032]  [810a37c4] kvm_async_pf_task_wait+0x1f4/0x2b0
 [  732.790032]  [81139e50] ? abort_exclusive_wait+0xb0/0xb0
 [  732.790032]  [81139c25] ? prepare_to_wait+0x25/0x90
 [  732.790032]  [810a3a66] do_async_page_fault+0x56/0xa0
 [  732.790032]  [83a6a6e8] async_page_fault+0x28/0x30

Thanks, applied to master for 3.7.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 14:18, Avi Kivity wrote:
 On 10/22/2012 01:45 PM, Jan Kiszka wrote:
 
 Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
 is gone. So this will break new userspace, not old. By global you mean
 shared between devices (or memory regions)?

 Yes. We only have a single ring per VM, so we cannot flush multi-second
 VGA access separately from other devices. In theory solvable by
 introducing per-region rings that can be driven separately.
 
 But in practice unneeded.  Real time VMs can disable coalescing and not
 use planar VGA modes.

A) At least right now, we do not differentiate between the VGA modes and
if flushing is needed. So that device is generally taboo for RT cores of
the VM.
B) We need to disable coalescing in E1000 as well - if we want to use
that model.
C) Gleb seems to propose using coalescing far beyond those two use cases.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 02:45:37PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:18, Avi Kivity wrote:
  On 10/22/2012 01:45 PM, Jan Kiszka wrote:
  
  Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
  is gone. So this will break new userspace, not old. By global you mean
  shared between devices (or memory regions)?
 
  Yes. We only have a single ring per VM, so we cannot flush multi-second
  VGA access separately from other devices. In theory solvable by
  introducing per-region rings that can be driven separately.
  
  But in practice unneeded.  Real time VMs can disable coalescing and not
  use planar VGA modes.
 
 A) At least right now, we do not differentiate between the VGA modes and
 if flushing is needed. So that device is generally taboo for RT cores of
 the VM.
 B) We need to disable coalescing in E1000 as well - if we want to use
 that model.
 C) Gleb seems to propose using coalescing far beyond those two use cases.
 
Since the userspace change is needed the idea is dead, but if we could
implement it I do not see how it can hurt the latency if it would be the
only mechanism to use coalesced mmio buffer. Checking that the ring buffer
is empty is cheap and if it is not empty it means that kernel just saved
you a lot of 8 bytes exists so even after iterating over all the entries there
you still saved a lot of time.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 14:53, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 02:45:37PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:18, Avi Kivity wrote:
 On 10/22/2012 01:45 PM, Jan Kiszka wrote:

 Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
 is gone. So this will break new userspace, not old. By global you mean
 shared between devices (or memory regions)?

 Yes. We only have a single ring per VM, so we cannot flush multi-second
 VGA access separately from other devices. In theory solvable by
 introducing per-region rings that can be driven separately.

 But in practice unneeded.  Real time VMs can disable coalescing and not
 use planar VGA modes.

 A) At least right now, we do not differentiate between the VGA modes and
 if flushing is needed. So that device is generally taboo for RT cores of
 the VM.
 B) We need to disable coalescing in E1000 as well - if we want to use
 that model.
 C) Gleb seems to propose using coalescing far beyond those two use cases.

 Since the userspace change is needed the idea is dead, but if we could
 implement it I do not see how it can hurt the latency if it would be the
 only mechanism to use coalesced mmio buffer. Checking that the ring buffer
 is empty is cheap and if it is not empty it means that kernel just saved
 you a lot of 8 bytes exists so even after iterating over all the entries there
 you still saved a lot of time.

When taking an exit for A, I'm not interesting in flushing stuff for B
unless I have a dependency. Thus, buffers would have to be per device
before extending their use.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/22/2012 02:53 PM, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 02:45:37PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:18, Avi Kivity wrote:
  On 10/22/2012 01:45 PM, Jan Kiszka wrote:
  
  Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
  is gone. So this will break new userspace, not old. By global you mean
  shared between devices (or memory regions)?
 
  Yes. We only have a single ring per VM, so we cannot flush multi-second
  VGA access separately from other devices. In theory solvable by
  introducing per-region rings that can be driven separately.
  
  But in practice unneeded.  Real time VMs can disable coalescing and not
  use planar VGA modes.
 
 A) At least right now, we do not differentiate between the VGA modes and
 if flushing is needed. So that device is generally taboo for RT cores of
 the VM.
 B) We need to disable coalescing in E1000 as well - if we want to use
 that model.
 C) Gleb seems to propose using coalescing far beyond those two use cases.
 
 Since the userspace change is needed the idea is dead, but if we could
 implement it I do not see how it can hurt the latency if it would be the
 only mechanism to use coalesced mmio buffer. Checking that the ring buffer
 is empty is cheap and if it is not empty it means that kernel just saved
 you a lot of 8 bytes exists so even after iterating over all the entries there
 you still saved a lot of time.

It's time where the guest cannot take interrupts, and time in a high
priority guest thread that is spent processing low guest priority requests.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/22/2012 02:45 PM, Jan Kiszka wrote:
 On 2012-10-22 14:18, Avi Kivity wrote:
 On 10/22/2012 01:45 PM, Jan Kiszka wrote:
 
 Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
 is gone. So this will break new userspace, not old. By global you mean
 shared between devices (or memory regions)?

 Yes. We only have a single ring per VM, so we cannot flush multi-second
 VGA access separately from other devices. In theory solvable by
 introducing per-region rings that can be driven separately.
 
 But in practice unneeded.  Real time VMs can disable coalescing and not
 use planar VGA modes.
 
 A) At least right now, we do not differentiate between the VGA modes and
 if flushing is needed. So that device is generally taboo for RT cores of
 the VM.

In non-planar modes the memory will be direct mapped, which overrides
coalescing (since kvm or qemu never see an exit).

 B) We need to disable coalescing in E1000 as well - if we want to use
 that model.

True.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/22/2012 02:55 PM, Jan Kiszka wrote:
 Since the userspace change is needed the idea is dead, but if we could
 implement it I do not see how it can hurt the latency if it would be the
 only mechanism to use coalesced mmio buffer. Checking that the ring buffer
 is empty is cheap and if it is not empty it means that kernel just saved
 you a lot of 8 bytes exists so even after iterating over all the entries 
 there
 you still saved a lot of time.
 
 When taking an exit for A, I'm not interesting in flushing stuff for B
 unless I have a dependency. Thus, buffers would have to be per device
 before extending their use.

Any mmio exit has to flush everything.  For example a DMA caused by an
e1000 write has to see any writes to the framebuffer, in case the guest
is transmitting its framebuffer to the outside world.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 02:55:14PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:53, Gleb Natapov wrote:
  On Mon, Oct 22, 2012 at 02:45:37PM +0200, Jan Kiszka wrote:
  On 2012-10-22 14:18, Avi Kivity wrote:
  On 10/22/2012 01:45 PM, Jan Kiszka wrote:
 
  Indeed. git pull, recheck and call for kvm_flush_coalesced_mmio_buffer()
  is gone. So this will break new userspace, not old. By global you mean
  shared between devices (or memory regions)?
 
  Yes. We only have a single ring per VM, so we cannot flush multi-second
  VGA access separately from other devices. In theory solvable by
  introducing per-region rings that can be driven separately.
 
  But in practice unneeded.  Real time VMs can disable coalescing and not
  use planar VGA modes.
 
  A) At least right now, we do not differentiate between the VGA modes and
  if flushing is needed. So that device is generally taboo for RT cores of
  the VM.
  B) We need to disable coalescing in E1000 as well - if we want to use
  that model.
  C) Gleb seems to propose using coalescing far beyond those two use cases.
 
  Since the userspace change is needed the idea is dead, but if we could
  implement it I do not see how it can hurt the latency if it would be the
  only mechanism to use coalesced mmio buffer. Checking that the ring buffer
  is empty is cheap and if it is not empty it means that kernel just saved
  you a lot of 8 bytes exists so even after iterating over all the entries 
  there
  you still saved a lot of time.
 
 When taking an exit for A, I'm not interesting in flushing stuff for B
 unless I have a dependency. Thus, buffers would have to be per device
 before extending their use.
 
Buts this is not what will happen (in the absence of other users of
coalesced mmio). What will happen is instead of taking 200 exists for B
you will take 1 exit for B.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 02:55:24PM +0200, Avi Kivity wrote:
 On 10/22/2012 02:53 PM, Gleb Natapov wrote:
  On Mon, Oct 22, 2012 at 02:45:37PM +0200, Jan Kiszka wrote:
  On 2012-10-22 14:18, Avi Kivity wrote:
   On 10/22/2012 01:45 PM, Jan Kiszka wrote:
   
   Indeed. git pull, recheck and call for 
   kvm_flush_coalesced_mmio_buffer()
   is gone. So this will break new userspace, not old. By global you mean
   shared between devices (or memory regions)?
  
   Yes. We only have a single ring per VM, so we cannot flush multi-second
   VGA access separately from other devices. In theory solvable by
   introducing per-region rings that can be driven separately.
   
   But in practice unneeded.  Real time VMs can disable coalescing and not
   use planar VGA modes.
  
  A) At least right now, we do not differentiate between the VGA modes and
  if flushing is needed. So that device is generally taboo for RT cores of
  the VM.
  B) We need to disable coalescing in E1000 as well - if we want to use
  that model.
  C) Gleb seems to propose using coalescing far beyond those two use cases.
  
  Since the userspace change is needed the idea is dead, but if we could
  implement it I do not see how it can hurt the latency if it would be the
  only mechanism to use coalesced mmio buffer. Checking that the ring buffer
  is empty is cheap and if it is not empty it means that kernel just saved
  you a lot of 8 bytes exists so even after iterating over all the entries 
  there
  you still saved a lot of time.
 
 It's time where the guest cannot take interrupts, and time in a high
 priority guest thread that is spent processing low guest priority requests.
 
Proposed fix has exactly same issue. Until all data is transfered to
userspace no interrupt will be served.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/22/2012 03:01 PM, Gleb Natapov wrote:

 It's time where the guest cannot take interrupts, and time in a high
 priority guest thread that is spent processing low guest priority requests.
 
 Proposed fix has exactly same issue. Until all data is transfered to
 userspace no interrupt will be served.

For mmio_fragments that is okay.  It's the same guest instruction, and
it's still O(1).

It's not okay for general mmio coalescing.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 14:58, Avi Kivity wrote:
 On 10/22/2012 02:55 PM, Jan Kiszka wrote:
 Since the userspace change is needed the idea is dead, but if we could
 implement it I do not see how it can hurt the latency if it would be the
 only mechanism to use coalesced mmio buffer. Checking that the ring buffer
 is empty is cheap and if it is not empty it means that kernel just saved
 you a lot of 8 bytes exists so even after iterating over all the entries 
 there
 you still saved a lot of time.

 When taking an exit for A, I'm not interesting in flushing stuff for B
 unless I have a dependency. Thus, buffers would have to be per device
 before extending their use.
 
 Any mmio exit has to flush everything.  For example a DMA caused by an
 e1000 write has to see any writes to the framebuffer, in case the guest
 is transmitting its framebuffer to the outside world.

We already flush when that crazy guest actually accesses the region, no
need to do this unconditionally.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 03:02:22PM +0200, Avi Kivity wrote:
 On 10/22/2012 03:01 PM, Gleb Natapov wrote:
 
  It's time where the guest cannot take interrupts, and time in a high
  priority guest thread that is spent processing low guest priority requests.
  
  Proposed fix has exactly same issue. Until all data is transfered to
  userspace no interrupt will be served.
 
 For mmio_fragments that is okay.  It's the same guest instruction, and
 it's still O(1).
 
 It's not okay for general mmio coalescing.
 
Ah, so optimizing mmio_fragments transmission to userspace using
dedicated coalesced MMIO buffer should be fine then. Unfortunately,
since we cannot use shared ring buffer that exists now, this is too much
work for small gain that only new QEMU will be able to enjoy.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 03:05:58PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:58, Avi Kivity wrote:
  On 10/22/2012 02:55 PM, Jan Kiszka wrote:
  Since the userspace change is needed the idea is dead, but if we could
  implement it I do not see how it can hurt the latency if it would be the
  only mechanism to use coalesced mmio buffer. Checking that the ring buffer
  is empty is cheap and if it is not empty it means that kernel just saved
  you a lot of 8 bytes exists so even after iterating over all the entries 
  there
  you still saved a lot of time.
 
  When taking an exit for A, I'm not interesting in flushing stuff for B
  unless I have a dependency. Thus, buffers would have to be per device
  before extending their use.
  
  Any mmio exit has to flush everything.  For example a DMA caused by an
  e1000 write has to see any writes to the framebuffer, in case the guest
  is transmitting its framebuffer to the outside world.
 
 We already flush when that crazy guest actually accesses the region, no
 need to do this unconditionally.
 
What if framebuffer is accessed from inside the kernel? Is this case handled?

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 15:08, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 03:05:58PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:58, Avi Kivity wrote:
 On 10/22/2012 02:55 PM, Jan Kiszka wrote:
 Since the userspace change is needed the idea is dead, but if we could
 implement it I do not see how it can hurt the latency if it would be the
 only mechanism to use coalesced mmio buffer. Checking that the ring buffer
 is empty is cheap and if it is not empty it means that kernel just saved
 you a lot of 8 bytes exists so even after iterating over all the entries 
 there
 you still saved a lot of time.

 When taking an exit for A, I'm not interesting in flushing stuff for B
 unless I have a dependency. Thus, buffers would have to be per device
 before extending their use.

 Any mmio exit has to flush everything.  For example a DMA caused by an
 e1000 write has to see any writes to the framebuffer, in case the guest
 is transmitting its framebuffer to the outside world.

 We already flush when that crazy guest actually accesses the region, no
 need to do this unconditionally.

 What if framebuffer is accessed from inside the kernel? Is this case handled?

Unless I miss a case now, there is no direct access to the framebuffer
possible when we are also doing coalescing. Everything needs to go
through userspace.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

FW: cgroup blkio.weight working, but not for KVM guests

2012-10-22 Thread Ben Clay

Forwarding this to the KVM general list.  I doubt you folks can help me with
libvirt, but I was wondering if theres some way to verify if the cache=none
parameter is being respected for my KVM guests disk image, or if there are
any other configuration/debug steps appropriate for KVM + virtio + cgroup.

Thanks.

Ben Clay
rbc...@ncsu.edu



From: Ben Clay [mailto:rbc...@ncsu.edu] 
Sent: Wednesday, October 17, 2012 11:31 AM
To: libvirt-us...@redhat.com
Subject: cgroup blkio.weight working, but not for KVM guests

Im running libvirt 0.10.2 and qemu-kvm-1.2.0, both compiled from source, on
CentOS 6.  Ive got a working blkio cgroup hierarchy which Im attaching
guests to using the following XML guest configs:

VM1 (foreground):

  cputune
    shares2048/shares
  /cputune
  blkiotune
    weight1000/weight
  /blkiotune

VM2 (background): 

  cputune
    shares2/shares
  /cputune
  blkiotune
    weight100/weight
  /blkiotune

Ive tested write throughput on the host using cgexec and dd, demonstrating
that libvirt has correctly set up the cgroups:

cgexec -g blkio:libvirt/qemu/foreground time dd if=/dev/zero of=trash1.img
oflag=direct bs=1M count=4096  cgexec -g blkio:libvirt/qemu/background time
dd if=/dev/zero of=trash2.img oflag=direct bs=1M count=4096 

Snap from iotop, showing an 8:1 ratio (should be 10:1, but 8:1 is
acceptable):

Total DISK READ: 0.00 B/s | Total DISK WRITE: 91.52 M/s
  TID  PRIO  USER DISK READ  DISK WRITE  SWAPIN IO    COMMAND
9602 be/4 root    0.00 B/s   10.71 M/s  0.00 % 98.54 % dd if=/dev/zero
of=trash2.img oflag=direct bs=1M count=4096
9601 be/4 root    0.00 B/s   80.81 M/s  0.00 % 97.76 % dd if=/dev/zero
of=trash1.img oflag=direct bs=1M count=4096

Further, checking the task list inside each cgroup shows the guests main
PID, plus those of the virtio kernel threads.  Its hard to tell if all the
virtio kernel threads are listed, but all the ones Ive hunted down appear
to be there.

However, when running the same dd commands inside the guests, I get
roughly-equal performance  nowhere near the ~8:1 relative bandwidth
enforcement I get from the host: (background ctrl-cd right after foreground
finishes, both started within 1s of each other)

[ben@foreground ~]$ dd if=/dev/zero of=trash1.img oflag=direct bs=1M
count=4096
4096+0 records in
4096+0 records out
4294967296 bytes (4.3 GB) copied, 104.645 s, 41.0 MB/s

[ben@background ~]$ dd if=/dev/zero of=trash2.img oflag=direct bs=1M
count=4096
^C4052+0 records in
4052+0 records out
4248829952 bytes (4.2 GB) copied, 106.318 s, 40.0 MB/s

I thought based on this statement: Currently, the Block I/O subsystem does
not work for buffered write operations. It is primarily targeted at direct
I/O, although it works for buffered read operations. from this page:
https://access.redhat.com/knowledge/docs/en-US/Red_Hat_Enterprise_Linux/6/ht
ml/Resource_Management_Guide/ch-Subsystems_and_Tunable_Parameters.html that
this problem might be due to host-side buffering, but I have that explicitly
disabled in my guest configs:

  devices
    emulator/usr/bin/qemu-kvm/emulator
    disk type=file device=disk
  driver name=qemu type=raw cache=none/
  source file=/path/to/disk.img/
  target dev=vda bus=virtio/
  alias name=virtio-disk0/
  address type=pci domain=0x bus=0x00 slot=0x04
function=0x0/
    /disk

Here is the qemu line from ps, showing that its clearly being passed
through from the guest XML config:

root  5110 20.8  4.3 4491352 349312 ?  Sl   11:58   0:38
/usr/bin/qemu-kvm -name background -S -M pc-1.2 -enable-kvm -m 2048 -smp
2,sockets=2,cores=1,threads=1 -uuid ea632741-c7be-36ab-bd69-da3cbe505b38
-no-user-config -nodefaults -chardev
socket,id=charmonitor,path=/var/lib/libvirt/qemu/background.monitor,server,n
owait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc
-no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive
file=/path/to/disk.img,if=none,id=drive-virtio-disk0,format=raw,cache=none
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x4,drive=drive-virtio-disk0,id=virti
o-disk0,bootindex=1 -netdev tap,fd=20,id=hostnet0,vhost=on,vhostfd=22
-device
virtio-net-pci,netdev=hostnet0,id=net0,mac=00:11:22:33:44:55,bus=pci.0,addr=
0x3 -chardev pty,id=charserial0 -device
isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc
127.0.0.1:1 -vga cirrus -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x5

For fun I tried a few different cache options to try to force a bypass the
host buffercache, including writethough and directsync, but the number of
virtio kernel threads appeared to explode (especially for directsync) and
the throughput dropped quite low: ~50% of none for writethrough and ~5%
for directsync.

With cache=none, when I generate write loads inside the VMs, I do see growth
in the hosts buffer cache.  Further, if I use non-direct I/O inside the
VMs, and inflate the balloon (forcing the guests buffer cache to flush), I
dont see a

KVM call agenda for 2012-10-23

2012-10-22 Thread Juan Quintela


Hi

Please send in any agenda topics you are interested in.

Later, Juan.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/19/2012 09:37 AM, Xiao Guangrong wrote:
 After commit b3356bf0dbb349 (KVM: emulator: optimize rep ins handling),
 the pieces of io data can be collected and write them to the guest memory
 or MMIO together.
 
 Unfortunately, kvm splits the mmio access into 8 bytes and store them to
 vcpu-mmio_fragments. If the guest uses rep ins to move large data, it
 will cause vcpu-mmio_fragments overflow
 
 The bug can be exposed by isapc (-M isapc):
 
 [23154.818733] general protection fault:  [#1] SMP DEBUG_PAGEALLOC
 [ ..]
 [23154.858083] Call Trace:
 [23154.859874]  [a04f0e17] kvm_get_cr8+0x1d/0x28 [kvm]
 [23154.861677]  [a04fa6d4] kvm_arch_vcpu_ioctl_run+0xcda/0xe45 [kvm]
 [23154.863604]  [a04f5a1a] ? kvm_arch_vcpu_load+0x17b/0x180 [kvm]
 
 
 Actually, we can use one mmio_fragment to store a large mmio access for the
 mmio access is always continuous then split it when we pass the mmio-exit-info
 to userspace.

Note, there are instructions that can access discontinuous areas.  We don't 
emulate them and they're unlikely to be used for mmio.

 After that, we only need two entries to store mmio info for
 the cross-mmio pages access

Patch is good, but is somewhat large for 3.7.  Maybe we can make it smaller 
with the following:

 
 diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
 index 8b90dd5..41ceb51 100644
 --- a/arch/x86/kvm/x86.c
 +++ b/arch/x86/kvm/x86.c
 @@ -3779,9 +3779,6 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t 
 gpa,
  static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa,
  void *val, int bytes)
  {
 - struct kvm_mmio_fragment *frag = vcpu-mmio_fragments[0];
 -
 - memcpy(vcpu-run-mmio.data, frag-data, frag-len);
   return X86EMUL_CONTINUE;
  }
 
 @@ -3799,6 +3796,64 @@ static const struct read_write_emulator_ops 
 write_emultor = {
   .write = true,
  };
 
 +static bool get_current_mmio_info(struct kvm_vcpu *vcpu, gpa_t *gpa,
 +   unsigned *len, void **data)
 +{
 + struct kvm_mmio_fragment *frag;
 + int cur = vcpu-mmio_cur_fragment;
 +
 + if (cur = vcpu-mmio_nr_fragments)
 + return false;
 +
 + frag = vcpu-mmio_fragments[cur];
 + if (frag-pos = frag-len) {
 + if (++vcpu-mmio_cur_fragment = vcpu-mmio_nr_fragments)
 + return false;
 + frag++;
 + }

Instead of having -pos, just adjust -gpa, -data, and -len in place.  Then 
get_current_mmio_info would be unneeded, just the min() bit when accessing 
-len.

 +
 + *gpa = frag-gpa + frag-pos;
 + *data = frag-data + frag-pos;
 + *len = min(8u, frag-len - frag-pos);
 + return true;
 +}
 +
 +static void complete_current_mmio(struct kvm_vcpu *vcpu)
 +{
 + struct kvm_mmio_fragment *frag;
 + gpa_t gpa;
 + unsigned len;
 + void *data;
 +
 + get_current_mmio_info(vcpu, gpa, len, data);
 +
 + if (!vcpu-mmio_is_write)
 + memcpy(data, vcpu-run-mmio.data, len);
 +
 + /* Increase frag-pos to switch to the next mmio. */
 + frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment];
 + frag-pos += len;
 +}
 +


And this would be unneeded, just adjust the code that does mmio_cur_fragment++:

 static int complete_emulated_mmio(struct kvm_vcpu *vcpu)
 {
struct kvm_run *run = vcpu-run;
-   struct kvm_mmio_fragment *frag;
+   struct kvm_mmio_fragment frag;

BUG_ON(!vcpu-mmio_needed);

/* Complete previous fragment */
-   frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment++];   
+   frag = vcpu-mmio_fragments[vcpu-mmio_cur_fragment];   
+   if (frag.len = 8) {
+   ++vcpu-mmio_cur_fragment;
+   } else {
+   vcpu-mmio_fragments[vcpu-mmio_cur_fragment].len -= frag.len;
...





-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Gleb Natapov

On Mon, Oct 22, 2012 at 03:25:49PM +0200, Jan Kiszka wrote:
 On 2012-10-22 15:08, Gleb Natapov wrote:
  On Mon, Oct 22, 2012 at 03:05:58PM +0200, Jan Kiszka wrote:
  On 2012-10-22 14:58, Avi Kivity wrote:
  On 10/22/2012 02:55 PM, Jan Kiszka wrote:
  Since the userspace change is needed the idea is dead, but if we could
  implement it I do not see how it can hurt the latency if it would be the
  only mechanism to use coalesced mmio buffer. Checking that the ring 
  buffer
  is empty is cheap and if it is not empty it means that kernel just saved
  you a lot of 8 bytes exists so even after iterating over all the 
  entries there
  you still saved a lot of time.
 
  When taking an exit for A, I'm not interesting in flushing stuff for B
  unless I have a dependency. Thus, buffers would have to be per device
  before extending their use.
 
  Any mmio exit has to flush everything.  For example a DMA caused by an
  e1000 write has to see any writes to the framebuffer, in case the guest
  is transmitting its framebuffer to the outside world.
 
  We already flush when that crazy guest actually accesses the region, no
  need to do this unconditionally.
 
  What if framebuffer is accessed from inside the kernel? Is this case 
  handled?
 
 Unless I miss a case now, there is no direct access to the framebuffer
 possible when we are also doing coalescing. Everything needs to go
 through userspace.
 
Yes, with frame buffer is seems to be the case. One can imagine ROMD
device that is MMIO on write but still can be accessed for read from
kernel, but it cannot be coalesced even if coalesced buffer is flushed
on every exit.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: How to do fast accesses to LAPIC TPR under kvm?

2012-10-22 Thread Avi Kivity

On 10/20/2012 12:39 AM, Stefan Fritsch wrote:
 On Thursday 18 October 2012, Avi Kivity wrote:
 On 10/18/2012 11:35 AM, Gleb Natapov wrote:
  You misunderstood the description. V_INTR_MASKING=1 means that
  CR8 writes are not propagated to real HW APIC.
  
  But KVM does not trap access to CR8 unconditionally. It enables
  CR8 intercept only when there is pending interrupt in IRR that
  cannot be immediately delivered due to current TPR value. This
  should eliminate 99% of CR8 intercepts.
 
 Right.  You will need to expose the alternate encoding of cr8 (IIRC
 lock mov reg, cr0) on AMD via cpuid, but otherwise it should just
 work.  Be aware that this will break cross-vendor migration.
 
 I get an exception and I am not sure why:
 
 kvm_entry: vcpu 0
 kvm_exit: reason write_cr8 rip 0xd0203788 info 0 0
 kvm_emulate_insn: 0:d0203788: f0 0f 22 c0 (prot32)
 kvm_inj_exception: #UD (0x0)
 
 This is qemu-kvm 1.1.2 on Linux 3.2.
 
 When I look at arch/x86/kvm/emulate.c (both the current and the v3.2 
 version), I don't see any special case handling for lock mov reg, 
 cr0 to mean mov reg, cr8.

emulate.c will #UD is the Lock flag is missing in the instruction decode
table.

 Before I spend lots of time on debugging my code, can you verify if 
 the alternate encoding of cr8 is actually supported in kvm or if it is 
 maybe missing? Thanks in advance.

With the decode table fix I think it should work.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Jan Kiszka

On 2012-10-22 16:00, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 03:25:49PM +0200, Jan Kiszka wrote:
 On 2012-10-22 15:08, Gleb Natapov wrote:
 On Mon, Oct 22, 2012 at 03:05:58PM +0200, Jan Kiszka wrote:
 On 2012-10-22 14:58, Avi Kivity wrote:
 On 10/22/2012 02:55 PM, Jan Kiszka wrote:
 Since the userspace change is needed the idea is dead, but if we could
 implement it I do not see how it can hurt the latency if it would be the
 only mechanism to use coalesced mmio buffer. Checking that the ring 
 buffer
 is empty is cheap and if it is not empty it means that kernel just saved
 you a lot of 8 bytes exists so even after iterating over all the 
 entries there
 you still saved a lot of time.

 When taking an exit for A, I'm not interesting in flushing stuff for B
 unless I have a dependency. Thus, buffers would have to be per device
 before extending their use.

 Any mmio exit has to flush everything.  For example a DMA caused by an
 e1000 write has to see any writes to the framebuffer, in case the guest
 is transmitting its framebuffer to the outside world.

 We already flush when that crazy guest actually accesses the region, no
 need to do this unconditionally.

 What if framebuffer is accessed from inside the kernel? Is this case 
 handled?

 Unless I miss a case now, there is no direct access to the framebuffer
 possible when we are also doing coalescing. Everything needs to go
 through userspace.

 Yes, with frame buffer is seems to be the case. One can imagine ROMD
 device that is MMIO on write but still can be accessed for read from
 kernel, but it cannot be coalesced even if coalesced buffer is flushed
 on every exit.

Usually, a ROMD device has a stable content as long as it is fast
read/slow write. Once it switches mode, it is slow read as well.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V2 0/5] Separate consigned (expected steal) from steal time.

2012-10-22 Thread Rik van Riel


On 10/16/2012 10:23 PM, Michael Wolf wrote:

In the case of where you have a system that is running in a
capped or overcommitted environment the user may see steal time
being reported in accounting tools such as top or vmstat.  This can
cause confusion for the end user.


How do s390 and Power systems deal with reporting that kind
of information?

IMHO it would be good to see what those do, so we do not end
up re-inventing the wheel, and confusing admins with yet another
way of reporting the information...

--
All rights reversed
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: x86: fix vcpu-mmio_fragments overflow

2012-10-22 Thread Avi Kivity

On 10/22/2012 04:00 PM, Gleb Natapov wrote:
 Yes, with frame buffer is seems to be the case. One can imagine ROMD
 device that is MMIO on write but still can be accessed for read from
 kernel, but it cannot be coalesced even if coalesced buffer is flushed
 on every exit.

You cannot enable coalescing on such a device.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] KVM_VCPU_GET_REG_LIST API

2012-10-22 Thread Will Deacon

On Mon, Oct 22, 2012 at 04:09:06AM +0100, Rusty Russell wrote:
 Christoffer Dall c.d...@virtualopensystems.com writes:
  On Fri, Oct 19, 2012 at 2:19 AM, Rusty Russell ru...@rustcorp.com.au 
  wrote:
  Wait, what?  kvm/arm isn't in kvm-next?
  Christoffer, is there anything I can help with?
 
  Specifically there are worries about the instruction decoding for the
  mmio instructions. My cycles are unfortunately too limited to change
  this right now and I'm also not sure I agree things will turn out
  nicer by unifying all decoding into a large complicated space ship,
  but it would be great if you could take a look. This discussion seems
  to be a good place to start:
 
  https://lists.cs.columbia.edu/pipermail/kvmarm/2012-September/003447.html
 
 They're still asking you to boil that ocean??
 
 I could create a struct and do simple decode into it for limited cases
 (ie. for kvm).  Will, do you want to see that?

Yes, I think that would be great! Basically, I'd like the code to be
reusable so that other subsystems (including uprobes as of last week) can
plug into if it possible. The actual plugging in of those subsystems is
obviously not up to you.

Dave posted an idea here:

  http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/123464.html

which could form a basic starting block for something like load/store
decoding. It looks like Christoffer has actually done a bunch of this for
v3.

 But unifying them all is a much larger task, and only when that's all
 done can you judge whether it was worthwhile.  I've spend half an hour
 looking and each case is subtly different, and the conversion has to be
 incredibly careful not to break them.  And converting opcodes.c is just
 ideology; it's great as it is.

opcodes.c may be where this stuff ultimately ends up. In the meantime, you
could try moving what you have for v3 into common ARM code so that other
people can try to use it. In fact, if you don't even want to do that, just
put it in its own file under arch/arm/kvm/ so that the interface to
emulate.c doesn't make it too hard to move around in future.

 I'm interested in the 64-bit ARM kvm, because it would be nice to unify
 the two implementations.  But the ABI will be different anyway (64 bit
 regs get their own id space even if nothing else changes).

Assumedly you'll need a wrapper around the 32-bit ABI in order to run 32-bit
guests under a 64-bit kernel, so unification is definitely a good idea.

Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] KVM_VCPU_GET_REG_LIST API

2012-10-22 Thread Christoffer Dall

On Mon, Oct 22, 2012 at 1:45 PM, Will Deacon will.dea...@arm.com wrote:
 On Mon, Oct 22, 2012 at 04:09:06AM +0100, Rusty Russell wrote:
 Christoffer Dall c.d...@virtualopensystems.com writes:
  On Fri, Oct 19, 2012 at 2:19 AM, Rusty Russell ru...@rustcorp.com.au 
  wrote:
  Wait, what?  kvm/arm isn't in kvm-next?
  Christoffer, is there anything I can help with?
 
  Specifically there are worries about the instruction decoding for the
  mmio instructions. My cycles are unfortunately too limited to change
  this right now and I'm also not sure I agree things will turn out
  nicer by unifying all decoding into a large complicated space ship,
  but it would be great if you could take a look. This discussion seems
  to be a good place to start:
 
  https://lists.cs.columbia.edu/pipermail/kvmarm/2012-September/003447.html

 They're still asking you to boil that ocean??

 I could create a struct and do simple decode into it for limited cases
 (ie. for kvm).  Will, do you want to see that?

 Yes, I think that would be great! Basically, I'd like the code to be
 reusable so that other subsystems (including uprobes as of last week) can
 plug into if it possible. The actual plugging in of those subsystems is
 obviously not up to you.

 Dave posted an idea here:

   
 http://lists.infradead.org/pipermail/linux-arm-kernel/2012-October/123464.html

 which could form a basic starting block for something like load/store
 decoding. It looks like Christoffer has actually done a bunch of this for
 v3.


The issue is that the decoding assumes that you're only going to
decode instructions that don't carry decode information in the HSR.
For example, we don't do anything special about unprivileged
load/store, because they would have failed in their stage 1
translation and we wouldn't even get to the decoding in KVM. There are
also a number of corner cases such as loading into the PC from an MMIO
address that we don't need to worry about as we simply dismiss it as
being insane guest behavior. Another example is all the checking of
the write-back cases, since we can simply assume that there will be a
write-back in case we're decoding anything.

We also don't decode load/store multiples (although Antonios Motakis
did work up a patch for this some time ago for the ARM mode), since
the user space ABI to communicate the MMIO operations don't support
multiple registers and reworking that on the architecture-generic
level for some theoretical non-existing guest is simply not something
worth the time.

The point is that we cannot just take the code that is there now and
make it available generically within arch/arm without a lot of work,
including coming up with a test framework and verify it, and as Rusty
says, even then we don't know if the whole thing will look so
complicated that we deem it not worth the effort end.

And, to repeat my favorite argument, this can always be changed after
merging KVM. Alternatively, we can remove the mmio encoding completely
from the patch series and rely on your patch for mmio accessors in any
guest kernel, but I think this is a shame for people wanting to try
slightly more exotic things like an older kernel, when we have code
that's working.

Please, please, consider signing off on the current stages of the
patches if nothing else ground breaking pops up.

 But unifying them all is a much larger task, and only when that's all
 done can you judge whether it was worthwhile.  I've spend half an hour
 looking and each case is subtly different, and the conversion has to be
 incredibly careful not to break them.  And converting opcodes.c is just
 ideology; it's great as it is.

 opcodes.c may be where this stuff ultimately ends up. In the meantime, you
 could try moving what you have for v3 into common ARM code so that other
 people can try to use it. In fact, if you don't even want to do that, just
 put it in its own file under arch/arm/kvm/ so that the interface to
 emulate.c doesn't make it too hard to move around in future.

 I'm interested in the 64-bit ARM kvm, because it would be nice to unify
 the two implementations.  But the ABI will be different anyway (64 bit
 regs get their own id space even if nothing else changes).

 Assumedly you'll need a wrapper around the 32-bit ABI in order to run 32-bit
 guests under a 64-bit kernel, so unification is definitely a good idea.

 Will
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

3.7-rc2 build failure on s390x

2012-10-22 Thread Alexander Graf

Hi Christian,

During our normal Factory kernel builds, s390x seems to choke:

/home/abuild/rpmbuild/BUILD/kernel-default-3.7.rc2/linux-3.7-rc2/arch/s390/include/asm/kvm_para.h:147:99:
 error: redefinition of 'kvm_arch_para_features'
/home/abuild/rpmbuild/BUILD/kernel-default-3.7.rc2/linux-3.7-rc2/arch/s390/include/asm/kvm_para.h:152:91:
 error: redefinition of 'kvm_check_and_clear_guest_paused'
/home/abuild/rpmbuild/BUILD/kernel-default-3.7.rc2/linux-3.7-rc2/include/asm-generic/kvm_para.h:11:91:
 note: previous definition of 'kvm_check_and_clear_guest_paused' was here
make[4]: *** [arch/s390/kvm/../../../virt/kvm/kvm_main.o] Error 1
make[3]: *** [arch/s390/kvm] Error 2
make[3]: *** Waiting for unfinished jobs

Any idea what the culprit for this could be?


Thanks,

Alex

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

fsfreeze support in qemu-ga win32

2012-10-22 Thread Abbas


(resending as plain-text)

To Michael Roth and KVM devs,

It is finally nice to be able to have a guest agent for Windows. I 
really appreciate your collective efforts.


Is it possible to have fs-freeze-thaw support in it too?

Trying to use Libvirt's snapshot feature with --quiesce switch which 
depends on guest-fsfreeze* commands. So with this guest agent the 
snapshotting fails with the error:


error: internal error unable to execute QEMU command 
'guest-fsfreeze-freeze': this feature or command is not currently supported




/Abbas.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] vhost-blk: Add vhost-blk support v2

2012-10-22 Thread Rusty Russell

Michael S. Tsirkin m...@redhat.com writes:
 On Thu, Oct 18, 2012 at 02:50:56PM +1030, Rusty Russell wrote:
 Asias He as...@redhat.com writes:
  +#define BLK_HDR 0
  
  What's this for, exactly? Please add a comment.
 
  The block headr is in the first and separate buffer.
 
 Please don't assume this!  We're trying to fix all the assumptions in
 qemu at the moment.
 
 vhost_net handles this correctly, taking bytes off the descriptor chain
 as required.
 
 Thanks,
 Rusty.

 BTW are we agreed on the spec update that makes cmd 32 bytes?

vhost-blk doesn't handle scsi requests, does it?

But since we're forced to use a feature bit, we could just put the cmd
size in explicitly.  Though Paulo seems convinced that 32 is always
sufficient.  Whoever implements it gets to decide...

Here's my TODO list:
1) Create qemu helpers to efficiently handle iovecs.
2) Switch all the qemu devices to use them.
3) ... except a special hack for virtio-blk in old-layout mode.
4) Implement pci capability layout RFC for qemu.
   - Detect whether guest uses capabilities.
   - Device config in new mode is le.
   - Add strict checking mode for extra compliance checks?
5) Add explicit size-based accessors to virtio_config in kernel.
6) Update pci capability RFC patches for linux to match.
   - Use explicit accessors to allow for endian conversion.
6) Push virtio torture patch to test variable boundaries.
7) Update spec.

That should keep me amused for a while...

Cheers,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

73 matches

Mail list logo