Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. SeaBIOS is polling for completion anyway. I think that's different because a disk will normally respond quickly. So it polls a bit, but then it stops as there are no outstanding requests. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] add irq priodrop support
This is the same Interrupt Priority Drop/Deactivation patch emailed some time back (except for 3.10-rc4) used by the initial device pass-through support. When enabled all IRQs on host write to distributor EOIR and DIR reg to dr-prioritize/de-activate an interrupt. For device that's passed through only the EOIR is written to drop the priority, the Guest deactivates it when it handles its EOI. This supports exitless EOI that's agnostic to bus type (i.e. PCI) The patch has been tested for all configurations: Host: No Prio Drop Guest: No Prio Drop Host: Prio DROP Guest: No Prio Drop Host: Prio Drop Guest: Prio Drop - Mario Signed-off-by: Mario Smarduch mario.smard...@huawei.com --- arch/arm/kvm/Kconfig|8 +++ drivers/irqchip/irq-gic.c | 145 ++- include/linux/irqchip/arm-gic.h |6 ++ 3 files changed, 156 insertions(+), 3 deletions(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 370e1a8..c0c9f3c 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -59,6 +59,14 @@ config KVM_ARM_VGIC ---help--- Adds support for a hardware assisted, in-kernel GIC emulation. +config KVM_ARM_INT_PRIO_DROP +bool KVM support for Interrupt pass-through +depends on KVM_ARM_VGIC OF +default n +---help--- + Seperates interrupt priority drop and deactivation to enable device + pass-through to Guests. + config KVM_ARM_TIMER bool KVM support for Architected Timers depends on KVM_ARM_VGIC ARM_ARCH_TIMER diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 1760ceb..9fb4ef3 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -41,10 +41,13 @@ #include linux/slab.h #include linux/irqchip/chained_irq.h #include linux/irqchip/arm-gic.h +#include linux/irqflags.h +#include linux/bitops.h #include asm/irq.h #include asm/exception.h #include asm/smp_plat.h +#include asm/virt.h #include irqchip.h @@ -99,6 +102,20 @@ struct irq_chip gic_arch_extn = { static struct gic_chip_data gic_data[MAX_GIC_NR] __read_mostly; +#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP +/* + * Priority drop/deactivation bit map, 1st 16 bits used for SGIs, this bit map + * is shared by several guests. If bit is set only execute EOI which drops + * current priority but not deactivation. + */ +static u32 gic_irq_prio_drop[DIV_ROUND_UP(1020, 32)] __read_mostly; +static void gic_eoi_irq_priodrop(struct irq_data *); +#endif + +static void gic_enable_gicc(void __iomem *); +static void gic_eoi_sgi(u32, void __iomem *); +static void gic_priodrop_remap_eoi(struct irq_chip *); + #ifdef CONFIG_GIC_NON_BANKED static void __iomem *gic_get_percpu_base(union gic_base *base) { @@ -296,7 +313,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs continue; } if (irqnr 16) { - writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI); + gic_eoi_sgi(irqstat, cpu_base); #ifdef CONFIG_SMP handle_IPI(irqnr, regs); #endif @@ -450,7 +467,7 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data *gic) writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / 4); writel_relaxed(0xf0, base + GIC_CPU_PRIMASK); - writel_relaxed(1, base + GIC_CPU_CTRL); + gic_enable_gicc(base); } #ifdef CONFIG_CPU_PM @@ -585,7 +602,7 @@ static void gic_cpu_restore(unsigned int gic_nr) writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4); writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK); - writel_relaxed(1, cpu_base + GIC_CPU_CTRL); + gic_enable_gicc(cpu_base); } static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) @@ -666,6 +683,7 @@ void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) { + gic_priodrop_remap_eoi(gic_chip); if (hw 32) { irq_set_percpu_devid(irq); irq_set_chip_and_handler(irq, gic_chip, @@ -857,4 +875,125 @@ IRQCHIP_DECLARE(cortex_a9_gic, arm,cortex-a9-gic, gic_of_init); IRQCHIP_DECLARE(msm_8660_qgic, qcom,msm-8660-qgic, gic_of_init); IRQCHIP_DECLARE(msm_qgic2, qcom,msm-qgic2, gic_of_init); +#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP +/* If HYP mode enabled and PRIO DROP set EOIR function to handle PRIO DROP */ +static inline void gic_priodrop_remap_eoi(struct irq_chip *chip) +{ + if (is_hyp_mode_available()) + chip-irq_eoi = gic_eoi_irq_priodrop; +} + +/* If HYP mode set enable interrupt priority drop/deactivation, and mark + * SGIs to deactive through writes to GCICC_DIR. For Guest only enable normal + * mode. + */ +static void gic_enable_gicc(void __iomem *gicc_base) +{ +
[PATCH 2/2] add initial kvm dev passhtrough support
This is the initial device pass through support. At this time host == guest only is supported. Basic Operation: - QEMU parameters: -device kvm-device-assign,host=device name for example - kvm-device-assign,host='arm-sp804'. Essentially any device that does PIO should be supported. - Host DTS contains the node for device to be passed through The host driver is unbound or not compiled in. - For Guest the intent is to add a DTS node that QEMU can parse and find the guest attributes (Mem. resource, IRQs) For now these values default to host. This is a future work item to get this working on board other then vexpress. - The physical interrupt is always passed through to CPU where the target vCPU executes or will execute. Current approach - pins vCPUs to physical CPUs, when Guest updates CPU affinity is updated in KVM vgic dist code. Future work item for IRQ affinity allow vCPU to float and on schedule in handle IRQ affinity. For high IRQ rates (i.e. wireless NEs) static binding may be used. For some other device (env. mgmt IPMI)where latency is not important dynamic may be used, it should be upto the user. - To support flexible affinity a mask is introduced (QEMU param0 (although not used here yet) o vCPU affinity - vCPU -- CPU binding, the IRQ physical CPU binding follows vCPU binding dynamically. - Obviously DMA is not supported - early DMA may be supported through a 1:1 mapping but it's unsafe and so far we don't know of any hardware that's not behind SMMU. This option may be useful in some embedded/wireless environments, where the guest may want to swap, secure isolation may not be an issue or device like look aside crypto engine is not behind IOMMU. - IOMMU/VFIO support is key and next item for us to work on. Especially for ETSI NFV VFIO is key since 4G/IMS NE pull packets of wire and switch them directly in user space. The patch has been tested on fast models in couple ways: - UP Guest with sp804 timer only - works consistently - SMP Guest with sp804 timer works consistently. Writes to '/proc/irq/sp804 irq/smp_affinity' confirm dynamic CPU affinity. - IRQ rates (maybe not that important give its emulated env) reached excess of 500. There is a QEMU piece very simple for now that I will email later, in case someone would like to test. - Mario Signed-off-by: Mario Smarduch mario.smard...@huawei.com --- arch/arm/include/asm/kvm_host.h | 14 +++ arch/arm/include/asm/kvm_vgic.h | 10 +++ arch/arm/kvm/Makefile |1 + arch/arm/kvm/arm.c | 60 + arch/arm/kvm/assign-dev.c | 189 +++ arch/arm/kvm/vgic.c | 106 ++ include/linux/irqchip/arm-gic.h |1 + include/uapi/linux/kvm.h| 33 +++ 8 files changed, 414 insertions(+) create mode 100644 arch/arm/kvm/assign-dev.c diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 57cb786..c6ad3a3 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -67,6 +67,10 @@ struct kvm_arch { /* Interrupt controller */ struct vgic_distvgic; + + /* Device Passthrough Fields */ + struct list_headassigned_dev_head; + struct mutexdev_pasthru_lock; }; #define KVM_NR_MEM_OBJS 40 @@ -146,6 +150,13 @@ struct kvm_vcpu_stat { u32 halt_wakeup; }; +struct kvm_arm_assigned_dev_kernel { + struct list_head list; + struct kvm_arm_assigned_device dev; + irqreturn_t (*irq_handler)(int, void *); + void *irq_arg; +}; + struct kvm_vcpu_init; int kvm_vcpu_set_target(struct kvm_vcpu *vcpu, const struct kvm_vcpu_init *init); @@ -156,6 +167,9 @@ int kvm_arm_get_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); int kvm_arm_set_reg(struct kvm_vcpu *vcpu, const struct kvm_one_reg *reg); u64 kvm_call_hyp(void *hypfn, ...); void force_vm_exit(const cpumask_t *mask); +int kvm_arm_get_device_resources(struct kvm *, + struct kvm_arm_get_device_resources *); +int kvm_arm_assign_device(struct kvm *, struct kvm_arm_assigned_device *); #define KVM_ARCH_WANT_MMU_NOTIFIER struct kvm; diff --git a/arch/arm/include/asm/kvm_vgic.h b/arch/arm/include/asm/kvm_vgic.h index 343744e..c4370ae 100644 --- a/arch/arm/include/asm/kvm_vgic.h +++ b/arch/arm/include/asm/kvm_vgic.h @@ -107,6 +107,16 @@ struct vgic_dist { /* Bitmap indicating which CPU has something pending */ unsigned long irq_pending_on_cpu; + + /* Device passthrough fields */ + /* Host irq to guest irq mapping */ + u8 guest_irq[VGIC_NR_SHARED_IRQS]; + + /* Pending passthruogh irq */ + struct vgic_bitmap pasthru_spi_pending; + + /* At least one passthrough IRQ pending for some vCPU */ + u32 pasthru_pending; #endif }; diff
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. SeaBIOS is polling for completion anyway. I think that's different because a disk will normally respond quickly. So it polls a bit, but then it stops as there are no outstanding requests. -- MST -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... SeaBIOS is polling for completion anyway. I think that's different because a disk will normally respond quickly. So it polls a bit, but then it stops as there are no outstanding requests. -- MST -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... You do not what to make it too slow obviously, this is interface that is used to load OS during boot. SeaBIOS is polling for completion anyway. I think that's different because a disk will normally respond quickly. So it polls a bit, but then it stops as there are no outstanding requests. -- MST -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... Also gleb mst, well the question is if it is safe to call int13 in the middle of pci bus enumeration/configuration gleb mst, and int13 predates PCI, so how knows SeaBIOS is polling for completion anyway. I think that's different because a disk will normally respond quickly. So it polls a bit, but then it stops as there are no outstanding requests. -- MST -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Dev Passthrough QEMU patch
This patch is for testing only and goes along with other two patches for priodrop and dev passthrough, it should apply against 1.4.5. diff --git a/cpus.c b/cpus.c index c15ff6c..0c19214 100644 --- a/cpus.c +++ b/cpus.c @@ -737,6 +737,26 @@ static void *qemu_kvm_cpu_thread_fn(void *arg) CPUState *cpu = ENV_GET_CPU(env); int r; +/* For now just do a 1:1 vCPU binding as they come online for device + * pass through + */ +cpu_set_t cpuset; +int ret, i; +unsigned long cpu_index = kvm_arch_vcpu_id(cpu); + +CPU_ZERO(cpuset); +CPU_SET(cpu_index, cpuset); +ret = pthread_setaffinity_np(pthread_self(), sizeof(cpu_set_t), cpuset); +if(ret != 0) { + printf(pthread_setaffinity_np failed to setaffinity to CPU 0\n); +exit(-1); +} + +CPU_ZERO(cpuset); +pthread_getaffinity_np(pthread_self(), sizeof(cpu_set_t), cpuset); +if(CPU_ISSET(cpu_index,cpuset)) +printf(Binding: vCPU %ld -- CPU %d\n, cpu_index, i); + qemu_mutex_lock(qemu_global_mutex); qemu_thread_get_self(cpu-thread); cpu-thread_id = qemu_get_thread_id(); diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h index caca979..46c2c59 100644 --- a/linux-headers/linux/kvm.h +++ b/linux-headers/linux/kvm.h @@ -904,6 +904,8 @@ struct kvm_s390_ucas_mapping { #define KVM_PPC_GET_HTAB_FD _IOW(KVMIO, 0xaa, struct kvm_get_htab_fd) /* Available with KVM_CAP_ARM_SET_DEVICE_ADDR */ #define KVM_ARM_SET_DEVICE_ADDR _IOW(KVMIO, 0xab, struct kvm_arm_device_addr) +#define KVM_ARM_GET_DEVICE_RESOURCES _IOW(KVMIO, 0xe1, struct kvm_arm_get_device_resources) +#define KVM_ARM_ASSIGN_DEVICE_IOW(KVMIO, 0xe2, struct kvm_arm_assigned_device) /* * ioctls for vcpu fds @@ -1013,6 +1015,7 @@ struct kvm_assigned_irq { }; }; + struct kvm_assigned_msix_nr { __u32 assigned_dev_id; __u16 entry_nr; @@ -1027,4 +1030,33 @@ struct kvm_assigned_msix_entry { __u16 padding[3]; }; + +/* MAX 6 MMIO resources per device */ +#define MAX_RES_PER_DEVICE 6 +struct kvm_arm_get_device_resources { +chardevname[128]; +__u32 resource_cnt; +struct { +__u64 hpa; +__u32 size; +__u32 attr; + charhost_name[64]; +} host_resources[MAX_RES_PER_DEVICE]; + struct { + __u32 hwirq; + __u32 attr; + charhost_name[64]; + } hostirq; +}; + +struct kvm_guest_device_resources { +__u64 gpa[MAX_RES_PER_DEVICE]; +__u32 girq; +}; + +struct kvm_arm_assigned_device { +struct kvm_arm_get_device_resources dev_res; +struct kvm_guest_device_resources guest_res; +}; + #endif /* __LINUX_KVM_H */ diff --git a/target-arm/Makefile.objs b/target-arm/Makefile.objs index d89b57c..9aee84e 100644 --- a/target-arm/Makefile.objs +++ b/target-arm/Makefile.objs @@ -1,5 +1,5 @@ obj-y += arm-semi.o obj-$(CONFIG_SOFTMMU) += machine.o -obj-$(CONFIG_KVM) += kvm.o +obj-$(CONFIG_KVM) += kvm.o device-assign.o obj-y += translate.o op_helper.o helper.o cpu.o obj-y += neon_helper.o iwmmxt_helper.o diff --git a/target-arm/device-assign.c b/target-arm/device-assign.c new file mode 100644 index 000..e4d0e97 --- /dev/null +++ b/target-arm/device-assign.c @@ -0,0 +1,118 @@ + +#include hw/sysbus.h +#include qemu-common.h +#include hw/qdev.h +#include hw/ptimer.h +#include kvm_arm.h +#include qemu/error-report.h + +#define IORESOURCE_TYPE_BITS0x1f00 /* Resource type */ +#define IORESOURCE_IO 0x0100 /* PCI/ISA I/O ports */ +#define IORESOURCE_MEM 0x0200 +#define IORESOURCE_REG 0x0300 /* Register offsets */ +#define IORESOURCE_IRQ 0x0400 +#define IORESOURCE_DMA 0x0800 + +#define IORESOURCE_PREFETCH 0x2000 /* No side effects */ +#define IORESOURCE_READONLY 0x4000 +#define IORESOURCE_CACHEABLE0x8000 + +typedef struct { +SysBusDevice busdev; +char *devname; +uint64_t hpa, gpa; +uint32_t dev_size; +uint32_t hirq,girq; +} AssignedDevice; + +static Property device_assign_properties[] = { +DEFINE_PROP_STRING(host, AssignedDevice, devname), +DEFINE_PROP_UINT64(hpa, AssignedDevice, hpa, 0), +DEFINE_PROP_UINT64(gpa, AssignedDevice, gpa, 0), +DEFINE_PROP_UINT32(size, AssignedDevice, dev_size, 0), +DEFINE_PROP_UINT32(hostirq, AssignedDevice, hirq, 0), +DEFINE_PROP_UINT32(guestirq, AssignedDevice, girq, 0), +DEFINE_PROP_END_OF_LIST(), +}; + +static int assign_device(AssignedDevice *dev) +{ +int ret,i; +struct kvm_arm_get_device_resources dev_res; +struct kvm_arm_assigned_device dev_assigned; +char *p, c='-'; + +memset(dev_res,0,sizeof(dev_res)); +memset(dev_assigned,0,sizeof(dev_assigned)); + +if((p = strstr(dev-devname, (char *)c)) != (char *) NULL) + *p = ','; +
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... You do not what to make it too slow obviously, this is interface that is used to load OS during boot. And possibly installation? SeaBIOS is polling for completion anyway. I think that's different because a disk will normally respond quickly. So it polls a bit, but then it stops as there are no outstanding requests. -- MST -- Gleb. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... You do not what to make it too slow obviously, this is interface that is used to load OS during boot. And possibly installation? Only the stage that reads files from CDROM. IIRC actual installation runs with native drivers. This is why Windows asks you to provide floppy with a driver at very early stage of installation. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add initial kvm dev passhtrough support
Am 11.06.2013 um 09:43 schrieb Mario Smarduch mario.smard...@huawei.com: This is the initial device pass through support. At this time host == guest only is supported. Basic Operation: - QEMU parameters: -device kvm-device-assign,host=device name for example - kvm-device-assign,host='arm-sp804'. Essentially any device that does PIO should be supported. Yikes! Over the last few years we've worked very hard to get rid of the unfortunate intertwining of device assignment and KVM. There are a number of reasons it's a bad idea: - kvm access is a potential priviledge escalation - device assignment is limited to kvm The solution to both of the above is VFIO. You get a completely separate interface for accessing your devices with a few connecting bits (irqfd, eventfd) to communicate quickly between vfio and kvm. Is there any particular reason you're not going down that path for your ARM implementation? On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 11:22:37AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... You do not what to make it too slow obviously, this is interface that is used to load OS during boot. And possibly installation? Only the stage that reads files from CDROM. IIRC actual installation runs with native drivers. This is why Windows asks you to provide floppy with a driver at very early stage of installation. Have any numbers to tell us how much time is spent there? E.g. if it's slowed down by a factor of 2, is it a problem? How about a factor of 10? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
On Tue, Jun 11, 2013 at 11:30:11AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 11:22:37AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 11:19:46AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 11:03:50AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 11:02:26AM +0300, Michael S. Tsirkin wrote: On Tue, Jun 11, 2013 at 10:53:48AM +0300, Gleb Natapov wrote: On Tue, Jun 11, 2013 at 10:10:47AM +0300, Michael S. Tsirkin wrote: On Thu, Jun 06, 2013 at 10:02:14AM -0500, Anthony Liguori wrote: Gleb Natapov g...@redhat.com writes: On Wed, Jun 05, 2013 at 07:41:17PM -0500, Anthony Liguori wrote: H. Peter Anvin h...@zytor.com writes: On 06/05/2013 03:08 PM, Anthony Liguori wrote: Definitely an option. However, we want to be able to boot from native devices, too, so having an I/O BAR (which would not be used by the OS driver) should still at the very least be an option. What makes it so difficult to work with an MMIO bar for PCI-e? With legacy PCI, tracking allocation of MMIO vs. PIO is pretty straight forward. Is there something special about PCI-e here? It's not tracking allocation. It is that accessing memory above 1 MiB is incredibly painful in the BIOS environment, which basically means MMIO is inaccessible. Oh, you mean in real mode. SeaBIOS runs the virtio code in 32-bit mode with a flat memory layout. There are loads of ASSERT32FLAT()s in the code to make sure of this. Well, not exactly. Initialization is done in 32bit, but disk reads/writes are done in 16bit mode since it should work from int13 interrupt handler. The only way I know to access MMIO bars from 16 bit is to use SMM which we do not have in KVM. Ah, if it's just the dataplane operations then there's another solution. We can introduce a virtqueue flag that asks the backend to poll for new requests. Then SeaBIOS can add the request to the queue and not worry about kicking or reading the ISR. This will pin a host CPU. If we do something timer based it will likely both increase host CPU utilization and slow device down. If we didn't care about performance at all we could do config cycles for signalling, which is much more elegant than polling in host, but I don't think that's the case. I wouldn't call BIOS int13 interface performance critical. So the plan always was to - add an MMIO BAR - add a register for pci-config based access to devices hpa felt performance does matter there but didn't clarify why ... You do not what to make it too slow obviously, this is interface that is used to load OS during boot. And possibly installation? Only the stage that reads files from CDROM. IIRC actual installation runs with native drivers. This is why Windows asks you to provide floppy with a driver at very early stage of installation. Have any numbers to tell us how much time is spent there? E.g. if it's slowed down by a factor of 2, is it a problem? How about a factor of 10? No I do not. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/6] KVM: MMU: make return value of mmio page fault handler more readable
On Mon, Jun 10, 2013 at 10:16:04PM +0900, Takuya Yoshikawa wrote: On Mon, 10 Jun 2013 10:57:50 +0300 Gleb Natapov g...@redhat.com wrote: On Fri, Jun 07, 2013 at 04:51:25PM +0800, Xiao Guangrong wrote: + +/* + * Return values of handle_mmio_page_fault_common: + * RET_MMIO_PF_EMULATE: it is a real mmio page fault, emulate the instruction + *directly. + * RET_MMIO_PF_RETRY: let CPU fault again on the address. + * RET_MMIO_PF_BUG: bug is detected. + */ +enum { + RET_MMIO_PF_EMULATE = 1, + RET_MMIO_PF_RETRY = 0, + RET_MMIO_PF_BUG = -1 +}; I would order them from -1 to 1 and rename RET_MMIO_PF_BUG to RET_MMIO_PF_ERROR, but no need to resend just for that. Why not just let compilers select the values? -- It's an enum. Any reason to make it start from -1? I am fine with this too as an additional patch. It makes sense to preserve original values like Xiao did for initial patch, since it is easier to verify that the patch is just a mechanical change. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 1/2] vhost: check owner before we overwrite ubuf_info
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 6 Jun 2013 15:20:39 +0300 If device has an owner, we shouldn't touch ubuf_info since it might be in use. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net 2/2] vhost: fix ubuf_info cleanup
From: Michael S. Tsirkin m...@redhat.com Date: Thu, 6 Jun 2013 15:20:46 +0300 vhost_net_clear_ubuf_info didn't clear ubuf_info after kfree, this could trigger double free. Fix this and simplify this code to make it more robust: make sure ubuf info is always freed through vhost_net_clear_ubuf_info. Reported-by: Tommi Rantala tt.rant...@gmail.com Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 03/13] nEPT: Add EPT tables support to paging_tmpl.h
On Tue, May 21, 2013 at 03:52:12PM +0800, Xiao Guangrong wrote: On 05/19/2013 12:52 PM, Jun Nakajima wrote: From: Nadav Har'El n...@il.ibm.com This is the first patch in a series which adds nested EPT support to KVM's nested VMX. Nested EPT means emulating EPT for an L1 guest so that L1 can use EPT when running a nested guest L2. When L1 uses EPT, it allows the L2 guest to set its own cr3 and take its own page faults without either of L0 or L1 getting involved. This often significanlty improves L2's performance over the previous two alternatives (shadow page tables over EPT, and shadow page tables over shadow page tables). This patch adds EPT support to paging_tmpl.h. paging_tmpl.h contains the code for reading and writing page tables. The code for 32-bit and 64-bit tables is very similar, but not identical, so paging_tmpl.h is #include'd twice in mmu.c, once with PTTTYPE=32 and once with PTTYPE=64, and this generates the two sets of similar functions. There are subtle but important differences between the format of EPT tables and that of ordinary x86 64-bit page tables, so for nested EPT we need a third set of functions to read the guest EPT table and to write the shadow EPT table. So this patch adds third PTTYPE, PTTYPE_EPT, which creates functions (prefixed with EPT) which correctly read and write EPT tables. Signed-off-by: Nadav Har'El n...@il.ibm.com Signed-off-by: Jun Nakajima jun.nakaj...@intel.com Signed-off-by: Xinhao Xu xinhao...@intel.com --- arch/x86/kvm/mmu.c | 5 + arch/x86/kvm/paging_tmpl.h | 43 +-- 2 files changed, 46 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 117233f..6c1670f 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -3397,6 +3397,11 @@ static inline bool is_last_gpte(struct kvm_mmu *mmu, unsigned level, unsigned gp return mmu-last_pte_bitmap (1 index); } +#define PTTYPE_EPT 18 /* arbitrary */ +#define PTTYPE PTTYPE_EPT +#include paging_tmpl.h +#undef PTTYPE + #define PTTYPE 64 #include paging_tmpl.h #undef PTTYPE diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index df34d4a..4c45654 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -50,6 +50,22 @@ #define PT_LEVEL_BITS PT32_LEVEL_BITS #define PT_MAX_FULL_LEVELS 2 #define CMPXCHG cmpxchg +#elif PTTYPE == PTTYPE_EPT + #define pt_element_t u64 + #define guest_walker guest_walkerEPT + #define FNAME(name) EPT_##name + #define PT_BASE_ADDR_MASK PT64_BASE_ADDR_MASK + #define PT_LVL_ADDR_MASK(lvl) PT64_LVL_ADDR_MASK(lvl) + #define PT_LVL_OFFSET_MASK(lvl) PT64_LVL_OFFSET_MASK(lvl) + #define PT_INDEX(addr, level) PT64_INDEX(addr, level) + #define PT_LEVEL_BITS PT64_LEVEL_BITS + #ifdef CONFIG_X86_64 + #define PT_MAX_FULL_LEVELS 4 + #define CMPXCHG cmpxchg + #else + #define CMPXCHG cmpxchg64 CMPXHG is only used in FNAME(cmpxchg_gpte), but you commented it later. Do we really need it? + #define PT_MAX_FULL_LEVELS 2 And the SDM says: It uses a page-walk length of 4, meaning that at most 4 EPT paging-structure entriesare accessed to translate a guest-physical address., Is My SDM obsolete? Which kind of process supports page-walk length = 2? It seems your patch is not able to handle the case that the guest uses walk-lenght = 2 which is running on the host with walk-lenght = 4. (plrease refer to how to handle sp-role.quadrant in FNAME(get_level1_sp_gpa) in the current code.) But since EPT always has 4 levels on all existing cpus it is not an issue and the only case that we should worry about is guest walk-lenght == host walk-lenght == 4, or have I misunderstood what you mean here? -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM fixes for 3.10-rc5
Linus, Please pull from git://git.kernel.org/pub/scm/virt/kvm/kvm.git fixes To receive the following KVM bug fixes. There is one more fix for MIPS KVM ABI here, MIPS and PPC build breakage fixes and a couple of PPC bug fixes. David Daney (2): kvm: Add definition of KVM_REG_MIPS mips/kvm: Use KVM_REG_MIPS and proper size indicators for *_ONE_REG James Hogan (1): KVM: add kvm_para_available to asm-generic/kvm_para.h Mihai Caraman (1): kvm/ppc/booke64: Fix AltiVec interrupt numbers and build breakage Scott Wood (3): kvm/ppc/booke64: Disable e6500 support kvm/ppc/booke: Hold srcu lock when calling gfn functions kvm/ppc/booke64: Fix lazy ee handling in kvmppc_handle_exit() arch/mips/include/uapi/asm/kvm.h | 81 +-- arch/mips/kvm/kvm_mips.c | 83 +++- arch/powerpc/include/asm/kvm_asm.h | 16 --- arch/powerpc/kvm/44x_tlb.c |5 +++ arch/powerpc/kvm/booke.c | 18 arch/powerpc/kvm/e500_mmu.c|5 +++ arch/powerpc/kvm/e500mc.c |2 - include/asm-generic/kvm_para.h |5 +++ include/uapi/linux/kvm.h |1 + 9 files changed, 137 insertions(+), 79 deletions(-) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] add irq priodrop support
On Tue, 11 Jun 2013 09:37:24 +0200, Mario Smarduch mario.smard...@huawei.com wrote: This is the same Interrupt Priority Drop/Deactivation patch emailed some time back (except for 3.10-rc4) used by the initial device pass-through support. When enabled all IRQs on host write to distributor EOIR and DIR reg to dr-prioritize/de-activate an interrupt. For device that's passed through only the EOIR is written to drop the priority, the Guest deactivates it when it handles its EOI. This supports exitless EOI that's agnostic to bus type (i.e. PCI) The patch has been tested for all configurations: Host: No Prio Drop Guest: No Prio Drop Host: Prio DROP Guest: No Prio Drop Host: Prio Drop Guest: Prio Drop - Mario Signed-off-by: Mario Smarduch mario.smard...@huawei.com Hi Mario, Comments below. I'm rather weak on how irq passthough is intended to work, so I don't have a lot of comments on that, but I did notice some things in this patch that should be addressed. --- arch/arm/kvm/Kconfig|8 +++ drivers/irqchip/irq-gic.c | 145 ++- include/linux/irqchip/arm-gic.h |6 ++ 3 files changed, 156 insertions(+), 3 deletions(-) diff --git a/arch/arm/kvm/Kconfig b/arch/arm/kvm/Kconfig index 370e1a8..c0c9f3c 100644 --- a/arch/arm/kvm/Kconfig +++ b/arch/arm/kvm/Kconfig @@ -59,6 +59,14 @@ config KVM_ARM_VGIC ---help--- Adds support for a hardware assisted, in-kernel GIC emulation. +config KVM_ARM_INT_PRIO_DROP +bool KVM support for Interrupt pass-through +depends on KVM_ARM_VGIC OF +default n +---help--- + Seperates interrupt priority drop and deactivation to enable device + pass-through to Guests. + Nit: check your whitespace (tabs vs. spaces) config KVM_ARM_TIMER bool KVM support for Architected Timers depends on KVM_ARM_VGIC ARM_ARCH_TIMER diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index 1760ceb..9fb4ef3 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -41,10 +41,13 @@ #include linux/slab.h #include linux/irqchip/chained_irq.h #include linux/irqchip/arm-gic.h +#include linux/irqflags.h +#include linux/bitops.h #include asm/irq.h #include asm/exception.h #include asm/smp_plat.h +#include asm/virt.h #include irqchip.h @@ -99,6 +102,20 @@ struct irq_chip gic_arch_extn = { static struct gic_chip_data gic_data[MAX_GIC_NR] __read_mostly; +#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP +/* + * Priority drop/deactivation bit map, 1st 16 bits used for SGIs, this bit map + * is shared by several guests. If bit is set only execute EOI which drops + * current priority but not deactivation. + */ +static u32 gic_irq_prio_drop[DIV_ROUND_UP(1020, 32)] __read_mostly; I believe it is possible to have more than one GIC in a system. This map assumes only one. The prio_drop map should probably be part of gic_chip_data so that it is per-instance. Also, as discussed below, the code should be using DECLARE_BITMAP() +static void gic_eoi_irq_priodrop(struct irq_data *); +#endif + +static void gic_enable_gicc(void __iomem *); +static void gic_eoi_sgi(u32, void __iomem *); +static void gic_priodrop_remap_eoi(struct irq_chip *); + The typical pattern here is to actually define the static functions above the code that uses them so that forward declarations are not required. #ifdef CONFIG_GIC_NON_BANKED static void __iomem *gic_get_percpu_base(union gic_base *base) { @@ -296,7 +313,7 @@ static asmlinkage void __exception_irq_entry gic_handle_irq(struct pt_regs *regs continue; } if (irqnr 16) { - writel_relaxed(irqstat, cpu_base + GIC_CPU_EOI); + gic_eoi_sgi(irqstat, cpu_base); #ifdef CONFIG_SMP handle_IPI(irqnr, regs); #endif @@ -450,7 +467,7 @@ static void __cpuinit gic_cpu_init(struct gic_chip_data *gic) writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4 / 4); writel_relaxed(0xf0, base + GIC_CPU_PRIMASK); - writel_relaxed(1, base + GIC_CPU_CTRL); + gic_enable_gicc(base); } #ifdef CONFIG_CPU_PM @@ -585,7 +602,7 @@ static void gic_cpu_restore(unsigned int gic_nr) writel_relaxed(0xa0a0a0a0, dist_base + GIC_DIST_PRI + i * 4); writel_relaxed(0xf0, cpu_base + GIC_CPU_PRIMASK); - writel_relaxed(1, cpu_base + GIC_CPU_CTRL); + gic_enable_gicc(cpu_base); } static int gic_notifier(struct notifier_block *self, unsigned long cmd, void *v) @@ -666,6 +683,7 @@ void gic_raise_softirq(const struct cpumask *mask, unsigned int irq) static int gic_irq_domain_map(struct irq_domain *d, unsigned int irq, irq_hw_number_t hw) { + gic_priodrop_remap_eoi(gic_chip); gic_priodrop_remap_eoi()
Re: [PATCH 2/2] add initial kvm dev passhtrough support
On 6/11/2013 10:28 AM, Alexander Graf wrote: Is there any particular reason you're not going down that path for your ARM implementation? We see this as a good starting point to build on, we need baseline numbers for performance, latency, interrupt throughput on real hardware ASAP to build competency for NFV, which has demanding Dev. Passthrough requirements. Over time we plan contributing to SMMU and VFIO as well (we're looking into this now). FYI NFV is an initiative wireless/fixed network operators are working towards - to virtualize Core, likely Radia Access and even Home Network equipment, this is a epic undertaking (i.e. Network Function Virtualization). So far VMware has taken the lead (mostly x86). On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. I'll email you offline, I'd like to know more what you've done on this and see where we can align/leverage the effort. - Mario Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM minutes for 2013-06-11
2013-06-11 -- - move ACPI table generation to QEMU - code sharing with SEABIOS - easier to generate there Anthony: it is the same put in QEMU or SEABIOS Michael: there are some information not easily available in seabios (hot plug) Anthony: transfer QOM tree to SEABIOS, current interface shows its age. - information hardcoded that change over time this is easier in qemu Example bus device number: maintain device number stable over migration It is easier to maintian the mostly static tables in QEMU. QEMU knows the whole device tree, so it is easy to generate. Where are we know? Do we have enough ACPI support into QEMU? Anthony wants a mergable tree before starting. Still think it is the wrong approach. Create a new serial port and enable it through ACPI? Using iasl at roon time and source table is problematic. iasl don't work on big endian hosts. - VFIO? How to do it (Alex), will be discussed on the list Later, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add initial kvm dev passhtrough support
On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote: On 6/11/2013 10:28 AM, Alexander Graf wrote: Is there any particular reason you're not going down that path for your ARM implementation? We see this as a good starting point to build on, we need baseline numbers for performance, latency, interrupt throughput on real hardware ASAP to build competency for NFV, which has demanding Dev. Passthrough requirements. Over time we plan contributing to SMMU and VFIO as well (we're looking into this now). FYI NFV is an initiative wireless/fixed network operators are working towards - to virtualize Core, likely Radia Access and even Home Network equipment, this is a epic undertaking (i.e. Network Function Virtualization). So far VMware has taken the lead (mostly x86). On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. I'll email you offline, I'd like to know more what you've done on this and see where we can align/leverage the effort. Yes, please let's use VFIO rather than continue to use or invent new device assignment interfaces for KVM. Antonios Motakis (cc'd) already contacted me about VFIO for ARM. IIRC, his initial impression was that the IOMMU backend was almost entirely reusable for ARM (a couple PCI assumptions implicit in the IOMMU API to handle) and my hope was that ARM and PPC could work together on a common VFIO device tree backend. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] add initial kvm dev passhtrough support
I know Antonios very well. Yes our intent is definitely to use VFIO. - Mario On 6/11/2013 4:52 PM, Alex Williamson wrote: On Tue, 2013-06-11 at 16:13 +0200, Mario Smarduch wrote: On 6/11/2013 10:28 AM, Alexander Graf wrote: Is there any particular reason you're not going down that path for your ARM implementation? We see this as a good starting point to build on, we need baseline numbers for performance, latency, interrupt throughput on real hardware ASAP to build competency for NFV, which has demanding Dev. Passthrough requirements. Over time we plan contributing to SMMU and VFIO as well (we're looking into this now). FYI NFV is an initiative wireless/fixed network operators are working towards - to virtualize Core, likely Radia Access and even Home Network equipment, this is a epic undertaking (i.e. Network Function Virtualization). So far VMware has taken the lead (mostly x86). On the embedded PPC side we've been discussing vfio and how it fits into a device tree, non-PCI world for a while. If you like, we can dive into more detail on that, either via email or via phone. I'll email you offline, I'd like to know more what you've done on this and see where we can align/leverage the effort. Yes, please let's use VFIO rather than continue to use or invent new device assignment interfaces for KVM. Antonios Motakis (cc'd) already contacted me about VFIO for ARM. IIRC, his initial impression was that the IOMMU backend was almost entirely reusable for ARM (a couple PCI assumptions implicit in the IOMMU API to handle) and my hope was that ARM and PPC could work together on a common VFIO device tree backend. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] add irq priodrop support
Hi Grant, appreciate the strong feedback, I agree with all the coding observations will make the changes. I have few inline responses. +static u32 gic_irq_prio_drop[DIV_ROUND_UP(1020, 32)] __read_mostly; I believe it is possible to have more than one GIC in a system. This map assumes only one. The prio_drop map should probably be part of gic_chip_data so that it is per-instance. Also, as discussed below, the code should be using DECLARE_BITMAP() Agree. gic_priodrop_remap_eoi() is used exactly once. You should instead put the body of it inline like so: if (IS_ENABLED(CONFIG_KVM_ARM_INT_PRIO_DROP) is_hyp_mode_available()) chip-irq_eoi = gic_eoi_irq_priodrop; Yes much cleaner. However, this block is problematic. For each map call it modifies the /global/ gic_chip. It's not a per-interrupt thing, but rather changes the callback for all gic interrupts, on *any* gic in the system. Is this really what you want? If it is, then I would expect the callback to be modified once sometime around gic_init_bases() time. Yes need to move it up, now its being set for each IRQ domain mapping call. If it is not, and what you really want is per-irq behaviour, then what you need to do is have a separate gic_priodrop_chip that can be used on a per-irq basis instead of the gic_chip. Prio drop/deactivate is per CPU and all IRQs are affected including SGIs. It's possible to run mixed CPU modes, but this patch enables all CPUs for device passthrough, similar to hyp mode enable. Another way would be the reverse - set all non-passthrough irqs to gic_priodrop_chip and the passed through IRQ to gic_chip. I think keeping it in one function and just setting a bit to enable/disable is cleaner. if (hw 32) { irq_set_percpu_devid(irq); irq_set_chip_and_handler(irq, gic_chip, @@ -857,4 +875,125 @@ IRQCHIP_DECLARE(cortex_a9_gic, arm,cortex-a9-gic, gic_of_init); IRQCHIP_DECLARE(msm_8660_qgic, qcom,msm-8660-qgic, gic_of_init); IRQCHIP_DECLARE(msm_qgic2, qcom,msm-qgic2, gic_of_init); +#ifdef CONFIG_KVM_ARM_INT_PRIO_DROP +/* If HYP mode enabled and PRIO DROP set EOIR function to handle PRIO DROP */ +static inline void gic_priodrop_remap_eoi(struct irq_chip *chip) +{ +if (is_hyp_mode_available()) +chip-irq_eoi = gic_eoi_irq_priodrop; +} + +/* If HYP mode set enable interrupt priority drop/deactivation, and mark + * SGIs to deactive through writes to GCICC_DIR. For Guest only enable normal + * mode. + */ Nit: Read Documentation/kernel-doc-nano-HOWTO.txt. It's a good idea to stick to that format when writing function documenation. Also, convention is for multiline comments to have an empty /* line before the first line of text. Will do. +} + +void gic_spi_clr_priodrop(int irq) +{ +struct irq_data *d = irq_get_irq_data(irq); +if (likely(irq = 32 irq 1019)) { 1019 ... +clear_bit(irq % 32, (void *) gic_irq_prio_drop[irq/32]); +writel_relaxed(irq, gic_cpu_base(d) + GIC_CPU_DIR); +} +} + +int gic_spi_get_priodrop(int irq) +{ +if (likely(irq = 32 irq = 1019)) ... = 1019 Looks like some off-by-one errors going on here. Also, the rest of the gic code uses 1020, not 1019 as the upper limit. What is the reason for being difference in this code block? Hmmm a mistake. ___ linux-arm-kernel mailing list linux-arm-ker...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-arm-kernel -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-06-11
On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote: Juan is not available now, and Anthony asked for agenda to be sent early. So here comes: Agenda for the meeting Tue, June 11: - Generating acpi tables, redux Not so much notes as a quick summary of the call: There are the following reasons to generate ACPI tables in QEMU: - sharing code with e.g. ovmf Anthony thinks this is not a valid argument - so we can make tables more dynamic and move away from iasl Anthony thinks this is not a valid reason too, since qemu and seabios have access to same info MST noted several info not accessible to bios. Anthony said they can be added, e.g. by exposing QOM to the bios. - even though most tables are static, hardcoded they are likely to change over time Anthony sees this as justified To summarize, there's a concensus now that generating ACPI tables in QEMU is a good idea. Two issues that need to be addressed: - original patches break cross-version migration. Need to fix that. - Anthony requested that patchset is merged together with some new feature. I'm not sure the reasoning is clear: current a version intentionally generates tables that are bug for bug compatible with seabios, to simplify testing. It seems clear we have users for this such as hotplug of devices behind pci bridges, so why keep the infrastructure out of tree? Looking for something additional, smaller as the hotplug patch is a bit big, so might delay merging. Going forward - would we want to move smbios as well? Everyone seems to think it's a good idea. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for 2013-06-25
Hi Now we have moved to one call each other week. Please, send any topic that you are interested in covering. Thanks, Juan. PD. If you want to attend and you don't have the call details, contact me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] vfio-pci: Avoid deadlock on remove
If an attempt is made to unbind a device from vfio-pci while that device is in use, the request is blocked until the device becomes unused. Unfortunately, that unbind path still grabs the device_lock, which certain things like __pci_reset_function() also want to take. This means we need to try to acquire the locks ourselves and use the pre-locked version, __pci_reset_function_locked(). Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/pci/vfio_pci.c | 23 +-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c index ac37254..41023e4 100644 --- a/drivers/vfio/pci/vfio_pci.c +++ b/drivers/vfio/pci/vfio_pci.c @@ -137,8 +137,27 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev) */ pci_write_config_word(pdev, PCI_COMMAND, PCI_COMMAND_INTX_DISABLE); - if (vdev-reset_works) - __pci_reset_function(pdev); + /* +* Careful, device_lock may already be held. This is the case if +* a driver unbind is blocked. Try to get the locks ourselves to +* prevent a deadlock. +*/ + if (vdev-reset_works) { + bool reset_done = false; + + if (pci_cfg_access_trylock(pdev)) { + if (device_trylock(pdev-dev)) { + __pci_reset_function_locked(pdev); + reset_done = true; + device_unlock(pdev-dev); + } + pci_cfg_access_unlock(pdev); + } + + if (!reset_done) + pr_warn(%s: Unable to acquire locks for reset of %s\n, + __func__, dev_name(pdev-dev)); + } pci_restore_state(pdev); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] vfio: Don't overreact to DEL_DEVICE
BUS_NOTIFY_DEL_DEVICE triggers IOMMU drivers to remove devices from their iommu group, but there's really nothing we can do about it at this point. If the device is in use, then the vfio sub-driver will block the device_del from completing until it's released. If the device is not in use or not owned by a vfio sub-driver, then we really don't care that it's being removed. The current code can be triggered just by unloading an sr-iov driver (ex. igb) while the VFs are attached to vfio-pci because it makes an incorrect assumption about the ordering of driver remove callbacks vs the DEL_DEVICE notification. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/vfio.c | 29 +++-- 1 file changed, 7 insertions(+), 22 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 6d78736..1bed313 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -492,27 +492,6 @@ static int vfio_group_nb_add_dev(struct vfio_group *group, struct device *dev) return 0; } -static int vfio_group_nb_del_dev(struct vfio_group *group, struct device *dev) -{ - struct vfio_device *device; - - /* -* Expect to fall out here. If a device was in use, it would -* have been bound to a vfio sub-driver, which would have blocked -* in .remove at vfio_del_group_dev. Sanity check that we no -* longer track the device, so it's safe to remove. -*/ - device = vfio_group_get_device(group, dev); - if (likely(!device)) - return 0; - - WARN(Device %s removed from live group %d!\n, dev_name(dev), -iommu_group_id(group-iommu_group)); - - vfio_device_put(device); - return 0; -} - static int vfio_group_nb_verify(struct vfio_group *group, struct device *dev) { /* We don't care what happens when the group isn't in use */ @@ -543,7 +522,13 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb, vfio_group_nb_add_dev(group, dev); break; case IOMMU_GROUP_NOTIFY_DEL_DEVICE: - vfio_group_nb_del_dev(group, dev); + /* +* Nothing to do here. If the device is in use, then the +* vfio sub-driver should block the remove callback until +* it is unused. If the device is unused or attached to a +* stub driver, then it should be released and we don't +* care that it will be going away. +*/ break; case IOMMU_GROUP_NOTIFY_BIND_DRIVER: pr_debug(%s: Device %s, group %d binding to driver\n, -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] vfio: Ignore sprurious notifies
Remove debugging WARN_ON if we get a spurious notify for a group that no longer exists. No reports of anyone hitting this, but it would likely be a race and not a bug if they did. Signed-off-by: Alex Williamson alex.william...@redhat.com --- drivers/vfio/vfio.c |8 +++- 1 file changed, 3 insertions(+), 5 deletions(-) diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c index 1bed313..2edfecc 100644 --- a/drivers/vfio/vfio.c +++ b/drivers/vfio/vfio.c @@ -508,13 +508,11 @@ static int vfio_iommu_group_notifier(struct notifier_block *nb, struct device *dev = data; /* -* Need to go through a group_lock lookup to get a reference or -* we risk racing a group being removed. Leave a WARN_ON for -* debuging, but if the group no longer exists, a spurious notify -* is harmless. +* Need to go through a group_lock lookup to get a reference or we +* risk racing a group being removed. Ignore spurious notifies. */ group = vfio_group_try_get(group); - if (WARN_ON(!group)) + if (!group) return NOTIFY_OK; switch (action) { -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] vfio: fixup notifiers and avoid possible deadlock
Cleanup a couple of the notifier paths to remove bogus WARN_ON calls. One is pretty easy to hit and neither really signifies a problems. Fix remove path to avoid potential deadlock with other device_lock holders. Thanks, Alex --- Alex Williamson (3): vfio: Don't overreact to DEL_DEVICE vfio: Ignore sprurious notifies vfio-pci: Avoid deadlock on remove drivers/vfio/pci/vfio_pci.c | 23 +-- drivers/vfio/vfio.c | 37 ++--- 2 files changed, 31 insertions(+), 29 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 59521] KVM linux guest reads uninitialized pvclock values before executing rdmsr MSR_KVM_WALL_CLOCK
https://bugzilla.kernel.org/show_bug.cgi?id=59521 --- Comment #1 from Eugene Batalov eabatalo...@gmail.com 2013-06-11 16:03:55 --- I have reconstructed the uninitialized pvclock read backtrace. References to file lines are for Ubuntu-raring kernel git://kernel.ubuntu.com/ubuntu/ubuntu-raring.git tag is Ubuntu-3.8.0-19.30 bp: 0xf3ccbe68 ip: 0xc103cfbd arch/x86/include/asm/pvclock.h:78 arch/x86/kernel/pvclock.c:74 bp: 0xf3ccbe70 ip: 0xc103c057 arch/x86/kernel/kvmclock.c:91 bp: 0xf3ccbe78 ip: 0xc1017598 arch/x86/kernel/tsc.c:58 bp: 0xf3ccbea8 ip: 0xc107e98d kernel/sched/clock.c:248 bp: 0xf3ccbeb8 ip: 0xc107ea35 kernel/sched/clock.c:342 bp: 0xf3ccbf08 ip: 0xc104ad85 kernel/printk.c:356 bp: 0xf3ccbf50 ip: 0xc104c4e1 kernel/printk.c:1607 bp: 0xf3ccbf70 ip: 0xc1609bb6 kernel/printk.c:1688 bp: 0xf3ccbf90 ip: 0xc1600a51 arch/x86/include/asm/bitops.h:321 arch/x86/kernel/cpu/common.c:1325 bp: 0xf3ccbfb4 ip: 0xc1604000 ?? bp: 0x kernel/printk.c:356 calls local_clock() calls sched_clock_cpu() calls sched_clock() calls paravirt_sched_clock() calls indirectly kvm_clock_read() unintialized pv_clock is read here vcpu kvmclock initialization is performed in kvm_register_clock. kvm_register_clock is called from static void __init kvm_smp_prepare_boot_cpu(void) called form ./init/main.c:524 as smp_prepare_boot_cpu I'll think about proper fix soon. We probably should fix cpu initialization stages order or disable usage of pvclock before it initialized. -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kwmppc_lazy_ee_enable() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_lazy_ee_enable(). Signed-off-by: Scott Wood scottw...@freescale.com --- Rebased without patches 5 and 6 in the previous patchset. arch/powerpc/kvm/booke.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5cd7ad0..1a1b511 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -673,7 +673,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_lazy_ee_enable(); kvm_guest_enter(); @@ -699,6 +698,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + kvmppc_lazy_ee_enable(); + ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2013-06-25
I don't think my presence on the call is necessary, but I would appreciate it you put RDMA on the agenda. The patches have been thoroughly bug-tested and reviewed. - Michael On 06/11/2013 11:52 AM, Juan Quintela wrote: Hi Now we have moved to one call each other week. Please, send any topic that you are interested in covering. Thanks, Juan. PD. If you want to attend and you don't have the call details, contact me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-06-11
On 06/11/13 17:45, Michael S. Tsirkin wrote: To summarize, there's a concensus now that generating ACPI tables in QEMU is a good idea. Two issues that need to be addressed: - original patches break cross-version migration. Need to fix that. - Anthony requested that patchset is merged together with some new feature. I'm not sure the reasoning is clear: current a version intentionally generates tables that are bug for bug compatible with seabios, to simplify testing. Sorry about not following the series more closely -- is there now a qemu interface available that allows any firmware just take the tables, maybe to fix them up blindly / algorithmically, and to install them? IOW, is the interface at such a point that in OVMF we could start looking throwing out specific code, in favor of implementing the generic fw-side algorithm? It seems clear we have users for this such as hotplug of devices behind pci bridges, so why keep the infrastructure out of tree? Looking for something additional, smaller as the hotplug patch is a bit big, so might delay merging. Going forward - would we want to move smbios as well? Everyone seems to think it's a good idea. I think the current fw_cfg interface for SMBIOS tables is already good enough to save a lot of work in OVMF. Namely, if all required tables were generated (table template + field-wise patching) in qemu, and then all exported over fw_cfg as verbatim tables, my SMBIOS series currently pending for OVMF should be able to install them. This would save OVMF the coding of templates (and any necessary patching) for types 3, 4 (especially nasty), 9, 16, 17, 19, and 32. (Basically all except type 0 and type 1, which are already implemented (but verbatim tables from qemu would take priority even for type 0 and type 1). Type 7 can be left out apparently; IIRC dmidecode doesn't report it even under SeaBIOS.) I'm not implying anyone should start working on this (myself included :)), but yeah, moving SMBIOS would save work in OVMF. (Provided there was any reason to support said SMBIOS tables in OVMF :)) Thanks, Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call agenda for 2013-06-25
On 11.06.2013, at 17:52, Juan Quintela wrote: Hi Now we have moved to one call each other week. Please, send any topic that you are interested in covering. VFIO for device tree based platforms Alex Thanks, Juan. PD. If you want to attend and you don't have the call details, contact me. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-06-11
Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote: Juan is not available now, and Anthony asked for agenda to be sent early. So here comes: Agenda for the meeting Tue, June 11: - Generating acpi tables, redux Not so much notes as a quick summary of the call: There are the following reasons to generate ACPI tables in QEMU: - sharing code with e.g. ovmf Anthony thinks this is not a valid argument - so we can make tables more dynamic and move away from iasl Anthony thinks this is not a valid reason too, since qemu and seabios have access to same info MST noted several info not accessible to bios. Anthony said they can be added, e.g. by exposing QOM to the bios. - even though most tables are static, hardcoded they are likely to change over time Anthony sees this as justified To summarize, there's a concensus now that generating ACPI tables in QEMU is a good idea. I would say best worst idea ;-) I am deeply concerned about the complexity it introduces but I don't see many other options. Two issues that need to be addressed: - original patches break cross-version migration. Need to fix that. - Anthony requested that patchset is merged together with some new feature. I'm not sure the reasoning is clear: current a version intentionally generates tables that are bug for bug compatible with seabios, to simplify testing. I expect that there will be additional issues that need to be worked out and want to see a feature that actually uses the infrastructure before we add it. It seems clear we have users for this such as hotplug of devices behind pci bridges, so why keep the infrastructure out of tree? It's hard to evaluate the infrastructure without a user. Looking for something additional, smaller as the hotplug patch is a bit big, so might delay merging. Going forward - would we want to move smbios as well? Everyone seems to think it's a good idea. Yes, independent of ACPI, I think QEMU should be generating the SMBIOS tables. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-06-11
On Tue, Jun 11, 2013 at 08:06:15PM +0200, Laszlo Ersek wrote: On 06/11/13 17:45, Michael S. Tsirkin wrote: To summarize, there's a concensus now that generating ACPI tables in QEMU is a good idea. Two issues that need to be addressed: - original patches break cross-version migration. Need to fix that. - Anthony requested that patchset is merged together with some new feature. I'm not sure the reasoning is clear: current a version intentionally generates tables that are bug for bug compatible with seabios, to simplify testing. Sorry about not following the series more closely -- is there now a qemu interface available that allows any firmware just take the tables, maybe to fix them up blindly / algorithmically, and to install them? Yes. IOW, is the interface at such a point that in OVMF we could start looking throwing out specific code, in favor of implementing the generic fw-side algorithm? It seems clear we have users for this such as hotplug of devices behind pci bridges, so why keep the infrastructure out of tree? Looking for something additional, smaller as the hotplug patch is a bit big, so might delay merging. Going forward - would we want to move smbios as well? Everyone seems to think it's a good idea. I think the current fw_cfg interface for SMBIOS tables is already good enough to save a lot of work in OVMF. Namely, if all required tables were generated (table template + field-wise patching) in qemu, and then all exported over fw_cfg as verbatim tables, my SMBIOS series currently pending for OVMF should be able to install them. This would save OVMF the coding of templates (and any necessary patching) for types 3, 4 (especially nasty), 9, 16, 17, 19, and 32. (Basically all except type 0 and type 1, which are already implemented (but verbatim tables from qemu would take priority even for type 0 and type 1). Type 7 can be left out apparently; IIRC dmidecode doesn't report it even under SeaBIOS.) I'm not implying anyone should start working on this (myself included :)), but yeah, moving SMBIOS would save work in OVMF. (Provided there was any reason to support said SMBIOS tables in OVMF :)) Thanks, Laszlo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for 2013-06-11
On Tue, Jun 11, 2013 at 01:38:11PM -0500, Anthony Liguori wrote: Michael S. Tsirkin m...@redhat.com writes: On Tue, Jun 04, 2013 at 04:24:31PM +0300, Michael S. Tsirkin wrote: Juan is not available now, and Anthony asked for agenda to be sent early. So here comes: Agenda for the meeting Tue, June 11: - Generating acpi tables, redux Not so much notes as a quick summary of the call: There are the following reasons to generate ACPI tables in QEMU: - sharing code with e.g. ovmf Anthony thinks this is not a valid argument - so we can make tables more dynamic and move away from iasl Anthony thinks this is not a valid reason too, since qemu and seabios have access to same info MST noted several info not accessible to bios. Anthony said they can be added, e.g. by exposing QOM to the bios. - even though most tables are static, hardcoded they are likely to change over time Anthony sees this as justified To summarize, there's a concensus now that generating ACPI tables in QEMU is a good idea. I would say best worst idea ;-) I am deeply concerned about the complexity it introduces but I don't see many other options. Two issues that need to be addressed: - original patches break cross-version migration. Need to fix that. - Anthony requested that patchset is merged together with some new feature. I'm not sure the reasoning is clear: current a version intentionally generates tables that are bug for bug compatible with seabios, to simplify testing. I expect that there will be additional issues that need to be worked out and want to see a feature that actually uses the infrastructure before we add it. So please look at it, that code has been posted. See: [PATCH] qemu: piix: PCI bridge ACPI hotplug support it does not seem to show any major issues to work out besides the cross-version migration issue that we know about. It seems clear we have users for this such as hotplug of devices behind pci bridges, so why keep the infrastructure out of tree? It's hard to evaluate the infrastructure without a user. But the user has been posted, even if there are still issues to work out with it, that should be enough to evaluate the infrastructure - the user itself does not need to be merged for this. So please evaluate and give feedback. Looking for something additional, smaller as the hotplug patch is a bit big, so might delay merging. Going forward - would we want to move smbios as well? Everyone seems to think it's a good idea. Yes, independent of ACPI, I think QEMU should be generating the SMBIOS tables. Regards, Anthony Liguori -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-scsi: return -ENOENT when no matching tcm_vhost_tpg found
cc to Greg for 3.9. On Tue, May 28, 2013 at 04:54:44PM +0800, Wenchao Xia wrote: ioctl for VHOST_SCSI_SET_ENDPOINT report file exist errori, when I forget to set it correctly in configfs, make user confused. Actually it fail to find a matching one, so change the error value. Signed-off-by: Wenchao Xia wenchaoli...@gmail.com Acked-by: Asias He as...@redhat.com BTW, It would be nice to print more informative info in qemu when wwpn is not available as well. --- drivers/vhost/scsi.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/vhost/scsi.c b/drivers/vhost/scsi.c index 7014202..6325b1d 100644 --- a/drivers/vhost/scsi.c +++ b/drivers/vhost/scsi.c @@ -1219,7 +1219,7 @@ static int vhost_scsi_set_endpoint( } ret = 0; } else { - ret = -EEXIST; + ret = -ENOENT; } /* -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM: x86: handle idiv overflow at kvm_write_tsc
Its possible that idivl overflows (due to large delta stored in usdiff, valid scenario). Create an exception handler to catch the overflow exception (division by zero is protected by vcpu-arch.virtual_tsc_khz check), and interpret it accordingly (delta is larger than USEC_PER_SEC). Fixes https://bugzilla.redhat.com/show_bug.cgi?id=969644 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 094b5d9..64a4b03 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1194,20 +1194,37 @@ void kvm_write_tsc(struct kvm_vcpu *vcpu, struct msr_data *msr) elapsed = ns - kvm-arch.last_tsc_nsec; if (vcpu-arch.virtual_tsc_khz) { + int faulted = 0; + /* n.b - signed multiplication and division required */ usdiff = data - kvm-arch.last_tsc_write; #ifdef CONFIG_X86_64 usdiff = (usdiff * 1000) / vcpu-arch.virtual_tsc_khz; #else /* do_div() only does unsigned */ - asm(idivl %2; xor %%edx, %%edx - : =A(usdiff) - : A(usdiff * 1000), rm(vcpu-arch.virtual_tsc_khz)); + asm(1: idivl %[divisor]\n + 2: xor %%edx, %%edx\n + movl $0, %[faulted]\n + 3:\n + .section .fixup,\ax\\n + 4: movl $1, %[faulted]\n + jmp 3b\n + .previous\n + + _ASM_EXTABLE(1b, 4b) + + : =A(usdiff), [faulted] =r (faulted) + : A(usdiff * 1000), [divisor] rm(vcpu-arch.virtual_tsc_khz)); + #endif do_div(elapsed, 1000); usdiff -= elapsed; if (usdiff 0) usdiff = -usdiff; + + /* idivl overflow = difference is larger than USEC_PER_SEC */ + if (faulted) + usdiff = USEC_PER_SEC; } else usdiff = USEC_PER_SEC; /* disable TSC match window below */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost-scsi: return -ENOENT when no matching tcm_vhost_tpg found
On Wed, Jun 12, 2013 at 09:39:50AM +0800, wenchao wrote: cc to Greg for 3.9. formletter This is not the correct way to submit patches for inclusion in the stable kernel tree. Please read Documentation/stable_kernel_rules.txt for how to do this properly. /formletter -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 v3] KVM: PPC: IOMMU in-kernel handling
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: Ben, ping! :) This series has tiny fixes (capability and ioctl numbers, changed documentation, compile errors in some configuration). More details are in the commit messages. Rebased on v3.10-rc4. Alex, I assume you'll merge that once I ack it ? Cheers, Ben. Alexey Kardashevskiy (4): KVM: PPC: Add support for multiple-TCE hcalls powerpc: Prepare to support kernel handling of IOMMU map/unmap KVM: PPC: Add support for IOMMU in-kernel handling KVM: PPC: Add hugepage support for IOMMU in-kernel handling Documentation/virtual/kvm/api.txt| 45 +++ arch/powerpc/include/asm/kvm_host.h |7 + arch/powerpc/include/asm/kvm_ppc.h | 40 ++- arch/powerpc/include/asm/pgtable-ppc64.h |4 + arch/powerpc/include/uapi/asm/kvm.h |7 + arch/powerpc/kvm/book3s_64_vio.c | 398 - arch/powerpc/kvm/book3s_64_vio_hv.c | 471 -- arch/powerpc/kvm/book3s_hv.c | 39 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + arch/powerpc/kvm/book3s_pr_papr.c| 37 ++- arch/powerpc/kvm/powerpc.c | 15 + arch/powerpc/mm/init_64.c| 77 - include/uapi/linux/kvm.h |3 + 13 files changed, 1121 insertions(+), 28 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] kvm/ppc/booke: Delay kvmppc_lazy_ee_enable
kwmppc_lazy_ee_enable() should be called as late as possible, or else we get things like WARN_ON(preemptible()) in enable_kernel_fp() in configurations where preemptible() works. Note that book3s_pr already waits until just before __kvmppc_vcpu_run to call kvmppc_lazy_ee_enable(). Signed-off-by: Scott Wood scottw...@freescale.com --- Rebased without patches 5 and 6 in the previous patchset. arch/powerpc/kvm/booke.c |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c index 5cd7ad0..1a1b511 100644 --- a/arch/powerpc/kvm/booke.c +++ b/arch/powerpc/kvm/booke.c @@ -673,7 +673,6 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) ret = s; goto out; } - kvmppc_lazy_ee_enable(); kvm_guest_enter(); @@ -699,6 +698,8 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu) kvmppc_load_guest_fp(vcpu); #endif + kvmppc_lazy_ee_enable(); + ret = __kvmppc_vcpu_run(kvm_run, vcpu); /* No need for kvm_guest_exit. It's done in handle_exit. -- 1.7.10.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4 v3] KVM: PPC: IOMMU in-kernel handling
On Wed, 2013-06-05 at 16:11 +1000, Alexey Kardashevskiy wrote: Ben, ping! :) This series has tiny fixes (capability and ioctl numbers, changed documentation, compile errors in some configuration). More details are in the commit messages. Rebased on v3.10-rc4. Alex, I assume you'll merge that once I ack it ? Cheers, Ben. Alexey Kardashevskiy (4): KVM: PPC: Add support for multiple-TCE hcalls powerpc: Prepare to support kernel handling of IOMMU map/unmap KVM: PPC: Add support for IOMMU in-kernel handling KVM: PPC: Add hugepage support for IOMMU in-kernel handling Documentation/virtual/kvm/api.txt| 45 +++ arch/powerpc/include/asm/kvm_host.h |7 + arch/powerpc/include/asm/kvm_ppc.h | 40 ++- arch/powerpc/include/asm/pgtable-ppc64.h |4 + arch/powerpc/include/uapi/asm/kvm.h |7 + arch/powerpc/kvm/book3s_64_vio.c | 398 - arch/powerpc/kvm/book3s_64_vio_hv.c | 471 -- arch/powerpc/kvm/book3s_hv.c | 39 +++ arch/powerpc/kvm/book3s_hv_rmhandlers.S |6 + arch/powerpc/kvm/book3s_pr_papr.c| 37 ++- arch/powerpc/kvm/powerpc.c | 15 + arch/powerpc/mm/init_64.c| 77 - include/uapi/linux/kvm.h |3 + 13 files changed, 1121 insertions(+), 28 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html