[PATCH] Use clockevent multiplier and shifter for decrementer
Time for which the hrtimer is started for decrementer emulation is calculated using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC reprogramming (if needed) and which calculate timebase ticks using the multiplier and shifter mechanism implemented within clockevent layer. It was observed that this conversion (timebase-time-timebase) are not correct because the mechanism are not consistent. In our setup it adds 2% jitter. With this patch clockevent multiplier and shifter mechanism are used when starting hrtimer for decrementer emulation. Now the jitter is 0.5%. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/time.h |2 ++ arch/powerpc/kernel/time.c |6 ++ arch/powerpc/kvm/emulate.c |5 +++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 7eb10fb..6d631b2 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -202,6 +202,8 @@ extern u64 mulhdu(u64, u64); extern void div128_by_32(u64 dividend_high, u64 dividend_low, unsigned divisor, struct div_result *dr); +extern void get_clockevent_mult(u64 *multi, u64 *shift); + /* Used to store Processor Utilization register (purr) values */ struct cpu_usage { diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 567dd7c..d229edd 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -910,6 +910,12 @@ static void __init init_decrementer_clockevent(void) register_decrementer_clockevent(cpu); } +void get_clockevent_mult(u64 *multi, u64 *shift) +{ + *multi = decrementer_clockevent.mult; + *shift = decrementer_clockevent.shift; +} + void secondary_cpu_time_init(void) { /* Start the decrementer on CPUs that have manual control diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index afc9154..4bfcaa1 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -76,6 +76,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) { unsigned long dec_nsec; unsigned long long dec_time; + u64 mult, shift; pr_debug(mtDEC: %x\n, vcpu-arch.dec); hrtimer_try_to_cancel(vcpu-arch.dec_timer); @@ -103,9 +104,9 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) * host ticks. */ + get_clockevent_mult(mult, shift); dec_time = vcpu-arch.dec; - dec_time *= 1000; - do_div(dec_time, tb_ticks_per_usec); + dec_time = (dec_time shift) / mult; dec_nsec = do_div(dec_time, NSEC_PER_SEC); hrtimer_start(vcpu-arch.dec_timer, ktime_set(dec_time, dec_nsec), HRTIMER_MODE_REL); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questing regarding KVM Guest PMU
On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote: On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla shashank.rachama...@gmail.com wrote: On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote: On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote: On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote: On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote: I guess things are working fine with perf. But why not with oprofile ? Looks like it. I never tried oprofile. Will try to reproduce your problem and see what oprofile is doing. I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and oprofile 0.9.6. Also, I have tried to capture kvm-events ( perf patch ) in host while running oprofile and perf in guest. Please see the attachment. I have run the tests in three cases for the around 5 secs. There are more number of MSR reads and writes in case of perf which I think is normal. However, there are very few MSR reads and writes with oprofile. Also, the number of NMI exceptions are too high in case of oprofile. Which host kernel are you using? Try latest kvm.git and check if you see something unusual in dmesg. Currenly running 3.3.0-rc5. will try with the latest source from kvm git and let you know. Thanks, there were some fixes that didn't make it into 3.3. rdpmc instruction emulation fix is one of them. If oprofile uses it this can explain the problem. I have tried with latest kvm source from git and also with 3.0 guest kernel but oprofile fails to collect any samples on guest. I am using a core2duo processor which is considered by oprofile as pentium pro model. core2duo on the host or the guest? What is your qemu command line? both. qemu command line below. sudo /usr/local/bin/qemu-system-x86_64 -drive file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net user please find more info ( /proc/cpuinfo and uname of both host and guest ) in attached files. oprofile does not work for me even on the host. After trying to use it I can see why perf was written in the first place. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming
On Mon, Apr 02, 2012 at 12:00:45PM -0700, Tejun Heo wrote: Hello, James. On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote: So if we're agreed no other devices going forwards should ever use this interface, is there any point unifying the interface? No matter how many caveats you hedge it round with, putting the API in a central place will be a bit like a honey trap for careless bears. It might be safer just to leave it buried in the three current drivers. Yeah, that was my hope but I think it would be easier to enforce to have a common function which is clearly marked legacy so that new driver writers can go look for the naming code in the existing ones, find out they're all using the same function which is marked legacy and explains what to do for newer drivers. Thanks. I think I'm not the only one to be confused about the preferred direction here. James, do you agree to the approach above? It would be nice to fix virtio block for 3.4, so how about this: - I'll just apply the original bugfix patch for 3.4 - it only affects virtio - Ren will repost the refactoring patch on top, and we can keep up the discussion Ren if you agree, can you make this a two patch series please? -- tejun ___ Virtualization mailing list virtualizat...@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[Bug 42636] PCI passthrough does not work with AMD iommu for PCI device
https://bugzilla.kernel.org/show_bug.cgi?id=42636 Alexandrov Stanislav n...@nya.ai changed: What|Removed |Added CC||n...@nya.ai --- Comment #8 from Alexandrov Stanislav n...@nya.ai 2012-04-04 08:22:51 --- I have same problems with pci passthrough, while my pci-e video card hd7750 with hda(01:00.0/1) and integrated network card(05:00.0) succsessfilly assigned in guest, when i try to add usb controller, i get same error about FLR: Unable to reset PCI device :00:12.0: no FLR, PM reset or bus reset available with xen it works fine. lspci -tv -[:00]-+-00.0 Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge (external gfx0 port B) +-00.2 Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory Management Unit (IOMMU) +-02.0-[01]--+-00.0 Advanced Micro Devices [AMD] nee ATI Device 683f |\-00.1 Advanced Micro Devices [AMD] nee ATI Device aab0 +-09.0-[02]00.0 Etron Technology, Inc. EJ168 USB 3.0 Host Controller +-0b.0-[03]--+-00.0 nVidia Corporation GF110 [GeForce GTX 560 Ti] |\-00.1 nVidia Corporation GF110 High Definition Audio Controller +-11.0 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA Controller [AHCI mode] +-12.0 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller +-12.2 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller +-13.0 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller +-13.2 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller +-14.0 Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller +-14.2 Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel HDA) +-14.3 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC host controller +-14.4-[04]-- +-14.5 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI2 Controller +-15.0-[05]00.0 Realtek Semiconductor Co., Ltd. RTL8111/8168B PCI Express Gigabit Ethernet controller +-15.1-[06]-- +-15.2-[07]-- +-15.3-[08]-- +-16.0 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB OHCI0 Controller +-16.2 Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB EHCI Controller +-18.0 Advanced Micro Devices [AMD] Family 10h Processor HyperTransport Configuration +-18.1 Advanced Micro Devices [AMD] Family 10h Processor Address Map +-18.2 Advanced Micro Devices [AMD] Family 10h Processor DRAM Controller +-18.3 Advanced Micro Devices [AMD] Family 10h Processor Miscellaneous Control \-18.4 Advanced Micro Devices [AMD] Family 10h Processor Link Control qemu-kvm-1.0, linux-3.3 motherboard is GA-990FXA-D3 -- Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are watching the assignee of the bug. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Tue, Apr 03, 2012 at 06:47:49PM +0200, Jan Kiszka wrote: On 2012-04-03 18:27, Avi Kivity wrote: On 03/29/2012 09:14 PM, Jan Kiszka wrote: Currently, MSI messages can only be injected to in-kernel irqchips by defining a corresponding IRQ route for each message. This is not only unhandy if the MSI messages are generated on the fly by user space, IRQ routes are a limited resource that user space has to manage carefully. By providing a direct injection path, we can both avoid using up limited resources and simplify the necessary steps for user land. diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 81ff39f..ed27d1b 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1482,6 +1482,27 @@ See KVM_ASSIGN_DEV_IRQ for the data structure. The target device is specified by assigned_dev_id. In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is evaluated. +4.61 KVM_SIGNAL_MSI + +Capability: KVM_CAP_SIGNAL_MSI +Architectures: x86 +Type: vm ioctl +Parameters: struct kvm_msi (in) +Returns: 0 on delivery, 0 if guest blocked the MSI, and -1 on error + +Directly inject a MSI message. Only valid with in-kernel irqchip that handles +MSI messages. + +struct kvm_msi { + __u32 address_lo; + __u32 address_hi; + __u32 data; + __u32 flags; + __u8 pad[16]; +}; + +No flags are defined so far. The corresponding field must be 0. There are two ways in which this can be generalized: struct kvm_general_irq { __u32 type; // line | MSI __u32 op; // raise/lower/trigger union { ... line; struct kvm_msi msi; } }; so we have a single ioctl for all interrupt handling. This allows eventual removal of the line-oriented ioctls. The other alternative is to have a dma interface, similar to the kvm_run mmio interface but with the kernel acting as destination. The advantage here is that we can handle dma from a device to any kernel-emulated device, not just the APIC MSI range. A downside is that we can't return values related to interrupt coalescing. Due to lacking injection feedback, I'm in favor of option 1. Will have a look. A performance note: delivering an interrupt needs to search all vcpus for an APIC ID match. The previous plan was to cache (or pre-calculate) this lookup in the irq routing table. Now it looks like we'll need a separate cache for this. As this is non-existent until today, we don't regress here. And it can still be added on top later on, transparently. I always worry about hash collisions and the cost of calculating good hash functions. We could instead return an index in the cache on injection, maintain in userspace and use it for fast path on the next injection. Will make it easy to use an array index instead of a hash here, and fallback to a slower ID lookup on mismatch. Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. (yes, I said on the call I don't anticipate objections but preparing to apply a patch always triggers more critical thinking) Well, we make progress, though slower than I was hoping. :) Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 0/4] uq/master: Basic MSI support for in-kernel irqchip mode
On Tue, Apr 03, 2012 at 07:27:36PM +0200, Jan Kiszka wrote: On 2012-04-03 15:06, Michael S. Tsirkin wrote: On Tue, Apr 03, 2012 at 09:23:12AM +0200, Jan Kiszka wrote: This is v2 of the RFC, fixing a memory leak in kvm_flush_dynamic_msi_routes and adding support for the proposed KVM_SIGNAL_MSI IOCTL. This series depends on kvm: set gsi_bits and max_gsi correctly (http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88906). Looks good to me. How hard would it be to add irqfd support? Shouldn't be, but the changes will be a bit bigger. I'm thinking about a revamped interface between the MSI core and affected devices for a while. Will try to put down in some lines of code what I have in mind - once the dynamic MSI injection topic has settled. Jan Yes it's not an objection - just a question. -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC][PATCH v2 0/4] uq/master: Basic MSI support for in-kernel irqchip mode
On Tue, Apr 03, 2012 at 09:17:30PM +0200, Jan Kiszka wrote: On 2012-04-03 09:23, Jan Kiszka wrote: This is v2 of the RFC, fixing a memory leak in kvm_flush_dynamic_msi_routes and adding support for the proposed KVM_SIGNAL_MSI IOCTL. This series depends on kvm: set gsi_bits and max_gsi correctly (http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88906). Jan Kiszka (4): kvm: Refactor KVMState::max_gsi to gsi_count kvm: Introduce basic MSI support for in-kernel irqchips KVM: x86: Wire up MSI support for in-kernel irqchip kvm: Add support for direct MSI injections hw/apic.c |3 + hw/kvm/apic.c | 33 +- hw/pc.c |5 -- kvm-all.c | 195 +++-- kvm.h |1 + 5 files changed, 225 insertions(+), 12 deletions(-) As we obviously agreed on the general direction regarding an MSI injection interface, I think patches 1-3 can lose their RFC tags and are ready for uq/master (provided there are no further review comments). Patch 4 will be reworked once the kernel interface is finalized. Jan I agree. Acked-by: Michael S. Tsirkin m...@redhat.com -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use the lower four bytes while restoring guest readable SPRGs.
While restoring the hardware copies of guest SPRG4-7 registers we must use the the lower 4 bytes of the 64 bit sotware copies maintained by KVM. Signed-off-by: Varun Sethi varun.se...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index c8c4b87..feda1bb 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -419,13 +419,13 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ - lwz r3, VCPU_SHARED_SPRG4(r5) + lwz r3, (VCPU_SHARED_SPRG4 + 4)(r5) mtspr SPRN_SPRG4W, r3 - lwz r3, VCPU_SHARED_SPRG5(r5) + lwz r3, (VCPU_SHARED_SPRG5 + 4)(r5) mtspr SPRN_SPRG5W, r3 - lwz r3, VCPU_SHARED_SPRG6(r5) + lwz r3, (VCPU_SHARED_SPRG6 + 4)(r5) mtspr SPRN_SPRG6W, r3 - lwz r3, VCPU_SHARED_SPRG7(r5) + lwz r3, (VCPU_SHARED_SPRG7 + 4)(r5) mtspr SPRN_SPRG7W, r3 #ifdef CONFIG_KVM_EXIT_TIMING -- 1.7.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 04/04/2012 11:38 AM, Michael S. Tsirkin wrote: A performance note: delivering an interrupt needs to search all vcpus for an APIC ID match. The previous plan was to cache (or pre-calculate) this lookup in the irq routing table. Now it looks like we'll need a separate cache for this. As this is non-existent until today, we don't regress here. And it can still be added on top later on, transparently. I always worry about hash collisions and the cost of calculating good hash functions. We could instead return an index in the cache on injection, maintain in userspace and use it for fast path on the next injection. Ahem, that is almost the existing routing table to a T. Will make it easy to use an array index instead of a hash here, and fallback to a slower ID lookup on mismatch. Need a free ioctl so we can reuse IDs. Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 04/03/2012 08:24 PM, Jan Kiszka wrote: A performance note: delivering an interrupt needs to search all vcpus for an APIC ID match. The previous plan was to cache (or pre-calculate) this lookup in the irq routing table. Now it looks like we'll need a separate cache for this. As this is non-existent until today, we don't regress here. And it can still be added on top later on, transparently. Yes, it's just a note, not an objection. The cache lookup will be slower than the gsi lookup (hash table vs. array) but still O(1) vs. the current O(n). If you are concerned about performance in this path, wouldn't a DMA interface for MSI injection be counterproductive? Yes, it would. The lack of coalescing reporting support is also problematic. I just mentioned this idea as food for thought. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use the lower four bytes while restoring guest readable SPRGs.
From: Varun Sethi varun.se...@freescale.com While restoring the hardware copies of guest SPRG4-7 registers we must use the the lower 4 bytes of the 64 bit sotware copies maintained by KVM. Signed-off-by: Varun Sethi varun.se...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index c8c4b87..feda1bb 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -419,13 +419,13 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ - lwz r3, VCPU_SHARED_SPRG4(r5) + lwz r3, (VCPU_SHARED_SPRG4 + 4)(r5) mtspr SPRN_SPRG4W, r3 - lwz r3, VCPU_SHARED_SPRG5(r5) + lwz r3, (VCPU_SHARED_SPRG5 + 4)(r5) mtspr SPRN_SPRG5W, r3 - lwz r3, VCPU_SHARED_SPRG6(r5) + lwz r3, (VCPU_SHARED_SPRG6 + 4)(r5) mtspr SPRN_SPRG6W, r3 - lwz r3, VCPU_SHARED_SPRG7(r5) + lwz r3, (VCPU_SHARED_SPRG7 + 4)(r5) mtspr SPRN_SPRG7W, r3 #ifdef CONFIG_KVM_EXIT_TIMING -- 1.7.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] kvm-autotest: Allow migration of multiple machines.
Change migration_multi_host test to migrate all vms on same time. https://github.com/autotest/autotest/pull/270 Signed-off-by: Jiří Župka jzu...@redhat.com --- client/tests/kvm/tests/migration_multi_host.py |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/client/tests/kvm/tests/migration_multi_host.py b/client/tests/kvm/tests/migration_multi_host.py index fe6e29a..2dcb098 100644 --- a/client/tests/kvm/tests/migration_multi_host.py +++ b/client/tests/kvm/tests/migration_multi_host.py @@ -19,8 +19,9 @@ def run_migration_multi_host(test, params, env): def migration_scenario(self): srchost = self.params.get(hosts)[0] dsthost = self.params.get(hosts)[1] +vms = params.get(vms).split() -self.migrate_wait([vm1], srchost, dsthost) +self.migrate_wait(vms, srchost, dsthost) mig = TestMultihostMigration(test, params, env) mig.run() -- 1.7.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On Wed, Apr 04, 2012 at 11:44:23AM +0300, Avi Kivity wrote: On 04/04/2012 11:38 AM, Michael S. Tsirkin wrote: A performance note: delivering an interrupt needs to search all vcpus for an APIC ID match. The previous plan was to cache (or pre-calculate) this lookup in the irq routing table. Now it looks like we'll need a separate cache for this. As this is non-existent until today, we don't regress here. And it can still be added on top later on, transparently. I always worry about hash collisions and the cost of calculating good hash functions. We could instead return an index in the cache on injection, maintain in userspace and use it for fast path on the next injection. Ahem, that is almost the existing routing table to a T. Will make it easy to use an array index instead of a hash here, and fallback to a slower ID lookup on mismatch. Need a free ioctl so we can reuse IDs. No, it could be kernel controlled not userspace controlled. We get both and address and an index: if (table[u.i].addr == u.addr table[u.i].data == u.data) { return table[u.i].id; } u.i = find_lru_idx(table); table[u.i].addr = u.addr; table[u.i].data = u.data; table[u.i].id = find_id(u.addr, u.data); return table[u.i].id; Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. Hmm true. OK scrap this idea then, it's not clear whether we are going to optimize this anyway. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH RFC] virtio-net: remove useless disable on freeze
disable_cb is just an optimization: it can not guarantee that there are no callbacks. I didn't yet figure out whether a callback in freeze will trigger a bug, but disable_cb won't address it in any case. So let's remove the useless calls as a first step. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- drivers/net/virtio_net.c |5 - 1 files changed, 0 insertions(+), 5 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 019da01..971931e5 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1182,11 +1182,6 @@ static int virtnet_freeze(struct virtio_device *vdev) { struct virtnet_info *vi = vdev-priv; - virtqueue_disable_cb(vi-rvq); - virtqueue_disable_cb(vi-svq); - if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ)) - virtqueue_disable_cb(vi-cvq); - netif_device_detach(vi-dev); cancel_delayed_work_sync(vi-refill); -- 1.7.9.111.gf3fb0 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2012-04-04 10:53, Michael S. Tsirkin wrote: On Wed, Apr 04, 2012 at 11:44:23AM +0300, Avi Kivity wrote: On 04/04/2012 11:38 AM, Michael S. Tsirkin wrote: A performance note: delivering an interrupt needs to search all vcpus for an APIC ID match. The previous plan was to cache (or pre-calculate) this lookup in the irq routing table. Now it looks like we'll need a separate cache for this. As this is non-existent until today, we don't regress here. And it can still be added on top later on, transparently. I always worry about hash collisions and the cost of calculating good hash functions. We could instead return an index in the cache on injection, maintain in userspace and use it for fast path on the next injection. Ahem, that is almost the existing routing table to a T. Will make it easy to use an array index instead of a hash here, and fallback to a slower ID lookup on mismatch. Need a free ioctl so we can reuse IDs. No, it could be kernel controlled not userspace controlled. We get both and address and an index: if (table[u.i].addr == u.addr table[u.i].data == u.data) { return table[u.i].id; } u.i = find_lru_idx(table); table[u.i].addr = u.addr; table[u.i].data = u.data; table[u.i].id = find_id(u.addr, u.data); return table[u.i].id; Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. Hmm true. OK scrap this idea then, it's not clear whether we are going to optimize this anyway. Also, the problem is that keeping that ID in userspace requires an infrastructure like the MSIRoutingCache that I proposed originally. Not much won /wrt invasiveness there. So we should really do the routing optimization in the kernel - one day. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)
On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote: Michael S. Tsirkin noticed that we could run the refill work after ndo_close, which can re-enable napi - we don't disable it until virtnet_remove. This is clearly wrong, so move the workqueue control to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close). One subtle point: virtnet_probe() could simply fail if it couldn't allocate a receive buffer, but that's less polite in virtnet_open() so we schedule a refill as we do in the normal receive path if we run out of memory. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Doh. napi_disable does not prevent the following napi_schedule, does it? Can someone confirm that I am not seeing things please? And this means this hack does not work: try_fill_recv can still run in parallel with napi, corrupting the vq. I suspect we need to resurrect a patch that used a dedicated flag to avoid this race. Comments? --- drivers/net/virtio_net.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct return err; } -/* Returns false if we couldn't fill entirely (OOM). */ +/* + * Returns false if we couldn't fill entirely (OOM). + * + * Normally run in the receive path, but can also be run from ndo_open + * before we're receiving packets, or from refill_work which is + * careful to disable receiving (using napi_disable). + */ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp) { int err; @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure we have some buffers: if oom use wq. */ + if (!try_fill_recv(vi, GFP_KERNEL)) + schedule_delayed_work(vi-refill, 0); + virtnet_napi_enable(vi); return 0; } @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure refill_work doesn't re-enable napi! */ + cancel_delayed_work_sync(vi-refill); napi_disable(vi-napi); return 0; @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d unregister: unregister_netdev(dev); - cancel_delayed_work_sync(vi-refill); free_vqs: vdev-config-del_vqs(vdev); free_stats: @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str /* Stop all the virtqueues. */ vdev-config-reset(vdev); - unregister_netdev(vi-dev); - cancel_delayed_work_sync(vi-refill); /* Free unused buffers in both send and recv, if any. */ free_unused_bufs(vi); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 04/04/2012 12:22 PM, Jan Kiszka wrote: Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. Hmm true. OK scrap this idea then, it's not clear whether we are going to optimize this anyway. Also, the problem is that keeping that ID in userspace requires an infrastructure like the MSIRoutingCache that I proposed originally. Not much won /wrt invasiveness there. Internal qemu refactorings are not a driver for kvm interface changes. So we should really do the routing optimization in the kernel - one day. No, we need to make a choice: explicit handles: array lookup, more expensive setup no handles: hash loopup, more expensive, but no setup, and no artificial limits -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2012-04-04 11:36, Avi Kivity wrote: On 04/04/2012 12:22 PM, Jan Kiszka wrote: Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. Hmm true. OK scrap this idea then, it's not clear whether we are going to optimize this anyway. Also, the problem is that keeping that ID in userspace requires an infrastructure like the MSIRoutingCache that I proposed originally. Not much won /wrt invasiveness there. Internal qemu refactorings are not a driver for kvm interface changes. No, but qemu demonstrates the applicability and handiness of the kernel interfaces. So we should really do the routing optimization in the kernel - one day. No, we need to make a choice: explicit handles: array lookup, more expensive setup no handles: hash loopup, more expensive, but no setup, and no artificial limits ...and I think we should head for option 2. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)
On Wed, Apr 04, 2012 at 12:32:29PM +0300, Michael S. Tsirkin wrote: On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote: Michael S. Tsirkin noticed that we could run the refill work after ndo_close, which can re-enable napi - we don't disable it until virtnet_remove. This is clearly wrong, so move the workqueue control to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close). One subtle point: virtnet_probe() could simply fail if it couldn't allocate a receive buffer, but that's less polite in virtnet_open() so we schedule a refill as we do in the normal receive path if we run out of memory. Signed-off-by: Rusty Russell ru...@rustcorp.com.au Doh. napi_disable does not prevent the following napi_schedule, does it? Can someone confirm that I am not seeing things please? Yes, I *was* seeing things. After napi_disable, NAPI_STATE_SCHED is set to napi_schedule does nothing. Sorry about the noise. And this means this hack does not work: try_fill_recv can still run in parallel with napi, corrupting the vq. I suspect we need to resurrect a patch that used a dedicated flag to avoid this race. Comments? --- drivers/net/virtio_net.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct return err; } -/* Returns false if we couldn't fill entirely (OOM). */ +/* + * Returns false if we couldn't fill entirely (OOM). + * + * Normally run in the receive path, but can also be run from ndo_open + * before we're receiving packets, or from refill_work which is + * careful to disable receiving (using napi_disable). + */ static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp) { int err; @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure we have some buffers: if oom use wq. */ + if (!try_fill_recv(vi, GFP_KERNEL)) + schedule_delayed_work(vi-refill, 0); + virtnet_napi_enable(vi); return 0; } @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi { struct virtnet_info *vi = netdev_priv(dev); + /* Make sure refill_work doesn't re-enable napi! */ + cancel_delayed_work_sync(vi-refill); napi_disable(vi-napi); return 0; @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d unregister: unregister_netdev(dev); - cancel_delayed_work_sync(vi-refill); free_vqs: vdev-config-del_vqs(vdev); free_stats: @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str /* Stop all the virtqueues. */ vdev-config-reset(vdev); - unregister_netdev(vi-dev); - cancel_delayed_work_sync(vi-refill); /* Free unused buffers in both send and recv, if any. */ free_unused_bufs(vi); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 04/04/2012 12:38 PM, Jan Kiszka wrote: On 2012-04-04 11:36, Avi Kivity wrote: On 04/04/2012 12:22 PM, Jan Kiszka wrote: Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. Hmm true. OK scrap this idea then, it's not clear whether we are going to optimize this anyway. Also, the problem is that keeping that ID in userspace requires an infrastructure like the MSIRoutingCache that I proposed originally. Not much won /wrt invasiveness there. Internal qemu refactorings are not a driver for kvm interface changes. No, but qemu demonstrates the applicability and handiness of the kernel interfaces. True. So we should really do the routing optimization in the kernel - one day. No, we need to make a choice: explicit handles: array lookup, more expensive setup no handles: hash loopup, more expensive, but no setup, and no artificial limits ...and I think we should head for option 2. I'm not so sure anymore. Sorry about the U turn, but remind me why? In the long term it will be slower. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questing regarding KVM Guest PMU
On Wed, Apr 4, 2012 at 12:34 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote: On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla shashank.rachama...@gmail.com wrote: On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote: On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote: On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote: On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote: I guess things are working fine with perf. But why not with oprofile ? Looks like it. I never tried oprofile. Will try to reproduce your problem and see what oprofile is doing. I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and oprofile 0.9.6. Also, I have tried to capture kvm-events ( perf patch ) in host while running oprofile and perf in guest. Please see the attachment. I have run the tests in three cases for the around 5 secs. There are more number of MSR reads and writes in case of perf which I think is normal. However, there are very few MSR reads and writes with oprofile. Also, the number of NMI exceptions are too high in case of oprofile. Which host kernel are you using? Try latest kvm.git and check if you see something unusual in dmesg. Currenly running 3.3.0-rc5. will try with the latest source from kvm git and let you know. Thanks, there were some fixes that didn't make it into 3.3. rdpmc instruction emulation fix is one of them. If oprofile uses it this can explain the problem. I have tried with latest kvm source from git and also with 3.0 guest kernel but oprofile fails to collect any samples on guest. I am using a core2duo processor which is considered by oprofile as pentium pro model. core2duo on the host or the guest? What is your qemu command line? both. qemu command line below. sudo /usr/local/bin/qemu-system-x86_64 -drive file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net user please find more info ( /proc/cpuinfo and uname of both host and guest ) in attached files. oprofile does not work for me even on the host. After trying to use it I can see why perf was written in the first place. ok. seems to be. will move over to perf as its working fine inside guest. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questing regarding KVM Guest PMU
On Wed, Apr 04, 2012 at 03:49:42PM +0530, shashank rachamalla wrote: tatus: RO Content-Length: 2989 Lines: 79 On Wed, Apr 4, 2012 at 12:34 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote: On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla shashank.rachama...@gmail.com wrote: On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote: On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote: On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote: On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote: I guess things are working fine with perf. But why not with oprofile ? Looks like it. I never tried oprofile. Will try to reproduce your problem and see what oprofile is doing. I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and oprofile 0.9.6. Also, I have tried to capture kvm-events ( perf patch ) in host while running oprofile and perf in guest. Please see the attachment. I have run the tests in three cases for the around 5 secs. There are more number of MSR reads and writes in case of perf which I think is normal. However, there are very few MSR reads and writes with oprofile. Also, the number of NMI exceptions are too high in case of oprofile. Which host kernel are you using? Try latest kvm.git and check if you see something unusual in dmesg. Currenly running 3.3.0-rc5. will try with the latest source from kvm git and let you know. Thanks, there were some fixes that didn't make it into 3.3. rdpmc instruction emulation fix is one of them. If oprofile uses it this can explain the problem. I have tried with latest kvm source from git and also with 3.0 guest kernel but oprofile fails to collect any samples on guest. I am using a core2duo processor which is considered by oprofile as pentium pro model. core2duo on the host or the guest? What is your qemu command line? both. qemu command line below. sudo /usr/local/bin/qemu-system-x86_64 -drive file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net user please find more info ( /proc/cpuinfo and uname of both host and guest ) in attached files. oprofile does not work for me even on the host. After trying to use it I can see why perf was written in the first place. ok. seems to be. will move over to perf as its working fine inside guest. Good riddance IMO. I managed to run it on a guest (but not on my host!). The thing is buggy. It does not use global ctrl MSR to enable counters and kvm has all of them disabled by default. I didn't find what value this MSR should have after reset, so this may be either kvm bug or real BIOSes enable all counters in global ctrl MSR for PMUv1 compatibility. Doing wrmsr 0x38f 0x7000f solves this problem. The second problem is that oprofile reprogram PMU counters without disabling them first and this is explicitly prohibited by Intel SDM. The patch below solve that, but oprofile is the one who should be fixed. diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index a73f0c1..be05028 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -396,6 +396,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data) (pmc = get_fixed_pmc(pmu, index))) { data = (s64)(s32)data; pmc-counter += data - read_pmc(pmc); + reprogram_gp_counter(pmc, pmc-eventsel); return 0; } else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) { if (data == pmc-eventsel) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: bookehv: Fix save/restore of guest accessible SPRGs.
For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and non-64 bit case. The registers are maintained as 64 bit copies by KVM. While saving/restoring for the non-64 bit case we should always take the lower 4 bytes. Signed-off-by: Varun Sethi varun.se...@freescale.com --- arch/powerpc/kvm/bookehv_interrupts.S | 48 +++- 1 files changed, 40 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 909e96e..c1c0bae 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -320,13 +320,29 @@ _GLOBAL(kvmppc_resume_host) PPC_STL r5, VCPU_LR(r4) mfspr r7, SPRN_SPRG5 PPC_STL r3, VCPU_VRSAVE(r4) - PPC_STL r6, VCPU_SHARED_SPRG4(r11) +#ifdef CONFIG_64BIT + std r6, VCPU_SHARED_SPRG4(r11) +#else + stw r6, (VCPU_SHARED_SPRG4 + 4)(r11) +#endif mfspr r8, SPRN_SPRG6 - PPC_STL r7, VCPU_SHARED_SPRG5(r11) +#ifdef CONFIG_64BIT + std r7, VCPU_SHARED_SPRG5(r11) +#else + stw r7, (VCPU_SHARED_SPRG5 + 4)(r11) +#endif mfspr r9, SPRN_SPRG7 - PPC_STL r8, VCPU_SHARED_SPRG6(r11) +#ifdef CONFIG_64BIT + std r8, VCPU_SHARED_SPRG6(r11) +#else + stw r8, (VCPU_SHARED_SPRG6 + 4)(r11) +#endif mfxer r3 - PPC_STL r9, VCPU_SHARED_SPRG7(r11) +#ifdef CONFIG_64BIT + std r9, VCPU_SHARED_SPRG7(r11) +#else + stw r9, (VCPU_SHARED_SPRG7 + 4)(r11) +#endif /* save guest MAS registers and restore host mas4 mas6 */ mfspr r5, SPRN_MAS0 @@ -549,13 +565,29 @@ lightweight_exit: * SPRGs, so we need to reload them here with the guest's values. */ lwz r3, VCPU_VRSAVE(r4) - lwz r5, VCPU_SHARED_SPRG4(r11) +#ifdef CONFIG_64BIT + ld r5, VCPU_SHARED_SPRG4(r11) +#else + lwz r5, (VCPU_SHARED_SPRG4 + 4)(r11) +#endif mtspr SPRN_VRSAVE, r3 - lwz r6, VCPU_SHARED_SPRG5(r11) +#ifdef CONFIG_64BIT + ld r6, VCPU_SHARED_SPRG5(r11) +#else + lwz r6, (VCPU_SHARED_SPRG5 + 4)(r11) +#endif mtspr SPRN_SPRG4W, r5 - lwz r7, VCPU_SHARED_SPRG6(r11) +#ifdef CONFIG_64BIT + ld r7, VCPU_SHARED_SPRG6(r11) +#else + lwz r7, (VCPU_SHARED_SPRG6 + 4)(r11) +#endif mtspr SPRN_SPRG5W, r6 - lwz r8, VCPU_SHARED_SPRG7(r11) +#ifdef CONFIG_64BIT + ld r8, VCPU_SHARED_SPRG7(r11) +#else + lwz r8, (VCPU_SHARED_SPRG7 + 4)(r11) +#endif mtspr SPRN_SPRG6W, r7 mtspr SPRN_SPRG7W, r8 -- 1.7.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2012-04-04 11:55, Avi Kivity wrote: On 04/04/2012 12:38 PM, Jan Kiszka wrote: On 2012-04-04 11:36, Avi Kivity wrote: On 04/04/2012 12:22 PM, Jan Kiszka wrote: Until we do have this fast path we can just fill this value with zeros, so kernel patch (almost) does not need to change for this - just the header. Partially implemented interfaces invite breakage. Hmm true. OK scrap this idea then, it's not clear whether we are going to optimize this anyway. Also, the problem is that keeping that ID in userspace requires an infrastructure like the MSIRoutingCache that I proposed originally. Not much won /wrt invasiveness there. Internal qemu refactorings are not a driver for kvm interface changes. No, but qemu demonstrates the applicability and handiness of the kernel interfaces. True. So we should really do the routing optimization in the kernel - one day. No, we need to make a choice: explicit handles: array lookup, more expensive setup no handles: hash loopup, more expensive, but no setup, and no artificial limits ...and I think we should head for option 2. I'm not so sure anymore. Sorry about the U turn, but remind me why? In the long term it will be slower. Likely not measurably slower. If you look at a message through the arch glasses, you can usually spot the destination directly, specifically if a message targets a single processor - no need for hashing and table lookups in the common case. In contrast, the maintenance costs for the current explicit route based model are significant as we see now. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On 04/04/2012 01:37 PM, Michael Roth wrote: On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. It may be worthwhile as an incremental step though, one nice thing about automatically generated bindings is that with the QIDL Anthony prototyped a while back we assume we serialize by default, so changes in annotated structs automatically trigger changes in the generated bindings unless you explicitly mark fields as immutable/derivable/etc, which we can tie into the build or make check to automatically detect and bring attention to changes in vmstate. This may be worth the effort if we adopt the proposed 4 year migration support cycle for pc-1.0, since that'll continue to rely on vmstate even after we move on to an IDL and newer protocol. Beyond ASL/IDL I like to be sure that we're not just translating one format to other representation but instead we introduce some new functionality like: - Ability to negotiate the protocol version - Bi-direction data exchange, the sender will send data as a function of the target release - Include the machine type too - Synchronize the entire qemu cmdline and don't relay on management to set it up. - Along the way, deal w/ hotplug events. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On 04/03/2012 03:43 PM, Dor Laor wrote: On 04/03/2012 05:43 PM, Markus Armbruster wrote: I'm afraid my notes are rather rough... * 1.1 soft freeze apr 15th (less than two weeks) hard freeze may 1 three months cycle for 1.2 stable machine types only every few releases? pc-next * Maintainers, got distracted and my notes make no sense, sorry * MSI injection to KVM irqchips from userspace devices models * qemu-kvm tree: working towards upstream merge not much left, mostly device assignment * Migration: vmstate and visitors, decoupling the wire format why not ASN.1 Curiosity kills me of waiting for next week's meeting to get the answer ASN.1 is an IDL format. It's encoded in many ways including BER. I think there's wide spread agreement that the next migration wire format should be encoded with BER which means it could be described via ASN.1 but I don't think we intend on using ASN.1 within the code base. I don't think using ASN.1 to describe devices makes sense. There really aren't very good Open Source ASN.1 compilers. I also don't think the syntax is flexible enough to fully describe a device either. Regards, Anthony Liguori * qtest: test cases wanted -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 04/04/2012 01:48 PM, Jan Kiszka wrote: I'm not so sure anymore. Sorry about the U turn, but remind me why? In the long term it will be slower. Likely not measurably slower. If you look at a message through the arch glasses, you can usually spot the destination directly, specifically if a message targets a single processor - no need for hashing and table lookups in the common case. Not on x86. The APIC ID is guest-provided. In x2apic mode it can be quite large. In contrast, the maintenance costs for the current explicit route based model are significant as we see now. You mean in amount of code in userspace? That doesn't get solved since we need to keep compatibility. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On 04/04/2012 05:53 AM, Dor Laor wrote: On 04/04/2012 01:37 PM, Michael Roth wrote: On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. It may be worthwhile as an incremental step though, one nice thing about automatically generated bindings is that with the QIDL Anthony prototyped a while back we assume we serialize by default, so changes in annotated structs automatically trigger changes in the generated bindings unless you explicitly mark fields as immutable/derivable/etc, which we can tie into the build or make check to automatically detect and bring attention to changes in vmstate. This may be worth the effort if we adopt the proposed 4 year migration support cycle for pc-1.0, since that'll continue to rely on vmstate even after we move on to an IDL and newer protocol. Beyond ASL/IDL I like to be sure that we're not just translating one format to other representation but instead we introduce some new functionality like: - Ability to negotiate the protocol version Ack. - Bi-direction data exchange, the sender will send data as a function of the target release The reason bi-direction data exchange doesn't exist is because it would add latency to the critical path. I think we should avoid bi-directional data exchange unless there's an extremely compelling reason to do so. - Include the machine type too - Synchronize the entire qemu cmdline and don't relay on management to set it up. - Along the way, deal w/ hotplug events. This will be address via QOM. As we convert backends and machine types, we should be able to dump out the full configuration and send it over the wire. Regards, Anthony Liguori Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips
On 2012-04-04 13:50, Avi Kivity wrote: On 04/04/2012 01:48 PM, Jan Kiszka wrote: I'm not so sure anymore. Sorry about the U turn, but remind me why? In the long term it will be slower. Likely not measurably slower. If you look at a message through the arch glasses, you can usually spot the destination directly, specifically if a message targets a single processor - no need for hashing and table lookups in the common case. Not on x86. The APIC ID is guest-provided. ...but is still a rather stable mapping on the physical ID. In x2apic mode it can be quite large. Yes, but then you can at least hash/search/cache inside that group only, with a smaller scope. In contrast, the maintenance costs for the current explicit route based model are significant as we see now. You mean in amount of code in userspace? That doesn't get solved since we need to keep compatibility. We do not need to track MSI origins to correlate them with routes (with the exception of 3 special devices: vhost-based virtio, kvm device assignment, and vfio device assignment). We emulate this centrally with a hand full of LOC in the kvm layer, and we bypass it with the advent of a direct injection API. Compare this to my original series that introduced MSIRoutingCaches to cope with the current kernel API. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On 04/04/2012 02:52 PM, Anthony Liguori wrote: On 04/04/2012 05:53 AM, Dor Laor wrote: On 04/04/2012 01:37 PM, Michael Roth wrote: On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. It may be worthwhile as an incremental step though, one nice thing about automatically generated bindings is that with the QIDL Anthony prototyped a while back we assume we serialize by default, so changes in annotated structs automatically trigger changes in the generated bindings unless you explicitly mark fields as immutable/derivable/etc, which we can tie into the build or make check to automatically detect and bring attention to changes in vmstate. This may be worth the effort if we adopt the proposed 4 year migration support cycle for pc-1.0, since that'll continue to rely on vmstate even after we move on to an IDL and newer protocol. Beyond ASL/IDL I like to be sure that we're not just translating one format to other representation but instead we introduce some new functionality like: - Ability to negotiate the protocol version Ack. - Bi-direction data exchange, the sender will send data as a function of the target release The reason bi-direction data exchange doesn't exist is because it would add latency to the critical path. I think we should avoid bi-directional Not necessarily, there is not need to do the exchange on the down time, you can do it ahead of time during the initial connection and few additional msec or even a second won't change much. data exchange unless there's an extremely compelling reason to do so. The key advantage is that you'll be able to migrate to an old qemu that may not be compatible w/ the standard protocol and the source will be able to discover this and adjust. At the moment I don't have anything more concrete than that but I think that's happen in the past and will continue to happen and we can add the required hook into the protocol. - Include the machine type too - Synchronize the entire qemu cmdline and don't relay on management to set it up. - Along the way, deal w/ hotplug events. This will be address via QOM. As we convert backends and machine types, we should be able to dump out the full configuration and send it over the wire. Regards, Anthony Liguori Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On Wed, Apr 04, 2012 at 01:53:34PM +0300, Dor Laor wrote: On 04/04/2012 01:37 PM, Michael Roth wrote: On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. It may be worthwhile as an incremental step though, one nice thing about automatically generated bindings is that with the QIDL Anthony prototyped a while back we assume we serialize by default, so changes in annotated structs automatically trigger changes in the generated bindings unless you explicitly mark fields as immutable/derivable/etc, which we can tie into the build or make check to automatically detect and bring attention to changes in vmstate. This may be worth the effort if we adopt the proposed 4 year migration support cycle for pc-1.0, since that'll continue to rely on vmstate even after we move on to an IDL and newer protocol. Beyond ASL/IDL I like to be sure that we're not just translating one format to other representation but instead we introduce some new functionality like: - Ability to negotiate the protocol version - Bi-direction data exchange, the sender will send data as a function of the target release - Include the machine type too I've been toying with the notion of having the target start up a QMP limited server that the source talks to to orchestrate negotiation. We could potentially even send the device state by taking our QIDL-generated visitors and serializing state via QmpOutputVisitor. QMP can be made aware of the format of the device state input by taking the intermediate step of generating QAPI schemas via QIDL, and using the QAPI code generators to generate the visitors rather than QIDL directly. This would also address the protocol side: just use QMP rather than ASN.1.. It's not as compact, but device state is such a small amount of data compared to memory/disk that I don't think it's worth optimizing that aspect, though we could use compression at the protocol layer if we were inclined. Anything more suited to an out-of-band protocol, like memory/disk, could be orchestrated via this interface... source can ask target for a port to handle such things, negotiate stuff like XBZRLE, etc. - Synchronize the entire qemu cmdline and don't relay on management to set it up. - Along the way, deal w/ hotplug events. My initial plan for the QIDL-generated visitors is to associate a QOM property, state, with each device, and to serialize device state by walking the QOM composition tree, the main rationale being that if we extend that serialization to include other QOM properties, I believe we have everything we need to recreate all the devices on the target: parent-child relationships, types, properties set via cmdline, device state... A simpler alternative would be to leverage just send the cmdline options over to the target and assume that it results in the same underlying machine, then just send off the device state. Much simpler actually...but the above approach should work regardless of changes to the command-line options on the source... having an internally stable cmdline scheme might work as well... I'm not sure what the right approach is here but whatever we decide on I think being able to automatically generate visitors from annotations is a good first-step and should tie into any forseeable approaches. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: RCU warning in async pf
On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote: On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote: Hi all, I got the spew at the bottom of the mail in a KVM guest using the KVM tools and running trinity. I'm not quite sure how default_idle managed to trigger a pagefault, so that part looks odd to me. This is not regular page fault. This is async page fault that tells the guest that a page, previously swapped out by hypervisor, is now swapped back in and it can happen while vcpu is idle. The code does not leave idle state properly though. We probably need to call rcu_irq_enter() there. Will look into it. The patch below solves it for me: Page ready async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index f0c6fd6..380079f 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -38,6 +38,7 @@ #include asm/traps.h #include asm/desc.h #include asm/tlbflush.h +#include asm/idle.h static int kvmapf = 1; @@ -253,7 +254,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code) kvm_async_pf_task_wait((u32)read_cr2()); break; case KVM_PV_REASON_PAGE_READY: + rcu_irq_enter(); + exit_idle(); kvm_async_pf_task_wake((u32)read_cr2()); + rcu_irq_exit(); break; } } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] Improve iteration through sptes from rmap
On Wed, 21 Mar 2012 23:48:23 +0900 Takuya Yoshikawa takuya.yoshik...@gmail.com wrote: By removing sptep from rmap_iterator, I could achieve 15% performance improvement without inlining. Takuya Yoshikawa (3): KVM: MMU: Make pte_list_desc fit cache lines well KVM: MMU: Improve iteration through sptes from rmap ping Takuya -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On 04/04/2012 02:14 PM, Michael Roth wrote: On Wed, Apr 04, 2012 at 01:53:34PM +0300, Dor Laor wrote: On 04/04/2012 01:37 PM, Michael Roth wrote: On Apr 4, 2012 2:42 AM, Paolo Bonzinipbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. It may be worthwhile as an incremental step though, one nice thing about automatically generated bindings is that with the QIDL Anthony prototyped a while back we assume we serialize by default, so changes in annotated structs automatically trigger changes in the generated bindings unless you explicitly mark fields as immutable/derivable/etc, which we can tie into the build or make check to automatically detect and bring attention to changes in vmstate. This may be worth the effort if we adopt the proposed 4 year migration support cycle for pc-1.0, since that'll continue to rely on vmstate even after we move on to an IDL and newer protocol. Beyond ASL/IDL I like to be sure that we're not just translating one format to other representation but instead we introduce some new functionality like: - Ability to negotiate the protocol version - Bi-direction data exchange, the sender will send data as a function of the target release - Include the machine type too I've been toying with the notion of having the target start up a QMP limited server that the source talks to to orchestrate negotiation. We could potentially even send the device state by taking our QIDL-generated visitors and serializing state via QmpOutputVisitor. QMP can be made aware of the format of the device state input by taking the intermediate step of generating QAPI schemas via QIDL, and using the QAPI code generators to generate the visitors rather than QIDL directly. This would also address the protocol side: just use QMP rather than ASN.1.. It's not as compact, but device state is such a small amount of data compared to memory/disk that I don't think it's worth optimizing that aspect, though we could use compression at the protocol layer if we were inclined. Anything more suited to an out-of-band protocol, like memory/disk, could be orchestrated via this interface... source can ask target for a port to handle such things, negotiate stuff like XBZRLE, etc. - Synchronize the entire qemu cmdline and don't relay on management to set it up. - Along the way, deal w/ hotplug events. My initial plan for the QIDL-generated visitors is to associate a QOM property, state, with each device, and to serialize device state by walking the QOM composition tree, the main rationale being that if we extend that serialization to include other QOM properties, I believe we have everything we need to recreate all the devices on the target: parent-child relationships, types, properties set via cmdline, device state... A simpler alternative would be to leverage just send the cmdline options over to the target and assume that it results in the same underlying machine, then just send off the device state. Much simpler actually...but the above approach should work regardless of changes to the command-line options on the source... having an internally stable cmdline scheme might work as well... Will command line take in account hot-plugged devices? I'm not sure what the right approach is here but whatever we decide on I think being able to automatically generate visitors from annotations is a good first-step and should tie into any forseeable approaches. Paolo -- - Igor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: RCU warning in async pf
On Wed, Apr 04, 2012 at 03:30:33PM +0300, Gleb Natapov wrote: On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote: On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote: Hi all, I got the spew at the bottom of the mail in a KVM guest using the KVM tools and running trinity. I'm not quite sure how default_idle managed to trigger a pagefault, so that part looks odd to me. This is not regular page fault. This is async page fault that tells the guest that a page, previously swapped out by hypervisor, is now swapped back in and it can happen while vcpu is idle. The code does not leave idle state properly though. We probably need to call rcu_irq_enter() there. Will look into it. The patch below solves it for me: Page ready async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. This is invoked from an exception or interrupt handler, not from process-level code? If so: Reviewed-by: Paul E. McKenney paul...@linux.vnet.ibm.com Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index f0c6fd6..380079f 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -38,6 +38,7 @@ #include asm/traps.h #include asm/desc.h #include asm/tlbflush.h +#include asm/idle.h static int kvmapf = 1; @@ -253,7 +254,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code) kvm_async_pf_task_wait((u32)read_cr2()); break; case KVM_PV_REASON_PAGE_READY: + rcu_irq_enter(); + exit_idle(); kvm_async_pf_task_wake((u32)read_cr2()); + rcu_irq_exit(); break; } } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: RCU warning in async pf
On Wed, Apr 04, 2012 at 07:04:16AM -0700, Paul E. McKenney wrote: On Wed, Apr 04, 2012 at 03:30:33PM +0300, Gleb Natapov wrote: On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote: On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote: Hi all, I got the spew at the bottom of the mail in a KVM guest using the KVM tools and running trinity. I'm not quite sure how default_idle managed to trigger a pagefault, so that part looks odd to me. This is not regular page fault. This is async page fault that tells the guest that a page, previously swapped out by hypervisor, is now swapped back in and it can happen while vcpu is idle. The code does not leave idle state properly though. We probably need to call rcu_irq_enter() there. Will look into it. The patch below solves it for me: Page ready async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. This is invoked from an exception or interrupt handler, not from process-level code? If so: From an exception. Reviewed-by: Paul E. McKenney paul...@linux.vnet.ibm.com Signed-off-by: Gleb Natapov g...@redhat.com diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c index f0c6fd6..380079f 100644 --- a/arch/x86/kernel/kvm.c +++ b/arch/x86/kernel/kvm.c @@ -38,6 +38,7 @@ #include asm/traps.h #include asm/desc.h #include asm/tlbflush.h +#include asm/idle.h static int kvmapf = 1; @@ -253,7 +254,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long error_code) kvm_async_pf_task_wait((u32)read_cr2()); break; case KVM_PV_REASON_PAGE_READY: + rcu_irq_enter(); + exit_idle(); kvm_async_pf_task_wake((u32)read_cr2()); + rcu_irq_exit(); break; } } -- Gleb. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] KVM call minutes April 3
On Wed, Apr 04, 2012 at 03:21:26PM +0200, Igor Mammedov wrote: On 04/04/2012 02:14 PM, Michael Roth wrote: On Wed, Apr 04, 2012 at 01:53:34PM +0300, Dor Laor wrote: On 04/04/2012 01:37 PM, Michael Roth wrote: On Apr 4, 2012 2:42 AM, Paolo Bonzinipbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 04/04/2012 03:18, Michael Roth ha scritto: Attacking the IDL/schema side first is the more rationale approach. From there we can potentially generate ASN.1 BER/DER visitors for the protocol side, or potentially even just vmstate bindings as a start. I've recently started looking into the latter... it's completely feasible, the only downside is it complicates the IDL due requiring support for a lot of what are very much vmstate-specific items, but it should be possible to do this in a manner where those annotations are self-contained and ignorable if we opted to replace vmstate-style declarations. We can also keep the current vmstate descriptions, but access fields from the automatically-generated visitors instead of struct fields. This keeps the IDL simple. It may be worthwhile as an incremental step though, one nice thing about automatically generated bindings is that with the QIDL Anthony prototyped a while back we assume we serialize by default, so changes in annotated structs automatically trigger changes in the generated bindings unless you explicitly mark fields as immutable/derivable/etc, which we can tie into the build or make check to automatically detect and bring attention to changes in vmstate. This may be worth the effort if we adopt the proposed 4 year migration support cycle for pc-1.0, since that'll continue to rely on vmstate even after we move on to an IDL and newer protocol. Beyond ASL/IDL I like to be sure that we're not just translating one format to other representation but instead we introduce some new functionality like: - Ability to negotiate the protocol version - Bi-direction data exchange, the sender will send data as a function of the target release - Include the machine type too I've been toying with the notion of having the target start up a QMP limited server that the source talks to to orchestrate negotiation. We could potentially even send the device state by taking our QIDL-generated visitors and serializing state via QmpOutputVisitor. QMP can be made aware of the format of the device state input by taking the intermediate step of generating QAPI schemas via QIDL, and using the QAPI code generators to generate the visitors rather than QIDL directly. This would also address the protocol side: just use QMP rather than ASN.1.. It's not as compact, but device state is such a small amount of data compared to memory/disk that I don't think it's worth optimizing that aspect, though we could use compression at the protocol layer if we were inclined. Anything more suited to an out-of-band protocol, like memory/disk, could be orchestrated via this interface... source can ask target for a port to handle such things, negotiate stuff like XBZRLE, etc. - Synchronize the entire qemu cmdline and don't relay on management to set it up. - Along the way, deal w/ hotplug events. My initial plan for the QIDL-generated visitors is to associate a QOM property, state, with each device, and to serialize device state by walking the QOM composition tree, the main rationale being that if we extend that serialization to include other QOM properties, I believe we have everything we need to recreate all the devices on the target: parent-child relationships, types, properties set via cmdline, device state... A simpler alternative would be to leverage just send the cmdline options over to the target and assume that it results in the same underlying machine, then just send off the device state. Much simpler actually...but the above approach should work regardless of changes to the command-line options on the source... having an internally stable cmdline scheme might work as well... Will command line take in account hot-plugged devices? No, that's a good point. We'd probably need to generate the options required to ensure the devices are created on the target, and we'd only be able to do that just before sending the device state. That means we need a way to create machines after we've completed tasks like memory migration, which probably has similar requirements to just being able to instantiate a machine from a serialized QOM composition tree. I'm not sure what the right approach is here but whatever we decide on I think being able to automatically generate visitors from annotations is a good first-step and should tie into any forseeable approaches. Paolo -- - Igor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix warning
Hi Bjorn, Did you have chance to look at this one? Regards, Tadeusz On 01/03/12 17:18, tadeusz.st...@intel.com wrote: From: Tadeusz Struk tadeusz.st...@intel.com Date: Mon, 14 Feb 2011 14:38:18 + Subject: [PATCH] Fixed warning This patch fixes the following warning. # virsh start fedora16-64 kernel: [ 133.324565] pci-stub :02:01.1: claimed by stub kernel: [ 134.163769] pci-stub :02:01.1: enabling device ( - 0002) kernel: [ 164.282679] [ cut here ] kernel: [ 164.282685] WARNING: at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0x87/0x9f() kernel: [ 164.282687] Hardware name: SandyBridge Platform kernel: [ 164.282689] Modules linked in: sha512_generic sha256_generic icp_qa_al(O) nfs fscache auth_rpcgss nfs_acl mga drm ip6table_filter ip6_tables ebtable_nat ebtables lockd ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc sunrpc btrfs zlib_deflate libcrc32c virtio_net kvm_intel kvm uinput matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc e1000e iTCO_wdt iTCO_vendor_support igb microcode i2c_i801 shpchp serio_raw i2c_core pcspkr dca [last unloaded: scsi_wait_scan] kernel: [ 164.282724] Pid: 1233, comm: qemu-kvm Tainted: G O 3.2.5 #10 kernel: [ 164.282726] Call Trace: kernel: [ 164.282732] [8104eac6] warn_slowpath_common+0x83/0x9b kernel: [ 164.282735] [8104eaf8] warn_slowpath_null+0x1a/0x1c kernel: [ 164.282737] [8123e028] pci_find_upstream_pcie_bridge+0x87/0x9f kernel: [ 164.282741] [813bf5eb] domain_context_mapping+0x50/0xe6 kernel: [ 164.282744] [813bf6c5] domain_add_dev_info+0x44/0xe3 kernel: [ 164.282747] [813bfcda] intel_iommu_attach_device+0x14f/0x15c kernel: [ 164.282750] [813bb48b] iommu_attach_device+0x1c/0x1e kernel: [ 164.282764] [a00f43aa] kvm_assign_device+0x4a/0x114 [kvm] kernel: [ 164.282773] [a00f3963] kvm_vm_ioctl_assigned_device+0x434/0xb25 [kvm] kernel: [ 164.282777] [810f0fee] ? __do_fault+0x351/0x38b kernel: [ 164.282781] [8107c05b] ? arch_local_irq_save+0x15/0x1b kernel: [ 164.282784] [814b26e4] ? _raw_spin_unlock_irqrestore+0x17/0x19 kernel: [ 164.282787] [813c0f35] ? pci_conf1_read+0xe1/0xee kernel: [ 164.282794] [a00f07df] kvm_vm_ioctl+0x377/0x3ac [kvm] kernel: [ 164.282797] [8123eb7c] ? pci_read_config+0xa2/0x1bd kernel: [ 164.282801] [8110edc2] ? virt_to_head_page+0xe/0x31 kernel: [ 164.282804] [8112f210] do_vfs_ioctl+0x45d/0x49e kernel: [ 164.282808] [811208da] ? fsnotify_access+0x5f/0x67 kernel: [ 164.282811] [8112f2a7] sys_ioctl+0x56/0x7b kernel: [ 164.282814] [814b8e42] system_call_fastpath+0x16/0x1b kernel: [ 164.282816] ---[ end trace 6a834ec5ac21cba8 ]--- Signed-off-by: Tadeusz Struk tadeusz.st...@intel.com --- drivers/pci/search.c |7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/search.c b/drivers/pci/search.c index 9d75dc8..7847c6b 100644 --- a/drivers/pci/search.c +++ b/drivers/pci/search.c @@ -26,6 +26,7 @@ struct pci_dev * pci_find_upstream_pcie_bridge(struct pci_dev *pdev) { struct pci_dev *tmp = NULL; + struct pci_dev *vf = pdev; if (pci_is_pcie(pdev)) return NULL; @@ -40,8 +41,10 @@ pci_find_upstream_pcie_bridge(struct pci_dev *pdev) } /* PCI device should connect to a PCIe bridge */ if (pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) { - /* Busted hardware? */ - WARN_ON_ONCE(1); + if (!vf-is_virtfn) { + /* Busted hardware? */ + WARN_ON_ONCE(1); + } return NULL; } return pdev; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm: RCU warning in async pf
On Wed, Apr 4, 2012 at 2:30 PM, Gleb Natapov g...@redhat.com wrote: On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote: On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote: Hi all, I got the spew at the bottom of the mail in a KVM guest using the KVM tools and running trinity. I'm not quite sure how default_idle managed to trigger a pagefault, so that part looks odd to me. This is not regular page fault. This is async page fault that tells the guest that a page, previously swapped out by hypervisor, is now swapped back in and it can happen while vcpu is idle. The code does not leave idle state properly though. We probably need to call rcu_irq_enter() there. Will look into it. The patch below solves it for me: Page ready async PF can kick vcpu out of idle state much like IRQ. We need to tell RCU about this. Looks good here. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Fix warning
On Wed, Apr 4, 2012 at 9:30 AM, Tadeusz Struk tadeusz.st...@intel.com wrote: Hi Bjorn, Did you have chance to look at this one? Yep. It needs a changelog. Fixed warning is inadequate. It needs an explanation of what the VF connection is. It would also be nice if you fixed the function comment, which is unintelligible, incorrect (a root bus need not be bus 0), and doesn't match what the function does. On 01/03/12 17:18, tadeusz.st...@intel.com wrote: From: Tadeusz Struk tadeusz.st...@intel.com Date: Mon, 14 Feb 2011 14:38:18 + Subject: [PATCH] Fixed warning This patch fixes the following warning. # virsh start fedora16-64 kernel: [ 133.324565] pci-stub :02:01.1: claimed by stub kernel: [ 134.163769] pci-stub :02:01.1: enabling device ( - 0002) kernel: [ 164.282679] [ cut here ] kernel: [ 164.282685] WARNING: at drivers/pci/search.c:46 pci_find_upstream_pcie_bridge+0x87/0x9f() kernel: [ 164.282687] Hardware name: SandyBridge Platform kernel: [ 164.282689] Modules linked in: sha512_generic sha256_generic icp_qa_al(O) nfs fscache auth_rpcgss nfs_acl mga drm ip6table_filter ip6_tables ebtable_nat ebtables lockd ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM iptable_mangle tun bridge stp llc sunrpc btrfs zlib_deflate libcrc32c virtio_net kvm_intel kvm uinput matroxfb_base matroxfb_DAC1064 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc e1000e iTCO_wdt iTCO_vendor_support igb microcode i2c_i801 shpchp serio_raw i2c_core pcspkr dca [last unloaded: scsi_wait_scan] kernel: [ 164.282724] Pid: 1233, comm: qemu-kvm Tainted: G O 3.2.5 #10 kernel: [ 164.282726] Call Trace: kernel: [ 164.282732] [8104eac6] warn_slowpath_common+0x83/0x9b kernel: [ 164.282735] [8104eaf8] warn_slowpath_null+0x1a/0x1c kernel: [ 164.282737] [8123e028] pci_find_upstream_pcie_bridge+0x87/0x9f kernel: [ 164.282741] [813bf5eb] domain_context_mapping+0x50/0xe6 kernel: [ 164.282744] [813bf6c5] domain_add_dev_info+0x44/0xe3 kernel: [ 164.282747] [813bfcda] intel_iommu_attach_device+0x14f/0x15c kernel: [ 164.282750] [813bb48b] iommu_attach_device+0x1c/0x1e kernel: [ 164.282764] [a00f43aa] kvm_assign_device+0x4a/0x114 [kvm] kernel: [ 164.282773] [a00f3963] kvm_vm_ioctl_assigned_device+0x434/0xb25 [kvm] kernel: [ 164.282777] [810f0fee] ? __do_fault+0x351/0x38b kernel: [ 164.282781] [8107c05b] ? arch_local_irq_save+0x15/0x1b kernel: [ 164.282784] [814b26e4] ? _raw_spin_unlock_irqrestore+0x17/0x19 kernel: [ 164.282787] [813c0f35] ? pci_conf1_read+0xe1/0xee kernel: [ 164.282794] [a00f07df] kvm_vm_ioctl+0x377/0x3ac [kvm] kernel: [ 164.282797] [8123eb7c] ? pci_read_config+0xa2/0x1bd kernel: [ 164.282801] [8110edc2] ? virt_to_head_page+0xe/0x31 kernel: [ 164.282804] [8112f210] do_vfs_ioctl+0x45d/0x49e kernel: [ 164.282808] [811208da] ? fsnotify_access+0x5f/0x67 kernel: [ 164.282811] [8112f2a7] sys_ioctl+0x56/0x7b kernel: [ 164.282814] [814b8e42] system_call_fastpath+0x16/0x1b kernel: [ 164.282816] ---[ end trace 6a834ec5ac21cba8 ]--- Signed-off-by: Tadeusz Struk tadeusz.st...@intel.com --- drivers/pci/search.c | 7 +-- 1 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/pci/search.c b/drivers/pci/search.c index 9d75dc8..7847c6b 100644 --- a/drivers/pci/search.c +++ b/drivers/pci/search.c @@ -26,6 +26,7 @@ struct pci_dev * pci_find_upstream_pcie_bridge(struct pci_dev *pdev) { struct pci_dev *tmp = NULL; + struct pci_dev *vf = pdev; if (pci_is_pcie(pdev)) return NULL; @@ -40,8 +41,10 @@ pci_find_upstream_pcie_bridge(struct pci_dev *pdev) } /* PCI device should connect to a PCIe bridge */ if (pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) { - /* Busted hardware? */ - WARN_ON_ONCE(1); + if (!vf-is_virtfn) { + /* Busted hardware? */ + WARN_ON_ONCE(1); + } return NULL; } return pdev; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Questing regarding KVM Guest PMU
On Wed, Apr 4, 2012 at 3:59 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Apr 04, 2012 at 03:49:42PM +0530, shashank rachamalla wrote: tatus: RO Content-Length: 2989 Lines: 79 On Wed, Apr 4, 2012 at 12:34 PM, Gleb Natapov g...@redhat.com wrote: On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote: On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla shashank.rachama...@gmail.com wrote: On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote: On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote: On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote: On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote: On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote: On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote: I guess things are working fine with perf. But why not with oprofile ? Looks like it. I never tried oprofile. Will try to reproduce your problem and see what oprofile is doing. I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and oprofile 0.9.6. Also, I have tried to capture kvm-events ( perf patch ) in host while running oprofile and perf in guest. Please see the attachment. I have run the tests in three cases for the around 5 secs. There are more number of MSR reads and writes in case of perf which I think is normal. However, there are very few MSR reads and writes with oprofile. Also, the number of NMI exceptions are too high in case of oprofile. Which host kernel are you using? Try latest kvm.git and check if you see something unusual in dmesg. Currenly running 3.3.0-rc5. will try with the latest source from kvm git and let you know. Thanks, there were some fixes that didn't make it into 3.3. rdpmc instruction emulation fix is one of them. If oprofile uses it this can explain the problem. I have tried with latest kvm source from git and also with 3.0 guest kernel but oprofile fails to collect any samples on guest. I am using a core2duo processor which is considered by oprofile as pentium pro model. core2duo on the host or the guest? What is your qemu command line? both. qemu command line below. sudo /usr/local/bin/qemu-system-x86_64 -drive file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net user please find more info ( /proc/cpuinfo and uname of both host and guest ) in attached files. oprofile does not work for me even on the host. After trying to use it I can see why perf was written in the first place. ok. seems to be. will move over to perf as its working fine inside guest. Good riddance IMO. I managed to run it on a guest (but not on my host!). The thing is buggy. It does not use global ctrl MSR to enable counters and kvm has all of them disabled by default. I didn't find what value this MSR should have after reset, so this may be either kvm bug or real BIOSes enable all counters in global ctrl MSR for PMUv1 compatibility. Doing wrmsr 0x38f 0x7000f solves this problem. The second problem is that oprofile reprogram PMU counters without disabling them first and this is explicitly prohibited by Intel SDM. The patch below solve that, but oprofile is the one who should be fixed. diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c index a73f0c1..be05028 100644 --- a/arch/x86/kvm/pmu.c +++ b/arch/x86/kvm/pmu.c @@ -396,6 +396,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 data) (pmc = get_fixed_pmc(pmu, index))) { data = (s64)(s32)data; pmc-counter += data - read_pmc(pmc); + reprogram_gp_counter(pmc, pmc-eventsel); return 0; } else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) { if (data == pmc-eventsel) -- Gleb. thanks for the patch. will check it out. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2] kvm: Disable MSI/MSI-X in assigned device reset path
We've hit a kernel host panic, when issuing a 'system_reset' with an 82576 nic assigned and a Windows guest. Host system is a PowerEdge R815. [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 32993 [Hardware Error]: APEI generic hardware error status [Hardware Error]: severity: 1, fatal [Hardware Error]: section: 0, severity: 1, fatal [Hardware Error]: flags: 0x01 [Hardware Error]: primary [Hardware Error]: section_type: PCIe error [Hardware Error]: port_type: 0, PCIe end point [Hardware Error]: version: 1.0 [Hardware Error]: command: 0x, status: 0x0010 [Hardware Error]: device_id: :08:00.0 [Hardware Error]: slot: 1 [Hardware Error]: secondary_bus: 0x00 [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 [Hardware Error]: class_code: 02 [Hardware Error]: aer_status: 0x0010, aer_mask: 0x00018000 [Hardware Error]: Unsupported Request [Hardware Error]: aer_layer=Transaction Layer, aer_agent=Requester ID [Hardware Error]: aer_uncor_severity: 0x00067011 [Hardware Error]: aer_tlp_header: 40001001 002f edbf800c 0100 [Hardware Error]: section: 1, severity: 1, fatal [Hardware Error]: flags: 0x01 [Hardware Error]: primary [Hardware Error]: section_type: PCIe error [Hardware Error]: port_type: 0, PCIe end point [Hardware Error]: version: 1.0 [Hardware Error]: command: 0x, status: 0x0010 [Hardware Error]: device_id: :08:00.0 [Hardware Error]: slot: 1 [Hardware Error]: secondary_bus: 0x00 [Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9 [Hardware Error]: class_code: 02 [Hardware Error]: aer_status: 0x0010, aer_mask: 0x00018000 [Hardware Error]: Unsupported Request [Hardware Error]: aer_layer=Transaction Layer, aer_agent=Requester ID [Hardware Error]: aer_uncor_severity: 0x00067011 [Hardware Error]: aer_tlp_header: 40001001 002f edbf800c 0100 Kernel panic - not syncing: Fatal hardware error! Pid: 0, comm: swapper Not tainted 2.6.32-242.el6.x86_64 #1 Call Trace: NMI [814f2fe5] ? panic+0xa0/0x168 [812f919c] ? ghes_notify_nmi+0x17c/0x180 [814f91d5] ? notifier_call_chain+0x55/0x80 [814f923a] ? atomic_notifier_call_chain+0x1a/0x20 [8109667e] ? notify_die+0x2e/0x30 [814f6e81] ? do_nmi+0x1a1/0x2b0 [814f6760] ? nmi+0x20/0x30 [8103762b] ? native_safe_halt+0xb/0x10 EOE [8101495d] ? default_idle+0x4d/0xb0 [81009e06] ? cpu_idle+0xb6/0x110 [814da63a] ? rest_init+0x7a/0x80 [81c1ff7b] ? start_kernel+0x424/0x430 [81c1f33a] ? x86_64_start_reservations+0x125/0x129 [81c1f438] ? x86_64_start_kernel+0xfa/0x109 The root cause of the problem is that the 'reset_assigned_device()' code first writes a 0 to the command register. Then, when qemu subsequently does a kvm_deassign_irq() (called by assign_irq(), in the system_reset path), the kernel ends up calling '__msix_mask_irq()', which performs a write to the memory mapped msi vector space. Since, we've explicitly told the device to disallow mmio access (via the 0 write to the command register), we end up with the above 'Unsupported Request'. The fix here is to first disable MSI-X, before doing the reset. We also disable MSI, leaving the device in INTx mode. In this way, the device is a known state after reset, and we avoid touching msi memory mapped space on any subsequent 'kvm_deassign_irq()'. Thanks to Michael S. Tsirkin for help in understanding what was going on here and Jason Baron, the original debugger of this problem. Signed-off-by: Alex Williamson alex.william...@redhat.com --- Jason is out of the office for a couple weeks, so I'll try to resolve this while he's away. Somehow the emulated config updates were lost in Jason's original posting, so I've fixed that and taken Jan's suggestion to simply call into the update functions instead of open coding the interrupt disable. I think there still may be some disagreements about how to handle guest generated errors in the host, but that's a large project whereas this is something we should be doing at reset anyway, and even if only a workaround, resolves the problem above. hw/device-assignment.c | 23 +++ 1 files changed, 23 insertions(+), 0 deletions(-) diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 89823f1..2e6b93e 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -1613,6 +1613,29 @@ static void reset_assigned_device(DeviceState *dev) const char reset[] = 1; int fd, ret; +/* + * If a guest is reset without being shutdown, MSI/MSI-X can still + * be running. We want to return the device to a known state on + * reset, so disable those here. We especially do not want MSI-X + * enabled since it lives in MMIO space, which is about to get + * disabled. + */ +if (adev-irq_requested_type KVM_DEV_IRQ_GUEST_MSIX) { +uint16_t ctrl = pci_get_word(pci_dev-config + + pci_dev-msix_cap +
CPU softlockup due to smp_call_function()
On Wed, 4 Apr 2012 about 22:12:36 +0200, Sasha Levin wrote: I've starting seeing soft lockups resulting from smp_call_function() calls. I've attached two different backtraces of this happening with different code paths. This is running inside a KVM guest with the trinity fuzzer, using today's linux-next kernel. Hi Sasha. You have two different call sites (arch/x86/mm/pageattr.c cpa_flush_range and net/core/dev.c netdev_run_todo), and both use call on_each_cpu with wait=1. I tried a few options but can't get close enough to your compiled length of 2a0 to know if the code is spinning on the first csd_lock_wait in csd_lock or in the second csd_lock_wait after the call to arch_send_call_function_ipi_mask (aka smp_ops + 0x44 in my x86_64 compile). Please check your disassembly and report. If its the first lock, then the current stack is an innocent victim. In either case we need to find what the cpu(s) holding up the reporting cpus call function data (cfd_data per_cpu var) is(are) doing. Since interrupts are on, we could read the time at entry (even jiffies) and report both the function and mask of cpus that have not processed the cpus entry if the elapsed time has exceeded some threshold. I described the call flow of smp_call_function_many and outlined some debug sanity checks that could be added at [1] if you suspect the function list is getting corrupted. Let me know if you need help creating this debug code. [1] https://lkml.org/lkml/2012/1/13/308 milton -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use clockevent multiplier and shifter for decrementer
Time for which the hrtimer is started for decrementer emulation is calculated using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC reprogramming (if needed) and which calculate timebase ticks using the multiplier and shifter mechanism implemented within clockevent layer. It was observed that this conversion (timebase-time-timebase) are not correct because the mechanism are not consistent. In our setup it adds 2% jitter. With this patch clockevent multiplier and shifter mechanism are used when starting hrtimer for decrementer emulation. Now the jitter is 0.5%. Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com --- arch/powerpc/include/asm/time.h |2 ++ arch/powerpc/kernel/time.c |6 ++ arch/powerpc/kvm/emulate.c |5 +++-- 3 files changed, 11 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h index 7eb10fb..6d631b2 100644 --- a/arch/powerpc/include/asm/time.h +++ b/arch/powerpc/include/asm/time.h @@ -202,6 +202,8 @@ extern u64 mulhdu(u64, u64); extern void div128_by_32(u64 dividend_high, u64 dividend_low, unsigned divisor, struct div_result *dr); +extern void get_clockevent_mult(u64 *multi, u64 *shift); + /* Used to store Processor Utilization register (purr) values */ struct cpu_usage { diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 567dd7c..d229edd 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -910,6 +910,12 @@ static void __init init_decrementer_clockevent(void) register_decrementer_clockevent(cpu); } +void get_clockevent_mult(u64 *multi, u64 *shift) +{ + *multi = decrementer_clockevent.mult; + *shift = decrementer_clockevent.shift; +} + void secondary_cpu_time_init(void) { /* Start the decrementer on CPUs that have manual control diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c index afc9154..4bfcaa1 100644 --- a/arch/powerpc/kvm/emulate.c +++ b/arch/powerpc/kvm/emulate.c @@ -76,6 +76,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) { unsigned long dec_nsec; unsigned long long dec_time; + u64 mult, shift; pr_debug(mtDEC: %x\n, vcpu-arch.dec); hrtimer_try_to_cancel(vcpu-arch.dec_timer); @@ -103,9 +104,9 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu) * host ticks. */ + get_clockevent_mult(mult, shift); dec_time = vcpu-arch.dec; - dec_time *= 1000; - do_div(dec_time, tb_ticks_per_usec); + dec_time = (dec_time shift) / mult; dec_nsec = do_div(dec_time, NSEC_PER_SEC); hrtimer_start(vcpu-arch.dec_timer, ktime_set(dec_time, dec_nsec), HRTIMER_MODE_REL); -- 1.7.0.4 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: bookehv: Fix save/restore of guest accessible SPRGs.
From: Varun Sethi varun.se...@freescale.com For Guest accessible SPRGs 4-7, save/restore must be handled differently for 64bit and non-64 bit case. The registers are maintained as 64 bit copies by KVM. While saving/restoring for the non-64 bit case we should always take the lower 4 bytes. Signed-off-by: Varun Sethi varun.se...@freescale.com --- arch/powerpc/kvm/bookehv_interrupts.S | 48 +++- 1 files changed, 40 insertions(+), 8 deletions(-) diff --git a/arch/powerpc/kvm/bookehv_interrupts.S b/arch/powerpc/kvm/bookehv_interrupts.S index 909e96e..c1c0bae 100644 --- a/arch/powerpc/kvm/bookehv_interrupts.S +++ b/arch/powerpc/kvm/bookehv_interrupts.S @@ -320,13 +320,29 @@ _GLOBAL(kvmppc_resume_host) PPC_STL r5, VCPU_LR(r4) mfspr r7, SPRN_SPRG5 PPC_STL r3, VCPU_VRSAVE(r4) - PPC_STL r6, VCPU_SHARED_SPRG4(r11) +#ifdef CONFIG_64BIT + std r6, VCPU_SHARED_SPRG4(r11) +#else + stw r6, (VCPU_SHARED_SPRG4 + 4)(r11) +#endif mfspr r8, SPRN_SPRG6 - PPC_STL r7, VCPU_SHARED_SPRG5(r11) +#ifdef CONFIG_64BIT + std r7, VCPU_SHARED_SPRG5(r11) +#else + stw r7, (VCPU_SHARED_SPRG5 + 4)(r11) +#endif mfspr r9, SPRN_SPRG7 - PPC_STL r8, VCPU_SHARED_SPRG6(r11) +#ifdef CONFIG_64BIT + std r8, VCPU_SHARED_SPRG6(r11) +#else + stw r8, (VCPU_SHARED_SPRG6 + 4)(r11) +#endif mfxer r3 - PPC_STL r9, VCPU_SHARED_SPRG7(r11) +#ifdef CONFIG_64BIT + std r9, VCPU_SHARED_SPRG7(r11) +#else + stw r9, (VCPU_SHARED_SPRG7 + 4)(r11) +#endif /* save guest MAS registers and restore host mas4 mas6 */ mfspr r5, SPRN_MAS0 @@ -549,13 +565,29 @@ lightweight_exit: * SPRGs, so we need to reload them here with the guest's values. */ lwz r3, VCPU_VRSAVE(r4) - lwz r5, VCPU_SHARED_SPRG4(r11) +#ifdef CONFIG_64BIT + ld r5, VCPU_SHARED_SPRG4(r11) +#else + lwz r5, (VCPU_SHARED_SPRG4 + 4)(r11) +#endif mtspr SPRN_VRSAVE, r3 - lwz r6, VCPU_SHARED_SPRG5(r11) +#ifdef CONFIG_64BIT + ld r6, VCPU_SHARED_SPRG5(r11) +#else + lwz r6, (VCPU_SHARED_SPRG5 + 4)(r11) +#endif mtspr SPRN_SPRG4W, r5 - lwz r7, VCPU_SHARED_SPRG6(r11) +#ifdef CONFIG_64BIT + ld r7, VCPU_SHARED_SPRG6(r11) +#else + lwz r7, (VCPU_SHARED_SPRG6 + 4)(r11) +#endif mtspr SPRN_SPRG5W, r6 - lwz r8, VCPU_SHARED_SPRG7(r11) +#ifdef CONFIG_64BIT + ld r8, VCPU_SHARED_SPRG7(r11) +#else + lwz r8, (VCPU_SHARED_SPRG7 + 4)(r11) +#endif mtspr SPRN_SPRG6W, r7 mtspr SPRN_SPRG7W, r8 -- 1.7.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: booke: Use the lower four bytes while restoring guest readable SPRGs.
From: Varun Sethi varun.se...@freescale.com While restoring the hardware copies of guest SPRG4-7 registers we must use the the lower 4 bytes of the 64 bit sotware copies maintained by KVM. Signed-off-by: Varun Sethi varun.se...@freescale.com --- arch/powerpc/kvm/booke_interrupts.S |8 1 files changed, 4 insertions(+), 4 deletions(-) diff --git a/arch/powerpc/kvm/booke_interrupts.S b/arch/powerpc/kvm/booke_interrupts.S index c8c4b87..feda1bb 100644 --- a/arch/powerpc/kvm/booke_interrupts.S +++ b/arch/powerpc/kvm/booke_interrupts.S @@ -419,13 +419,13 @@ lightweight_exit: * written directly to the shared area, so we * need to reload them here with the guest's values. */ - lwz r3, VCPU_SHARED_SPRG4(r5) + lwz r3, (VCPU_SHARED_SPRG4 + 4)(r5) mtspr SPRN_SPRG4W, r3 - lwz r3, VCPU_SHARED_SPRG5(r5) + lwz r3, (VCPU_SHARED_SPRG5 + 4)(r5) mtspr SPRN_SPRG5W, r3 - lwz r3, VCPU_SHARED_SPRG6(r5) + lwz r3, (VCPU_SHARED_SPRG6 + 4)(r5) mtspr SPRN_SPRG6W, r3 - lwz r3, VCPU_SHARED_SPRG7(r5) + lwz r3, (VCPU_SHARED_SPRG7 + 4)(r5) mtspr SPRN_SPRG7W, r3 #ifdef CONFIG_KVM_EXIT_TIMING -- 1.7.2.2 -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html