[COMMIT] [WIN-GUEST-DRIVERS] Clean up prefast (MS static code analyzer tool) errors.
repository: C:/dev/kvm-guest-drivers-windows branch: master commit 2a08ca42e31cb525a7179c71fbb7f9e19edc7bb3 Author: Yan Vugenfirer yvuge...@redhat.com Date: Tue Oct 27 18:28:43 2009 +0200 [WIN-GUEST-DRIVERS] Clean up prefast (MS static code analyzer tool) errors. Signed-off-by: Yan Vugenfirer yvuge...@redhat.com diff --git a/NetKVM/Common/ParaNdis-Debug.c b/NetKVM/Common/ParaNdis-Debug.c index 22d3b73..71494ef 100644 --- a/NetKVM/Common/ParaNdis-Debug.c +++ b/NetKVM/Common/ParaNdis-Debug.c @@ -24,12 +24,7 @@ int bDebugPrint = 1; static NDIS_SPIN_LOCK CrashLock; -static VOID ParaNdis_OnBugCheck( -IN KBUGCHECK_CALLBACK_REASON Reason, -IN PKBUGCHECK_REASON_CALLBACK_RECORD Record, -IN OUT PVOID ReasonSpecificData, -IN ULONG ReasonSpecificDataLength -); +static KBUGCHECK_REASON_CALLBACK_ROUTINE ParaNdis_OnBugCheck; static VOID ParaNdis_PrepareBugCheckData(); typedef BOOLEAN (*KeRegisterBugCheckReasonCallbackType) ( diff --git a/NetKVM/wlh/ParaNdis6-Driver.c b/NetKVM/wlh/ParaNdis6-Driver.c index 30357a2..1ceaef6 100644 --- a/NetKVM/wlh/ParaNdis6-Driver.c +++ b/NetKVM/wlh/ParaNdis6-Driver.c @@ -56,7 +56,7 @@ static KeRegisterProcessorChangeCallbackType static KeDeregisterProcessorChangeCallbackType ParaNdis2008_DeregisterCallback = DummyDeregisterProcessorChangeCallback; -static PVOID ProcessorChangeCallbackHandle = NULL; +static PROCESSOR_CALLBACK_FUNCTION * ProcessorChangeCallbackHandle = NULL; static VOID ParaNdis_OnCPUChange( __in PVOID CallbackContext, diff --git a/NetKVM/wxp/ParaNdis5-Driver.c b/NetKVM/wxp/ParaNdis5-Driver.c index 2e36a78..12aeeef 100644 --- a/NetKVM/wxp/ParaNdis5-Driver.c +++ b/NetKVM/wxp/ParaNdis5-Driver.c @@ -22,6 +22,8 @@ static NDIS_HANDLE DriverHandle; static ULONG gID = 0; +static DRIVER_UNLOAD ParaVirtualNICUnload; + /** Unload handler, only responsibility is cleanup WPP ***/ -- To unsubscribe from this list: send the line unsubscribe kvm-commits in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-net patches
On Mon, Oct 26, 2009 at 02:34:49PM -0700, Shirley Ma wrote: Hello Miachel, On Mon, 2009-10-26 at 22:05 +0200, Michael S. Tsirkin wrote: Shirley, could you please test the following patch? With this patch, the performance has gained from 1xxx to 2xxx Mb/s, still has some performance gap compared to without vhost. It was 3xxxMb/s before from guest to host on my set up. At some point my guest had a runaway nash-hotplug process consuming 100% CPU. Could you please verify this does not happen to you? Looks like your git tree virtio_net has fixed the skb_xmit panic I have seen before as well, good news. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Data is copied from irq_rt onto the stack and this copy is accessed outside critical section. - Because the pointer is accessed outside of the read-side critical section. There are two basic patterns we can use to fix this bug: 1) Switch to sleeping-rcu and encompass the -set() access within the read-side critical section, OR 2) Add reference counting to the irq_rt structure, and simply acquire the reference from within the RSCS. This patch implements solution (1). Signed-off-by: Gregory Haskins ghask...@novell.com --- include/linux/kvm_host.h |6 +- virt/kvm/irq_comm.c | 50 +++--- virt/kvm/kvm_main.c |1 + 3 files changed, 35 insertions(+), 22 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd5a616..1fe135d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -185,7 +185,10 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP - struct kvm_irq_routing_table *irq_routing; + struct { + struct srcu_structsrcu; + struct kvm_irq_routing_table *table; + } irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; #endif @@ -541,6 +544,7 @@ int kvm_set_irq_routing(struct kvm *kvm, const struct kvm_irq_routing_entry *entries, unsigned nr, unsigned flags); +void kvm_init_irq_routing(struct kvm *kvm); void kvm_free_irq_routing(struct kvm *kvm); #else diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 00c68d2..db2553f 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -144,10 +144,11 @@ static int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e, */ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) { - struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS]; - int ret = -1, i = 0; + struct kvm_kernel_irq_routing_entry *e; + int ret = -1; struct kvm_irq_routing_table *irq_rt; struct hlist_node *n; + int idx; trace_kvm_set_irq(irq, level, irq_source_id); @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ - rcu_read_lock(); - irq_rt = rcu_dereference(kvm-irq_routing); + idx = srcu_read_lock(kvm-irq_routing.srcu); + irq_rt = rcu_dereference(kvm-irq_routing.table); if (irq irq_rt-nr_rt_entries) - hlist_for_each_entry(e, n, irq_rt-map[irq], link) - irq_set[i++] = *e; - rcu_read_unlock(); + hlist_for_each_entry(e, n, irq_rt-map[irq], link) { + int r; - while(i--) { - int r; - r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level); - if (r 0) - continue; + r = e-set(e, kvm, irq_source_id, level); + if (r 0) + continue; - ret = r + ((ret 0) ? 0 : ret); - } + ret = r + ((ret 0) ? 0 : ret); + } + srcu_read_unlock(kvm-irq_routing.srcu, idx); return ret; } @@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned irqchip, unsigned pin) struct kvm_irq_ack_notifier *kian; struct hlist_node *n; int gsi; + int idx; trace_kvm_ack_irq(irqchip, pin); - rcu_read_lock(); - gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin]; + idx = srcu_read_lock(kvm-irq_routing.srcu); + gsi = rcu_dereference(kvm-irq_routing.table)-chip[irqchip][pin]; if (gsi != -1) hlist_for_each_entry_rcu(kian, n, kvm-irq_ack_notifier_list,
[ kvm-Bugs-2886941 ] Extreme slow down using -cpu host
Bugs item #2886941, was opened at 2009-10-27 08:26 Message generated for change (Tracker Item Submitted) made by nwxi You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: nwxi (nwxi) Assigned to: Nobody/Anonymous (nobody) Summary: Extreme slow down using -cpu host Initial Comment: Hi! I've currently experienced a massive slowdown when using flag -cpu host on an AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects network (tested e1000 and virtio) and i/o performance (virtio). Further there are dozens of log messages from kvm module complaining about: cpu0/1 unhandled rdmsr: 0xc0010055 Both, the slowdown and the rdmsr-message do not happen when using qemu64 as cpu, but also occur when using -cpu phenom. Thank you for your comments! regards, Michael -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-kmod-2.6.31.5
Alexander Graf wrote: On 26.10.2009, at 18:26, Jan Kiszka wrote: This package contains the kvm external modules, using the sources from latest stable Linux release 2.6.31.5. It can be used to update the kernel-side support of kvm without upgrading the host kernel. This release has been tested on x86 down to host kernel 2.6.27 and builds down to 2.6.24. Building against older kernels is expected to be broken, but if anyone provides patches to fix it, I'm open to merge them. Aww - I'm missing the awesome changelogs :-). Your wish is my command: Major KVM changes since kvm-kmod-2.6.30.1: - MSI-X support - MCE support - SVM: Reworkedfixed IRQ injection, added NMI injection - Host security: Limit lapic periodic timer frequency - Guest security: CPL check on hypercalls and debug register access - x86 emulator: lcall and soft INT support, interrupt shadow handling - Guest migration fixes (I probably missed some more important stuff) Major kvm-kmod changes: - Reanimated support down to 2.6.24 hosts - Fixed building against out-of-tree kernel builds / split kernels - udev script with auto-modprobe on boot - Installation fixes (DESTDIR) Wanted to attach this information to the tarball as well, but sourceforge's FRS is still in a almost completely broken state. Sigh. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
On 10/27/2009 01:21 AM, Olof Johansson wrote: On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote: For some reason, I'm not seeing this build break, but the patch is obviously correct. Acked-by: Hollis Blanchard holl...@us.ibm.com I saw it when building with pasemi_defconfig + manually enabled KVM options (all available). Alex, can you fold this patch in? No need to repost; just update your git tree. (btw, please make sure the patchset is bisectable). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM: VMX: move CR3/PDPTR update to vmx_set_cr3
On 10/26/2009 08:48 PM, Marcelo Tosatti wrote: GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from outside guest context. Similarly pdptrs are updated via load_pdptrs. Let kvm_set_cr3 perform the update, removing it from the vcpu_run fast path. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration
On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote: Hi Michael, I am reviewing your patchset and have just a minor remark to make here: On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldishmgold...@redhat.com wrote: This patch adds a new test that checks the timedrift introduced by migrations. It uses the same parameters used by the timedrift test to get the guest time. In addition, the number of migrations the test performs is controlled by the parameter 'migration_iterations'. Signed-off-by: Michael Goldishmgold...@redhat.com --- client/tests/kvm/kvm_tests.cfg.sample | 33 --- client/tests/kvm/tests/timedrift_with_migration.py | 95 2 files changed, 115 insertions(+), 13 deletions(-) create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py diff --git a/client/tests/kvm/kvm_tests.cfg.sample b/client/tests/kvm/kvm_tests.cfg.sample index 540d0a2..618c21e 100644 --- a/client/tests/kvm/kvm_tests.cfg.sample +++ b/client/tests/kvm/kvm_tests.cfg.sample @@ -100,19 +100,26 @@ variants: type = linux_s3 - timedrift:install setup -type = timedrift extra_params += -rtc-td-hack -# Pin the VM and host load to CPU #0 -cpu_mask = 0x1 -# Set the load and rest durations -load_duration = 20 -rest_duration = 20 -# Fail if the drift after load is higher than 50% -drift_threshold = 50 -# Fail if the drift after the rest period is higher than 10% -drift_threshold_after_rest = 10 -# For now, make sure this test is executed alone -used_cpus = 100 +variants: +- with_load: +type = timedrift +# Pin the VM and host load to CPU #0 +cpu_mask = 0x1 Let's use -smp 2 always. btw: we need not to parallel the load test with standard tests. +# Set the load and rest durations +load_duration = 20 +rest_duration = 20 Even the default duration here seems way too brief here, is there any reason why 20s was chosen instead of, let's say, 1800s? I am under the impression that 20s of load won't be enough to cause any noticeable drift... +# Fail if the drift after load is higher than 50% +drift_threshold = 50 +# Fail if the drift after the rest period is higher than 10% +drift_threshold_after_rest = 10 I am also curious about those tresholds and the reasoning behind them. Is there any official agreement on what we consider to be an unreasonable drift? Another thing that struck me out is drift calculation: On the original timedrift test, the guest drift is normalized against the host drift: drift = 100.0 * (host_delta - guest_delta) / host_delta While in the new drift tests, we consider only the guest drift. I believe is better to normalize all tests based on one drift calculation criteria, and those values should be reviewed, and at least a certain level of agreement on our development community should be reached. I think we don't need to calculate drift ratio. We should define a threshold in seconds, let's say 2 seconds. Beyond that, there should not be any drift. Do we support migration to a different host? We should, especially in this test too. The destination host reading should also be used. Apart for that, good patchset, and good thing you refactored some of the code to shared utils. Other than this concern that came to my mind, the new tests look good and work fine here. I had to do a slight rebase in one of the patches, very minor stuff. The default values and the drift calculation can be changed on a later time. Thanks! -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: xp guest, blue screen c0000221 on boot
On 10/26/2009 11:09 PM, Andrew Olney wrote: Hangs on boot, xp guest: STOP: c221 Unknown Hard Error \SystemRoot\System32\ntdll.dll Will boot into safe mode, but _not_ into safe mode with networking. According to http://support.microsoft.com/kb/314474, this file is corrupted. If you have another VM with the same service pack, you might try copying the file over in safe mode, or you can try extracting it from the service pack and copying it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Virtio block module slower than IDE
On 10/27/2009 02:13 AM, Floris Bos wrote: Hi, I am running Proxmox 1.4 (which uses the 2.6.30.1 kvm modules) and am experiencing performance problems with Linux guests using the virtio_blk module. Especially with random IO it is a lot slower than IDE. Try switching the host I/O scheduler to deadline. You can do that at runtime in /sys/block/.../queue. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -cpu host AMD Host
On 10/26/2009 09:55 PM, Martin Gallant wrote: Is “–cpu host” supported on AMD hosts? Yes. Whenever I try to use this option on a Windows Vista/7 client, I get blue screen. Removing the option, the client works fine. Host kernel 2.6.31.4. Userspace is qemu-kvm-0.11.0. (Previous versions fail too) /proc/cpuinfo snippet: vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ Please post the flags : field as well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance regression in virtio-net in 2.6.32-rc4
On 10/26/2009 08:48 PM, Michael S. Tsirkin wrote: Hi! I noticed a performance regression in virtio net: going from 2.6.31 to 2.6.32-rc4 I see this, for guest to host communication: Any tips on debugging this? Lacking better advice, a bisect can help as a last resort. 'git bisect start -- drivers/net drivers/virtio' will probably find it fastest. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm problems on new hardware
On 10/26/2009 12:06 PM, Danny ter Haar wrote: Hello, I have a KVM virtualization problem. I've put together new hardware (supermicro) server with 2 E5530 cpu's and memory disk to start experimenting with virtualization. I intend to use the www.proxmox.com system/setup. I installed proxmox and started stress testing the hardware: parallel kernel compiles in a loop (concurrency_level=32) memtest86+ during the night etc. The hardware/os performs rocksolid when i stress test it, but the moment i start a virtual guest (eg debian netinstall) i get the first screen of the installation procedure in a vnc screen. I choose either normal install or expert install , the guest screen goes blank with only a cursor and the kvm process prints an error on the console and starts to eat cpu cycles. So the host OS is not barfing, only the kvm process is giving problems and the guest is frozen. Does this happen for all guests (different OSes), or just this one? Please provide a link to the install media. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2886941 ] Extreme slow down using -cpu host
Bugs item #2886941, was opened at 2009-10-27 09:26 Message generated for change (Comment added) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: nwxi (nwxi) Assigned to: Nobody/Anonymous (nobody) Summary: Extreme slow down using -cpu host Initial Comment: Hi! I've currently experienced a massive slowdown when using flag -cpu host on an AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects network (tested e1000 and virtio) and i/o performance (virtio). Further there are dozens of log messages from kvm module complaining about: cpu0/1 unhandled rdmsr: 0xc0010055 Both, the slowdown and the rdmsr-message do not happen when using qemu64 as cpu, but also occur when using -cpu phenom. Thank you for your comments! regards, Michael -- Comment By: Avi Kivity (avik) Date: 2009-10-27 11:52 Message: This is fixed in 2.6.32. I'll send a patch to 2.6.31-stable. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2886941 ] Extreme slow down using -cpu host
Bugs item #2886941, was opened at 2009-10-27 09:26 Message generated for change (Settings changed) made by avik You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: nwxi (nwxi) Assigned to: Nobody/Anonymous (nobody) Summary: Extreme slow down using -cpu host Initial Comment: Hi! I've currently experienced a massive slowdown when using flag -cpu host on an AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects network (tested e1000 and virtio) and i/o performance (virtio). Further there are dozens of log messages from kvm module complaining about: cpu0/1 unhandled rdmsr: 0xc0010055 Both, the slowdown and the rdmsr-message do not happen when using qemu64 as cpu, but also occur when using -cpu phenom. Thank you for your comments! regards, Michael -- Comment By: Avi Kivity (avik) Date: 2009-10-27 11:56 Message: Should be fixed in 2.6.31.6 when it comes out. -- Comment By: Avi Kivity (avik) Date: 2009-10-27 11:52 Message: This is fixed in 2.6.32. I'll send a patch to 2.6.31-stable. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: -cpu host AMD Host
On 10/27/2009 11:39 AM, Avi Kivity wrote: On 10/26/2009 09:55 PM, Martin Gallant wrote: Is “–cpu host” supported on AMD hosts? Yes. Whenever I try to use this option on a Windows Vista/7 client, I get blue screen. Removing the option, the client works fine. Host kernel 2.6.31.4. Userspace is qemu-kvm-0.11.0. (Previous versions fail too) /proc/cpuinfo snippet: vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ Please post the flags : field as well. You might try the attached patch. -- error compiling committee.c: too many arguments to function From d8e9cf4f4f688873456b297d78be70c5c328489c Mon Sep 17 00:00:00 2001 From: Andre Przywara andre.przyw...@amd.com Date: Wed, 24 Jun 2009 12:44:34 +0200 Subject: [PATCH -stable] KVM: ignore reads from AMDs C1E enabled MSR If the Linux kernel detects an C1E capable AMD processor (K8 RevF and higher), it will access a certain MSR on every attempt to go to halt. Explicitly handle this read and return 0 to let KVM run a Linux guest with the native AMD host CPU propagated to the guest. Signed-off-by: Andre Przywara andre.przyw...@amd.com Signed-off-by: Avi Kivity a...@redhat.com (cherry picked from commit 1fdbd48c242db996107f72ae4140ffe8163e26a8) --- arch/x86/kvm/x86.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 8aafb62..aa4c46f 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -949,6 +949,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata) case MSR_P6_EVNTSEL0: case MSR_P6_EVNTSEL1: case MSR_K7_EVNTSEL0: + case MSR_K8_INT_PENDING_MSG: data = 0; break; case MSR_MTRRcap: -- 1.6.5.2
Re: kvm problems on new hardware
On Tue, 2009-10-27 at 11:45 +0200, Avi Kivity wrote: Does this happen for all guests (different OSes), or just this one? I tried an iso of both X86_64 and i386 of debian. I even burned the iso image to a real cd to try if it would boot (it did) Then i tried a rescuecd image: same behaviour. Please provide a link to the install media. It's really a standard cd image i got from http://cdimage.debian.org/debian-cd/5.0.3/amd64/iso-cd/debian-503-amd64-netinst.iso I opened a question at the proxmox forum (first): http://www.proxmox.com/forum/showthread.php?t=2384 I just followed up with this posting: http://www.proxmox.com/forum/showthread.php?p=13192#post13192 which may give more detailed/useful information. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm problems on new hardware
On Tue, 2009-10-27 at 11:45 +0200, Avi Kivity wrote: Does this happen for all guests (different OSes), or just this one? I just tried a windows7 cd image Same error: vmbr0: port 2(vmtab105i0) entering learning state vmbr0: topology change detected, propagating vmbr0: port 2(vmtab105i0) entering forwarding state handle_exception: unexpected, vectoring info 0x8410 intr info 0x8b0d I am thinking it's either a bios setting (i tried (afaik) all possible combinations). I even have a not-yet-released version of the bios. All still won't let me use virtualization. Could it be i have faulty cpu's ? Is there a way to debug what happens when ik start the kvm guest process ? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm problems on new hardware
On 10/27/2009 12:27 PM, Danny ter Haar wrote: On Tue, 2009-10-27 at 11:45 +0200, Avi Kivity wrote: Does this happen for all guests (different OSes), or just this one? I just tried a windows7 cd image Same error: vmbr0: port 2(vmtab105i0) entering learning state vmbr0: topology change detected, propagating vmbr0: port 2(vmtab105i0) entering forwarding state handle_exception: unexpected, vectoring info 0x8410 intr info 0x8b0d I'm not able to reproduce this on a similar processor. Can you post your qemu command line? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-kmod-2.6.31.5
On 10/27/2009 10:25 AM, Jan Kiszka wrote: Wanted to attach this information to the tarball as well, but sourceforge's FRS is still in a almost completely broken state. Sigh. It can be done - upload the changelog file, make it as a changelog, and then go to the tarball and link the newly marked changelog. Messy. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Jan Kiszka to maintain kvm-kmod
On 10/26/2009 07:51 PM, Jan Kiszka wrote: That forecast already promises the next rain That would be me. I think it's easily fixable, either by emulating user return notifiers using preempt notifiers (sched_out - on_user_return), or by noping out user return notifiers completely and hooking the kvm sched_out callback to call kvm_on_user_return. Of course, we won't see the performance gain, but that's expected. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm problems on new hardware
On Tue, 2009-10-27 at 12:34 +0200, Avi Kivity wrote: I'm not able to reproduce this on a similar processor. Can you post your qemu command line? vhost1:/var/tmp# ps axuw |grep kvm root 5843 42.7 0.0 621048 13560 ?Sl 11:35 0:06 /usr/bin/kvm -monitor unix:/var/run/qemu-server/105.mon,server,nowait -vnc unix:/var/run/qemu-server/105.vnc,password -pidfile /var/run/qemu-server/105.pid -daemonize -usbdevice tablet -name win7 -smp sockets=1,cores=1 -vga cirrus -tdf -localtime -rtc-td-hack -k en-us -drive file=/var/lib/vz/images/105/vm-105-disk-1.raw,if=ide,index=0,boot=on -drive file=/var/lib/vz/template/iso/windows7.iso,if=ide,index=2,media=cdrom -m 512 -net tap,vlan=0,ifname=vmtab105i0,script=/var/lib/qemu-server/bridge-vlan -net nic,vlan=0,model=e1000,macaddr=BA:29:E3:CC:95:F6 -id 105 -cpuunits 1000 vhost1:/etc/qemu-server# cat 105.conf name: win7 ide2: local:iso/windows.iso,media=cdrom vlan0: e1000=BA:29:E3:CC:95:F6 bootdisk: ide0 ostype: w2k8 ide0: local:105/vm-105-disk-1.raw memory: 512 sockets: 1 I started the debian install again, attached a strace and then hit enter in the vnc screen. The output of the strace can be found here: http://dth.net/supermicro/strace_debian_install Don't know if it helps, _i_ don't know what i'm looking at ;-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] kvm-kmod-2.6.31.5
Avi Kivity wrote: On 10/27/2009 10:25 AM, Jan Kiszka wrote: Wanted to attach this information to the tarball as well, but sourceforge's FRS is still in a almost completely broken state. Sigh. It can be done - upload the changelog file, make it as a changelog, and then go to the tarball and link the newly marked changelog. Unless the web interface refuses to work with my Firefox (interestingly, it works with IE6 under Windows?!) and the shell access reports the frs folder as detached. Messy. Indeed, beside the bugs, the new concept looks like a step backward: You now have to download the changelog to view it. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Jan Kiszka to maintain kvm-kmod
Avi Kivity wrote: On 10/26/2009 07:51 PM, Jan Kiszka wrote: That forecast already promises the next rain That would be me. I think it's easily fixable, either by emulating user return notifiers using preempt notifiers (sched_out - on_user_return), or by noping out user return notifiers completely and hooking the kvm sched_out callback to call kvm_on_user_return. Of course, we won't see the performance gain, but that's expected. I was playing with such an approach already. But something goes wrong (GPF in kvm_set_shared_msr) when booting Win7. Will check without kvm-kmod next. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Autotest] [PATCH 1/2] KVM-test: Add execute permission to qemu-ifup script
Ooops fixed, thanks! On Tue, Oct 27, 2009 at 2:07 AM, Amos Kong ak...@redhat.com wrote: qemu-ifup is a script for setting network bridge. If no execute permission, always face this problem: autotest/client/tests/kvm/scripts/qemu-ifup: could not launch network script Could not initialize device 'tap Signed-off-by: Amos Kong ak...@redhat.com --- 0 files changed, 0 insertions(+), 0 deletions(-) mode change 100644 = 100755 client/tests/kvm/scripts/qemu-ifup diff --git a/client/tests/kvm/scripts/qemu-ifup b/client/tests/kvm/scripts/qemu-ifup old mode 100644 new mode 100755 -- 1.5.5.6 ___ Autotest mailing list autot...@test.kernel.org http://test.kernel.org/cgi-bin/mailman/listinfo/autotest -- Lucas -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
BUG with Win7 and user-return-notifier
Hi Avi, just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test of Windows 7. Already during Starting Windows I get this: ... general protection fault: [#1] PREEMPT SMP last sysfs file: /sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:01/power_supply/CMB2/charge_full CPU 0 Modules linked in: kvm_intel kvm ip6t_LOG xt_pkttype xt_TCPMSS xt_tcpudp ipt_LOG xt_limit iptable_nat nf_nat xt_physdev sco bridge stp bnep rfcomm l2cap snd_pcm_oss snd_mixer_oss crc16 snd_seq coretemp snd_seq_device i915 drm_kms_helper drm i2c_algo_bit af_packet ip6t_REJECT nf_conntrack_ipv6 ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode fuse ohci_hcd loop snd_hda_codec_realtek arc4 ecb snd_hda_intel ath5k snd_hda_codec mac80211 ath snd_hwdep btusb pcmcia snd_pcm sdhci_pci video snd_timer cfg80211 iTCO_wdt rtc_cmos ppdev bluetooth sdhci yenta_socket iTCO_vendor_support i2c_i801 snd rsrc_nonstatic fujitsu_laptop rtc_core soundcore mmc_core output ohci1394 parport_pc battery rfkill ac rtc_lib i2c_core snd_page_alloc pcmcia_core intel_agp parport joydev serio_raw led_class ieee1394 pcspkr button sky2 sg sha256 _generic aes_generic cbc dm_crypt usbhid hid ehci_hcd uhci_hcd sd_mod crc_t10dif usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan thermal_sys hwmon ata_piix ahci libata scsi_mod Pid: 7404, comm: qemu-system-x86 Not tainted 2.6.32-rc5 #8 LIFEBOOK E8110 RIP: 0010:[a05853ad] [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm] RSP: 0018:880060947cc8 EFLAGS: 00010246 RAX: RBX: 0003 RCX: c080 RDX: RSI: RDI: 0003 RBP: 880060947ce8 R08: R09: R10: 0012 R11: 880062b01158 R12: 88000601dad0 R13: R14: 880062b00058 R15: 880062b00059 FS: 7f91606df950() GS:88000600() knlGS: CS: 0010 DS: 002b ES: 002b CR0: 80050033 CR2: CR3: 665ab000 CR4: 26f0 DR0: DR1: DR2: DR3: DR6: 0ff0 DR7: 0400 Process qemu-system-x86 (pid: 7404, threadinfo 880060946000, task 880062989950) Stack: 60947cf8 0004 880062b0 0 880060947d08 a05d1ef9 880060991f80 880062b0 0 880060947dd8 a0582e40 880060947d38 0001db70 Call Trace: [a05d1ef9] vmx_save_host_state+0x141/0x150 [kvm_intel] [a0582e40] kvm_arch_vcpu_ioctl_run+0x510/0xb25 [kvm] [a0571dda] kvm_vcpu_ioctl+0xfb/0x722 [kvm] [a05743de] ? kvm_vm_ioctl+0x33a/0x36b [kvm] [81330022] ? sub_preempt_count+0x9/0x83 [810cb698] ? fire_user_return_notifiers+0x50/0x6b [81122bf4] vfs_ioctl+0x2f/0x7d [81123112] do_vfs_ioctl+0x450/0x48d [81330022] ? sub_preempt_count+0x9/0x83 [811231a9] sys_ioctl+0x5a/0x7c [8100bc5b] system_call_fastpath+0x16/0x1b Code: 03 24 c5 60 a8 6f 81 89 d8 48 8d 50 04 4d 3b 2c d4 74 38 4d 89 2c d4 48 c1 e0 04 4c 89 ea 8b 88 f8 50 5a a0 48 c1 ea 20 44 89 e8 0f 30 41 80 7c 24 18 00 75 16 49 c7 04 24 d6 53 58 a0 4c 89 e7 RIP [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm] RSP 880060947cc8 ---[ end trace 44d1410c7cb2d885 ]--- note: qemu-system-x86[7404] exited with preempt_count 1 So the problem is not kvm-kmod related. Any ideas? Need .config or other additional information? Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG with Win7 and user-return-notifier
On 10/27/2009 02:52 PM, Jan Kiszka wrote: Hi Avi, just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test of Windows 7. Already during Starting Windows I get this: x86 or x64 7? RAX: RBX: 0003 RCX: c080 RDX: RSI: RDI: 0003 Call Trace: [a05d1ef9] vmx_save_host_state+0x141/0x150 [kvm_intel] [a0582e40] kvm_arch_vcpu_ioctl_run+0x510/0xb25 [kvm] [a0571dda] kvm_vcpu_ioctl+0xfb/0x722 [kvm] [a05743de] ? kvm_vm_ioctl+0x33a/0x36b [kvm] [81330022] ? sub_preempt_count+0x9/0x83 [810cb698] ? fire_user_return_notifiers+0x50/0x6b [81122bf4] vfs_ioctl+0x2f/0x7d [81123112] do_vfs_ioctl+0x450/0x48d [81330022] ? sub_preempt_count+0x9/0x83 [811231a9] sys_ioctl+0x5a/0x7c [8100bc5b] system_call_fastpath+0x16/0x1b Code: 03 24 c5 60 a8 6f 81 89 d8 48 8d 50 04 4d 3b 2c d4 74 38 4d 89 2c d4 48 c1 e0 04 4c 89 ea 8b 88 f8 50 5a a0 48 c1 ea 20 44 89 e80f 30 41 80 7c 24 18 00 75 16 49 c7 04 24 d6 53 58 a0 4c 89 e7 RIP [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm] That's a wrmsr(EFER, 0), which should definitely fault. update_transition_efer() should have taken care of it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG with Win7 and user-return-notifier
Avi Kivity wrote: On 10/27/2009 02:52 PM, Jan Kiszka wrote: Hi Avi, just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test of Windows 7. Already during Starting Windows I get this: x86 or x64 7? x64. RAX: RBX: 0003 RCX: c080 RDX: RSI: RDI: 0003 Call Trace: [a05d1ef9] vmx_save_host_state+0x141/0x150 [kvm_intel] [a0582e40] kvm_arch_vcpu_ioctl_run+0x510/0xb25 [kvm] [a0571dda] kvm_vcpu_ioctl+0xfb/0x722 [kvm] [a05743de] ? kvm_vm_ioctl+0x33a/0x36b [kvm] [81330022] ? sub_preempt_count+0x9/0x83 [810cb698] ? fire_user_return_notifiers+0x50/0x6b [81122bf4] vfs_ioctl+0x2f/0x7d [81123112] do_vfs_ioctl+0x450/0x48d [81330022] ? sub_preempt_count+0x9/0x83 [811231a9] sys_ioctl+0x5a/0x7c [8100bc5b] system_call_fastpath+0x16/0x1b Code: 03 24 c5 60 a8 6f 81 89 d8 48 8d 50 04 4d 3b 2c d4 74 38 4d 89 2c d4 48 c1 e0 04 4c 89 ea 8b 88 f8 50 5a a0 48 c1 ea 20 44 89 e80f 30 41 80 7c 24 18 00 75 16 49 c7 04 24 d6 53 58 a0 4c 89 e7 RIP [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm] That's a wrmsr(EFER, 0), which should definitely fault. update_transition_efer() should have taken care of it. Jan -- Siemens AG, Corporate Technology, CT SE 2 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG with Win7 and user-return-notifier
On 10/27/2009 03:13 PM, Jan Kiszka wrote: Avi Kivity wrote: On 10/27/2009 02:52 PM, Jan Kiszka wrote: Hi Avi, just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test of Windows 7. Already during Starting Windows I get this: x86 or x64 7? x64. Worked for me - getting to the initial prompt. Do you have CONFIG_USER_RETURN_NOTIFIER=y in your .config? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: BUG with Win7 and user-return-notifier
On 10/27/2009 03:24 PM, Avi Kivity wrote: Worked for me - getting to the initial prompt. Do you have CONFIG_USER_RETURN_NOTIFIER=y in your .config? If you do, send your own .config, will try to reproduce. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Hi Paul, Paul E. McKenney wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ - Because the pointer is accessed outside of the read-side critical section. There are two basic patterns we can use to fix this bug: 1) Switch to sleeping-rcu and encompass the -set() access within the read-side critical section, OR 2) Add reference counting to the irq_rt structure, and simply acquire the reference from within the RSCS. This patch implements solution (1). Looks like a good transformation! A few questions interspersed below. Thanks for the review. I would have CC'd you but I figured I pestered you enough with my RCU reviews in the past, and didn't want to annoy you ;) I will be sure to CC you in the future, unless you ask otherwise. Signed-off-by: Gregory Haskins ghask...@novell.com --- include/linux/kvm_host.h |6 +- virt/kvm/irq_comm.c | 50 +++--- virt/kvm/kvm_main.c |1 + 3 files changed, 35 insertions(+), 22 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd5a616..1fe135d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -185,7 +185,10 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP -struct kvm_irq_routing_table *irq_routing; +struct { +struct srcu_structsrcu; Each structure has its own SRCU domain. This is OK, but just asking if that is the intent. It does look like the SRCU primitives are passed a pointer to the correct structure, and that the return value from srcu_read_lock() gets passed into the matching srcu_read_unlock() like it needs to be, so that is good. Yeah, it was intentional. Technically the table is per-guest, and thus the locking is too, which is the desired/intentional granularity. On that note, I tried to denote that kvm-irq_routing.srcu and kvm-irq_routing.table were related to one another, but then went ahead and modified the hunks that touched kvm-irq_ack_notifier_list, too. In retrospect, this was probably a mistake. I should leave the rcu usage outside of -irq_routing.table alone. +struct kvm_irq_routing_table *table; +} irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; #endif [ . . . ] @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ -rcu_read_lock(); -irq_rt = rcu_dereference(kvm-irq_routing); +idx = srcu_read_lock(kvm-irq_routing.srcu); +irq_rt = rcu_dereference(kvm-irq_routing.table); if (irq irq_rt-nr_rt_entries) -hlist_for_each_entry(e, n, irq_rt-map[irq], link) -irq_set[i++] = *e; -rcu_read_unlock(); +hlist_for_each_entry(e, n, irq_rt-map[irq], link) { What prevents the above list from changing while we are traversing it? (Yes, presumably whatever was preventing it from changing before this patch, but what?) Mostly kvm-lock is held, but not always. And if kvm-lock were held all the time, there would be no point in using SRCU. ;-) This is protected by kvm-irq_lock within kvm_set_irq_routing(). Entries are added to a copy of the list, and the top-level table pointer is swapped (via rcu_assign_pointer(), as it should be) while holding the lock. Finally, we synchronize with the RSCS before deleting the old copy. It looks to me like the original author got this part right, so I didn't modify it outside of converting to SRCU. +int r; -while(i--) { -int r; -r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level); -if (r 0) -continue; +r = e-set(e, kvm, irq_source_id, level); +if (r 0) +continue; -ret = r + ((ret 0) ? 0 : ret); -} +ret = r + ((ret 0) ? 0 : ret); +} +
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. Yes, irq_rt is not accessed outside the RSCS. However, the entry pointers stored in the irq_rt-map are, and this is equally problematic afaict. In this particular case we seem to never delete entries at run-time once they are established. Therefore, while perhaps sloppy, its technically safe to leave them unprotected from this perspective. The issue is more related to shutdown since a kvm_set_irq() caller could be within the aforementioned race-region and call entry-set() after the guest is gone. Or did I miss something? Data is copied from irq_rt onto the stack and this copy is accessed outside critical section. As mentioned above, I do not believe this really protect us. And even if it did, the copy is just a work-around to avoid sleeping within the standard RCU RSCS, which is what SRCU is designed for. So rather than inventing an awkward two-phased stack based solution, it's better to reuse the provided tools, IMO. To flip it around: Is there any reason why an SRCU would not work here, and thus we were forced to use something like the stack-copy approach? Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
On 27.10.2009, at 09:56, Avi Kivity wrote: On 10/27/2009 01:21 AM, Olof Johansson wrote: On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote: For some reason, I'm not seeing this build break, but the patch is obviously correct. Acked-by: Hollis Blanchard holl...@us.ibm.com I saw it when building with pasemi_defconfig + manually enabled KVM options (all available). Alex, can you fold this patch in? I can, but it's only partly related. My patches don't even touch timing.c. The only thing I can imagine resulting in a breakage is that my patches allow for an =M setting. So IMHO this patch should be applied before my series. Should I stick it as first patch in my git tree? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: -cpu host AMD Host
I still see the problem with the kernel patch applied. /proc/cpuinfo attached Will post a formal bug report with all dumps in a few minutes. Thanks for the quick response. -- Marty -Original Message- From: Avi Kivity [mailto:a...@redhat.com] Sent: Tuesday, October 27, 2009 4:58 AM To: Martin Gallant Cc: kvm@vger.kernel.org Subject: Re: -cpu host AMD Host On 10/27/2009 11:39 AM, Avi Kivity wrote: On 10/26/2009 09:55 PM, Martin Gallant wrote: Is -cpu host supported on AMD hosts? Yes. Whenever I try to use this option on a Windows Vista/7 client, I get blue screen. Removing the option, the client works fine. Host kernel 2.6.31.4. Userspace is qemu-kvm-0.11.0. (Previous versions fail too) /proc/cpuinfo snippet: vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ Please post the flags : field as well. You might try the attached patch. -- error compiling committee.c: too many arguments to function processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ stepping: 2 cpu MHz : 2700.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch bogomips: 5429.84 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 100mhzsteps processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 107 model name : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+ stepping: 2 cpu MHz : 2700.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy 3dnowprefetch bogomips: 5429.84 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc 100mhzsteps
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. Here is a revised problem statement thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); entry_cache = get_entries(irq_rt); rcu_read_unlock(); invalidate_entries(irq_rt); for_each_entry(entry_cache) entry-set(); /* bad */ - invalidate_entries() may be any operation that deletes an entry at run-time (doesn't exist today), or as the guest is shutting down. As far as I can tell, the current code does not protect us from either condition, and my proposed patch protects us from both. Did I miss anything? HTH -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. A little is underestimation :) There is not /* bad */ line in the code! Yes, irq_rt is not accessed outside the RSCS. However, the entry pointers stored in the irq_rt-map are, and this is equally problematic afaict. The pointer is in text and can't disappear without kvm_set_irq() disappearing too. In this particular case we seem to never delete entries at run-time once they are established. Therefore, while perhaps sloppy, its technically safe to leave them unprotected from this perspective. The issue is more related to shutdown since a kvm_set_irq() caller could be within the aforementioned race-region and call entry-set() after the guest is gone. Or did I miss something? The caller of kvm_set_irq() should hold reference to kvm instance, so it can't disappear while you are inside kvm_set_irq(). RCU protects only kvm-irq_routing not kvm structure itself. Data is copied from irq_rt onto the stack and this copy is accessed outside critical section. As mentioned above, I do not believe this really protect us. And even I don't see the prove it doesn't, so I assume it does. if it did, the copy is just a work-around to avoid sleeping within the It is not a work-around. There was two solutions to the problem one is to call -set() outside rcu critical section another is to use SRCU. I decided to use the first one. This way the code is much simpler and I remember asking Paul what are the disadvantages of using SRCU and there was something. standard RCU RSCS, which is what SRCU is designed for. So rather than inventing an awkward two-phased stack based solution, it's better to reuse the provided tools, IMO. To flip it around: Is there any reason why an SRCU would not work here, and thus we were forced to use something like the stack-copy approach? If SRCU has no disadvantage comparing to RCU why not use it always? :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2887189 ] -cpu host Blue-Screens M$ Guests
Bugs item #2887189, was opened at 2009-10-27 09:04 Message generated for change (Tracker Item Submitted) made by martyg7 You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2887189group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: amd Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Martin Gallant (martyg7) Assigned to: Nobody/Anonymous (nobody) Summary: -cpu host Blue-Screens M$ Guests Initial Comment: On my 64-bit AMD host, whenever I start a Windows guest machine with the -cpu host flag set, the guest blue-screens during startup. I have reproduced this with 32-bit vista, 32-bit W7, and 64-bit W7 boot disks. Everything is fine with the -cpu host flag removed - I have been running this setup since the beginning of 2009. CPU - AMD 64 x2 (/proc/cpuinfo attached) Userspace - qemu-kvm-0.11.0 Kernel - v2.6.31.5 (amd64) Invocations - attached Problem persists with -no-kvm-irqchip or -no-kvm-pit switch With -no-kvm, invocation returns Unable to find x86 CPU definition I keep up-to-date git trees for the kernel and kvm userspace - Will gladly test patches -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2887189group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote: Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. Here is a revised problem statement thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); entry_cache = get_entries(irq_rt); rcu_read_unlock(); invalidate_entries(irq_rt); for_each_entry(entry_cache) entry-set(); /* bad */ - invalidate_entries() may be any operation that deletes an entry at run-time (doesn't exist today), or as the guest is shutting down. As far as I can tell, the current code does not protect us from either condition, and my proposed patch protects us from both. Did I miss anything? Yes. What happened to irq_rt is completely irrelevant at the point you marked /* bad */. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-net patches
Hello Michael, On Tue, 2009-10-27 at 08:43 +0200, Michael S. Tsirkin wrote: At some point my guest had a runaway nash-hotplug process consuming 100% CPU. Could you please verify this does not happen to you? What I have found that the start_xmit stopped and restarted too often. There is no vring descriptor available for adding the new buf. The buf release is not able to keep up after vhost patch? I saw lots of Unexpected full queue from dmesg. I haven't had time to debug it yet. You might have some idea here? /* This can happen with OOM and indirect buffers. */ if (unlikely(capacity 0)) { netif_stop_queue(dev); dev_warn(dev-dev, Unexpected full queue\n); if (unlikely(!vi-svq-vq_ops-enable_cb(vi-svq))) { vi-svq-vq_ops-disable_cb(vi-svq); netif_start_queue(dev); goto again; } return NETDEV_TX_BUSY; } Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Gleb Natapov wrote: On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. A little is underestimation :) There is not /* bad */ line in the code! Sorry, that was my own highlighting, not trying to reflect actual code. Yes, irq_rt is not accessed outside the RSCS. However, the entry pointers stored in the irq_rt-map are, and this is equally problematic afaict. The pointer is in text and can't disappear without kvm_set_irq() disappearing too. No, the entry* pointer is .text _AND_ .data, and is subject to standard synchronization rules like most other objects. Unless I am misreading the code, the entry* pointers point to heap within the irq_rt pointer. Therefore, the kfree(irq_rt) I mention above effectively invalidates the entire set of entry* pointers that you are caching, and is thus an issue. In this particular case we seem to never delete entries at run-time once they are established. Therefore, while perhaps sloppy, its technically safe to leave them unprotected from this perspective. Note: I was wrong in this statement. I forgot that it's not safe at run-time either since the entry objects are part of irq_rt. The issue is more related to shutdown since a kvm_set_irq() caller could be within the aforementioned race-region and call entry-set() after the guest is gone. Or did I miss something? The caller of kvm_set_irq() should hold reference to kvm instance, so it can't disappear while you are inside kvm_set_irq(). RCU protects only kvm-irq_routing not kvm structure itself. Agreed, but this has nothing to do with protecting the entry* pointers. Data is copied from irq_rt onto the stack and this copy is accessed outside critical section. As mentioned above, I do not believe this really protect us. And even I don't see the prove it doesn't, so I assume it does. What would you like to see beyond what I've already provided you? I can show how the entry pointers are allocated as part of the irq_rt, and I can show how the irq_rt (via entry-set) access is racy against invalidation. if it did, the copy is just a work-around to avoid sleeping within the It is not a work-around. There was two solutions to the problem one is to call -set() outside rcu critical section This is broken afaict without taking additional precautions, such as a reference count on the irq_rt structure, but I mentioned this alternate solution in my header. another is to use SRCU. I decided to use the first one. This way the code is much simpler simpler is debatable, but ok. SRCU is an established pattern available in the upstream kernel, so I don't think its particularly complicated or controversial to use. and I remember asking Paul what are the disadvantages of using SRCU and there was something. The disadvantages to my knowledge are as follows: 1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but we are talking about nanoseconds on modern hardware (I think Paul quoted me 10ns vs 45ns on his rig). I don't think either overhead is something to be concerned about in this case. 2) standard rcu supports deferred synchronization (call_rcu()), as well as barriers (synchronize_rcu()), whereas SRCU only supports barriers (synchronize_srcu()). We only use the barrier type in this code path, so that is not an issue. 3) SRCU requires explicit initialization/cleanup, whereas regular RCU does not. Trivially solved in my patch since KVM has plenty of init/cleanup hook points. standard RCU RSCS, which is what SRCU is designed for. So rather than inventing an awkward two-phased stack based solution, it's better to reuse the provided tools, IMO. To flip it around: Is there any reason why an SRCU would not work here, and thus we were forced to use something like the stack-copy approach? If SRCU has no disadvantage comparing to RCU why not use it always? :) No one is debating that
Re: vhost-net patches
On Tue, 2009-10-27 at 08:38 +0200, Michael S. Tsirkin wrote: Yes but you need to make host send packets out to tap as well, somehow. One way to do this is to assign IP address in a separate subnet to tap in host and to eth device in guest. Thanks for the hint, I will make a try. Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 04:02:37PM +0200, Gleb Natapov wrote: On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote: [ . . . ] standard RCU RSCS, which is what SRCU is designed for. So rather than inventing an awkward two-phased stack based solution, it's better to reuse the provided tools, IMO. To flip it around: Is there any reason why an SRCU would not work here, and thus we were forced to use something like the stack-copy approach? If SRCU has no disadvantage comparing to RCU why not use it always? :) The disadvantages of SRCU compared to RCU include the following: 1. SRCU requires that the return value of srcu_read_lock() be fed into srcu_read_unlock(). This is usually not a problem, but can be painful if there are multiple levels of function call separating the two. 2. SRCU's grace periods are about 4x slower than those of RCU. And they also don't scale all that well with extremely large numbers of CPUs (but this can be fixed when/if it becomes a real problem). 3. SRCU's read-side primitives are also significantly slower than those of RCU. 4. SRCU does not have a call_srcu(). One could be provided, but its semantics would be a bit strange due to the need to limit the number of callbacks, given that general blocking is permitted in SRCU read-side critical sections. (And it would take some doing to convince me to supply an SRCU!) 5. The current SRCU has no reasonable way to implement read-side priority boosting, as there is no record of which task is read-holding which SRCU. Hey, you asked! ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Gleb Natapov wrote: On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote: Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. Here is a revised problem statement thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); entry_cache = get_entries(irq_rt); rcu_read_unlock(); invalidate_entries(irq_rt); for_each_entry(entry_cache) entry-set(); /* bad */ - invalidate_entries() may be any operation that deletes an entry at run-time (doesn't exist today), or as the guest is shutting down. As far as I can tell, the current code does not protect us from either condition, and my proposed patch protects us from both. Did I miss anything? Yes. What happened to irq_rt is completely irrelevant at the point you marked /* bad */. kfree() happened to irq_rt, and thus to the objects behind the pointers in entry_cache at the point I marked /* bad */. That certainly isn't /* good */ ;) -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Thanks for this, Paul. Some questions and statements below. Paul E. McKenney wrote: On Tue, Oct 27, 2009 at 04:02:37PM +0200, Gleb Natapov wrote: On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote: [ . . . ] standard RCU RSCS, which is what SRCU is designed for. So rather than inventing an awkward two-phased stack based solution, it's better to reuse the provided tools, IMO. To flip it around: Is there any reason why an SRCU would not work here, and thus we were forced to use something like the stack-copy approach? If SRCU has no disadvantage comparing to RCU why not use it always? :) The disadvantages of SRCU compared to RCU include the following: 1.SRCU requires that the return value of srcu_read_lock() be fed into srcu_read_unlock(). This is usually not a problem, but can be painful if there are multiple levels of function call separating the two. Right, and this is simple/neat w.r.t. its usage in irq_routing, so no problem there. 2.SRCU's grace periods are about 4x slower than those of RCU. And they also don't scale all that well with extremely large numbers of CPUs (but this can be fixed when/if it becomes a real problem). The irq_routing update path is extremely infrequent, so this should not be an issue. 3.SRCU's read-side primitives are also significantly slower than those of RCU. Are the 10ns vs 45ns numbers that I mentioned in my last reply the proper ballpark? How do these compare to an atomic-op, say an uncontended spinlock on modern hardware? The assumption is that srcu_read_lock() should be significantly cheaper than a read-lock(). If its not, then we might as well use something else, I suppose. But if its not, I guess you probably wouldn't have bothered to invent it in the first place ;) 4.SRCU does not have a call_srcu(). One could be provided, but its semantics would be a bit strange due to the need to limit the number of callbacks, given that general blocking is permitted in SRCU read-side critical sections. (And it would take some doing to convince me to supply an SRCU!) This is not an issue in our design. 5.The current SRCU has no reasonable way to implement read-side priority boosting, as there is no record of which task is read-holding which SRCU. Given the infrequency of the update path, I do not see this as a problem. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: Virtio block module slower than IDE
Hi, On Tue, 27 Oct 2009 11:34:57 +0200, Avi Kivity a...@redhat.com wrote: On 10/27/2009 02:13 AM, Floris Bos wrote: I am running Proxmox 1.4 (which uses the 2.6.30.1 kvm modules) and am experiencing performance problems with Linux guests using the virtio_blk module. Especially with random IO it is a lot slower than IDE. Try switching the host I/O scheduler to deadline. You can do that at runtime in /sys/block/.../queue. Thanks for the tip. Seems to improve random IO performance by 11 times! == Version 1.03c --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP ubuntu910 16G 38776 97 61414 5 32451 7 34711 73 95325 9 7174 23 --Sequential Create-- Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 + +++ + +++ + +++ + +++ + +++ + +++ ubuntu910,16G,38776,97,61414,5,32451,7,34711,73,95325,9,7174.2,23,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++ == Yours sincerely, Floris Bos -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 10:50:45AM -0400, Gregory Haskins wrote: Gleb Natapov wrote: On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote: Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. Here is a revised problem statement thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); entry_cache = get_entries(irq_rt); rcu_read_unlock(); invalidate_entries(irq_rt); for_each_entry(entry_cache) entry-set(); /* bad */ - invalidate_entries() may be any operation that deletes an entry at run-time (doesn't exist today), or as the guest is shutting down. As far as I can tell, the current code does not protect us from either condition, and my proposed patch protects us from both. Did I miss anything? Yes. What happened to irq_rt is completely irrelevant at the point you marked /* bad */. kfree() happened to irq_rt, and thus to the objects behind the pointers in entry_cache at the point I marked /* bad */. The entire entry is cached not a pointer to an entry! kfree(). That certainly isn't /* good */ ;) It looks like we are looking at different code :) -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 0/3] ioctl fixes
See individual patches for details. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 1/3] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP. CC: sta...@kernel.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -2285,6 +2285,9 @@ long kvm_arch_vm_ioctl(struct file *filp goto out; break; case KVM_CREATE_IRQCHIP: + r = -EEXIST; + if (kvm-arch.vpic) + goto out; r = -ENOMEM; kvm-arch.vpic = kvm_create_pic(kvm); if (kvm-arch.vpic) { @@ -2300,6 +2303,8 @@ long kvm_arch_vm_ioctl(struct file *filp if (r) { kfree(kvm-arch.vpic); kfree(kvm-arch.vioapic); + kvm-arch.vpic = NULL; + kvm-arch.vioapic = NULL; goto out; } break; -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 2/3] KVM: x86: disallow KVM_{SET,GET}_LAPIC without in kernel irqchip
Otherwise kvm might attempt to dereference a NULL pointer. CC: sta...@kernel.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -1815,6 +1815,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi switch (ioctl) { case KVM_GET_LAPIC: { + r = -EINVAL; + if (!irqchip_in_kernel(vcpu-kvm)) + goto out; lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); r = -ENOMEM; @@ -1830,6 +1833,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi break; } case KVM_SET_LAPIC: { + r = -EINVAL; + if (!irqchip_in_kernel(vcpu-kvm)) + goto out; lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); r = -ENOMEM; if (!lapic) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch 3/3] KVM: only clear irq_source_id if irqchip is present
Otherwise kvm might attempt to dereference a NULL pointer. CC: sta...@kernel.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/virt/kvm/irq_comm.c === --- kvm.orig/virt/kvm/irq_comm.c +++ kvm/virt/kvm/irq_comm.c @@ -243,6 +243,10 @@ void kvm_free_irq_source_id(struct kvm * printk(KERN_ERR kvm: IRQ source ID out of range!\n); goto unlock; } + clear_bit(irq_source_id, kvm-arch.irq_sources_bitmap); + if (!irqchip_in_kernel(kvm)) + goto unlock; + for (i = 0; i KVM_IOAPIC_NUM_PINS; i++) { clear_bit(irq_source_id, kvm-arch.vioapic-irq_states[i]); if (i = 16) @@ -251,7 +255,6 @@ void kvm_free_irq_source_id(struct kvm * clear_bit(irq_source_id, pic_irqchip(kvm)-irq_states[i]); #endif } - clear_bit(irq_source_id, kvm-arch.irq_sources_bitmap); unlock: mutex_unlock(kvm-irq_lock); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] megaraid_sas HBA emulation
Hi all, this patchset implements an emulation for the megaraid_sas HBA. It provides emulates an LSI MegaRAID SAS 8708EM2 HBA, ie presenting to the guest a virtual SCSI adapter. Internally it is using aio for read/write requests and either SG_IO or SCSI command emulation for everything else. The reason for choosing the megaraid_sas HBA and not, say, implementing a virtio scsi interface is because: - the megaraid_sas is using a very simple firmware interface, comparable to virtio - the HBA driver are already existent, so I only have to write the backend :-) The device can be accessed by -drive if=raid,file=XXX In order to support SCSI command emulation I had to update / patch up the existing SCSI disk support. This might be not to everyones taste, so I'm open to alternative suggestions. But I certainly do _not_ want to update the SCSI disk emulation, as this is really quite tied to the SCSI parallel interface used by the old lsi53c895a.c. Plus it doesn't do scatter-gather list handling, which is quite impossible to fix without proper documentation. Of course, if anyone else would step in here, I won't object :-) It currently runs guests with 2.6.27 and up; Windows XP support is not quite there yet. Anything else might work; if not, enable debugging and sent me the logfile. As usual, comment / suggestions etc welcome. Cheers, Hannes -- Dr. Hannes Reinecke zSeries Storage h...@suse.de +49 911 74053 688 SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: Markus Rex, HRB 16746 (AG Nürnberg) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] Add 'raid' interface class
This patch adds a 'raid' interface class. It is basically a clone of the existing 'scsi' interface, only allowing up to 128 disks. Signed-off-by: Hannes Reinecke h...@suse.de --- hw/pc.c |5 + hw/pci-hotplug.c |1 + hw/scsi-disk.c | 17 + hw/scsi-disk.h | 20 +++- qemu-config.c|2 +- sysemu.h |3 ++- vl.c |8 ++-- 7 files changed, 51 insertions(+), 5 deletions(-) diff --git a/hw/pc.c b/hw/pc.c index 83012a9..26aad4c 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1345,6 +1345,11 @@ static void pc_init1(ram_addr_t ram_size, for (bus = 0; bus = max_bus; bus++) { pci_create_simple(pci_bus, -1, lsi53c895a); } + + max_bus = drive_get_max_bus(IF_RAID); + for (bus = 0; bus = max_bus; bus++) { + pci_create_simple(pci_bus, -1, megasas); + } } if (extboot_drive) { diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c index 410fa3f..855a1ad 100644 --- a/hw/pci-hotplug.c +++ b/hw/pci-hotplug.c @@ -85,6 +85,7 @@ void drive_hot_add(Monitor *mon, const QDict *qdict) switch (type) { case IF_SCSI: +case IF_RAID: if (pci_read_devaddr(mon, pci_addr, dom, pci_bus, slot)) { goto err; } diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 2a9268a..68b4e83 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -41,6 +41,7 @@ do { fprintf(stderr, scsi-disk: fmt , ## __VA_ARGS__); } while (0) #define SCSI_DMA_BUF_SIZE131072 #define SCSI_MAX_INQUIRY_LEN 256 +#define SCSI_SENSE_LEN 18 #define SCSI_REQ_STATUS_RETRY 0x01 @@ -136,6 +137,22 @@ static SCSIRequest *scsi_find_request(SCSIDiskState *s, uint32_t tag) return r; } +/* Helper function to build a sense block */ +int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense) +{ +memset(sense_buf, 0, SCSI_SENSE_LEN); +if (!sense) + return 0; + +sense_buf[0] = 0xf0; /* current, fixed format */ +sense_buf[2] = (sense 16) 0x0F; +sense_buf[7] = 10; +sense_buf[12] = (sense 8 ) 0xFF; +sense_buf[13] = sense 0xFF; + +return SCSI_SENSE_LEN; +} + /* Helper function for command completion. */ static void scsi_command_complete(SCSIRequest *r, int status, int sense) { diff --git a/hw/scsi-disk.h b/hw/scsi-disk.h index b6b6c12..5b54272 100644 --- a/hw/scsi-disk.h +++ b/hw/scsi-disk.h @@ -9,6 +9,23 @@ enum scsi_reason { SCSI_REASON_DATA /* Transfer complete, more data required. */ }; +/* LUN not ready, Manual intervention required */ +#define SENSE_LUN_NOT_READY 0x020403 +/* Hardware error, I/O process terminated */ +#define SENSE_IO_ERROR 0x040006 +/* Hardware error, I_T Nexus loss occured */ +#define SENSE_TAG_NOT_FOUND 0x042907 +/* Hardware error, internal target failure */ +#define SENSE_TARGET_FAILURE 0x044400 +/* Illegal request, invalid command operation code */ +#define SENSE_INVALID_OPCODE 0x052000 +/* Illegal request, LBA out of range */ +#define SENSE_LBA_OUT_OF_RANGE 0x052100 +/* Illegal request, Invalid field in CDB */ +#define SENSE_INVALID_FIELD 0x052400 +/* Illegal request, LUN not supported */ +#define SENSE_LUN_NOT_SUPPORTED 0x052500 + typedef struct SCSIBus SCSIBus; typedef struct SCSIDevice SCSIDevice; typedef struct SCSIDeviceInfo SCSIDeviceInfo; @@ -49,7 +66,7 @@ struct SCSIBus { int tcq, ndev; scsi_completionfn complete; -SCSIDevice *devs[8]; +SCSIDevice *devs[128]; }; void scsi_bus_new(SCSIBus *bus, DeviceState *host, int tcq, int ndev, @@ -63,5 +80,6 @@ static inline SCSIBus *scsi_bus_from_device(SCSIDevice *d) SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, DriveInfo *dinfo, int unit); void scsi_bus_legacy_handle_cmdline(SCSIBus *bus); +int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense); #endif diff --git a/qemu-config.c b/qemu-config.c index 4fb7898..8d7a137 100644 --- a/qemu-config.c +++ b/qemu-config.c @@ -18,7 +18,7 @@ QemuOptsList qemu_drive_opts = { },{ .name = if, .type = QEMU_OPT_STRING, -.help = interface (ide, scsi, sd, mtd, floppy, pflash, virtio), +.help = interface (ide, scsi, raid, sd, mtd, floppy, pflash, virtio), },{ .name = index, .type = QEMU_OPT_NUMBER, diff --git a/sysemu.h b/sysemu.h index 2ef3797..8ed0b8c 100644 --- a/sysemu.h +++ b/sysemu.h @@ -159,7 +159,7 @@ extern unsigned int nb_prom_envs; typedef enum { IF_NONE, -IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN, +IF_IDE, IF_SCSI, IF_RAID, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN, IF_COUNT } BlockInterfaceType; @@ -185,6 +185,7 @@ typedef struct DriveInfo { #define MAX_IDE_DEVS 2 #define MAX_SCSI_DEVS 7 +#define MAX_RAID_DEVS 128 #define MAX_DRIVES 32 extern QTAILQ_HEAD(drivelist, DriveInfo) drives; diff --git a/vl.c b/vl.c index 5dc7b2b..404afc3 100644 --- a/vl.c +++ b/vl.c @@ -2065,6 +2065,9 @@
[PATCH 2/4] megasas: LSI MegaRAID SAS HBA emulation
This patch add an emulation for the LSI MegaRAID SAS HBA. It is using SG_IO to forward / pass through SCSI commands to the underlying block driver, so no emulation is done currently. Signed-off-by: Hannes Reinecke h...@suse.de --- Makefile.hw |2 +- hw/megasas.c | 1134 ++ hw/pci_ids.h |2 + 3 files changed, 1137 insertions(+), 1 deletions(-) create mode 100644 hw/megasas.c diff --git a/Makefile.hw b/Makefile.hw index de1db31..cae35f9 100644 --- a/Makefile.hw +++ b/Makefile.hw @@ -33,7 +33,7 @@ obj-y += wdt_i6300esb.o obj-y += ne2000.o # SCSI layer -obj-y += lsi53c895a.o +obj-y += lsi53c895a.o megasas.o obj-$(CONFIG_ESP) += esp.o obj-y += dma-helpers.o sysbus.o isa-bus.o diff --git a/hw/megasas.c b/hw/megasas.c new file mode 100644 index 000..a57e8e0 --- /dev/null +++ b/hw/megasas.c @@ -0,0 +1,1134 @@ +/* + * QEMU MegaRAID SAS 8708EM2 Host Bus Adapter emulation + * + * Copyright (c) 2009 Hannes Reinecke, SUSE Linux Products GmbH + * + * This code is licenced under the LGPL. + */ + + +#include assert.h + +#include hw.h +#include pci.h +#include scsi.h +#include scsi-disk.h +#include block_int.h +#ifdef __linux__ +# include scsi/sg.h +#endif + +#undef DEBUG_MEGASAS +#undef DEBUG_MEGASAS_REG +#undef DEBUG_MEGASAS_QUEUE +#undef DEBUG_MEGASAS_MFI + +#ifdef DEBUG_MEGASAS +#define DPRINTF(fmt, ...) \ +do { printf(megasas: fmt , ## __VA_ARGS__); } while (0) +#define BADF(fmt, ...) \ +do { fprintf(stderr, megasas: error: fmt , ## __VA_ARGS__); exit(1);} while (0) +#else +#define DPRINTF(fmt, ...) do {} while(0) +#define BADF(fmt, ...) \ +do { fprintf(stderr, megasas: error: fmt , ## __VA_ARGS__);} while (0) +#endif + +/* Static definitions */ +#define MEGASAS_MAX_FRAMES 64 +#define MEGASAS_MAX_SGE 8 + +/* SCSI definitions */ +#define SAM_STAT_GOOD0x00 +#define SAM_STAT_CHECK_CONDITION 0x02 +#define SAM_STAT_CONDITION_MET 0x04 +#define SAM_STAT_BUSY0x08 +#define SAM_STAT_TASK_ABORTED0x40 + +/* Register definitions */ +#defineMEGASAS_INBOUND_MSG_0 0x0010 +#defineMEGASAS_INBOUND_MSG_1 0x0014 +#defineMEGASAS_OUTBOUND_MSG_0 0x0018 +#defineMEGASAS_OUTBOUND_MSG_1 0x001C +#defineMEGASAS_INBOUND_DOORBELL0x0020 +#defineMEGASAS_INBOUND_INTR_STATUS 0x0024 +#defineMEGASAS_INBOUND_INTR_MASK 0x0028 +#defineMEGASAS_OUTBOUND_DOORBELL 0x002C +#defineMEGASAS_OUTBOUND_INTR_STATUS0x0030 +#defineMEGASAS_OUTBOUND_INTR_MASK 0x0034 +#defineMEGASAS_INBOUND_QUEUE_PORT 0x0040 +#defineMEGASAS_OUTBOUND_QUEUE_PORT 0x0044 +#defineMEGASAS_OUTBOUND_DOORBELL_CLEAR 0x00A0 +#defineMEGASAS_OUTBOUND_SCRATCH_PAD0x00B0 +#defineMEGASAS_INBOUND_LOW_QUEUE_PORT 0x00C0 +#defineMEGASAS_INBOUND_HIGH_QUEUE_PORT 0x00C4 + +/* FW commands */ +#define MFI_INIT_ABORT 0x0001 +#define MFI_INIT_READY 0x0002 +#define MFI_INIT_MFIMODE 0x0004 +#define MFI_INIT_CLEAR_HANDSHAKE 0x0008 +#define MFI_INIT_HOTPLUG 0x0010 +#define MFI_STOP_ADP 0x0020 + +/* MFI states */ +#define MFI_STATE_UNDEFINED0x0 +#define MFI_STATE_BB_INIT 0x1 +#define MFI_STATE_FW_INIT 0x4 +#define MFI_STATE_WAIT_HANDSHAKE 0x6 +#define MFI_STATE_FW_INIT_20x7 +#define MFI_STATE_DEVICE_SCAN 0x8 +#define MFI_STATE_BOOT_MESSAGE_PENDING 0x9 +#define MFI_STATE_FLUSH_CACHE 0xA +#define MFI_STATE_READY0xB +#define MFI_STATE_OPERATIONAL 0xC +#define MFI_STATE_FAULT0xF + +/* + * MFI command opcodes + */ +#define MFI_CMD_INIT 0x00 +#define MFI_CMD_LD_READ0x01 +#define MFI_CMD_LD_WRITE 0x02 +#define MFI_CMD_LD_SCSI_IO 0x03 +#define MFI_CMD_PD_SCSI_IO 0x04 +#define MFI_CMD_DCMD 0x05 +#define MFI_CMD_ABORT 0x06 +#define MFI_CMD_SMP0x07 +#define MFI_CMD_STP0x08 + +#define MR_DCMD_CTRL_GET_INFO 0x0101 + +#define MR_DCMD_CTRL_CACHE_FLUSH 0x01101000 +#define MR_FLUSH_CTRL_CACHE0x01 +#define MR_FLUSH_DISK_CACHE0x02 + +#define MR_DCMD_CTRL_SHUTDOWN 0x0105 +#define MR_DCMD_HIBERNATE_SHUTDOWN 0x0106 +#define MR_ENABLE_DRIVE_SPINDOWN 0x01 + +#define
[PATCH 3/4] scsi-disk: Factor out SCSI command emulation
Other drives might want to use SCSI command emulation without going through the SCSI disk abstraction, as this imposes too many limits on the emulation. Signed-off-by: Hannes Reinecke h...@suse.de --- block.c| 15 ++ block.h|3 + block_int.h|1 + hw/scsi-disk.c | 610 ++-- hw/scsi-disk.h |3 + 5 files changed, 346 insertions(+), 286 deletions(-) diff --git a/block.c b/block.c index 33f3d65..06f92c4 100644 --- a/block.c +++ b/block.c @@ -930,6 +930,21 @@ int bdrv_is_sg(BlockDriverState *bs) return bs-sg; } +void bdrv_set_sg(BlockDriverState *bs, int set) +{ +bs-sg = set; +} + +int bdrv_get_tcq(BlockDriverState *bs) +{ +return bs-tcq; +} + +void bdrv_set_tcq(BlockDriverState *bs, int set) +{ +bs-tcq = set; +} + int bdrv_enable_write_cache(BlockDriverState *bs) { return bs-enable_write_cache; diff --git a/block.h b/block.h index a966afb..7862fa0 100644 --- a/block.h +++ b/block.h @@ -134,9 +134,12 @@ void bdrv_get_geometry_hint(BlockDriverState *bs, int *pcyls, int *pheads, int *psecs); int bdrv_get_type_hint(BlockDriverState *bs); int bdrv_get_translation_hint(BlockDriverState *bs); +int bdrv_get_tcq(BlockDriverState *bs); +void bdrv_set_tcq(BlockDriverState *bs, int set); int bdrv_is_removable(BlockDriverState *bs); int bdrv_is_read_only(BlockDriverState *bs); int bdrv_is_sg(BlockDriverState *bs); +void bdrv_set_sg(BlockDriverState *bs, int set); int bdrv_enable_write_cache(BlockDriverState *bs); int bdrv_is_inserted(BlockDriverState *bs); int bdrv_media_changed(BlockDriverState *bs); diff --git a/block_int.h b/block_int.h index 8e72abe..e5ee57b 100644 --- a/block_int.h +++ b/block_int.h @@ -129,6 +129,7 @@ struct BlockDriverState { int encrypted; /* if true, the media is encrypted */ int valid_key; /* if true, a valid encryption key has been set */ int sg;/* if true, the device is a /dev/sg* */ +int tcq; /* if true, the device supports tagged command queueing */ /* event callback when inserting/removing */ void (*change_cb)(void *opaque); void *change_opaque; diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 68b4e83..3e68518 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -56,7 +56,7 @@ typedef struct SCSIRequest { /* Both sector and sector_count are in terms of qemu 512 byte blocks. */ uint64_t sector; uint32_t sector_count; -struct iovec iov; +struct iovec *iov; QEMUIOVector qiov; BlockDriverAIOCB *aiocb; struct SCSIRequest *next; @@ -72,7 +72,8 @@ struct SCSIDiskState This is the number of 512 byte blocks in a single scsi sector. */ int cluster_size; uint64_t max_lba; -int sense; +uint8_t sense[SCSI_SENSE_LEN]; +uint8_t sense_len; char drive_serial_str[21]; QEMUBH *bh; }; @@ -90,13 +91,12 @@ static SCSIRequest *scsi_new_request(SCSIDevice *d, uint32_t tag) free_requests = r-next; } else { r = qemu_malloc(sizeof(SCSIRequest)); -r-iov.iov_base = qemu_memalign(512, SCSI_DMA_BUF_SIZE); + r-iov = NULL; } r-bus = scsi_bus_from_device(d); r-dev = s; r-tag = tag; r-sector_count = 0; -r-iov.iov_len = 0; r-aiocb = NULL; r-status = 0; @@ -126,6 +126,17 @@ static void scsi_remove_request(SCSIRequest *r) free_requests = r; } +static void *scsi_allocate_iovec(SCSIRequest *r) { +if (!r-iov) { + r-iov = qemu_malloc(sizeof(struct iovec)); + if (!r-iov) + return NULL; + r-iov-iov_base = qemu_memalign(512, SCSI_DMA_BUF_SIZE); + r-iov-iov_len = SCSI_DMA_BUF_SIZE; +} +return r-iov; +} + static SCSIRequest *scsi_find_request(SCSIDiskState *s, uint32_t tag) { SCSIRequest *r; @@ -137,12 +148,11 @@ static SCSIRequest *scsi_find_request(SCSIDiskState *s, uint32_t tag) return r; } -/* Helper function to build a sense block */ int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense) { memset(sense_buf, 0, SCSI_SENSE_LEN); if (!sense) - return 0; + return 0; sense_buf[0] = 0xf0; /* current, fixed format */ sense_buf[2] = (sense 16) 0x0F; @@ -154,15 +164,19 @@ int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense) } /* Helper function for command completion. */ -static void scsi_command_complete(SCSIRequest *r, int status, int sense) +static void scsi_command_complete(SCSIRequest *r, int status, uint32_t sense) { SCSIDiskState *s = r-dev; uint32_t tag; -DPRINTF(Command complete tag=0x%x status=%d sense=%d\n, r-tag, status, sense); -s-sense = sense; + +DPRINTF(Command complete tag=0x%x status=%d sense=%d\n, r-tag, + status, s-sense_len); +if (status == STATUS_CHECK_CONDITION) { + s-sense_len = scsi_build_sense(s-sense, sense); +} tag = r-tag; scsi_remove_request(r); -r-bus-complete(r-bus,
[PATCH 4/4] megasas: Add SCSI command emulation
Now that we can use SCSI command emulation without using the SCSI disk abstraction we can easily add it to the megasas HBA. Signed-off-by: Hannes Reinecke h...@suse.de --- hw/megasas.c | 88 +++--- 1 files changed, 53 insertions(+), 35 deletions(-) diff --git a/hw/megasas.c b/hw/megasas.c index a57e8e0..f32b313 100644 --- a/hw/megasas.c +++ b/hw/megasas.c @@ -661,40 +661,55 @@ static int megasas_handle_scsi(MPTState *s, uint8_t fcmd, } } -memset(cmd-hdr, 0, sizeof(struct sg_io_hdr)); -cmd-hdr.interface_id = 'S'; -cmd-hdr.cmd_len = cdb_len; -cmd-hdr.cmdp = cdb; -cmd-hdr.iovec_count = cmd-sge_count; -cmd-hdr.dxferp = cmd-iov; -for (n = 0; n cmd-sge_count; n++) - cmd-hdr.dxfer_len += cmd-iov[n].iov_len; -if (cmd-sge_count) { - if (dir) - cmd-hdr.dxfer_direction = SG_DXFER_TO_DEV; - else - cmd-hdr.dxfer_direction = SG_DXFER_FROM_DEV; -} else { - cmd-hdr.dxfer_direction = SG_DXFER_NONE; -} -cmd-hdr.sbp = cmd-sense; -cmd-hdr.mx_sb_len = cmd-sense_len; +if (bdrv_is_sg(cmd-lun-bdrv)) { + memset(cmd-hdr, 0, sizeof(struct sg_io_hdr)); + cmd-hdr.interface_id = 'S'; + cmd-hdr.cmd_len = cdb_len; + cmd-hdr.cmdp = cdb; + cmd-hdr.iovec_count = cmd-sge_count; + cmd-hdr.dxferp = cmd-iov; + for (n = 0; n cmd-sge_count; n++) + cmd-hdr.dxfer_len += cmd-iov[n].iov_len; + if (cmd-sge_count) { + if (dir) + cmd-hdr.dxfer_direction = SG_DXFER_TO_DEV; + else + cmd-hdr.dxfer_direction = SG_DXFER_FROM_DEV; + } else { + cmd-hdr.dxfer_direction = SG_DXFER_NONE; + } + cmd-hdr.sbp = cmd-sense; + cmd-hdr.mx_sb_len = cmd-sense_len; -ret = bdrv_ioctl(cmd-lun-bdrv, SG_IO, cmd-hdr); -if (ret) { - DPRINTF(SCSI pthru dev %x lun %x failed with %d\n, - target, lun, errno); - sense_len = scsi_build_sense(cmd-sense, SENSE_IO_ERROR); - cmd-sge_size = 0; - scsi_status = SAM_STAT_CHECK_CONDITION; -} else if (cmd-hdr.status) { - sense_len = cmd-hdr.sb_len_wr; - scsi_status = cmd-hdr.status; - cmd-sge_size = cmd-hdr.dxfer_len; - scsi_status = SAM_STAT_CHECK_CONDITION; + ret = bdrv_ioctl(cmd-lun-bdrv, SG_IO, cmd-hdr); + if (ret) { + DPRINTF(SCSI pthru dev %x lun %x failed with %d\n, + target, lun, errno); + sense_len = scsi_build_sense(cmd-sense, SENSE_IO_ERROR); + cmd-sge_size = 0; + scsi_status = SAM_STAT_CHECK_CONDITION; + } else if (cmd-hdr.status) { + sense_len = cmd-hdr.sb_len_wr; + scsi_status = cmd-hdr.status; + cmd-sge_size = cmd-hdr.dxfer_len; + scsi_status = SAM_STAT_CHECK_CONDITION; + } else { + sense_len = 0; + cmd-sge_size = cmd-hdr.dxfer_len; + } } else { - sense_len = 0; - cmd-sge_size = cmd-hdr.dxfer_len; + uint32_t sense; + + DPRINTF(Emulate SCSI pthru cmd %x\n, cdb[0]); + sense = scsi_emulate_command(cmd-lun-bdrv, 0, cdb, +cmd-iov[0].iov_len, +cmd-iov[0].iov_base, +cmd-sge_size); + sense_len = scsi_build_sense(cmd-sense, sense); + if (sense_len) + scsi_status = SAM_STAT_CHECK_CONDITION; + else + scsi_status = SAM_STAT_GOOD; } out: megasas_unmap_sense(cmd, sense_len); @@ -1105,13 +1120,16 @@ static int megasas_scsi_init(PCIDevice *dev) lun-bdrv = NULL; continue; } + bdrv_set_tcq(lun-bdrv, 1); /* check if we can use SG_IO */ ret = bdrv_ioctl(lun-bdrv, SG_IO, hdr); if (ret) { - DPRINTF(SCSI cmd passthrough not available on dev %d (error %d)\n, + DPRINTF(Using SCSI cmd emulation on dev %d (error %d)\n, unit, ret); - lun-sdev = NULL; - lun-bdrv = NULL; + bdrv_set_sg(lun-bdrv, 0); + } else { + DPRINTF(Using SCSI cmd passthrough on dev %d\n, unit); + bdrv_set_sg(lun-bdrv, 1); } } register_savevm(megasas, -1, 0, megasas_scsi_save, megasas_scsi_load, s); -- 1.6.0.2 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: vhost-net patches
On Tue, Oct 27, 2009 at 07:46:59AM -0700, Shirley Ma wrote: Hello Michael, On Tue, 2009-10-27 at 08:43 +0200, Michael S. Tsirkin wrote: At some point my guest had a runaway nash-hotplug process consuming 100% CPU. Could you please verify this does not happen to you? What I have found that the start_xmit stopped and restarted too often. There is no vring descriptor available for adding the new buf. The buf release is not able to keep up after vhost patch? Possibly GFP_ATOMIC allocations in vring_add_indirect are failing? Is there a chance you are tight on guest memory for some reason? with vhost, virtio does currently consume a bit more memory than with userspace backend. I saw lots of Unexpected full queue from dmesg. I haven't had time to debug it yet. You might have some idea here? /* This can happen with OOM and indirect buffers. */ if (unlikely(capacity 0)) { netif_stop_queue(dev); dev_warn(dev-dev, Unexpected full queue\n); if (unlikely(!vi-svq-vq_ops-enable_cb(vi-svq))) { vi-svq-vq_ops-disable_cb(vi-svq); netif_start_queue(dev); goto again; } return NETDEV_TX_BUSY; } Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 10:47:49AM -0400, Gregory Haskins wrote: Gleb Natapov wrote: On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. A little is underestimation :) There is not /* bad */ line in the code! Sorry, that was my own highlighting, not trying to reflect actual code. Yes, irq_rt is not accessed outside the RSCS. However, the entry pointers stored in the irq_rt-map are, and this is equally problematic afaict. The pointer is in text and can't disappear without kvm_set_irq() disappearing too. No, the entry* pointer is .text _AND_ .data, and is subject to standard synchronization rules like most other objects. Unless I am misreading the code, the entry* pointers point to heap within the irq_rt pointer. Therefore, the kfree(irq_rt) I mention above effectively invalidates the entire set of entry* pointers that you are caching, and is thus an issue. I think you are missing that the content of the entry is copied, not pointer to the entry: irq_set[i++] = *e; In this particular case we seem to never delete entries at run-time once they are established. Therefore, while perhaps sloppy, its technically safe to leave them unprotected from this perspective. Note: I was wrong in this statement. I forgot that it's not safe at run-time either since the entry objects are part of irq_rt. The issue is more related to shutdown since a kvm_set_irq() caller could be within the aforementioned race-region and call entry-set() after the guest is gone. Or did I miss something? The caller of kvm_set_irq() should hold reference to kvm instance, so it can't disappear while you are inside kvm_set_irq(). RCU protects only kvm-irq_routing not kvm structure itself. Agreed, but this has nothing to do with protecting the entry* pointers. There are not used outside critical section. Data is copied from irq_rt onto the stack and this copy is accessed outside critical section. As mentioned above, I do not believe this really protect us. And even I don't see the prove it doesn't, so I assume it does. What would you like to see beyond what I've already provided you? I can show how the entry pointers are allocated as part of the irq_rt, and I can show how the irq_rt (via entry-set) access is racy against invalidation. if it did, the copy is just a work-around to avoid sleeping within the It is not a work-around. There was two solutions to the problem one is to call -set() outside rcu critical section This is broken afaict without taking additional precautions, such as a reference count on the irq_rt structure, but I mentioned this alternate solution in my header. another is to use SRCU. I decided to use the first one. This way the code is much simpler simpler is debatable, but ok. SRCU is an established pattern available in the upstream kernel, so I don't think its particularly complicated or controversial to use. and I remember asking Paul what are the disadvantages of using SRCU and there was something. The disadvantages to my knowledge are as follows: 1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but we are talking about nanoseconds on modern hardware (I think Paul quoted me 10ns vs 45ns on his rig). I don't think either overhead is something to be concerned about in this case. If we can avoid why not? Nanoseconds tend to add up. 2) standard rcu supports deferred synchronization (call_rcu()), as well as barriers (synchronize_rcu()), whereas SRCU only supports barriers (synchronize_srcu()). We only use the barrier type in this code path, so that is not an issue. Agree. 3) SRCU requires explicit initialization/cleanup, whereas regular RCU does not. Trivially solved in my patch since KVM has plenty of init/cleanup hook points. No problem
[PATCH v2] fix qemu-kvm sigsegv at exit
Michael reported a qemu-kvm SIGSEGV at shutdown: Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 0x411d0940 (LWP 14446)] 0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0, expire_time=62275467335) at /home/mst/scm/qemu-kvm/vl.c:1009 1009if ((alarm_timer-flags ALARM_FLAG_EXPIRED) == 0) { (gdb) l 1004ts-next = *pt; 1005*pt = ts; 1006 1007/* Rearm if necessary */ 1008if (pt == active_timers[ts-clock-type]) { 1009if ((alarm_timer-flags ALARM_FLAG_EXPIRED) == 0) { 1010qemu_rearm_alarm_timer(alarm_timer); 1011} 1012/* Interrupt execution to force deadline recalculation. */ 1013if (use_icount) (gdb) p alarm_timer $1 = (struct qemu_alarm_timer *) 0x0 Problem is kvm_main_loop_wait(env, 0) can process the stop request (signalling iothread that vcpu is stopped, so its OK to exit) and continue to kvm_cpu_exec. Reorder kvm_main_loop_wait and kvm_cpu_exec, as suggested by Gleb. Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Reported-by: Michael S. Tsirkin m...@redhat.com diff --git a/qemu-kvm.c b/qemu-kvm.c index 4c13628..809fd65 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -1867,8 +1867,8 @@ static int kvm_main_loop_cpu(CPUState *env) run_cpu = !env-halted; } if (run_cpu) { -kvm_main_loop_wait(env, 0); kvm_cpu_exec(env); +kvm_main_loop_wait(env, 0); } else { kvm_main_loop_wait(env, 1000); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Gleb Natapov wrote: On Tue, Oct 27, 2009 at 10:50:45AM -0400, Gregory Haskins wrote: Gleb Natapov wrote: On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote: Gregory Haskins wrote: Gleb Natapov wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ This is not what happens. irq_rt is never accessed outside read-side critical section. Sorry, I was generalizing to keep the comments short. I figured it would be clear what I was actually saying, but realize in retrospect that I was a little ambiguous. Here is a revised problem statement thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); entry_cache = get_entries(irq_rt); rcu_read_unlock(); invalidate_entries(irq_rt); for_each_entry(entry_cache) entry-set(); /* bad */ - invalidate_entries() may be any operation that deletes an entry at run-time (doesn't exist today), or as the guest is shutting down. As far as I can tell, the current code does not protect us from either condition, and my proposed patch protects us from both. Did I miss anything? Yes. What happened to irq_rt is completely irrelevant at the point you marked /* bad */. kfree() happened to irq_rt, and thus to the objects behind the pointers in entry_cache at the point I marked /* bad */. The entire entry is cached not a pointer to an entry! kfree(). light bulb goes off Ah, I see. I missed that detail that it was a structure copy, not a pointer copy. My bad. You are right, and I am wrong. I retract the 1/3 patch. Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
On 10/27/2009 03:42 PM, Alexander Graf wrote: I can, but it's only partly related. My patches don't even touch timing.c. The only thing I can imagine resulting in a breakage is that my patches allow for an =M setting. So IMHO this patch should be applied before my series. Should I stick it as first patch in my git tree? No need, I'll apply it independently. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/5]: Fix kdump under KVM
This patch series aims to get kdump working inside a KVM guest. The current problem with using kdump is that KVM always delivers PIT interrupts to the BSP, and the BSP only. While this is technically allowed by the MPS spec, most motherboards actually deliver timer interrupts to *any* LAPIC in virtual wire mode. Since a crash can occur on any CPU, timer interrupts must be able to reach any CPU in order for kdump to work properly. Therefore, this patch series kicks all of the relevant vCPUs when delivering a timer interrupt. With these patches in place, kdump in a RHEL-5 guest works properly. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] Fix up some comments around the source tree.
Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 34b700f... ba61f27... M arch/x86/kvm/svm.c :100644 100644 38a2d20... cd6f92b... M virt/kvm/ioapic.c :100644 100644 bd44fb4... c22bc17... M virt/kvm/kvm_main.c arch/x86/kvm/svm.c |2 +- virt/kvm/ioapic.c |2 +- virt/kvm/kvm_main.c |2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 34b700f..ba61f27 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -2608,7 +2608,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm) break; case SVM_EXITINTINFO_TYPE_EXEPT: /* In case of software exception do not reinject an exception - vector, but re-execute and instruction instead */ + vector, but re-execute an instruction instead */ if (is_nested(svm)) break; if (kvm_exception_is_soft(vector)) diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index 38a2d20..cd6f92b 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -164,7 +164,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) irqe.shorthand = 0; #ifdef CONFIG_X86 - /* Always delivery PIT interrupt to vcpu 0 */ + /* Always deliver PIT interrupt to vcpu 0 */ if (irq == 0) { irqe.dest_mode = 0; /* Physical mode. */ /* need to read apic_id from apic regiest since diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index bd44fb4..c22bc17 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu) } /* - * Creates some virtual cpus. Good luck creating more than one. + * Creates a virtual cpu. */ static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id) { -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] Remove references to VCPU in i8254
Conceptually, the i8254 is hooked to a PIC or IOAPIC. Therefore, this patch removes most references to vcpu in i8254.c. There are two exceptions to this: 1) In pit_timer_fn, we still have to kick the BSP to wake it out of idle. This will be changed in a later patch. 2) In __kvm_migrate_pit_timer, we have to migrate the PIT around with the BSP, since hrtimers work on a per-CPU basis. I've added a comment here to clarify why this is needed. Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 fab7440... d5c08fa... M arch/x86/kvm/i8254.c :100644 100644 d4c1c7f... d7bc40b... M arch/x86/kvm/i8254.h :100644 100644 96dfbb6... ab3a56e... M arch/x86/kvm/irq.c :100644 100644 c025a23... e16b968... M arch/x86/kvm/irq.h :100644 100644 55c7524... ba39e25... M arch/x86/kvm/kvm_timer.h arch/x86/kvm/i8254.c | 50 ++--- arch/x86/kvm/i8254.h |4 ++- arch/x86/kvm/irq.c |4 +- arch/x86/kvm/irq.h |2 - arch/x86/kvm/kvm_timer.h |3 ++ 5 files changed, 50 insertions(+), 13 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index fab7440..d5c08fa 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -33,6 +33,7 @@ #include irq.h #include i8254.h +#include kvm_timer.h #ifndef CONFIG_X86_64 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y)) @@ -227,12 +228,13 @@ static void pit_latch_status(struct kvm *kvm, int channel) } } -int pit_has_pending_timer(struct kvm_vcpu *vcpu) +int pit_has_pending_timer(struct kvm *kvm) { - struct kvm_pit *pit = vcpu-kvm-arch.vpit; + struct kvm_pit *pit = kvm-arch.vpit; - if (pit kvm_vcpu_is_bsp(vcpu) pit-pit_state.irq_ack) + if (pit pit-pit_state.irq_ack) return atomic_read(pit-pit_state.pit_timer.pending); + return 0; } @@ -252,6 +254,13 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu) struct kvm_pit *pit = vcpu-kvm-arch.vpit; struct hrtimer *timer; + /* +* technically, the PIT isn't hooked to a particular VCPU; +* the logical structure is PIT - [IOA]PIC - CPU[s]. However, +* hrtimers expire on a per-cpu basis, and since we initially +* created the hrtimer during BSP creation, we move it around +* with the BSP. +*/ if (!kvm_vcpu_is_bsp(vcpu) || !pit) return; @@ -277,6 +286,33 @@ static struct kvm_timer_ops kpit_ops = { .is_periodic = kpit_is_periodic, }; +static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) +{ + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + int restart_timer = 0; + + /* +* There is a race window between reading and incrementing, but we do +* not care about potentially losing timer events in the !reinject +* case anyway. +*/ + if (ktimer-reinject || !atomic_read(ktimer-pending)) + atomic_inc(ktimer-pending); + + if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq)) + wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + restart_timer = 1; + } + + if (restart_timer) + return HRTIMER_RESTART; + else + return HRTIMER_NORESTART; +} + static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) { struct kvm_timer *pt = ps-pit_timer; @@ -291,10 +327,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period) pt-period = interval; ps-is_periodic = is_period; - pt-timer.function = kvm_timer_fn; + pt-timer.function = pit_timer_fn; pt-t_ops = kpit_ops; pt-kvm = ps-pit-kvm; - pt-vcpu = pt-kvm-bsp_vcpu; atomic_set(pt-pending, 0); ps-irq_ack = 1; @@ -705,10 +740,9 @@ static void __inject_pit_timer_intr(struct kvm *kvm) kvm_apic_nmi_wd_deliver(vcpu); } -void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu) +void kvm_inject_pit_timer_irqs(struct kvm *kvm) { - struct kvm_pit *pit = vcpu-kvm-arch.vpit; - struct kvm *kvm = vcpu-kvm; + struct kvm_pit *pit = kvm-arch.vpit; struct kvm_kpit_state *ps; if (pit) { diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h index d4c1c7f..d7bc40b 100644 --- a/arch/x86/kvm/i8254.h +++ b/arch/x86/kvm/i8254.h @@ -49,10 +49,12 @@ struct kvm_pit { #define KVM_MAX_PIT_INTR_INTERVAL HZ / 100 #define KVM_PIT_CHANNEL_MASK 0x3 -void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu); +int pit_has_pending_timer(struct kvm *kvm); +void kvm_inject_pit_timer_irqs(struct kvm *kvm); void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int hpet_legacy_start); struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags); void kvm_free_pit(struct kvm *kvm);
[PATCH 5/5] Fix kdump under KVM.
This patch is the main point of the series. In order for kdump to properly work inside a KVM guest, we need to make sure that all VCPUs in virtual wire APIC mode get kicked to try and pick up the timer interrupts. To do this, we iterate over the CPUs and deliver interrupts to the proper VCPUs. I don't love the concept of doing kvm_irq_kick_vcpus() from within pit_timer_fn(). A PIT is not connected to a CPU at all, only to a PIC or APIC. However, if a CPU enters idle, this is the only way to wake it up to check for the interrupt. Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 d5c08fa... fe08823... M arch/x86/kvm/i8254.c :100644 100644 40f... 5b699c1... M arch/x86/kvm/lapic.c :100644 100644 40010b0... 9c4e52b... M arch/x86/kvm/lapic.h :100644 100644 053e49f... 975b0d6... M include/linux/kvm_host.h :100644 100644 cd6f92b... 717d265... M virt/kvm/ioapic.c :100644 100644 0d454d3... d24ac91... M virt/kvm/irq_comm.c arch/x86/kvm/i8254.c |3 +-- arch/x86/kvm/lapic.c | 12 arch/x86/kvm/lapic.h |1 + include/linux/kvm_host.h |2 ++ virt/kvm/ioapic.c|9 - virt/kvm/irq_comm.c | 12 6 files changed, 28 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c index d5c08fa..fe08823 100644 --- a/arch/x86/kvm/i8254.c +++ b/arch/x86/kvm/i8254.c @@ -299,8 +299,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer *data) if (ktimer-reinject || !atomic_read(ktimer-pending)) atomic_inc(ktimer-pending); - if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq)) - wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq); + kvm_irq_kick_vcpus(ktimer-kvm); if (ktimer-t_ops-is_periodic(ktimer)) { hrtimer_add_expires_ns(ktimer-timer, ktimer-period); diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index 40f..5b699c1 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1031,6 +1031,18 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu) kvm_apic_local_deliver(apic, APIC_LVT0); } +int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu) +{ + if (kvm_lapic_enabled(vcpu)) { + u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0); + if ((lvt0 APIC_LVT_MASKED) == 0 + GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT) + return 1; + } + + return 0; +} + static struct kvm_timer_ops lapic_timer_ops = { .is_periodic = lapic_is_periodic, }; diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 40010b0..9c4e52b 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -30,6 +30,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value); u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu); void kvm_apic_set_version(struct kvm_vcpu *vcpu); +int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu); int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 053e49f..975b0d6 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -542,6 +542,8 @@ int kvm_set_irq_routing(struct kvm *kvm, unsigned flags); void kvm_free_irq_routing(struct kvm *kvm); +void kvm_irq_kick_vcpus(struct kvm *kvm); + #else static inline void kvm_free_irq_routing(struct kvm *kvm) {} diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c index cd6f92b..717d265 100644 --- a/virt/kvm/ioapic.c +++ b/virt/kvm/ioapic.c @@ -163,15 +163,6 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int irq) irqe.level = 1; irqe.shorthand = 0; -#ifdef CONFIG_X86 - /* Always deliver PIT interrupt to vcpu 0 */ - if (irq == 0) { - irqe.dest_mode = 0; /* Physical mode. */ - /* need to read apic_id from apic regiest since -* it can be rewritten */ - irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id; - } -#endif return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe); } diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c index 0d454d3..d24ac91 100644 --- a/virt/kvm/irq_comm.c +++ b/virt/kvm/irq_comm.c @@ -293,6 +293,18 @@ void kvm_free_irq_routing(struct kvm *kvm) kfree(kvm-irq_routing); } +void kvm_irq_kick_vcpus(struct kvm *kvm) +{ + int i; + struct kvm_vcpu *vcpu; + + kvm_for_each_vcpu(i, vcpu, kvm) { + if (kvm_apic_in_virtual_wire_mode(vcpu)) + if (waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); + } +} + static int setup_routing_entry(struct kvm_irq_routing_table *rt, struct kvm_kernel_irq_routing_entry *e,
[PATCH 4/5] Remove timer.c
The code in arch/x86/kvm/timer.c is not similar enough between the various implementations to really share it. Move the implementation into the LAPIC code, and then remove timer.c Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 31a7035... 8d9adf6... M arch/x86/kvm/Makefile :100644 100644 ba39e25... 55b2dab... M arch/x86/kvm/kvm_timer.h :100644 100644 cd60c0b... 40f... M arch/x86/kvm/lapic.c :100644 00 72b5144... 000... D arch/x86/kvm/timer.c arch/x86/kvm/Makefile|3 +- arch/x86/kvm/kvm_timer.h |3 -- arch/x86/kvm/lapic.c | 34 - arch/x86/kvm/timer.c | 47 -- 4 files changed, 34 insertions(+), 53 deletions(-) diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile index 31a7035..8d9adf6 100644 --- a/arch/x86/kvm/Makefile +++ b/arch/x86/kvm/Makefile @@ -10,8 +10,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, kvm_main.o ioapic.o \ assigned-dev.o) kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o) -kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \ - i8254.o timer.o +kvm-y += x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o kvm-intel-y+= vmx.o kvm-amd-y += svm.o diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h index ba39e25..55b2dab 100644 --- a/arch/x86/kvm/kvm_timer.h +++ b/arch/x86/kvm/kvm_timer.h @@ -15,7 +15,4 @@ struct kvm_timer_ops { bool (*is_periodic)(struct kvm_timer *); }; - -enum hrtimer_restart kvm_timer_fn(struct hrtimer *data); - #endif diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index cd60c0b..40f 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -1040,6 +1040,38 @@ static const struct kvm_io_device_ops apic_mmio_ops = { .write= apic_mmio_write, }; +static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data) +{ + struct kvm_vcpu *vcpu; + struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); + int restart_timer = 0; + + vcpu = ktimer-vcpu; + if (!vcpu) + return HRTIMER_NORESTART; + + /* +* There is a race window between reading and incrementing, but we do +* not care about potentially losing timer events in the !reinject +* case anyway. +*/ + if (ktimer-reinject || !atomic_read(ktimer-pending)) + atomic_inc(ktimer-pending); + + if (waitqueue_active(vcpu-wq)) + wake_up_interruptible(vcpu-wq); + + if (ktimer-t_ops-is_periodic(ktimer)) { + hrtimer_add_expires_ns(ktimer-timer, ktimer-period); + restart_timer = 1; + } + + if (restart_timer) + return HRTIMER_RESTART; + else + return HRTIMER_NORESTART; +} + int kvm_create_lapic(struct kvm_vcpu *vcpu) { struct kvm_lapic *apic; @@ -1065,7 +1097,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu) hrtimer_init(apic-lapic_timer.timer, CLOCK_MONOTONIC, HRTIMER_MODE_ABS); - apic-lapic_timer.timer.function = kvm_timer_fn; + apic-lapic_timer.timer.function = lapic_timer_fn; apic-lapic_timer.t_ops = lapic_timer_ops; apic-lapic_timer.kvm = vcpu-kvm; apic-lapic_timer.vcpu = vcpu; diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c deleted file mode 100644 index 72b5144..000 --- a/arch/x86/kvm/timer.c +++ /dev/null @@ -1,47 +0,0 @@ -#include linux/kvm_host.h -#include linux/kvm.h -#include linux/hrtimer.h -#include asm/atomic.h -#include kvm_timer.h - -static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer) -{ - int restart_timer = 0; - wait_queue_head_t *q = vcpu-wq; - - /* -* There is a race window between reading and incrementing, but we do -* not care about potentially loosing timer events in the !reinject -* case anyway. -*/ - if (ktimer-reinject || !atomic_read(ktimer-pending)) - atomic_inc(ktimer-pending); - - if (waitqueue_active(q)) - wake_up_interruptible(q); - - if (ktimer-t_ops-is_periodic(ktimer)) { - hrtimer_add_expires_ns(ktimer-timer, ktimer-period); - restart_timer = 1; - } - - return restart_timer; -} - -enum hrtimer_restart kvm_timer_fn(struct hrtimer *data) -{ - int restart_timer; - struct kvm_vcpu *vcpu; - struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer); - - vcpu = ktimer-vcpu; - if (!vcpu) - return HRTIMER_NORESTART; - - restart_timer = __kvm_timer_fn(vcpu, ktimer); - if (restart_timer) - return HRTIMER_RESTART; - else - return HRTIMER_NORESTART; -} - -- 1.6.0.6 -- To unsubscribe from this
[PATCH 2/5] Remove KVM_REQ_PENDING_TIMER.
KVM_REQ_PENDING_TIMER is set and cleared in a couple of places, but it never seems to be actually checked. Remove it. Signed-off-by: Chris Lalancette clala...@redhat.com --- :100644 100644 eea4043... 72b5144... M arch/x86/kvm/timer.c :100644 100644 2ef39062.. 93a65b4... M arch/x86/kvm/x86.c :100644 100644 bd5a616... 053e49f... M include/linux/kvm_host.h arch/x86/kvm/timer.c |5 + arch/x86/kvm/x86.c |1 - include/linux/kvm_host.h |1 - 3 files changed, 1 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c index eea4043..72b5144 100644 --- a/arch/x86/kvm/timer.c +++ b/arch/x86/kvm/timer.c @@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer) * not care about potentially loosing timer events in the !reinject * case anyway. */ - if (ktimer-reinject || !atomic_read(ktimer-pending)) { + if (ktimer-reinject || !atomic_read(ktimer-pending)) atomic_inc(ktimer-pending); - /* FIXME: this code should not know anything about vcpus */ - set_bit(KVM_REQ_PENDING_TIMER, vcpu-requests); - } if (waitqueue_active(q)) wake_up_interruptible(q); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2ef3906..93a65b4 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3906,7 +3906,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) if (r = 0) break; - clear_bit(KVM_REQ_PENDING_TIMER, vcpu-requests); if (kvm_cpu_has_pending_timer(vcpu)) kvm_inject_pending_timer_irqs(vcpu); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd5a616..053e49f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -33,7 +33,6 @@ #define KVM_REQ_REPORT_TPR_ACCESS 2 #define KVM_REQ_MMU_RELOAD 3 #define KVM_REQ_TRIPLE_FAULT 4 -#define KVM_REQ_PENDING_TIMER 5 #define KVM_REQ_UNHALT 6 #define KVM_REQ_MMU_SYNC 7 #define KVM_REQ_KVMCLOCK_UPDATE8 -- 1.6.0.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation
Hi, The device can be accessed by -drive if=raid,file=XXX Don't extend that qemu automagic please. The new way to handle this is: -drive if=none,id=mydisk,file=/path/to/some/disk.img -device megasas,id=raid -device scsi-disk,bus=raid.0,scsi-id=1,drive=mydisk In order to support SCSI command emulation I had to update / patch up the existing SCSI disk support. This might be not to everyones taste, so I'm open to alternative suggestions. But I certainly do _not_ want to update the SCSI disk emulation, as this is really quite tied to the SCSI parallel interface used by the old lsi53c895a.c. --verbose please. I'd prefer to fix scsi-disk bugs and/or limitations instead of working around them. Plus it doesn't do scatter-gather list handling, Which should be fixed anyway. cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
Gleb Natapov wrote: 1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but we are talking about nanoseconds on modern hardware (I think Paul quoted me 10ns vs 45ns on his rig). I don't think either overhead is something to be concerned about in this case. If we can avoid why not? Nanoseconds tend to add up. BTW: I didn't mean to imply that we should be cavalier in adding overhead. My point was that adding overhead is sometimes _necessary_ to prevent a race above and beyond an RCU pointer acquisition, and 35ns is not _terrible_ IMO. I was suggesting to solve this by switching to SRCU, but an alternative is copying the structure (when permitted with immutable objects), which seems to work in this particular case. It should be noted that the copy has its own unquantified overhead beyond basic RCU as well, so its not truly free (I'd guess its = as the switch to SRCU without copies, though). IOW: sync hurts, but its sometimes a necessary evil ;) Kind Regards, -Greg signature.asc Description: OpenPGP digital signature
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 09:34:41AM -0400, Gregory Haskins wrote: Hi Paul, Paul E. McKenney wrote: On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote: The current code suffers from the following race condition: thread-1thread-2 --- kvm_set_irq() { rcu_read_lock() irq_rt = rcu_dereference(table); rcu_read_unlock(); kvm_set_irq_routing() { mutex_lock(); irq_rt = table; rcu_assign_pointer(); mutex_unlock(); synchronize_rcu(); kfree(irq_rt); irq_rt-entry-set(); /* bad */ - Because the pointer is accessed outside of the read-side critical section. There are two basic patterns we can use to fix this bug: 1) Switch to sleeping-rcu and encompass the -set() access within the read-side critical section, OR 2) Add reference counting to the irq_rt structure, and simply acquire the reference from within the RSCS. This patch implements solution (1). Looks like a good transformation! A few questions interspersed below. Thanks for the review. I would have CC'd you but I figured I pestered you enough with my RCU reviews in the past, and didn't want to annoy you ;) I will be sure to CC you in the future, unless you ask otherwise. No problem either way. ;-) Thanx, Paul Signed-off-by: Gregory Haskins ghask...@novell.com --- include/linux/kvm_host.h |6 +- virt/kvm/irq_comm.c | 50 +++--- virt/kvm/kvm_main.c |1 + 3 files changed, 35 insertions(+), 22 deletions(-) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index bd5a616..1fe135d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -185,7 +185,10 @@ struct kvm { struct mutex irq_lock; #ifdef CONFIG_HAVE_KVM_IRQCHIP - struct kvm_irq_routing_table *irq_routing; + struct { + struct srcu_structsrcu; Each structure has its own SRCU domain. This is OK, but just asking if that is the intent. It does look like the SRCU primitives are passed a pointer to the correct structure, and that the return value from srcu_read_lock() gets passed into the matching srcu_read_unlock() like it needs to be, so that is good. Yeah, it was intentional. Technically the table is per-guest, and thus the locking is too, which is the desired/intentional granularity. On that note, I tried to denote that kvm-irq_routing.srcu and kvm-irq_routing.table were related to one another, but then went ahead and modified the hunks that touched kvm-irq_ack_notifier_list, too. In retrospect, this was probably a mistake. I should leave the rcu usage outside of -irq_routing.table alone. + struct kvm_irq_routing_table *table; + } irq_routing; struct hlist_head mask_notifier_list; struct hlist_head irq_ack_notifier_list; #endif [ . . . ] @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level) * IOAPIC. So set the bit in both. The guest will ignore * writes to the unused one. */ - rcu_read_lock(); - irq_rt = rcu_dereference(kvm-irq_routing); + idx = srcu_read_lock(kvm-irq_routing.srcu); + irq_rt = rcu_dereference(kvm-irq_routing.table); if (irq irq_rt-nr_rt_entries) - hlist_for_each_entry(e, n, irq_rt-map[irq], link) - irq_set[i++] = *e; - rcu_read_unlock(); + hlist_for_each_entry(e, n, irq_rt-map[irq], link) { What prevents the above list from changing while we are traversing it? (Yes, presumably whatever was preventing it from changing before this patch, but what?) Mostly kvm-lock is held, but not always. And if kvm-lock were held all the time, there would be no point in using SRCU. ;-) This is protected by kvm-irq_lock within kvm_set_irq_routing(). Entries are added to a copy of the list, and the top-level table pointer is swapped (via rcu_assign_pointer(), as it should be) while holding the lock. Finally, we synchronize with the RSCS before deleting the old copy. It looks to me like the original author got this part right, so I didn't modify it outside of converting to SRCU. + int r; - while(i--) { - int r; - r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level); - if (r 0) - continue; + r = e-set(e, kvm,
Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic
On Tue, Oct 27, 2009 at 11:02:23AM -0400, Gregory Haskins wrote: Thanks for this, Paul. Some questions and statements below. Paul E. McKenney wrote: On Tue, Oct 27, 2009 at 04:02:37PM +0200, Gleb Natapov wrote: On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote: [ . . . ] standard RCU RSCS, which is what SRCU is designed for. So rather than inventing an awkward two-phased stack based solution, it's better to reuse the provided tools, IMO. To flip it around: Is there any reason why an SRCU would not work here, and thus we were forced to use something like the stack-copy approach? If SRCU has no disadvantage comparing to RCU why not use it always? :) The disadvantages of SRCU compared to RCU include the following: 1. SRCU requires that the return value of srcu_read_lock() be fed into srcu_read_unlock(). This is usually not a problem, but can be painful if there are multiple levels of function call separating the two. Right, and this is simple/neat w.r.t. its usage in irq_routing, so no problem there. Fair enough! 2. SRCU's grace periods are about 4x slower than those of RCU. And they also don't scale all that well with extremely large numbers of CPUs (but this can be fixed when/if it becomes a real problem). The irq_routing update path is extremely infrequent, so this should not be an issue. Sounds good! 3. SRCU's read-side primitives are also significantly slower than those of RCU. Are the 10ns vs 45ns numbers that I mentioned in my last reply the proper ballpark? How do these compare to an atomic-op, say an uncontended spinlock on modern hardware? The assumption is that srcu_read_lock() should be significantly cheaper than a read-lock(). If its not, then we might as well use something else, I suppose. But if its not, I guess you probably wouldn't have bothered to invent it in the first place ;) SRCU read-side critical sections should indeed be quite a bit cheaper than uncontended spinlock, particularly if the spinlock was last released by some other CPU. There are those who insist that uncontended spinlocks and atomic operations will soon be free, but I will believe this when I see it. ;-) 4. SRCU does not have a call_srcu(). One could be provided, but its semantics would be a bit strange due to the need to limit the number of callbacks, given that general blocking is permitted in SRCU read-side critical sections. (And it would take some doing to convince me to supply an SRCU!) This is not an issue in our design. Very good! 5. The current SRCU has no reasonable way to implement read-side priority boosting, as there is no record of which task is read-holding which SRCU. Given the infrequency of the update path, I do not see this as a problem. Sounds like you have it covered, then! Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] Fix kdump under KVM.
On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote: This patch is the main point of the series. In order for kdump to properly work inside a KVM guest, we need to make sure that all VCPUs in virtual wire APIC mode get kicked to try and pick up the timer interrupts. To do this, we iterate over the CPUs and deliver interrupts to the proper VCPUs. I don't love the concept of doing kvm_irq_kick_vcpus() from within pit_timer_fn(). A PIT is not connected to a CPU at all, only to a PIC or APIC. However, if a CPU enters idle, this is the only way to wake it up to check for the interrupt. The reason the PIT interrupt was fixed to BSP is: http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the destination overwrite in case its programmed by the guest to a single CPU would fix it? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation
On Mon, Oct 26, 2009 at 12:22:08PM -0400, Gregory Haskins wrote: IRQFD currently uses a deferred workqueue item to execute the injection operation. It was originally designed this way because kvm_set_irq() required the caller to hold the irq_lock mutex, and the eventfd callback is invoked from within a non-preemptible critical section. With the advent of lockless injection support for certain GSIs, the deferment mechanism is no longer technically needed in all cases. Since context switching to the workqueue is a source of interrupt latency, lets switch to a direct method whenever possible. Fortunately for us, the most common use of irqfd (MSI-based GSIs) readily support lockless injection. Signed-off-by: Gregory Haskins ghask...@novell.com This is a useful optimization I think. Some comments below. --- virt/kvm/eventfd.c | 31 +++ 1 files changed, 27 insertions(+), 4 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 30f70fd..e6cc958 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -51,20 +51,34 @@ struct _irqfd { wait_queue_t wait; struct work_structinject; struct work_structshutdown; + void (*execute)(struct _irqfd *); }; static struct workqueue_struct *irqfd_cleanup_wq; static void -irqfd_inject(struct work_struct *work) +irqfd_inject(struct _irqfd *irqfd) { - struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); struct kvm *kvm = irqfd-kvm; kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1); kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0); } +static void +irqfd_deferred_inject(struct work_struct *work) +{ + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); + + irqfd_inject(irqfd); +} + +static void +irqfd_schedule(struct _irqfd *irqfd) +{ + schedule_work(irqfd-inject); +} + /* * Race-free decouple logic (ordering is critical) */ @@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) if (flags POLLIN) /* An event has been signaled, inject an interrupt */ - schedule_work(irqfd-inject); + irqfd-execute(irqfd); if (flags POLLHUP) { /* The eventfd is closing, detach from KVM */ @@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) irqfd-kvm = kvm; irqfd-gsi = gsi; INIT_LIST_HEAD(irqfd-list); - INIT_WORK(irqfd-inject, irqfd_inject); + INIT_WORK(irqfd-inject, irqfd_deferred_inject); INIT_WORK(irqfd-shutdown, irqfd_shutdown); file = eventfd_fget(fd); @@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) list_add_tail(irqfd-list, kvm-irqfds.items); spin_unlock_irq(kvm-irqfds.lock); + ret = kvm_irq_check_lockless(kvm, gsi); + if (ret 0) + goto fail; + + if (ret) + irqfd-execute = irqfd_inject; + else + irqfd-execute = irqfd_schedule; + Can't gsi get converted from lockless to non-lockless after it's checked (by the routing ioctl)? Kernel will crash then. How about, each time we get event from eventfd, we implement kvm_irqfd_toggle_lockless, which does a single scan, and returns true/false status (and I really mean toggle, let's not do set 1 / set 0 as well) telling us whether interrupts could be delivered in a lockless manner? Then we can either just ignore the error (no one uses eventfd this way), or handle the mostly irrelevant case of level by means of the workqueue, like we did previously. /* * Check if there was an event already pending on the eventfd * before we registered, and trigger it as if we didn't miss it. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] introduce VMSTATE_U64
On Tue, Oct 20, 2009 at 08:40:26AM +0900, Avi Kivity wrote: On 10/17/2009 04:27 AM, Glauber Costa wrote: This is a patch actually written by Juan, which, according to him, he plans on posting to qemu.git. Problem is that linux defines u64 in a way that is type-uncompatible with uint64_t. I am including it here, because it is a dependency to my patch series that follows. Why can't we store these values in qemu as uint64_ts? Because then we have to redefine the whole structure in qemu. the proposal is to simply pick the structures directly from linux. I believe it is much easier, and the versioning scheme in vmstate will help us get around any changes they might suffer in the future. I get build errors with this. Is there something extra I need to do? I am currently running a rhel54 2.6.18 kernel. CCqdev-properties.o savevm.c: In function ‘get_u64’: savevm.c:856: error: ‘__u64’ undeclared (first use in this function) savevm.c:856: error: (Each undeclared identifier is reported only once savevm.c:856: error: for each function it appears in.) savevm.c:856: error: ‘v’ undeclared (first use in this function) savevm.c: In function ‘put_u64’: savevm.c:863: error: ‘__u64’ undeclared (first use in this function) savevm.c:863: error: ‘v’ undeclared (first use in this function) make[1]: *** [savevm.o] Error 1 make[1]: *** Waiting for unfinished jobs make: *** [build-all] Error 2 Thanks, -Andrew -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 1/3] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP
On Tue, Oct 27, 2009 at 01:10:43PM -0200, Marcelo Tosatti wrote: Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP. CC: sta...@kernel.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -2285,6 +2285,9 @@ long kvm_arch_vm_ioctl(struct file *filp goto out; break; case KVM_CREATE_IRQCHIP: + r = -EEXIST; + if (kvm-arch.vpic) + goto out; r = -ENOMEM; kvm-arch.vpic = kvm_create_pic(kvm); if (kvm-arch.vpic) { @@ -2300,6 +2303,8 @@ long kvm_arch_vm_ioctl(struct file *filp if (r) { kfree(kvm-arch.vpic); kfree(kvm-arch.vioapic); + kvm-arch.vpic = NULL; + kvm-arch.vioapic = NULL; goto out; } break; Is there a lock that protects this structure? Can memory leak still occur if multiple threads call KVM_CREATE_IRQCHIP in parallel? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch 2/3] KVM: x86: disallow KVM_{SET,GET}_LAPIC without in kernel irqchip
On Tue, Oct 27, 2009 at 01:10:44PM -0200, Marcelo Tosatti wrote: Otherwise kvm might attempt to dereference a NULL pointer. CC: sta...@kernel.org Signed-off-by: Marcelo Tosatti mtosa...@redhat.com Index: kvm/arch/x86/kvm/x86.c === --- kvm.orig/arch/x86/kvm/x86.c +++ kvm/arch/x86/kvm/x86.c @@ -1815,6 +1815,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi switch (ioctl) { case KVM_GET_LAPIC: { + r = -EINVAL; + if (!irqchip_in_kernel(vcpu-kvm)) + goto out; lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); r = -ENOMEM; @@ -1830,6 +1833,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi break; } case KVM_SET_LAPIC: { + r = -EINVAL; + if (!irqchip_in_kernel(vcpu-kvm)) + goto out; lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL); r = -ENOMEM; if (!lapic) Can the value of irqchip_in_kernel be changed by userspace after we have checked it? If yes, this check won't help ... -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation
Michael S. Tsirkin wrote: On Mon, Oct 26, 2009 at 12:22:08PM -0400, Gregory Haskins wrote: IRQFD currently uses a deferred workqueue item to execute the injection operation. It was originally designed this way because kvm_set_irq() required the caller to hold the irq_lock mutex, and the eventfd callback is invoked from within a non-preemptible critical section. With the advent of lockless injection support for certain GSIs, the deferment mechanism is no longer technically needed in all cases. Since context switching to the workqueue is a source of interrupt latency, lets switch to a direct method whenever possible. Fortunately for us, the most common use of irqfd (MSI-based GSIs) readily support lockless injection. Signed-off-by: Gregory Haskins ghask...@novell.com This is a useful optimization I think. Some comments below. --- virt/kvm/eventfd.c | 31 +++ 1 files changed, 27 insertions(+), 4 deletions(-) diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c index 30f70fd..e6cc958 100644 --- a/virt/kvm/eventfd.c +++ b/virt/kvm/eventfd.c @@ -51,20 +51,34 @@ struct _irqfd { wait_queue_t wait; struct work_structinject; struct work_structshutdown; +void (*execute)(struct _irqfd *); }; static struct workqueue_struct *irqfd_cleanup_wq; static void -irqfd_inject(struct work_struct *work) +irqfd_inject(struct _irqfd *irqfd) { -struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); struct kvm *kvm = irqfd-kvm; kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1); kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0); } +static void +irqfd_deferred_inject(struct work_struct *work) +{ +struct _irqfd *irqfd = container_of(work, struct _irqfd, inject); + +irqfd_inject(irqfd); +} + +static void +irqfd_schedule(struct _irqfd *irqfd) +{ +schedule_work(irqfd-inject); +} + /* * Race-free decouple logic (ordering is critical) */ @@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, void *key) if (flags POLLIN) /* An event has been signaled, inject an interrupt */ -schedule_work(irqfd-inject); +irqfd-execute(irqfd); if (flags POLLHUP) { /* The eventfd is closing, detach from KVM */ @@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) irqfd-kvm = kvm; irqfd-gsi = gsi; INIT_LIST_HEAD(irqfd-list); -INIT_WORK(irqfd-inject, irqfd_inject); +INIT_WORK(irqfd-inject, irqfd_deferred_inject); INIT_WORK(irqfd-shutdown, irqfd_shutdown); file = eventfd_fget(fd); @@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi) list_add_tail(irqfd-list, kvm-irqfds.items); spin_unlock_irq(kvm-irqfds.lock); +ret = kvm_irq_check_lockless(kvm, gsi); +if (ret 0) +goto fail; + +if (ret) +irqfd-execute = irqfd_inject; +else +irqfd-execute = irqfd_schedule; + Can't gsi get converted from lockless to non-lockless after it's checked (by the routing ioctl)? I think I protect against this in patch 2/3 by ensuring that any vectors that are added have to conform to the same locking rules. The code doesn't support deleting routes, so we really only need to make sure that new routes do not change. Kernel will crash then. How about, each time we get event from eventfd, we implement kvm_irqfd_toggle_lockless, which does a single scan, and returns true/false status (and I really mean toggle, let's not do set 1 / set 0 as well) telling us whether interrupts could be delivered in a lockless manner? I am not sure I like this idea in general given that I believe I already handle the error case you are concerned with. However, the concept of providing a toggle option so we can avoid scanning the list twice is a good one. That can be done as a new patch series, but it would be a nice addition. Thanks Michael, -Greg signature.asc Description: OpenPGP digital signature
[PATCH] [RFC] KVM test: Major control file cleanup
As pointed out before, the KVM reference control files could use a little clean up. This patch implements severe cleanup of the main control file by: * Refactoring the code present there, moving it to the kvm_utils.py library * Treat the build test exactly the same way as other tests, moving the config stuff that used to be in the control file realm out to its own configuration file, for the sake of consistency. * Since a frequent complaint amongst users is that by default the build test will build from release, and that build might be unsuccessful depending on the system being used (older kernel versions and other sorts of stuff), change the default build method to 'noinstall'. This way the control file becomes way shorter, fairly well organized, and we have a consistent configuration schema across the board, based on configuration files. If people are OK with this change, final path will change the control.parallel file as well. Important: For the default control file, I removed the part of the code that parses new strings for the test configuration because I believe those modifications belong to the test config files, they could be just commented. For cases where keeping all stuff inside the config file is not possible, appropriate code will be added (in other example control files). Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com --- client/tests/kvm/control | 203 +++-- client/tests/kvm/kvm_build.cfg.sample | 82 + client/tests/kvm/kvm_utils.py | 112 ++ 3 files changed, 211 insertions(+), 186 deletions(-) create mode 100644 client/tests/kvm/kvm_build.cfg.sample diff --git a/client/tests/kvm/control b/client/tests/kvm/control index 431baf9..7acec14 100644 --- a/client/tests/kvm/control +++ b/client/tests/kvm/control @@ -20,194 +20,25 @@ KVM (both kernelspace and userspace) code. For online docs, please refer to http://www.linux-kvm.org/page/KVM-Autotest +import sys, os, logging +# Add the KVM tests dir to the python path +kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests/kvm') +sys.path.append(kvm_test_dir) +import kvm_utils, kvm_config -import sys, os -#- -# set English environment (command output might be localized, need to be safe) -#- -os.environ['LANG'] = 'en_US.UTF-8' +## Prepare the environment for the KVM test. +kvm_utils.prepare_test_run(basedir=kvm_test_dir, + rootdir='/tmp/kvm_autotest_root') -#- -# Enable modules import from current directory (tests/kvm) -#- -pwd = os.path.join(os.environ['AUTODIR'],'tests/kvm') -sys.path.append(pwd) +cfg_names = ['kvm_build.cfg', 'kvm_tests.cfg'] -# -# create required symlinks -# -# When dispatching tests from autotest-server the links we need do not exist on -# the host (the client). The following lines create those symlinks. Change -# 'rootdir' here and/or mount appropriate directories in it. -# -# When dispatching tests on local host (client mode) one can either setup kvm -# links, or same as server mode use rootdir and set all appropriate links and -# mount-points there. For example, guest installation tests need to know where -# to find the iso-files. -# -# We create the links only if not already exist, so if one already set up the -# links for client/local run we do not touch the links. -rootdir='/tmp/kvm_autotest_root' -iso=os.path.join(rootdir, 'iso') -images=os.path.join(rootdir, 'images') -qemu=os.path.join(rootdir, 'qemu') -qemu_img=os.path.join(rootdir, 'qemu-img') +for cfg_name in cfg_names: +cfg_path = os.path.join(kvm_test_dir, cfg_name) +cfg = kvm_config.config(cfg_path) +list = kvm_utils.get_test_list(cfg, cfg_name, kvm_test_dir) +logging.info(Running test set defined on config file %s, cfg_name) +kvm_utils.run_tests(list, job) - -def link_if_not_exist(ldir, target, link_name): -t = target -l = os.path.join(ldir, link_name) -if not os.path.exists(l): -os.system('ln -s %s %s' % (t, l)) - -# Create links only if not already exist -link_if_not_exist(pwd, '../../', 'autotest') -link_if_not_exist(pwd, iso, 'isos') -link_if_not_exist(pwd, images, 'images') -link_if_not_exist(pwd, qemu, 'qemu') -link_if_not_exist(pwd, qemu_img, 'qemu-img') - -# -# Params that will be passed to the KVM install/build test -# -params = { -name: build, -shortname: build, -type: build, -mode: release, -#mode: snapshot, -#mode: localtar, -#mode: localsrc, -#mode: git, -#mode: noinstall, -#mode: koji, - -## Are we going to
Re: vhost-net patches
On Tue, Oct 27, 2009 at 09:36:18AM -0700, Shirley Ma wrote: Hello Michael, On Tue, 2009-10-27 at 17:27 +0200, Michael S. Tsirkin wrote: Possibly GFP_ATOMIC allocations in vring_add_indirect are failing? Is there a chance you are tight on guest memory for some reason? with vhost, virtio does currently consume a bit more memory than with userspace backend. I did see memory leak on host every time after exiting guest. I don't know where. Do you see it? I didn't notice. I'll check this. Anyway after I reboot host and restart guest with large memory allocation, How large is large here? I usually allocate 1G. I do see performance improves to 3xxxMb/s, and occasionally reaches 40xxMb/s. This is same as userspace, isn't it? But queue full still exists, I can avoid the problem by increasing send queue size from qemu. And what performance do you get then? I will apply deferring skb allocation patch on guest to see any performance gain after your vhost patch. Thanks Shirley -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: performance regression in virtio-net in 2.6.32-rc4
On Tue, 27 Oct 2009 05:18:35 am Michael S. Tsirkin wrote: Hi! I noticed a performance regression in virtio net: going from 2.6.31 to 2.6.32-rc4 I see this, for guest to host communication: ... Size SizeSize Time Throughput bytes bytes bytessecs.10^6bits/sec 87380 16384 1638410.207806.48 ... 87380 16384 1638410.006814.60 Hmm, needs a bisect I'd say. Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM
On 2009/10/21 14:13, MORITA Kazutaka wrote: Hi everyone, Sheepdog is a distributed storage system for KVM/QEMU. It provides highly available block level storage volumes to VMs like Amazon EBS. Sheepdog supports advanced volume management features such as snapshot, cloning, and thin provisioning. Sheepdog runs on several tens or hundreds of nodes, and the architecture is fully symmetric; there is no central node such as a meta-data server. We added some pages to Sheepdog website: Design: http://www.osrg.net/sheepdog/design.html FAQ : http://www.osrg.net/sheepdog/faq.html Sheepdog mailing list is also ready to use (thanks for Tomasz) Subscribe/Unsubscribe/Preferences http://lists.wpkg.org/mailman/listinfo/sheepdog Archive http://lists.wpkg.org/pipermail/sheepdog/ We are always looking for developers or users interested in participating in Sheepdog project! Thanks. MORITA Kazutaka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
On 10/27/2009 01:21 AM, Olof Johansson wrote: On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote: For some reason, I'm not seeing this build break, but the patch is obviously correct. Acked-by: Hollis Blanchard holl...@us.ibm.com I saw it when building with pasemi_defconfig + manually enabled KVM options (all available). Alex, can you fold this patch in? No need to repost; just update your git tree. (btw, please make sure the patchset is bisectable). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
On 27.10.2009, at 09:56, Avi Kivity wrote: On 10/27/2009 01:21 AM, Olof Johansson wrote: On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote: For some reason, I'm not seeing this build break, but the patch is obviously correct. Acked-by: Hollis Blanchard holl...@us.ibm.com I saw it when building with pasemi_defconfig + manually enabled KVM options (all available). Alex, can you fold this patch in? I can, but it's only partly related. My patches don't even touch timing.c. The only thing I can imagine resulting in a breakage is that my patches allow for an =M setting. So IMHO this patch should be applied before my series. Should I stick it as first patch in my git tree? Alex -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5
On 10/27/2009 03:42 PM, Alexander Graf wrote: I can, but it's only partly related. My patches don't even touch timing.c. The only thing I can imagine resulting in a breakage is that my patches allow for an =M setting. So IMHO this patch should be applied before my series. Should I stick it as first patch in my git tree? No need, I'll apply it independently. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html