date:20091027

[COMMIT] [WIN-GUEST-DRIVERS] Clean up prefast (MS static code analyzer tool) errors.

2009-10-27 Thread Yan Vugenfirer

repository: C:/dev/kvm-guest-drivers-windows
branch: master
commit 2a08ca42e31cb525a7179c71fbb7f9e19edc7bb3
Author: Yan Vugenfirer yvuge...@redhat.com
Date:   Tue Oct 27 18:28:43 2009 +0200

[WIN-GUEST-DRIVERS] Clean up prefast (MS static code analyzer tool) errors.

Signed-off-by: Yan Vugenfirer yvuge...@redhat.com

diff --git a/NetKVM/Common/ParaNdis-Debug.c b/NetKVM/Common/ParaNdis-Debug.c
index 22d3b73..71494ef 100644
--- a/NetKVM/Common/ParaNdis-Debug.c
+++ b/NetKVM/Common/ParaNdis-Debug.c
@@ -24,12 +24,7 @@ int bDebugPrint = 1;
 
 static NDIS_SPIN_LOCK CrashLock;
 
-static VOID ParaNdis_OnBugCheck(
-IN KBUGCHECK_CALLBACK_REASON Reason,
-IN PKBUGCHECK_REASON_CALLBACK_RECORD Record,
-IN OUT PVOID ReasonSpecificData,
-IN ULONG ReasonSpecificDataLength
-);
+static KBUGCHECK_REASON_CALLBACK_ROUTINE ParaNdis_OnBugCheck;
 static VOID ParaNdis_PrepareBugCheckData();
 
 typedef BOOLEAN (*KeRegisterBugCheckReasonCallbackType) (
diff --git a/NetKVM/wlh/ParaNdis6-Driver.c b/NetKVM/wlh/ParaNdis6-Driver.c
index 30357a2..1ceaef6 100644
--- a/NetKVM/wlh/ParaNdis6-Driver.c
+++ b/NetKVM/wlh/ParaNdis6-Driver.c
@@ -56,7 +56,7 @@ static KeRegisterProcessorChangeCallbackType
 static KeDeregisterProcessorChangeCallbackType
ParaNdis2008_DeregisterCallback = 
DummyDeregisterProcessorChangeCallback;
 
-static PVOID ProcessorChangeCallbackHandle = NULL;
+static PROCESSOR_CALLBACK_FUNCTION * ProcessorChangeCallbackHandle = NULL;
 
 static VOID ParaNdis_OnCPUChange(
__in PVOID CallbackContext,
diff --git a/NetKVM/wxp/ParaNdis5-Driver.c b/NetKVM/wxp/ParaNdis5-Driver.c
index 2e36a78..12aeeef 100644
--- a/NetKVM/wxp/ParaNdis5-Driver.c
+++ b/NetKVM/wxp/ParaNdis5-Driver.c
@@ -22,6 +22,8 @@
 static NDIS_HANDLE DriverHandle;
 static ULONG   gID = 0;
 
+static DRIVER_UNLOAD ParaVirtualNICUnload;
+
 /**
 Unload handler, only responsibility is cleanup WPP
 ***/
--
To unsubscribe from this list: send the line unsubscribe kvm-commits in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

2009-10-27 Thread Michael S. Tsirkin

On Mon, Oct 26, 2009 at 02:34:49PM -0700, Shirley Ma wrote:
 Hello Miachel,
 
 On Mon, 2009-10-26 at 22:05 +0200, Michael S. Tsirkin wrote:
  Shirley, could you please test the following patch?
 
 With this patch, the performance has gained from 1xxx to 2xxx Mb/s,
 still has some performance gap compared to without vhost. It was
 3xxxMb/s before from guest to host on my set up.

At some point my guest had a runaway nash-hotplug process
consuming 100% CPU. Could you please verify this
does not happen to you?

 Looks like your git tree virtio_net has fixed the skb_xmit panic I have
 seen before as well, good news.
 
 Thanks
 Shirley
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gleb Natapov

On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:
 
 thread-1thread-2
 ---
 
 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();
 
kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();
 
   kfree(irq_rt);
 
irq_rt-entry-set(); /* bad */
 
This is not what happens. irq_rt is never accessed outside read-side
critical section. Data is copied from irq_rt onto the stack and this
copy is accessed outside critical section.

 -
 
 Because the pointer is accessed outside of the read-side critical
 section.  There are two basic patterns we can use to fix this bug:
 
 1) Switch to sleeping-rcu and encompass the -set() access within the
read-side critical section,
 
OR
 
 2) Add reference counting to the irq_rt structure, and simply acquire
the reference from within the RSCS.
 
 This patch implements solution (1).
 
 Signed-off-by: Gregory Haskins ghask...@novell.com
 ---
 
  include/linux/kvm_host.h |6 +-
  virt/kvm/irq_comm.c  |   50 
 +++---
  virt/kvm/kvm_main.c  |1 +
  3 files changed, 35 insertions(+), 22 deletions(-)
 
 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index bd5a616..1fe135d 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -185,7 +185,10 @@ struct kvm {
  
   struct mutex irq_lock;
  #ifdef CONFIG_HAVE_KVM_IRQCHIP
 - struct kvm_irq_routing_table *irq_routing;
 + struct {
 + struct srcu_structsrcu;
 + struct kvm_irq_routing_table *table;
 + } irq_routing;
   struct hlist_head mask_notifier_list;
   struct hlist_head irq_ack_notifier_list;
  #endif
 @@ -541,6 +544,7 @@ int kvm_set_irq_routing(struct kvm *kvm,
   const struct kvm_irq_routing_entry *entries,
   unsigned nr,
   unsigned flags);
 +void kvm_init_irq_routing(struct kvm *kvm);
  void kvm_free_irq_routing(struct kvm *kvm);
  
  #else
 diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
 index 00c68d2..db2553f 100644
 --- a/virt/kvm/irq_comm.c
 +++ b/virt/kvm/irq_comm.c
 @@ -144,10 +144,11 @@ static int kvm_set_msi(struct 
 kvm_kernel_irq_routing_entry *e,
   */
  int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
  {
 - struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS];
 - int ret = -1, i = 0;
 + struct kvm_kernel_irq_routing_entry *e;
 + int ret = -1;
   struct kvm_irq_routing_table *irq_rt;
   struct hlist_node *n;
 + int idx;
  
   trace_kvm_set_irq(irq, level, irq_source_id);
  
 @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
 irq, int level)
* IOAPIC.  So set the bit in both. The guest will ignore
* writes to the unused one.
*/
 - rcu_read_lock();
 - irq_rt = rcu_dereference(kvm-irq_routing);
 + idx = srcu_read_lock(kvm-irq_routing.srcu);
 + irq_rt = rcu_dereference(kvm-irq_routing.table);
   if (irq  irq_rt-nr_rt_entries)
 - hlist_for_each_entry(e, n, irq_rt-map[irq], link)
 - irq_set[i++] = *e;
 - rcu_read_unlock();
 + hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
 + int r;
  
 - while(i--) {
 - int r;
 - r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level);
 - if (r  0)
 - continue;
 + r = e-set(e, kvm, irq_source_id, level);
 + if (r  0)
 + continue;
  
 - ret = r + ((ret  0) ? 0 : ret);
 - }
 + ret = r + ((ret  0) ? 0 : ret);
 + }
 + srcu_read_unlock(kvm-irq_routing.srcu, idx);
  
   return ret;
  }
 @@ -179,17 +178,18 @@ void kvm_notify_acked_irq(struct kvm *kvm, unsigned 
 irqchip, unsigned pin)
   struct kvm_irq_ack_notifier *kian;
   struct hlist_node *n;
   int gsi;
 + int idx;
  
   trace_kvm_ack_irq(irqchip, pin);
  
 - rcu_read_lock();
 - gsi = rcu_dereference(kvm-irq_routing)-chip[irqchip][pin];
 + idx = srcu_read_lock(kvm-irq_routing.srcu);
 + gsi = rcu_dereference(kvm-irq_routing.table)-chip[irqchip][pin];
   if (gsi != -1)
   hlist_for_each_entry_rcu(kian, n, kvm-irq_ack_notifier_list,

[ kvm-Bugs-2886941 ] Extreme slow down using -cpu host

2009-10-27 Thread SourceForge.net

Bugs item #2886941, was opened at 2009-10-27 08:26
Message generated for change (Tracker Item Submitted) made by nwxi
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nwxi (nwxi)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extreme slow down using -cpu host

Initial Comment:
Hi!

I've currently experienced a massive slowdown when using flag -cpu host on an 
AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects 
network (tested e1000 and virtio) and i/o performance (virtio). Further there 
are dozens of log messages from kvm module complaining about:

 cpu0/1 unhandled rdmsr: 0xc0010055

Both, the slowdown and the rdmsr-message do not happen when using qemu64 as 
cpu, but also occur when using -cpu phenom. Thank you for your comments!

regards, Michael

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] kvm-kmod-2.6.31.5

2009-10-27 Thread Jan Kiszka

Alexander Graf wrote:
 
 On 26.10.2009, at 18:26, Jan Kiszka wrote:
 
 This package contains the kvm external modules, using the sources from
 latest stable Linux release 2.6.31.5. It can be used to update the
 kernel-side support of kvm without upgrading the host kernel.

 This release has been tested on x86 down to host kernel 2.6.27 and
 builds down to 2.6.24. Building against older kernels is expected to be
 broken, but if anyone provides patches to fix it, I'm open to merge them.
 
 Aww - I'm missing the awesome changelogs :-).
 

Your wish is my command:

Major KVM changes since kvm-kmod-2.6.30.1:
 - MSI-X support
 - MCE support
 - SVM: Reworkedfixed IRQ injection, added NMI injection
 - Host security: Limit lapic periodic timer frequency
 - Guest security: CPL check on hypercalls and debug register access
 - x86 emulator: lcall and soft INT support, interrupt shadow handling
 - Guest migration fixes
(I probably missed some more important stuff)

Major kvm-kmod changes:
 - Reanimated support down to 2.6.24 hosts
 - Fixed building against out-of-tree kernel builds / split kernels
 - udev script with auto-modprobe on boot
 - Installation fixes (DESTDIR)

Wanted to attach this information to the tarball as well, but
sourceforge's FRS is still in a almost completely broken state. Sigh.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-27 Thread Avi Kivity


On 10/27/2009 01:21 AM, Olof Johansson wrote:

On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:



For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard holl...@us.ibm.com


I saw it when building with pasemi_defconfig + manually enabled KVM 
options (all available).




Alex, can you fold this patch in?

No need to repost; just update your git tree.

(btw, please make sure the patchset is bisectable).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: KVM: VMX: move CR3/PDPTR update to vmx_set_cr3

2009-10-27 Thread Avi Kivity


On 10/26/2009 08:48 PM, Marcelo Tosatti wrote:

GUEST_CR3 is updated via kvm_set_cr3 whenever CR3 is modified from
outside guest context. Similarly pdptrs are updated via load_pdptrs.

Let kvm_set_cr3 perform the update, removing it from the vcpu_run
fast path.
   


Applied, thanks.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [KVM-AUTOTEST PATCH 3/7] KVM test: new test timedrift_with_migration

2009-10-27 Thread Dor Laor


On 10/12/2009 05:28 PM, Lucas Meneghel Rodrigues wrote:

Hi Michael, I am reviewing your patchset and have just a minor remark
to make here:

On Wed, Oct 7, 2009 at 2:54 PM, Michael Goldishmgold...@redhat.com  wrote:

This patch adds a new test that checks the timedrift introduced by migrations.
It uses the same parameters used by the timedrift test to get the guest time.
In addition, the number of migrations the test performs is controlled by the
parameter 'migration_iterations'.

Signed-off-by: Michael Goldishmgold...@redhat.com
---
  client/tests/kvm/kvm_tests.cfg.sample  |   33 ---
  client/tests/kvm/tests/timedrift_with_migration.py |   95 
  2 files changed, 115 insertions(+), 13 deletions(-)
  create mode 100644 client/tests/kvm/tests/timedrift_with_migration.py

diff --git a/client/tests/kvm/kvm_tests.cfg.sample 
b/client/tests/kvm/kvm_tests.cfg.sample
index 540d0a2..618c21e 100644
--- a/client/tests/kvm/kvm_tests.cfg.sample
+++ b/client/tests/kvm/kvm_tests.cfg.sample
@@ -100,19 +100,26 @@ variants:
 type = linux_s3

 - timedrift:install setup
-type = timedrift
 extra_params +=  -rtc-td-hack
-# Pin the VM and host load to CPU #0
-cpu_mask = 0x1
-# Set the load and rest durations
-load_duration = 20
-rest_duration = 20
-# Fail if the drift after load is higher than 50%
-drift_threshold = 50
-# Fail if the drift after the rest period is higher than 10%
-drift_threshold_after_rest = 10
-# For now, make sure this test is executed alone
-used_cpus = 100
+variants:
+- with_load:
+type = timedrift
+# Pin the VM and host load to CPU #0
+cpu_mask = 0x1



Let's use -smp 2 always.

btw: we need not to parallel the load test with standard tests.


+# Set the load and rest durations
+load_duration = 20
+rest_duration = 20


Even the default duration here seems way too brief here, is there any
reason why 20s was chosen instead of, let's say, 1800s? I am under the
impression that 20s of load won't be enough to cause any noticeable
drift...


+# Fail if the drift after load is higher than 50%
+drift_threshold = 50
+# Fail if the drift after the rest period is higher than 10%
+drift_threshold_after_rest = 10


I am also curious about those tresholds and the reasoning behind them.
Is there any official agreement on what we consider to be an
unreasonable drift?

Another thing that struck me out is drift calculation: On the original
timedrift test, the guest drift is normalized against the host drift:

drift = 100.0 * (host_delta - guest_delta) / host_delta

While in the new drift tests, we consider only the guest drift. I
believe is better to normalize all tests based on one drift
calculation criteria, and those values should be reviewed, and at
least a certain level of agreement on our development community should
be reached.


I think we don't need to calculate drift ratio. We should define a 
threshold in seconds, let's say 2 seconds. Beyond that, there should not 
be any drift.


Do we support migration to a different host? We should, especially in 
this test too. The destination host reading should also be used.


Apart for that, good patchset, and good thing you refactored some of the 
code to shared utils.




Other than this concern that came to my mind, the new tests look good
and work fine here. I had to do a slight rebase in one of the patches,
very minor stuff. The default values and the drift calculation can be
changed on a later time. Thanks!
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: xp guest, blue screen c0000221 on boot

2009-10-27 Thread Avi Kivity


On 10/26/2009 11:09 PM, Andrew Olney wrote:

Hangs on boot, xp guest:

STOP: c221 Unknown Hard Error
 \SystemRoot\System32\ntdll.dll

Will boot into safe mode, but _not_ into safe mode with networking.



According to http://support.microsoft.com/kb/314474, this file is 
corrupted.  If you have another VM with the same service pack, you might 
try copying the file over in safe mode, or you can try extracting it 
from the service pack and copying it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Virtio block module slower than IDE

2009-10-27 Thread Avi Kivity


On 10/27/2009 02:13 AM, Floris Bos wrote:

Hi,

I am running Proxmox 1.4 (which uses the 2.6.30.1 kvm modules) and am
experiencing performance problems with Linux guests using the virtio_blk
module.
Especially with random IO it is a lot slower than IDE.


   


Try switching the host I/O scheduler to deadline.  You can do that at 
runtime in /sys/block/.../queue.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: -cpu host AMD Host

2009-10-27 Thread Avi Kivity


On 10/26/2009 09:55 PM, Martin Gallant wrote:

Is “–cpu host” supported on AMD hosts?

   


Yes.


Whenever I try to use this option on a Windows Vista/7 client, I get blue
screen.
Removing the option, the client works fine.

Host kernel 2.6.31.4.  Userspace is qemu-kvm-0.11.0.  (Previous versions
fail too)

/proc/cpuinfo snippet:
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
   


Please post the flags : field as well.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance regression in virtio-net in 2.6.32-rc4

2009-10-27 Thread Avi Kivity


On 10/26/2009 08:48 PM, Michael S. Tsirkin wrote:

Hi!
I noticed a performance regression in virtio net: going from
2.6.31 to 2.6.32-rc4 I see this, for guest to host communication:

Any tips on debugging this?
   


Lacking better advice, a bisect can help as a last resort.  'git bisect 
start -- drivers/net drivers/virtio' will probably find it fastest.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm problems on new hardware

2009-10-27 Thread Avi Kivity


On 10/26/2009 12:06 PM, Danny ter Haar wrote:

Hello,
I have a KVM virtualization problem.
I've put together new hardware (supermicro) server with 2 E5530 cpu's
and memory  disk to start experimenting with virtualization.

I intend to use the www.proxmox.com system/setup.
I installed proxmox and started stress testing the hardware:
parallel kernel compiles in a loop (concurrency_level=32)
memtest86+ during the night etc.
The hardware/os performs rocksolid when i stress test it, but the moment
i start a virtual guest (eg debian netinstall) i get the first screen of the
installation procedure in a vnc screen. I choose either normal install or
expert install , the guest screen goes blank with only a cursor and the
kvm process prints an error on the console and starts to eat cpu cycles.
So the host OS is not barfing, only the kvm process is giving problems and the
guest is frozen.

   


Does this happen for all guests (different OSes), or just this one?

Please provide a link to the install media.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2886941 ] Extreme slow down using -cpu host

2009-10-27 Thread SourceForge.net

Bugs item #2886941, was opened at 2009-10-27 09:26
Message generated for change (Comment added) made by avik
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: nwxi (nwxi)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extreme slow down using -cpu host

Initial Comment:
Hi!

I've currently experienced a massive slowdown when using flag -cpu host on an 
AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects 
network (tested e1000 and virtio) and i/o performance (virtio). Further there 
are dozens of log messages from kvm module complaining about:

 cpu0/1 unhandled rdmsr: 0xc0010055

Both, the slowdown and the rdmsr-message do not happen when using qemu64 as 
cpu, but also occur when using -cpu phenom. Thank you for your comments!

regards, Michael

--

Comment By: Avi Kivity (avik)
Date: 2009-10-27 11:52

Message:
This is fixed in 2.6.32. I'll send a patch to 2.6.31-stable.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2886941 ] Extreme slow down using -cpu host

2009-10-27 Thread SourceForge.net

Bugs item #2886941, was opened at 2009-10-27 09:26
Message generated for change (Settings changed) made by avik
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: qemu
Group: None
Status: Closed
Resolution: Fixed
Priority: 5
Private: No
Submitted By: nwxi (nwxi)
Assigned to: Nobody/Anonymous (nobody)
Summary: Extreme slow down using -cpu host

Initial Comment:
Hi!

I've currently experienced a massive slowdown when using flag -cpu host on an 
AMD Phenom 905e with qemu-kvm 0.11.0 and kernel 2.6.31.3 x86_64. This affects 
network (tested e1000 and virtio) and i/o performance (virtio). Further there 
are dozens of log messages from kvm module complaining about:

 cpu0/1 unhandled rdmsr: 0xc0010055

Both, the slowdown and the rdmsr-message do not happen when using qemu64 as 
cpu, but also occur when using -cpu phenom. Thank you for your comments!

regards, Michael

--

Comment By: Avi Kivity (avik)
Date: 2009-10-27 11:56

Message:
Should be fixed in 2.6.31.6 when it comes out.

--

Comment By: Avi Kivity (avik)
Date: 2009-10-27 11:52

Message:
This is fixed in 2.6.32. I'll send a patch to 2.6.31-stable.

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2886941group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: -cpu host AMD Host

2009-10-27 Thread Avi Kivity


On 10/27/2009 11:39 AM, Avi Kivity wrote:

On 10/26/2009 09:55 PM, Martin Gallant wrote:

Is “–cpu host” supported on AMD hosts?



Yes.

Whenever I try to use this option on a Windows Vista/7 client, I get 
blue

screen.
Removing the option, the client works fine.

Host kernel 2.6.31.4.  Userspace is qemu-kvm-0.11.0.  (Previous versions
fail too)

/proc/cpuinfo snippet:
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+


Please post the flags : field as well.



You might try the attached patch.


--
error compiling committee.c: too many arguments to function

From d8e9cf4f4f688873456b297d78be70c5c328489c Mon Sep 17 00:00:00 2001
From: Andre Przywara andre.przyw...@amd.com
Date: Wed, 24 Jun 2009 12:44:34 +0200
Subject: [PATCH -stable] KVM: ignore reads from AMDs C1E enabled MSR

If the Linux kernel detects an C1E capable AMD processor (K8 RevF and
higher), it will access a certain MSR on every attempt to go to halt.
Explicitly handle this read and return 0 to let KVM run a Linux guest
with the native AMD host CPU propagated to the guest.

Signed-off-by: Andre Przywara andre.przyw...@amd.com
Signed-off-by: Avi Kivity a...@redhat.com
(cherry picked from commit 1fdbd48c242db996107f72ae4140ffe8163e26a8)
---
 arch/x86/kvm/x86.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8aafb62..aa4c46f 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -949,6 +949,7 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
 	case MSR_P6_EVNTSEL0:
 	case MSR_P6_EVNTSEL1:
 	case MSR_K7_EVNTSEL0:
+	case MSR_K8_INT_PENDING_MSG:
 		data = 0;
 		break;
 	case MSR_MTRRcap:
-- 
1.6.5.2

Re: kvm problems on new hardware

2009-10-27 Thread Danny ter Haar

On Tue, 2009-10-27 at 11:45 +0200, Avi Kivity wrote:
 Does this happen for all guests (different OSes), or just this one?
I tried an iso of both X86_64 and i386 of debian.
I even burned the iso image to a real cd to try if it would boot (it
did)
Then i tried a rescuecd image: same behaviour.

 Please provide a link to the install media.

It's really a standard cd image i got from 

http://cdimage.debian.org/debian-cd/5.0.3/amd64/iso-cd/debian-503-amd64-netinst.iso

I opened a question at the proxmox forum (first):

http://www.proxmox.com/forum/showthread.php?t=2384

I just followed up with this posting:

http://www.proxmox.com/forum/showthread.php?p=13192#post13192

which may give more detailed/useful information.





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm problems on new hardware

2009-10-27 Thread Danny ter Haar

On Tue, 2009-10-27 at 11:45 +0200, Avi Kivity wrote:
 Does this happen for all guests (different OSes), or just this one?

I just tried a windows7 cd image 
Same error:

vmbr0: port 2(vmtab105i0) entering learning state
vmbr0: topology change detected, propagating
vmbr0: port 2(vmtab105i0) entering forwarding state
handle_exception: unexpected, vectoring info 0x8410 intr info
0x8b0d

I am thinking it's either a bios setting (i tried (afaik) all possible
combinations). I even have a not-yet-released version of the bios.
All still won't let me use virtualization.
Could it be i have faulty cpu's ?
Is there a way to debug what happens when ik start the kvm guest
process ?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm problems on new hardware

2009-10-27 Thread Avi Kivity


On 10/27/2009 12:27 PM, Danny ter Haar wrote:

On Tue, 2009-10-27 at 11:45 +0200, Avi Kivity wrote:
   

Does this happen for all guests (different OSes), or just this one?
 

I just tried a windows7 cd image
Same error:

vmbr0: port 2(vmtab105i0) entering learning state
vmbr0: topology change detected, propagating
vmbr0: port 2(vmtab105i0) entering forwarding state
handle_exception: unexpected, vectoring info 0x8410 intr info
0x8b0d
   


I'm not able to reproduce this on a similar processor.  Can you post 
your qemu command line?


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] kvm-kmod-2.6.31.5

2009-10-27 Thread Avi Kivity


On 10/27/2009 10:25 AM, Jan Kiszka wrote:

Wanted to attach this information to the tarball as well, but
sourceforge's FRS is still in a almost completely broken state. Sigh.
   


It can be done - upload the changelog file, make it as a changelog, and 
then go to the tarball and link the newly marked changelog.


Messy.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Jan Kiszka to maintain kvm-kmod

2009-10-27 Thread Avi Kivity


On 10/26/2009 07:51 PM, Jan Kiszka wrote:

That forecast already promises the next rain


That would be me.  I think it's easily fixable, either by emulating user 
return notifiers using preempt notifiers (sched_out - on_user_return), 
or by noping out user return notifiers completely and hooking the kvm 
sched_out callback to call kvm_on_user_return.  Of course, we won't see 
the performance gain, but that's expected.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: kvm problems on new hardware

2009-10-27 Thread Danny ter Haar

On Tue, 2009-10-27 at 12:34 +0200, Avi Kivity wrote:
 I'm not able to reproduce this on a similar processor.  Can you post 
 your qemu command line?

vhost1:/var/tmp# ps axuw |grep kvm
root  5843 42.7  0.0 621048 13560 ?Sl   11:35   0:06 /usr/bin/kvm 
-monitor unix:/var/run/qemu-server/105.mon,server,nowait -vnc 
unix:/var/run/qemu-server/105.vnc,password -pidfile 
/var/run/qemu-server/105.pid -daemonize -usbdevice tablet -name win7 -smp 
sockets=1,cores=1 -vga cirrus -tdf -localtime -rtc-td-hack -k en-us -drive 
file=/var/lib/vz/images/105/vm-105-disk-1.raw,if=ide,index=0,boot=on -drive 
file=/var/lib/vz/template/iso/windows7.iso,if=ide,index=2,media=cdrom -m 512 
-net tap,vlan=0,ifname=vmtab105i0,script=/var/lib/qemu-server/bridge-vlan -net 
nic,vlan=0,model=e1000,macaddr=BA:29:E3:CC:95:F6 -id 105 -cpuunits 1000


vhost1:/etc/qemu-server# cat 105.conf 

name: win7
ide2: local:iso/windows.iso,media=cdrom
vlan0: e1000=BA:29:E3:CC:95:F6
bootdisk: ide0
ostype: w2k8
ide0: local:105/vm-105-disk-1.raw
memory: 512
sockets: 1

I started the debian install again,
attached a strace and then hit enter in the vnc screen.
The output of the strace can be found here:
http://dth.net/supermicro/strace_debian_install

Don't know if it helps, _i_ don't know what i'm looking at ;-)




--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] kvm-kmod-2.6.31.5

2009-10-27 Thread Jan Kiszka

Avi Kivity wrote:
 On 10/27/2009 10:25 AM, Jan Kiszka wrote:
 Wanted to attach this information to the tarball as well, but
 sourceforge's FRS is still in a almost completely broken state. Sigh.

 
 It can be done - upload the changelog file, make it as a changelog, and 
 then go to the tarball and link the newly marked changelog.

Unless the web interface refuses to work with my Firefox (interestingly,
it works with IE6 under Windows?!) and the shell access reports the frs
folder as detached.

 
 Messy.
 

Indeed, beside the bugs, the new concept looks like a step backward: You
now have to download the changelog to view it.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: Jan Kiszka to maintain kvm-kmod

2009-10-27 Thread Jan Kiszka

Avi Kivity wrote:
 On 10/26/2009 07:51 PM, Jan Kiszka wrote:
 That forecast already promises the next rain
 
 That would be me.  I think it's easily fixable, either by emulating user 
 return notifiers using preempt notifiers (sched_out - on_user_return), 
 or by noping out user return notifiers completely and hooking the kvm 
 sched_out callback to call kvm_on_user_return.  Of course, we won't see 
 the performance gain, but that's expected.

I was playing with such an approach already. But something goes wrong
(GPF in kvm_set_shared_msr) when booting Win7. Will check without
kvm-kmod next.

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] [PATCH 1/2] KVM-test: Add execute permission to qemu-ifup script

2009-10-27 Thread Lucas Meneghel Rodrigues

Ooops fixed, thanks!

On Tue, Oct 27, 2009 at 2:07 AM, Amos Kong ak...@redhat.com wrote:

 qemu-ifup is a script for setting network bridge.
 If no execute permission, always face this problem:

 autotest/client/tests/kvm/scripts/qemu-ifup: could not launch network script
 Could not initialize device 'tap

 Signed-off-by: Amos Kong ak...@redhat.com
 ---
  0 files changed, 0 insertions(+), 0 deletions(-)
  mode change 100644 = 100755 client/tests/kvm/scripts/qemu-ifup

 diff --git a/client/tests/kvm/scripts/qemu-ifup 
 b/client/tests/kvm/scripts/qemu-ifup
 old mode 100644
 new mode 100755
 --
 1.5.5.6

 ___
 Autotest mailing list
 autot...@test.kernel.org
 http://test.kernel.org/cgi-bin/mailman/listinfo/autotest




-- 
Lucas
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

BUG with Win7 and user-return-notifier

2009-10-27 Thread Jan Kiszka

Hi Avi,

just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test
of Windows 7. Already during Starting Windows I get this:

...
general protection fault:  [#1] PREEMPT SMP
last sysfs file: 
/sys/devices/LNXSYSTM:00/LNXSYBUS:00/PNP0C0A:01/power_supply/CMB2/charge_full
CPU 0
Modules linked in: kvm_intel kvm ip6t_LOG xt_pkttype xt_TCPMSS xt_tcpudp 
ipt_LOG xt_limit iptable_nat nf_nat xt_physdev sco bridge stp bnep rfcomm l2cap 
snd_pcm_oss snd_mixer_oss crc16 snd_seq coretemp snd_seq_device i915 
drm_kms_helper drm i2c_algo_bit af_packet ip6t_REJECT nf_conntrack_ipv6 
ip6table_raw xt_NOTRACK ipt_REJECT xt_state iptable_raw iptable_filter 
ip6table_mangle nf_conntrack_netbios_ns nf_conntrack_ipv4 nf_conntrack 
nf_defrag_ipv4 ip_tables ip6table_filter ip6_tables x_tables ipv6 microcode 
fuse ohci_hcd loop snd_hda_codec_realtek arc4 ecb snd_hda_intel ath5k 
snd_hda_codec mac80211 ath snd_hwdep btusb pcmcia snd_pcm sdhci_pci video 
snd_timer cfg80211 iTCO_wdt rtc_cmos ppdev bluetooth sdhci yenta_socket 
iTCO_vendor_support i2c_i801 snd rsrc_nonstatic fujitsu_laptop rtc_core 
soundcore mmc_core output ohci1394 parport_pc battery rfkill ac rtc_lib 
i2c_core snd_page_alloc pcmcia_core intel_agp parport joydev serio_raw 
led_class ieee1394 pcspkr button sky2 sg sha256
_generic aes_generic cbc dm_crypt usbhid hid ehci_hcd uhci_hcd sd_mod 
crc_t10dif usbcore dm_snapshot dm_mod edd ext3 mbcache jbd fan thermal_sys 
hwmon ata_piix ahci libata scsi_mod
Pid: 7404, comm: qemu-system-x86 Not tainted 2.6.32-rc5 #8 LIFEBOOK E8110
RIP: 0010:[a05853ad]  [a05853ad] 
kvm_set_shared_msr+0x51/0x7a [kvm]
RSP: 0018:880060947cc8  EFLAGS: 00010246
RAX:  RBX: 0003 RCX: c080
RDX:  RSI:  RDI: 0003
RBP: 880060947ce8 R08:  R09: 
R10: 0012 R11: 880062b01158 R12: 88000601dad0
R13:  R14: 880062b00058 R15: 880062b00059
FS:  7f91606df950() GS:88000600() knlGS:
CS:  0010 DS: 002b ES: 002b CR0: 80050033
CR2:  CR3: 665ab000 CR4: 26f0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process qemu-system-x86 (pid: 7404, threadinfo 880060946000, task 
880062989950)
Stack:
 60947cf8 0004 880062b0 
0 880060947d08 a05d1ef9 880060991f80 880062b0
0 880060947dd8 a0582e40 880060947d38 0001db70
Call Trace:
 [a05d1ef9] vmx_save_host_state+0x141/0x150 [kvm_intel]
 [a0582e40] kvm_arch_vcpu_ioctl_run+0x510/0xb25 [kvm]
 [a0571dda] kvm_vcpu_ioctl+0xfb/0x722 [kvm]
 [a05743de] ? kvm_vm_ioctl+0x33a/0x36b [kvm]
 [81330022] ? sub_preempt_count+0x9/0x83
 [810cb698] ? fire_user_return_notifiers+0x50/0x6b
 [81122bf4] vfs_ioctl+0x2f/0x7d
 [81123112] do_vfs_ioctl+0x450/0x48d
 [81330022] ? sub_preempt_count+0x9/0x83
 [811231a9] sys_ioctl+0x5a/0x7c
 [8100bc5b] system_call_fastpath+0x16/0x1b
Code: 03 24 c5 60 a8 6f 81 89 d8 48 8d 50 04 4d 3b 2c d4 74 38 4d 89 2c d4 48 
c1 e0 04 4c 89 ea 8b 88 f8 50 5a a0 48 c1 ea 20 44 89 e8 0f 30 41 80 7c 24 18 
00 75 16 49 c7 04 24 d6 53 58 a0 4c 89 e7
RIP  [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm]
 RSP 880060947cc8
---[ end trace 44d1410c7cb2d885 ]---
note: qemu-system-x86[7404] exited with preempt_count 1


So the problem is not kvm-kmod related. Any ideas? Need .config or other
additional information?

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG with Win7 and user-return-notifier

2009-10-27 Thread Avi Kivity


On 10/27/2009 02:52 PM, Jan Kiszka wrote:

Hi Avi,

just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test
of Windows 7. Already during Starting Windows I get this:
   


x86 or x64 7?


RAX:  RBX: 0003 RCX: c080
RDX:  RSI:  RDI: 0003

Call Trace:
  [a05d1ef9] vmx_save_host_state+0x141/0x150 [kvm_intel]
  [a0582e40] kvm_arch_vcpu_ioctl_run+0x510/0xb25 [kvm]
  [a0571dda] kvm_vcpu_ioctl+0xfb/0x722 [kvm]
  [a05743de] ? kvm_vm_ioctl+0x33a/0x36b [kvm]
  [81330022] ? sub_preempt_count+0x9/0x83
  [810cb698] ? fire_user_return_notifiers+0x50/0x6b
  [81122bf4] vfs_ioctl+0x2f/0x7d
  [81123112] do_vfs_ioctl+0x450/0x48d
  [81330022] ? sub_preempt_count+0x9/0x83
  [811231a9] sys_ioctl+0x5a/0x7c
  [8100bc5b] system_call_fastpath+0x16/0x1b
Code: 03 24 c5 60 a8 6f 81 89 d8 48 8d 50 04 4d 3b 2c d4 74 38 4d 89 2c d4 48 c1 e0 
04 4c 89 ea 8b 88 f8 50 5a a0 48 c1 ea 20 44 89 e80f  30 41 80 7c 24 18 00 75 
16 49 c7 04 24 d6 53 58 a0 4c 89 e7
RIP  [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm]
   


That's a wrmsr(EFER, 0), which should definitely fault.  
update_transition_efer() should have taken care of it.


--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG with Win7 and user-return-notifier

2009-10-27 Thread Jan Kiszka

Avi Kivity wrote:
 On 10/27/2009 02:52 PM, Jan Kiszka wrote:
 Hi Avi,

 just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test
 of Windows 7. Already during Starting Windows I get this:

 
 x86 or x64 7?

x64.

 
 RAX:  RBX: 0003 RCX: c080
 RDX:  RSI:  RDI: 0003

 Call Trace:
   [a05d1ef9] vmx_save_host_state+0x141/0x150 [kvm_intel]
   [a0582e40] kvm_arch_vcpu_ioctl_run+0x510/0xb25 [kvm]
   [a0571dda] kvm_vcpu_ioctl+0xfb/0x722 [kvm]
   [a05743de] ? kvm_vm_ioctl+0x33a/0x36b [kvm]
   [81330022] ? sub_preempt_count+0x9/0x83
   [810cb698] ? fire_user_return_notifiers+0x50/0x6b
   [81122bf4] vfs_ioctl+0x2f/0x7d
   [81123112] do_vfs_ioctl+0x450/0x48d
   [81330022] ? sub_preempt_count+0x9/0x83
   [811231a9] sys_ioctl+0x5a/0x7c
   [8100bc5b] system_call_fastpath+0x16/0x1b
 Code: 03 24 c5 60 a8 6f 81 89 d8 48 8d 50 04 4d 3b 2c d4 74 38 4d 89 2c d4 
 48 c1 e0 04 4c 89 ea 8b 88 f8 50 5a a0 48 c1 ea 20 44 89 e80f  30 41 80 7c 
 24 18 00 75 16 49 c7 04 24 d6 53 58 a0 4c 89 e7
 RIP  [a05853ad] kvm_set_shared_msr+0x51/0x7a [kvm]

 
 That's a wrmsr(EFER, 0), which should definitely fault.  
 update_transition_efer() should have taken care of it.
 

Jan

-- 
Siemens AG, Corporate Technology, CT SE 2
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG with Win7 and user-return-notifier

2009-10-27 Thread Avi Kivity


On 10/27/2009 03:13 PM, Jan Kiszka wrote:

Avi Kivity wrote:
   

On 10/27/2009 02:52 PM, Jan Kiszka wrote:
 

Hi Avi,

just booted kvm.git master (974ae8d7ff) as host and re-ran my boot test
of Windows 7. Already during Starting Windows I get this:

   

x86 or x64 7?
 

x64.
   


Worked for me - getting to the initial prompt.  Do you have

   CONFIG_USER_RETURN_NOTIFIER=y

in your .config?

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: BUG with Win7 and user-return-notifier

2009-10-27 Thread Avi Kivity


On 10/27/2009 03:24 PM, Avi Kivity wrote:


Worked for me - getting to the initial prompt.  Do you have

   CONFIG_USER_RETURN_NOTIFIER=y

in your .config?



If you do, send your own .config, will try to reproduce.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Hi Paul,

Paul E. McKenney wrote:
 On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();

kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();

   kfree(irq_rt);

irq_rt-entry-set(); /* bad */

 -

 Because the pointer is accessed outside of the read-side critical
 section.  There are two basic patterns we can use to fix this bug:

 1) Switch to sleeping-rcu and encompass the -set() access within the
read-side critical section,

OR

 2) Add reference counting to the irq_rt structure, and simply acquire
the reference from within the RSCS.

 This patch implements solution (1).
 
 Looks like a good transformation!  A few questions interspersed below.

Thanks for the review.  I would have CC'd you but I figured I pestered
you enough with my RCU reviews in the past, and didn't want to annoy you ;)

I will be sure to CC you in the future, unless you ask otherwise.

 
 Signed-off-by: Gregory Haskins ghask...@novell.com
 ---

  include/linux/kvm_host.h |6 +-
  virt/kvm/irq_comm.c  |   50 
 +++---
  virt/kvm/kvm_main.c  |1 +
  3 files changed, 35 insertions(+), 22 deletions(-)

 diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
 index bd5a616..1fe135d 100644
 --- a/include/linux/kvm_host.h
 +++ b/include/linux/kvm_host.h
 @@ -185,7 +185,10 @@ struct kvm {

  struct mutex irq_lock;
  #ifdef CONFIG_HAVE_KVM_IRQCHIP
 -struct kvm_irq_routing_table *irq_routing;
 +struct {
 +struct srcu_structsrcu;
 
 Each structure has its own SRCU domain.  This is OK, but just asking
 if that is the intent.  It does look like the SRCU primitives are
 passed a pointer to the correct structure, and that the return value
 from srcu_read_lock() gets passed into the matching srcu_read_unlock()
 like it needs to be, so that is good.

Yeah, it was intentional.  Technically the table is per-guest, and thus
the locking is too, which is the desired/intentional granularity.

On that note, I tried to denote that kvm-irq_routing.srcu and
kvm-irq_routing.table were related to one another, but then went ahead
and modified the hunks that touched kvm-irq_ack_notifier_list, too.  In
retrospect, this was probably a mistake.  I should leave the rcu usage
outside of -irq_routing.table alone.

 
 +struct kvm_irq_routing_table *table;
 +} irq_routing;
  struct hlist_head mask_notifier_list;
  struct hlist_head irq_ack_notifier_list;
  #endif
 
 [ . . . ]
 
 @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, 
 u32 irq, int level)
   * IOAPIC.  So set the bit in both. The guest will ignore
   * writes to the unused one.
   */
 -rcu_read_lock();
 -irq_rt = rcu_dereference(kvm-irq_routing);
 +idx = srcu_read_lock(kvm-irq_routing.srcu);
 +irq_rt = rcu_dereference(kvm-irq_routing.table);
  if (irq  irq_rt-nr_rt_entries)
 -hlist_for_each_entry(e, n, irq_rt-map[irq], link)
 -irq_set[i++] = *e;
 -rcu_read_unlock();
 +hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
 
 What prevents the above list from changing while we are traversing it?
 (Yes, presumably whatever was preventing it from changing before this
 patch, but what?)
 
 Mostly kvm-lock is held, but not always.  And if kvm-lock were held
 all the time, there would be no point in using SRCU.  ;-)

This is protected by kvm-irq_lock within kvm_set_irq_routing().
Entries are added to a copy of the list, and the top-level table pointer
is swapped (via rcu_assign_pointer(), as it should be) while holding the
lock.  Finally, we synchronize with the RSCS before deleting the old
copy.  It looks to me like the original author got this part right, so I
didn't modify it outside of converting to SRCU.

 
 +int r;

 -while(i--) {
 -int r;
 -r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level);
 -if (r  0)
 -continue;
 +r = e-set(e, kvm, irq_source_id, level);
 +if (r  0)
 +continue;

 -ret = r + ((ret  0) ? 0 : ret);
 -}
 +ret = r + ((ret  0) ? 0 : ret);
 +}
 +

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();

kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();

   kfree(irq_rt);

irq_rt-entry-set(); /* bad */

 This is not what happens. irq_rt is never accessed outside read-side
 critical section.

Sorry, I was generalizing to keep the comments short.  I figured it
would be clear what I was actually saying, but realize in retrospect
that I was a little ambiguous.

Yes, irq_rt is not accessed outside the RSCS.  However, the entry
pointers stored in the irq_rt-map are, and this is equally problematic
afaict.

In this particular case we seem to never delete entries at run-time once
they are established.  Therefore, while perhaps sloppy, its technically
safe to leave them unprotected from this perspective.  The issue is more
related to shutdown since a kvm_set_irq() caller could be within the
aforementioned race-region and call entry-set() after the guest is
gone.  Or did I miss something?

 Data is copied from irq_rt onto the stack and this copy is accessed
 outside critical section.

As mentioned above, I do not believe this really protect us.  And even
if it did, the copy is just a work-around to avoid sleeping within the
standard RCU RSCS, which is what SRCU is designed for.  So rather than
inventing an awkward two-phased stack based solution, it's better to
reuse the provided tools, IMO.

To flip it around:  Is there any reason why an SRCU would not work here,
and thus we were forced to use something like the stack-copy approach?

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-27 Thread Alexander Graf



On 27.10.2009, at 09:56, Avi Kivity wrote:


On 10/27/2009 01:21 AM, Olof Johansson wrote:

On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:



For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard holl...@us.ibm.com


I saw it when building with pasemi_defconfig + manually enabled KVM  
options (all available).




Alex, can you fold this patch in?


I can, but it's only partly related. My patches don't even touch  
timing.c. The only thing I can imagine resulting in a breakage is that  
my patches allow for an =M setting.


So IMHO this patch should be applied before my series. Should I stick  
it as first patch in my git tree?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

RE: -cpu host AMD Host

2009-10-27 Thread Martin Gallant

I still see the problem with the kernel patch applied.

/proc/cpuinfo attached

Will post a formal bug report with all dumps in a few minutes.

Thanks for the quick response.

--
Marty

-Original Message-
From: Avi Kivity [mailto:a...@redhat.com] 
Sent: Tuesday, October 27, 2009 4:58 AM
To: Martin Gallant
Cc: kvm@vger.kernel.org
Subject: Re: -cpu host AMD Host

On 10/27/2009 11:39 AM, Avi Kivity wrote:
 On 10/26/2009 09:55 PM, Martin Gallant wrote:
 Is -cpu host supported on AMD hosts?


 Yes.

 Whenever I try to use this option on a Windows Vista/7 client, I get 
 blue screen.
 Removing the option, the client works fine.

 Host kernel 2.6.31.4.  Userspace is qemu-kvm-0.11.0.  (Previous 
 versions fail too)

 /proc/cpuinfo snippet:
 vendor_id   : AuthenticAMD
 cpu family  : 15
 model   : 107
 model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+

 Please post the flags : field as well.


You might try the attached patch.


--
error compiling committee.c: too many arguments to function

processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
stepping: 2
cpu MHz : 2700.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 0
cpu cores   : 2
apicid  : 0
initial apicid  : 0
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 
3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic 
cr8_legacy 3dnowprefetch
bogomips: 5429.84
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

processor   : 1
vendor_id   : AuthenticAMD
cpu family  : 15
model   : 107
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 5200+
stepping: 2
cpu MHz : 2700.000
cache size  : 512 KB
physical id : 0
siblings: 2
core id : 1
cpu cores   : 2
apicid  : 1
initial apicid  : 1
fpu : yes
fpu_exception   : yes
cpuid level : 1
wp  : yes
flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov 
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 
3dnowext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic 
cr8_legacy 3dnowprefetch
bogomips: 5429.84
TLB size: 1024 4K pages
clflush size: 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc 100mhzsteps

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Gregory Haskins wrote:
 Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();

kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();

   kfree(irq_rt);

irq_rt-entry-set(); /* bad */

 This is not what happens. irq_rt is never accessed outside read-side
 critical section.
 
 Sorry, I was generalizing to keep the comments short.  I figured it
 would be clear what I was actually saying, but realize in retrospect
 that I was a little ambiguous.

Here is a revised problem statement

thread-1thread-2
---

kvm_set_irq() {
   rcu_read_lock()
   irq_rt = rcu_dereference(table);
   entry_cache = get_entries(irq_rt);
   rcu_read_unlock();

invalidate_entries(irq_rt);

   for_each_entry(entry_cache)
  entry-set(); /* bad */

-


invalidate_entries() may be any operation that deletes an entry at
run-time (doesn't exist today), or as the guest is shutting down.  As
far as I can tell, the current code does not protect us from either
condition, and my proposed patch protects us from both.  Did I miss
anything?

HTH
-Greg




signature.asc
Description: OpenPGP digital signature

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gleb Natapov

On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:
 Gleb Natapov wrote:
  On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
  The current code suffers from the following race condition:
 
  thread-1thread-2
  ---
 
  kvm_set_irq() {
 rcu_read_lock()
 irq_rt = rcu_dereference(table);
 rcu_read_unlock();
 
 kvm_set_irq_routing() {
mutex_lock();
irq_rt = table;
rcu_assign_pointer();
mutex_unlock();
synchronize_rcu();
 
kfree(irq_rt);
 
 irq_rt-entry-set(); /* bad */
 
  This is not what happens. irq_rt is never accessed outside read-side
  critical section.
 
 Sorry, I was generalizing to keep the comments short.  I figured it
 would be clear what I was actually saying, but realize in retrospect
 that I was a little ambiguous.
 
A little is underestimation :) There is not /* bad */ line in the code!

 Yes, irq_rt is not accessed outside the RSCS.  However, the entry
 pointers stored in the irq_rt-map are, and this is equally problematic
 afaict.
The pointer is in text and can't disappear without kvm_set_irq()
disappearing too.

 
 In this particular case we seem to never delete entries at run-time once
 they are established.  Therefore, while perhaps sloppy, its technically
 safe to leave them unprotected from this perspective.  The issue is more
 related to shutdown since a kvm_set_irq() caller could be within the
 aforementioned race-region and call entry-set() after the guest is
 gone.  Or did I miss something?
 
The caller of kvm_set_irq() should hold reference to kvm instance, so it
can't disappear while you are inside kvm_set_irq(). RCU protects only
kvm-irq_routing not kvm structure itself.

  Data is copied from irq_rt onto the stack and this copy is accessed
  outside critical section.
 
 As mentioned above, I do not believe this really protect us.  And even
I don't see the prove it doesn't, so I assume it does.

 if it did, the copy is just a work-around to avoid sleeping within the
It is not a work-around. There was two solutions to the problem one is
to call -set() outside rcu critical section another is to use SRCU. I
decided to use the first one. This way the code is much simpler and I
remember asking Paul what are the disadvantages of using SRCU and there
was something.

 standard RCU RSCS, which is what SRCU is designed for.  So rather than
 inventing an awkward two-phased stack based solution, it's better to
 reuse the provided tools, IMO.
 
 To flip it around:  Is there any reason why an SRCU would not work here,
 and thus we were forced to use something like the stack-copy approach?
 
If SRCU has no disadvantage comparing to RCU why not use it always? :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[ kvm-Bugs-2887189 ] -cpu host Blue-Screens M$ Guests

2009-10-27 Thread SourceForge.net

Bugs item #2887189, was opened at 2009-10-27 09:04
Message generated for change (Tracker Item Submitted) made by martyg7
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2887189group_id=180599

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: amd
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Martin Gallant (martyg7)
Assigned to: Nobody/Anonymous (nobody)
Summary: -cpu host Blue-Screens M$ Guests

Initial Comment:
On my 64-bit AMD host, whenever I start a Windows guest machine with the -cpu 
host flag set, the guest blue-screens during startup.
I have reproduced this with 32-bit vista, 32-bit W7, and 64-bit W7 boot disks.
Everything is fine with the -cpu host flag removed - I have been running this 
setup since the beginning of 2009.

CPU - AMD 64 x2 (/proc/cpuinfo attached)
Userspace - qemu-kvm-0.11.0
Kernel - v2.6.31.5 (amd64)
Invocations - attached
Problem persists with -no-kvm-irqchip or -no-kvm-pit switch
With -no-kvm, invocation returns Unable to find x86 CPU definition

I keep up-to-date git trees for the kernel and kvm userspace - Will gladly test 
patches

--

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detailatid=893831aid=2887189group_id=180599
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gleb Natapov

On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote:
 Gregory Haskins wrote:
  Gleb Natapov wrote:
  On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
  The current code suffers from the following race condition:
 
  thread-1thread-2
  ---
 
  kvm_set_irq() {
 rcu_read_lock()
 irq_rt = rcu_dereference(table);
 rcu_read_unlock();
 
 kvm_set_irq_routing() {
mutex_lock();
irq_rt = table;
rcu_assign_pointer();
mutex_unlock();
synchronize_rcu();
 
kfree(irq_rt);
 
 irq_rt-entry-set(); /* bad */
 
  This is not what happens. irq_rt is never accessed outside read-side
  critical section.
  
  Sorry, I was generalizing to keep the comments short.  I figured it
  would be clear what I was actually saying, but realize in retrospect
  that I was a little ambiguous.
 
 Here is a revised problem statement
 
 thread-1thread-2
 ---
 
 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
entry_cache = get_entries(irq_rt);
rcu_read_unlock();
 
 invalidate_entries(irq_rt);
 
for_each_entry(entry_cache)
 entry-set(); /* bad */
 
 -
 
 
 invalidate_entries() may be any operation that deletes an entry at
 run-time (doesn't exist today), or as the guest is shutting down.  As
 far as I can tell, the current code does not protect us from either
 condition, and my proposed patch protects us from both.  Did I miss
 anything?
 
Yes. What happened to irq_rt is completely irrelevant at the point you
marked /* bad */.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

2009-10-27 Thread Shirley Ma

Hello Michael,

On Tue, 2009-10-27 at 08:43 +0200, Michael S. Tsirkin wrote:
 At some point my guest had a runaway nash-hotplug process
 consuming 100% CPU. Could you please verify this
 does not happen to you?

What I have found that the start_xmit stopped and restarted too often.
There is no vring descriptor available for adding the new buf. The buf
release is not able to keep up after vhost patch?

I saw lots of Unexpected full queue from dmesg. I haven't had time to
debug it yet. You might have some idea here?

/* This can happen with OOM and indirect buffers. */
if (unlikely(capacity  0)) {
netif_stop_queue(dev);
dev_warn(dev-dev, Unexpected full queue\n);
if (unlikely(!vi-svq-vq_ops-enable_cb(vi-svq))) {
vi-svq-vq_ops-disable_cb(vi-svq);
netif_start_queue(dev);
goto again;
}
return NETDEV_TX_BUSY;
}

Thanks
Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Gleb Natapov wrote:
 On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:
 Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();

kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();

   kfree(irq_rt);

irq_rt-entry-set(); /* bad */

 This is not what happens. irq_rt is never accessed outside read-side
 critical section.
 Sorry, I was generalizing to keep the comments short.  I figured it
 would be clear what I was actually saying, but realize in retrospect
 that I was a little ambiguous.

 A little is underestimation :) There is not /* bad */ line in the code!

Sorry, that was my own highlighting, not trying to reflect actual code.

 
 Yes, irq_rt is not accessed outside the RSCS.  However, the entry
 pointers stored in the irq_rt-map are, and this is equally problematic
 afaict.
 The pointer is in text and can't disappear without kvm_set_irq()
 disappearing too.

No, the entry* pointer is .text _AND_ .data, and is subject to standard
synchronization rules like most other objects.

Unless I am misreading the code, the entry* pointers point to heap
within the irq_rt pointer.  Therefore, the kfree(irq_rt) I mention
above effectively invalidates the entire set of entry* pointers that you
are caching, and is thus an issue.

 
 In this particular case we seem to never delete entries at run-time once
 they are established.  Therefore, while perhaps sloppy, its technically
 safe to leave them unprotected from this perspective.

Note: I was wrong in this statement.  I forgot that it's not safe at
run-time either since the entry objects are part of irq_rt.

 The issue is more
 related to shutdown since a kvm_set_irq() caller could be within the
 aforementioned race-region and call entry-set() after the guest is
 gone.  Or did I miss something?

 The caller of kvm_set_irq() should hold reference to kvm instance, so it
 can't disappear while you are inside kvm_set_irq(). RCU protects only
 kvm-irq_routing not kvm structure itself.

Agreed, but this has nothing to do with protecting the entry* pointers.

 
 Data is copied from irq_rt onto the stack and this copy is accessed
 outside critical section.
 As mentioned above, I do not believe this really protect us.  And even
 I don't see the prove it doesn't, so I assume it does.

What would you like to see beyond what I've already provided you?  I can
show how the entry pointers are allocated as part of the irq_rt, and I
can show how the irq_rt (via entry-set) access is racy against
invalidation.

 
 if it did, the copy is just a work-around to avoid sleeping within the
 It is not a work-around. There was two solutions to the problem one is
 to call -set() outside rcu critical section

This is broken afaict without taking additional precautions, such as a
reference count on the irq_rt structure, but I mentioned this alternate
solution in my header.

 another is to use SRCU. I
 decided to use the first one. This way the code is much simpler

simpler is debatable, but ok.  SRCU is an established pattern
available in the upstream kernel, so I don't think its particularly
complicated or controversial to use.

 and I remember asking Paul what are the disadvantages of using SRCU and there
 was something.
 

The disadvantages to my knowledge are as follows:

1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but
we are talking about nanoseconds on modern hardware (I think Paul quoted
me 10ns vs 45ns on his rig).  I don't think either overhead is something
to be concerned about in this case.

2) standard rcu supports deferred synchronization (call_rcu()), as well
as barriers (synchronize_rcu()), whereas SRCU only supports barriers
(synchronize_srcu()).  We only use the barrier type in this code path,
so that is not an issue.

3) SRCU requires explicit initialization/cleanup, whereas regular RCU
does not.  Trivially solved in my patch since KVM has plenty of
init/cleanup hook points.

 standard RCU RSCS, which is what SRCU is designed for.  So rather than
 inventing an awkward two-phased stack based solution, it's better to
 reuse the provided tools, IMO.

 To flip it around:  Is there any reason why an SRCU would not work here,
 and thus we were forced to use something like the stack-copy approach?

 If SRCU has no disadvantage comparing to RCU why not use it always? :)

No one is debating that

Re: vhost-net patches

2009-10-27 Thread Shirley Ma

On Tue, 2009-10-27 at 08:38 +0200, Michael S. Tsirkin wrote:
 Yes but you need to make host send packets out to tap as well,
 somehow. One way to do this is to assign IP address in
 a separate subnet to tap in host and to eth device in guest.

Thanks for the hint, I will make a try.

Shirley

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Paul E. McKenney

On Tue, Oct 27, 2009 at 04:02:37PM +0200, Gleb Natapov wrote:
 On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:

[ . . . ]

  standard RCU RSCS, which is what SRCU is designed for.  So rather than
  inventing an awkward two-phased stack based solution, it's better to
  reuse the provided tools, IMO.
  
  To flip it around:  Is there any reason why an SRCU would not work here,
  and thus we were forced to use something like the stack-copy approach?
  
 If SRCU has no disadvantage comparing to RCU why not use it always? :)

The disadvantages of SRCU compared to RCU include the following:

1.  SRCU requires that the return value of srcu_read_lock()
be fed into srcu_read_unlock().  This is usually not a problem,
but can be painful if there are multiple levels of function
call separating the two.

2.  SRCU's grace periods are about 4x slower than those of RCU.
And they also don't scale all that well with extremely large
numbers of CPUs (but this can be fixed when/if it becomes a
real problem).

3.  SRCU's read-side primitives are also significantly slower than
those of RCU.

4.  SRCU does not have a call_srcu().  One could be provided, but
its semantics would be a bit strange due to the need to limit
the number of callbacks, given that general blocking is
permitted in SRCU read-side critical sections.  (And it would
take some doing to convince me to supply an SRCU!)

5.  The current SRCU has no reasonable way to implement read-side
priority boosting, as there is no record of which task
is read-holding which SRCU.

Hey, you asked!  ;-)

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Gleb Natapov wrote:
 On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote:
 Gregory Haskins wrote:
 Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();

kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();

   kfree(irq_rt);

irq_rt-entry-set(); /* bad */

 This is not what happens. irq_rt is never accessed outside read-side
 critical section.
 Sorry, I was generalizing to keep the comments short.  I figured it
 would be clear what I was actually saying, but realize in retrospect
 that I was a little ambiguous.
 Here is a revised problem statement

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
entry_cache = get_entries(irq_rt);
rcu_read_unlock();

 invalidate_entries(irq_rt);

for_each_entry(entry_cache)
entry-set(); /* bad */

 -


 invalidate_entries() may be any operation that deletes an entry at
 run-time (doesn't exist today), or as the guest is shutting down.  As
 far as I can tell, the current code does not protect us from either
 condition, and my proposed patch protects us from both.  Did I miss
 anything?

 Yes. What happened to irq_rt is completely irrelevant at the point you
 marked /* bad */.

kfree() happened to irq_rt, and thus to the objects behind the pointers
in entry_cache at the point I marked /* bad */.

That certainly isn't /* good */ ;)

-Greg



signature.asc
Description: OpenPGP digital signature

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Thanks for this, Paul.

Some questions and statements below.

Paul E. McKenney wrote:
 On Tue, Oct 27, 2009 at 04:02:37PM +0200, Gleb Natapov wrote:
 On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:
 
 [ . . . ]
 
 standard RCU RSCS, which is what SRCU is designed for.  So rather than
 inventing an awkward two-phased stack based solution, it's better to
 reuse the provided tools, IMO.

 To flip it around:  Is there any reason why an SRCU would not work here,
 and thus we were forced to use something like the stack-copy approach?

 If SRCU has no disadvantage comparing to RCU why not use it always? :)
 
 The disadvantages of SRCU compared to RCU include the following:
 
 1.SRCU requires that the return value of srcu_read_lock()
   be fed into srcu_read_unlock().  This is usually not a problem,
   but can be painful if there are multiple levels of function
   call separating the two.

Right, and this is simple/neat w.r.t. its usage in irq_routing, so no
problem there.

 
 2.SRCU's grace periods are about 4x slower than those of RCU.
   And they also don't scale all that well with extremely large
   numbers of CPUs (but this can be fixed when/if it becomes a
   real problem).

The irq_routing update path is extremely infrequent, so this should not
be an issue.

 
 3.SRCU's read-side primitives are also significantly slower than
   those of RCU.
 

Are the 10ns vs 45ns numbers that I mentioned in my last reply the
proper ballpark?  How do these compare to an atomic-op, say an
uncontended spinlock on modern hardware?  The assumption is that
srcu_read_lock() should be significantly cheaper than a read-lock().  If
its not, then we might as well use something else, I suppose.  But if
its not, I guess you probably wouldn't have bothered to invent it in the
first place ;)

 4.SRCU does not have a call_srcu().  One could be provided, but
   its semantics would be a bit strange due to the need to limit
   the number of callbacks, given that general blocking is
   permitted in SRCU read-side critical sections.  (And it would
   take some doing to convince me to supply an SRCU!)

This is not an issue in our design.

 
 5.The current SRCU has no reasonable way to implement read-side
   priority boosting, as there is no record of which task
   is read-holding which SRCU.

Given the infrequency of the update path, I do not see this as a problem.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: Virtio block module slower than IDE

2009-10-27 Thread Floris Bos


Hi,



On Tue, 27 Oct 2009 11:34:57 +0200, Avi Kivity a...@redhat.com wrote:

 On 10/27/2009 02:13 AM, Floris Bos wrote:

 I am running Proxmox 1.4 (which uses the 2.6.30.1 kvm modules) and am

 experiencing performance problems with Linux guests using the virtio_blk

 module.

 Especially with random IO it is a lot slower than IDE.



 

 Try switching the host I/O scheduler to deadline.  You can do that at 

 runtime in /sys/block/.../queue.



Thanks for the tip.

Seems to improve random IO performance by 11 times!



==

Version 1.03c   --Sequential Output-- --Sequential Input-

--Random-

-Per Chr- --Block-- -Rewrite- -Per Chr- --Block--

--Seeks--

MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec

%CP

ubuntu910   16G 38776  97 61414   5 32451   7 34711  73 95325   9  7174

 23

--Sequential Create-- Random

Create

-Create-- --Read--- -Delete-- -Create-- --Read---

-Delete--

  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec

%CP

 16 + +++ + +++ + +++ + +++ + +++ +

+++

ubuntu910,16G,38776,97,61414,5,32451,7,34711,73,95325,9,7174.2,23,16,+,+++,+,+++,+,+++,+,+++,+,+++,+,+++

==





Yours sincerely,



Floris Bos
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gleb Natapov

On Tue, Oct 27, 2009 at 10:50:45AM -0400, Gregory Haskins wrote:
 Gleb Natapov wrote:
  On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote:
  Gregory Haskins wrote:
  Gleb Natapov wrote:
  On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
  The current code suffers from the following race condition:
 
  thread-1thread-2
  ---
 
  kvm_set_irq() {
 rcu_read_lock()
 irq_rt = rcu_dereference(table);
 rcu_read_unlock();
 
 kvm_set_irq_routing() {
mutex_lock();
irq_rt = table;
rcu_assign_pointer();
mutex_unlock();
synchronize_rcu();
 
kfree(irq_rt);
 
 irq_rt-entry-set(); /* bad */
 
  This is not what happens. irq_rt is never accessed outside read-side
  critical section.
  Sorry, I was generalizing to keep the comments short.  I figured it
  would be clear what I was actually saying, but realize in retrospect
  that I was a little ambiguous.
  Here is a revised problem statement
 
  thread-1thread-2
  ---
 
  kvm_set_irq() {
 rcu_read_lock()
 irq_rt = rcu_dereference(table);
 entry_cache = get_entries(irq_rt);
 rcu_read_unlock();
 
  invalidate_entries(irq_rt);
 
 for_each_entry(entry_cache)
   entry-set(); /* bad */
 
  -
 
 
  invalidate_entries() may be any operation that deletes an entry at
  run-time (doesn't exist today), or as the guest is shutting down.  As
  far as I can tell, the current code does not protect us from either
  condition, and my proposed patch protects us from both.  Did I miss
  anything?
 
  Yes. What happened to irq_rt is completely irrelevant at the point you
  marked /* bad */.
 
 kfree() happened to irq_rt, and thus to the objects behind the pointers
 in entry_cache at the point I marked /* bad */.
The entire entry is cached not a pointer to an entry! kfree().

 
 That certainly isn't /* good */ ;)
 
It looks like we are looking at different code :)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 0/3] ioctl fixes

2009-10-27 Thread Marcelo Tosatti

See individual patches for details.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 1/3] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP

2009-10-27 Thread Marcelo Tosatti

Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.

CC: sta...@kernel.org
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -2285,6 +2285,9 @@ long kvm_arch_vm_ioctl(struct file *filp
goto out;
break;
case KVM_CREATE_IRQCHIP:
+   r = -EEXIST;
+   if (kvm-arch.vpic)
+   goto out;
r = -ENOMEM;
kvm-arch.vpic = kvm_create_pic(kvm);
if (kvm-arch.vpic) {
@@ -2300,6 +2303,8 @@ long kvm_arch_vm_ioctl(struct file *filp
if (r) {
kfree(kvm-arch.vpic);
kfree(kvm-arch.vioapic);
+   kvm-arch.vpic = NULL;
+   kvm-arch.vioapic = NULL;
goto out;
}
break;


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 2/3] KVM: x86: disallow KVM_{SET,GET}_LAPIC without in kernel irqchip

2009-10-27 Thread Marcelo Tosatti

Otherwise kvm might attempt to dereference a NULL pointer.

CC: sta...@kernel.org
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/arch/x86/kvm/x86.c
===
--- kvm.orig/arch/x86/kvm/x86.c
+++ kvm/arch/x86/kvm/x86.c
@@ -1815,6 +1815,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi
 
switch (ioctl) {
case KVM_GET_LAPIC: {
+   r = -EINVAL;
+   if (!irqchip_in_kernel(vcpu-kvm))
+   goto out;
lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
 
r = -ENOMEM;
@@ -1830,6 +1833,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi
break;
}
case KVM_SET_LAPIC: {
+   r = -EINVAL;
+   if (!irqchip_in_kernel(vcpu-kvm))
+   goto out;
lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
r = -ENOMEM;
if (!lapic)


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[patch 3/3] KVM: only clear irq_source_id if irqchip is present

2009-10-27 Thread Marcelo Tosatti

Otherwise kvm might attempt to dereference a NULL pointer.

CC: sta...@kernel.org
Signed-off-by: Marcelo Tosatti mtosa...@redhat.com

Index: kvm/virt/kvm/irq_comm.c
===
--- kvm.orig/virt/kvm/irq_comm.c
+++ kvm/virt/kvm/irq_comm.c
@@ -243,6 +243,10 @@ void kvm_free_irq_source_id(struct kvm *
printk(KERN_ERR kvm: IRQ source ID out of range!\n);
goto unlock;
}
+   clear_bit(irq_source_id, kvm-arch.irq_sources_bitmap);
+   if (!irqchip_in_kernel(kvm))
+   goto unlock;
+
for (i = 0; i  KVM_IOAPIC_NUM_PINS; i++) {
clear_bit(irq_source_id, kvm-arch.vioapic-irq_states[i]);
if (i = 16)
@@ -251,7 +255,6 @@ void kvm_free_irq_source_id(struct kvm *
clear_bit(irq_source_id, pic_irqchip(kvm)-irq_states[i]);
 #endif
}
-   clear_bit(irq_source_id, kvm-arch.irq_sources_bitmap);
 unlock:
mutex_unlock(kvm-irq_lock);
 }


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/4] megaraid_sas HBA emulation

2009-10-27 Thread Hannes Reinecke

Hi all,

this patchset implements an emulation for the megaraid_sas HBA.
It provides emulates an LSI MegaRAID SAS 8708EM2 HBA, ie
presenting to the guest a virtual SCSI adapter.
Internally it is using aio for read/write requests and
either SG_IO or SCSI command emulation for everything else.

The reason for choosing the megaraid_sas HBA and not, say,
implementing a virtio scsi interface is because:
- the megaraid_sas is using a very simple firmware interface,
  comparable to virtio
- the HBA driver are already existent, so I only have to
  write the backend :-)

The device can be accessed by

-drive if=raid,file=XXX

In order to support SCSI command emulation I had to update /
patch up the existing SCSI disk support. This might be
not to everyones taste, so I'm open to alternative
suggestions.

But I certainly do _not_ want to update the SCSI disk
emulation, as this is really quite tied to the SCSI parallel
interface used by the old lsi53c895a.c.
Plus it doesn't do scatter-gather list handling, which
is quite impossible to fix without proper documentation.

Of course, if anyone else would step in here, I won't object :-)

It currently runs guests with 2.6.27 and up; Windows XP
support is not quite there yet. Anything else might work;
if not, enable debugging and sent me the logfile.

As usual, comment / suggestions  etc welcome.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke   zSeries  Storage
h...@suse.de  +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/4] Add 'raid' interface class

2009-10-27 Thread Hannes Reinecke


This patch adds a 'raid' interface class. It is basically a clone
of the existing 'scsi' interface, only allowing up to 128 disks.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 hw/pc.c  |5 +
 hw/pci-hotplug.c |1 +
 hw/scsi-disk.c   |   17 +
 hw/scsi-disk.h   |   20 +++-
 qemu-config.c|2 +-
 sysemu.h |3 ++-
 vl.c |8 ++--
 7 files changed, 51 insertions(+), 5 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 83012a9..26aad4c 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -1345,6 +1345,11 @@ static void pc_init1(ram_addr_t ram_size,
for (bus = 0; bus = max_bus; bus++) {
 pci_create_simple(pci_bus, -1, lsi53c895a);
 }
+
+   max_bus = drive_get_max_bus(IF_RAID);
+   for (bus = 0; bus = max_bus; bus++) {
+   pci_create_simple(pci_bus, -1, megasas);
+   }
 }
 
 if (extboot_drive) {
diff --git a/hw/pci-hotplug.c b/hw/pci-hotplug.c
index 410fa3f..855a1ad 100644
--- a/hw/pci-hotplug.c
+++ b/hw/pci-hotplug.c
@@ -85,6 +85,7 @@ void drive_hot_add(Monitor *mon, const QDict *qdict)
 
 switch (type) {
 case IF_SCSI:
+case IF_RAID:
 if (pci_read_devaddr(mon, pci_addr, dom, pci_bus, slot)) {
 goto err;
 }
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 2a9268a..68b4e83 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -41,6 +41,7 @@ do { fprintf(stderr, scsi-disk:  fmt , ## __VA_ARGS__); } 
while (0)
 
 #define SCSI_DMA_BUF_SIZE131072
 #define SCSI_MAX_INQUIRY_LEN 256
+#define SCSI_SENSE_LEN 18
 
 #define SCSI_REQ_STATUS_RETRY 0x01
 
@@ -136,6 +137,22 @@ static SCSIRequest *scsi_find_request(SCSIDiskState *s, 
uint32_t tag)
 return r;
 }
 
+/* Helper function to build a sense block */
+int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense)
+{
+memset(sense_buf, 0, SCSI_SENSE_LEN);
+if (!sense)
+   return 0;
+
+sense_buf[0] = 0xf0; /* current, fixed format */
+sense_buf[2] = (sense  16)  0x0F;
+sense_buf[7] = 10;
+sense_buf[12] = (sense  8 )  0xFF;
+sense_buf[13] = sense  0xFF;
+
+return SCSI_SENSE_LEN;
+}
+
 /* Helper function for command completion.  */
 static void scsi_command_complete(SCSIRequest *r, int status, int sense)
 {
diff --git a/hw/scsi-disk.h b/hw/scsi-disk.h
index b6b6c12..5b54272 100644
--- a/hw/scsi-disk.h
+++ b/hw/scsi-disk.h
@@ -9,6 +9,23 @@ enum scsi_reason {
 SCSI_REASON_DATA  /* Transfer complete, more data required.  */
 };
 
+/* LUN not ready, Manual intervention required */
+#define SENSE_LUN_NOT_READY 0x020403
+/* Hardware error, I/O process terminated */
+#define SENSE_IO_ERROR 0x040006
+/* Hardware error, I_T Nexus loss occured */
+#define SENSE_TAG_NOT_FOUND 0x042907
+/* Hardware error, internal target failure */
+#define SENSE_TARGET_FAILURE 0x044400
+/* Illegal request, invalid command operation code */
+#define SENSE_INVALID_OPCODE 0x052000
+/* Illegal request, LBA out of range */
+#define SENSE_LBA_OUT_OF_RANGE 0x052100
+/* Illegal request, Invalid field in CDB */
+#define SENSE_INVALID_FIELD 0x052400
+/* Illegal request, LUN not supported */
+#define SENSE_LUN_NOT_SUPPORTED 0x052500
+
 typedef struct SCSIBus SCSIBus;
 typedef struct SCSIDevice SCSIDevice;
 typedef struct SCSIDeviceInfo SCSIDeviceInfo;
@@ -49,7 +66,7 @@ struct SCSIBus {
 int tcq, ndev;
 scsi_completionfn complete;
 
-SCSIDevice *devs[8];
+SCSIDevice *devs[128];
 };
 
 void scsi_bus_new(SCSIBus *bus, DeviceState *host, int tcq, int ndev,
@@ -63,5 +80,6 @@ static inline SCSIBus *scsi_bus_from_device(SCSIDevice *d)
 
 SCSIDevice *scsi_bus_legacy_add_drive(SCSIBus *bus, DriveInfo *dinfo, int 
unit);
 void scsi_bus_legacy_handle_cmdline(SCSIBus *bus);
+int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense);
 
 #endif
diff --git a/qemu-config.c b/qemu-config.c
index 4fb7898..8d7a137 100644
--- a/qemu-config.c
+++ b/qemu-config.c
@@ -18,7 +18,7 @@ QemuOptsList qemu_drive_opts = {
 },{
 .name = if,
 .type = QEMU_OPT_STRING,
-.help = interface (ide, scsi, sd, mtd, floppy, pflash, virtio),
+.help = interface (ide, scsi, raid, sd, mtd, floppy, pflash, 
virtio),
 },{
 .name = index,
 .type = QEMU_OPT_NUMBER,
diff --git a/sysemu.h b/sysemu.h
index 2ef3797..8ed0b8c 100644
--- a/sysemu.h
+++ b/sysemu.h
@@ -159,7 +159,7 @@ extern unsigned int nb_prom_envs;
 
 typedef enum {
 IF_NONE,
-IF_IDE, IF_SCSI, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, IF_XEN,
+IF_IDE, IF_SCSI, IF_RAID, IF_FLOPPY, IF_PFLASH, IF_MTD, IF_SD, IF_VIRTIO, 
IF_XEN,
 IF_COUNT
 } BlockInterfaceType;
 
@@ -185,6 +185,7 @@ typedef struct DriveInfo {
 
 #define MAX_IDE_DEVS   2
 #define MAX_SCSI_DEVS  7
+#define MAX_RAID_DEVS  128
 #define MAX_DRIVES 32
 
 extern QTAILQ_HEAD(drivelist, DriveInfo) drives;
diff --git a/vl.c b/vl.c
index 5dc7b2b..404afc3 100644
--- a/vl.c
+++ b/vl.c
@@ -2065,6 +2065,9 @@

[PATCH 2/4] megasas: LSI MegaRAID SAS HBA emulation

2009-10-27 Thread Hannes Reinecke


This patch add an emulation for the LSI MegaRAID SAS HBA. It is
using SG_IO to forward / pass through SCSI commands to the
underlying block driver, so no emulation is done currently.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 Makefile.hw  |2 +-
 hw/megasas.c | 1134 ++
 hw/pci_ids.h |2 +
 3 files changed, 1137 insertions(+), 1 deletions(-)
 create mode 100644 hw/megasas.c

diff --git a/Makefile.hw b/Makefile.hw
index de1db31..cae35f9 100644
--- a/Makefile.hw
+++ b/Makefile.hw
@@ -33,7 +33,7 @@ obj-y += wdt_i6300esb.o
 obj-y += ne2000.o
 
 # SCSI layer
-obj-y += lsi53c895a.o
+obj-y += lsi53c895a.o megasas.o
 obj-$(CONFIG_ESP) += esp.o
 
 obj-y += dma-helpers.o sysbus.o isa-bus.o
diff --git a/hw/megasas.c b/hw/megasas.c
new file mode 100644
index 000..a57e8e0
--- /dev/null
+++ b/hw/megasas.c
@@ -0,0 +1,1134 @@
+/*
+ * QEMU MegaRAID SAS 8708EM2 Host Bus Adapter emulation
+ *
+ * Copyright (c) 2009 Hannes Reinecke, SUSE Linux Products GmbH
+ *
+ * This code is licenced under the LGPL.
+ */
+
+
+#include assert.h
+
+#include hw.h
+#include pci.h
+#include scsi.h
+#include scsi-disk.h
+#include block_int.h
+#ifdef __linux__
+# include scsi/sg.h
+#endif
+
+#undef DEBUG_MEGASAS
+#undef DEBUG_MEGASAS_REG
+#undef DEBUG_MEGASAS_QUEUE
+#undef DEBUG_MEGASAS_MFI
+
+#ifdef DEBUG_MEGASAS
+#define DPRINTF(fmt, ...) \
+do { printf(megasas:  fmt , ## __VA_ARGS__); } while (0)
+#define BADF(fmt, ...) \
+do { fprintf(stderr, megasas: error:  fmt , ## __VA_ARGS__); exit(1);} while 
(0)
+#else
+#define DPRINTF(fmt, ...) do {} while(0)
+#define BADF(fmt, ...) \
+do { fprintf(stderr, megasas: error:  fmt , ## __VA_ARGS__);} while (0)
+#endif
+
+/* Static definitions */
+#define MEGASAS_MAX_FRAMES 64
+#define MEGASAS_MAX_SGE 8
+
+/* SCSI definitions */
+#define SAM_STAT_GOOD0x00
+#define SAM_STAT_CHECK_CONDITION 0x02
+#define SAM_STAT_CONDITION_MET   0x04
+#define SAM_STAT_BUSY0x08
+#define SAM_STAT_TASK_ABORTED0x40
+
+/* Register definitions */
+#defineMEGASAS_INBOUND_MSG_0   0x0010
+#defineMEGASAS_INBOUND_MSG_1   0x0014
+#defineMEGASAS_OUTBOUND_MSG_0  0x0018
+#defineMEGASAS_OUTBOUND_MSG_1  0x001C
+#defineMEGASAS_INBOUND_DOORBELL0x0020
+#defineMEGASAS_INBOUND_INTR_STATUS 0x0024
+#defineMEGASAS_INBOUND_INTR_MASK   0x0028
+#defineMEGASAS_OUTBOUND_DOORBELL   0x002C
+#defineMEGASAS_OUTBOUND_INTR_STATUS0x0030
+#defineMEGASAS_OUTBOUND_INTR_MASK  0x0034
+#defineMEGASAS_INBOUND_QUEUE_PORT  0x0040
+#defineMEGASAS_OUTBOUND_QUEUE_PORT 0x0044
+#defineMEGASAS_OUTBOUND_DOORBELL_CLEAR 0x00A0
+#defineMEGASAS_OUTBOUND_SCRATCH_PAD0x00B0
+#defineMEGASAS_INBOUND_LOW_QUEUE_PORT  0x00C0
+#defineMEGASAS_INBOUND_HIGH_QUEUE_PORT 0x00C4
+
+/* FW commands */
+#define MFI_INIT_ABORT 0x0001
+#define MFI_INIT_READY 0x0002
+#define MFI_INIT_MFIMODE   0x0004
+#define MFI_INIT_CLEAR_HANDSHAKE   0x0008
+#define MFI_INIT_HOTPLUG   0x0010
+#define MFI_STOP_ADP   0x0020
+
+/* MFI states */
+#define MFI_STATE_UNDEFINED0x0
+#define MFI_STATE_BB_INIT  0x1
+#define MFI_STATE_FW_INIT  0x4
+#define MFI_STATE_WAIT_HANDSHAKE   0x6
+#define MFI_STATE_FW_INIT_20x7
+#define MFI_STATE_DEVICE_SCAN  0x8
+#define MFI_STATE_BOOT_MESSAGE_PENDING 0x9
+#define MFI_STATE_FLUSH_CACHE  0xA
+#define MFI_STATE_READY0xB
+#define MFI_STATE_OPERATIONAL  0xC
+#define MFI_STATE_FAULT0xF
+
+/*
+ * MFI command opcodes
+ */
+#define MFI_CMD_INIT   0x00
+#define MFI_CMD_LD_READ0x01
+#define MFI_CMD_LD_WRITE   0x02
+#define MFI_CMD_LD_SCSI_IO 0x03
+#define MFI_CMD_PD_SCSI_IO 0x04
+#define MFI_CMD_DCMD   0x05
+#define MFI_CMD_ABORT  0x06
+#define MFI_CMD_SMP0x07
+#define MFI_CMD_STP0x08
+
+#define MR_DCMD_CTRL_GET_INFO  0x0101
+
+#define MR_DCMD_CTRL_CACHE_FLUSH   0x01101000
+#define MR_FLUSH_CTRL_CACHE0x01
+#define MR_FLUSH_DISK_CACHE0x02
+
+#define MR_DCMD_CTRL_SHUTDOWN  0x0105
+#define MR_DCMD_HIBERNATE_SHUTDOWN 0x0106
+#define MR_ENABLE_DRIVE_SPINDOWN   0x01
+
+#define

[PATCH 3/4] scsi-disk: Factor out SCSI command emulation

2009-10-27 Thread Hannes Reinecke


Other drives might want to use SCSI command emulation without
going through the SCSI disk abstraction, as this imposes too
many limits on the emulation.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 block.c|   15 ++
 block.h|3 +
 block_int.h|1 +
 hw/scsi-disk.c |  610 ++--
 hw/scsi-disk.h |3 +
 5 files changed, 346 insertions(+), 286 deletions(-)

diff --git a/block.c b/block.c
index 33f3d65..06f92c4 100644
--- a/block.c
+++ b/block.c
@@ -930,6 +930,21 @@ int bdrv_is_sg(BlockDriverState *bs)
 return bs-sg;
 }
 
+void bdrv_set_sg(BlockDriverState *bs, int set)
+{
+bs-sg = set;
+}
+
+int bdrv_get_tcq(BlockDriverState *bs)
+{
+return bs-tcq;
+}
+
+void bdrv_set_tcq(BlockDriverState *bs, int set)
+{
+bs-tcq = set;
+}
+
 int bdrv_enable_write_cache(BlockDriverState *bs)
 {
 return bs-enable_write_cache;
diff --git a/block.h b/block.h
index a966afb..7862fa0 100644
--- a/block.h
+++ b/block.h
@@ -134,9 +134,12 @@ void bdrv_get_geometry_hint(BlockDriverState *bs,
 int *pcyls, int *pheads, int *psecs);
 int bdrv_get_type_hint(BlockDriverState *bs);
 int bdrv_get_translation_hint(BlockDriverState *bs);
+int bdrv_get_tcq(BlockDriverState *bs);
+void bdrv_set_tcq(BlockDriverState *bs, int set);
 int bdrv_is_removable(BlockDriverState *bs);
 int bdrv_is_read_only(BlockDriverState *bs);
 int bdrv_is_sg(BlockDriverState *bs);
+void bdrv_set_sg(BlockDriverState *bs, int set);
 int bdrv_enable_write_cache(BlockDriverState *bs);
 int bdrv_is_inserted(BlockDriverState *bs);
 int bdrv_media_changed(BlockDriverState *bs);
diff --git a/block_int.h b/block_int.h
index 8e72abe..e5ee57b 100644
--- a/block_int.h
+++ b/block_int.h
@@ -129,6 +129,7 @@ struct BlockDriverState {
 int encrypted; /* if true, the media is encrypted */
 int valid_key; /* if true, a valid encryption key has been set */
 int sg;/* if true, the device is a /dev/sg* */
+int tcq;   /* if true, the device supports tagged command queueing */
 /* event callback when inserting/removing */
 void (*change_cb)(void *opaque);
 void *change_opaque;
diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c
index 68b4e83..3e68518 100644
--- a/hw/scsi-disk.c
+++ b/hw/scsi-disk.c
@@ -56,7 +56,7 @@ typedef struct SCSIRequest {
 /* Both sector and sector_count are in terms of qemu 512 byte blocks.  */
 uint64_t sector;
 uint32_t sector_count;
-struct iovec iov;
+struct iovec *iov;
 QEMUIOVector qiov;
 BlockDriverAIOCB *aiocb;
 struct SCSIRequest *next;
@@ -72,7 +72,8 @@ struct SCSIDiskState
This is the number of 512 byte blocks in a single scsi sector.  */
 int cluster_size;
 uint64_t max_lba;
-int sense;
+uint8_t sense[SCSI_SENSE_LEN];
+uint8_t sense_len;
 char drive_serial_str[21];
 QEMUBH *bh;
 };
@@ -90,13 +91,12 @@ static SCSIRequest *scsi_new_request(SCSIDevice *d, 
uint32_t tag)
 free_requests = r-next;
 } else {
 r = qemu_malloc(sizeof(SCSIRequest));
-r-iov.iov_base = qemu_memalign(512, SCSI_DMA_BUF_SIZE);
+   r-iov = NULL;
 }
 r-bus = scsi_bus_from_device(d);
 r-dev = s;
 r-tag = tag;
 r-sector_count = 0;
-r-iov.iov_len = 0;
 r-aiocb = NULL;
 r-status = 0;
 
@@ -126,6 +126,17 @@ static void scsi_remove_request(SCSIRequest *r)
 free_requests = r;
 }
 
+static void *scsi_allocate_iovec(SCSIRequest *r) {
+if (!r-iov) {
+   r-iov = qemu_malloc(sizeof(struct iovec));
+   if (!r-iov)
+   return NULL;
+   r-iov-iov_base = qemu_memalign(512, SCSI_DMA_BUF_SIZE);
+   r-iov-iov_len = SCSI_DMA_BUF_SIZE;
+}
+return r-iov;
+}
+
 static SCSIRequest *scsi_find_request(SCSIDiskState *s, uint32_t tag)
 {
 SCSIRequest *r;
@@ -137,12 +148,11 @@ static SCSIRequest *scsi_find_request(SCSIDiskState *s, 
uint32_t tag)
 return r;
 }
 
-/* Helper function to build a sense block */
 int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t sense)
 {
 memset(sense_buf, 0, SCSI_SENSE_LEN);
 if (!sense)
-   return 0;
+   return 0;
 
 sense_buf[0] = 0xf0; /* current, fixed format */
 sense_buf[2] = (sense  16)  0x0F;
@@ -154,15 +164,19 @@ int32_t scsi_build_sense(uint8_t *sense_buf, uint32_t 
sense)
 }
 
 /* Helper function for command completion.  */
-static void scsi_command_complete(SCSIRequest *r, int status, int sense)
+static void scsi_command_complete(SCSIRequest *r, int status, uint32_t sense)
 {
 SCSIDiskState *s = r-dev;
 uint32_t tag;
-DPRINTF(Command complete tag=0x%x status=%d sense=%d\n, r-tag, status, 
sense);
-s-sense = sense;
+
+DPRINTF(Command complete tag=0x%x status=%d sense=%d\n, r-tag,
+   status, s-sense_len);
+if (status == STATUS_CHECK_CONDITION) {
+   s-sense_len = scsi_build_sense(s-sense, sense);
+}
 tag = r-tag;
 scsi_remove_request(r);
-r-bus-complete(r-bus,

[PATCH 4/4] megasas: Add SCSI command emulation

2009-10-27 Thread Hannes Reinecke


Now that we can use SCSI command emulation without using the SCSI
disk abstraction we can easily add it to the megasas HBA.

Signed-off-by: Hannes Reinecke h...@suse.de
---
 hw/megasas.c |   88 +++---
 1 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/hw/megasas.c b/hw/megasas.c
index a57e8e0..f32b313 100644
--- a/hw/megasas.c
+++ b/hw/megasas.c
@@ -661,40 +661,55 @@ static int megasas_handle_scsi(MPTState *s, uint8_t fcmd,
 }
 }
 
-memset(cmd-hdr, 0, sizeof(struct sg_io_hdr));
-cmd-hdr.interface_id = 'S';
-cmd-hdr.cmd_len = cdb_len;
-cmd-hdr.cmdp = cdb;
-cmd-hdr.iovec_count = cmd-sge_count;
-cmd-hdr.dxferp = cmd-iov;
-for (n = 0; n  cmd-sge_count; n++)
-   cmd-hdr.dxfer_len += cmd-iov[n].iov_len;
-if (cmd-sge_count) {
-   if (dir)
-   cmd-hdr.dxfer_direction = SG_DXFER_TO_DEV;
-   else
-   cmd-hdr.dxfer_direction = SG_DXFER_FROM_DEV;
-} else {
-   cmd-hdr.dxfer_direction = SG_DXFER_NONE;
-}
-cmd-hdr.sbp = cmd-sense;
-cmd-hdr.mx_sb_len = cmd-sense_len;
+if (bdrv_is_sg(cmd-lun-bdrv)) {
+   memset(cmd-hdr, 0, sizeof(struct sg_io_hdr));
+   cmd-hdr.interface_id = 'S';
+   cmd-hdr.cmd_len = cdb_len;
+   cmd-hdr.cmdp = cdb;
+   cmd-hdr.iovec_count = cmd-sge_count;
+   cmd-hdr.dxferp = cmd-iov;
+   for (n = 0; n  cmd-sge_count; n++)
+   cmd-hdr.dxfer_len += cmd-iov[n].iov_len;
+   if (cmd-sge_count) {
+   if (dir)
+   cmd-hdr.dxfer_direction = SG_DXFER_TO_DEV;
+   else
+   cmd-hdr.dxfer_direction = SG_DXFER_FROM_DEV;
+   } else {
+   cmd-hdr.dxfer_direction = SG_DXFER_NONE;
+   }
+   cmd-hdr.sbp = cmd-sense;
+   cmd-hdr.mx_sb_len = cmd-sense_len;
 
-ret = bdrv_ioctl(cmd-lun-bdrv, SG_IO, cmd-hdr);
-if (ret) {
-   DPRINTF(SCSI pthru dev %x lun %x failed with %d\n,
-   target, lun, errno);
-   sense_len = scsi_build_sense(cmd-sense, SENSE_IO_ERROR);
-   cmd-sge_size = 0;
-   scsi_status = SAM_STAT_CHECK_CONDITION;
-} else if (cmd-hdr.status) {
-   sense_len = cmd-hdr.sb_len_wr;
-   scsi_status = cmd-hdr.status;
-   cmd-sge_size = cmd-hdr.dxfer_len;
-   scsi_status = SAM_STAT_CHECK_CONDITION;
+   ret = bdrv_ioctl(cmd-lun-bdrv, SG_IO, cmd-hdr);
+   if (ret) {
+   DPRINTF(SCSI pthru dev %x lun %x failed with %d\n,
+   target, lun, errno);
+   sense_len = scsi_build_sense(cmd-sense, SENSE_IO_ERROR);
+   cmd-sge_size = 0;
+   scsi_status = SAM_STAT_CHECK_CONDITION;
+   } else if (cmd-hdr.status) {
+   sense_len = cmd-hdr.sb_len_wr;
+   scsi_status = cmd-hdr.status;
+   cmd-sge_size = cmd-hdr.dxfer_len;
+   scsi_status = SAM_STAT_CHECK_CONDITION;
+   } else {
+   sense_len = 0;
+   cmd-sge_size = cmd-hdr.dxfer_len;
+   }
 } else {
-   sense_len = 0;
-   cmd-sge_size = cmd-hdr.dxfer_len;
+   uint32_t sense;
+
+   DPRINTF(Emulate SCSI pthru cmd %x\n, cdb[0]);
+   sense = scsi_emulate_command(cmd-lun-bdrv, 0, cdb,
+cmd-iov[0].iov_len,
+cmd-iov[0].iov_base,
+cmd-sge_size);
+   sense_len = scsi_build_sense(cmd-sense, sense);
+   if (sense_len)
+   scsi_status = SAM_STAT_CHECK_CONDITION;
+   else
+   scsi_status = SAM_STAT_GOOD;
 }
 out:
 megasas_unmap_sense(cmd, sense_len);
@@ -1105,13 +1120,16 @@ static int megasas_scsi_init(PCIDevice *dev)
lun-bdrv = NULL;
continue;
}
+   bdrv_set_tcq(lun-bdrv, 1);
/* check if we can use SG_IO */
ret = bdrv_ioctl(lun-bdrv, SG_IO, hdr);
if (ret) {
-   DPRINTF(SCSI cmd passthrough not available on dev %d (error %d)\n,
+   DPRINTF(Using SCSI cmd emulation on dev %d (error %d)\n,
unit, ret);
-   lun-sdev = NULL;
-   lun-bdrv = NULL;
+   bdrv_set_sg(lun-bdrv, 0);
+   } else {
+   DPRINTF(Using SCSI cmd passthrough on dev %d\n, unit);
+   bdrv_set_sg(lun-bdrv, 1);
}
 }
 register_savevm(megasas, -1, 0, megasas_scsi_save, megasas_scsi_load, s);
-- 
1.6.0.2

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: vhost-net patches

2009-10-27 Thread Michael S. Tsirkin

On Tue, Oct 27, 2009 at 07:46:59AM -0700, Shirley Ma wrote:
 Hello Michael,
 
 On Tue, 2009-10-27 at 08:43 +0200, Michael S. Tsirkin wrote:
  At some point my guest had a runaway nash-hotplug process
  consuming 100% CPU. Could you please verify this
  does not happen to you?
 
 What I have found that the start_xmit stopped and restarted too often.
 There is no vring descriptor available for adding the new buf. The buf
 release is not able to keep up after vhost patch?

Possibly GFP_ATOMIC allocations in vring_add_indirect are failing?
Is there a chance you are tight on guest memory for some reason?
with vhost, virtio does currently consume a bit more memory than
with userspace backend.

 I saw lots of Unexpected full queue from dmesg. I haven't had time to
 debug it yet. You might have some idea here?
 
 /* This can happen with OOM and indirect buffers. */
 if (unlikely(capacity  0)) {
 netif_stop_queue(dev);
 dev_warn(dev-dev, Unexpected full queue\n);
 if (unlikely(!vi-svq-vq_ops-enable_cb(vi-svq))) {
 vi-svq-vq_ops-disable_cb(vi-svq);
 netif_start_queue(dev);
 goto again;
 }
 return NETDEV_TX_BUSY;
 }
 
 Thanks
 Shirley
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gleb Natapov

On Tue, Oct 27, 2009 at 10:47:49AM -0400, Gregory Haskins wrote:
 Gleb Natapov wrote:
  On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:
  Gleb Natapov wrote:
  On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
  The current code suffers from the following race condition:
 
  thread-1thread-2
  ---
 
  kvm_set_irq() {
 rcu_read_lock()
 irq_rt = rcu_dereference(table);
 rcu_read_unlock();
 
 kvm_set_irq_routing() {
mutex_lock();
irq_rt = table;
rcu_assign_pointer();
mutex_unlock();
synchronize_rcu();
 
kfree(irq_rt);
 
 irq_rt-entry-set(); /* bad */
 
  This is not what happens. irq_rt is never accessed outside read-side
  critical section.
  Sorry, I was generalizing to keep the comments short.  I figured it
  would be clear what I was actually saying, but realize in retrospect
  that I was a little ambiguous.
 
  A little is underestimation :) There is not /* bad */ line in the code!
 
 Sorry, that was my own highlighting, not trying to reflect actual code.
 
  
  Yes, irq_rt is not accessed outside the RSCS.  However, the entry
  pointers stored in the irq_rt-map are, and this is equally problematic
  afaict.
  The pointer is in text and can't disappear without kvm_set_irq()
  disappearing too.
 
 No, the entry* pointer is .text _AND_ .data, and is subject to standard
 synchronization rules like most other objects.
 
 Unless I am misreading the code, the entry* pointers point to heap
 within the irq_rt pointer.  Therefore, the kfree(irq_rt) I mention
 above effectively invalidates the entire set of entry* pointers that you
 are caching, and is thus an issue.
 
I think you are missing that the content of the entry is copied, not
pointer to the entry:
irq_set[i++] = *e;
  
  In this particular case we seem to never delete entries at run-time once
  they are established.  Therefore, while perhaps sloppy, its technically
  safe to leave them unprotected from this perspective.
 
 Note: I was wrong in this statement.  I forgot that it's not safe at
 run-time either since the entry objects are part of irq_rt.
 
  The issue is more
  related to shutdown since a kvm_set_irq() caller could be within the
  aforementioned race-region and call entry-set() after the guest is
  gone.  Or did I miss something?
 
  The caller of kvm_set_irq() should hold reference to kvm instance, so it
  can't disappear while you are inside kvm_set_irq(). RCU protects only
  kvm-irq_routing not kvm structure itself.
 
 Agreed, but this has nothing to do with protecting the entry* pointers.
 
There are not used outside critical section.

  
  Data is copied from irq_rt onto the stack and this copy is accessed
  outside critical section.
  As mentioned above, I do not believe this really protect us.  And even
  I don't see the prove it doesn't, so I assume it does.
 
 What would you like to see beyond what I've already provided you?  I can
 show how the entry pointers are allocated as part of the irq_rt, and I
 can show how the irq_rt (via entry-set) access is racy against
 invalidation.
 
  
  if it did, the copy is just a work-around to avoid sleeping within the
  It is not a work-around. There was two solutions to the problem one is
  to call -set() outside rcu critical section
 
 This is broken afaict without taking additional precautions, such as a
 reference count on the irq_rt structure, but I mentioned this alternate
 solution in my header.
 
  another is to use SRCU. I
  decided to use the first one. This way the code is much simpler
 
 simpler is debatable, but ok.  SRCU is an established pattern
 available in the upstream kernel, so I don't think its particularly
 complicated or controversial to use.
 
  and I remember asking Paul what are the disadvantages of using SRCU and 
  there
  was something.
  
 
 The disadvantages to my knowledge are as follows:
 
 1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but
 we are talking about nanoseconds on modern hardware (I think Paul quoted
 me 10ns vs 45ns on his rig).  I don't think either overhead is something
 to be concerned about in this case.
 
If we can avoid why not? Nanoseconds tend to add up.

 2) standard rcu supports deferred synchronization (call_rcu()), as well
 as barriers (synchronize_rcu()), whereas SRCU only supports barriers
 (synchronize_srcu()).  We only use the barrier type in this code path,
 so that is not an issue.
Agree.

 
 3) SRCU requires explicit initialization/cleanup, whereas regular RCU
 does not.  Trivially solved in my patch since KVM has plenty of
 init/cleanup hook points.
 
No problem

[PATCH v2] fix qemu-kvm sigsegv at exit

2009-10-27 Thread Marcelo Tosatti


Michael reported a qemu-kvm SIGSEGV at shutdown:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x411d0940 (LWP 14446)]
0x0040afb4 in qemu_mod_timer (ts=0x19f0fd0,
expire_time=62275467335)
at /home/mst/scm/qemu-kvm/vl.c:1009
1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0)
{
(gdb) l
1004ts-next = *pt;
1005*pt = ts;
1006
1007/* Rearm if necessary  */
1008if (pt == active_timers[ts-clock-type]) {
1009if ((alarm_timer-flags  ALARM_FLAG_EXPIRED) == 0)
{
1010qemu_rearm_alarm_timer(alarm_timer);
1011}
1012/* Interrupt execution to force deadline
recalculation.  */
1013if (use_icount)
(gdb) p alarm_timer
$1 = (struct qemu_alarm_timer *) 0x0

Problem is kvm_main_loop_wait(env, 0) can process the stop request
(signalling iothread that vcpu is stopped, so its OK to exit) and
continue to kvm_cpu_exec.

Reorder kvm_main_loop_wait and kvm_cpu_exec, as suggested by Gleb.

Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
Reported-by: Michael S. Tsirkin m...@redhat.com

diff --git a/qemu-kvm.c b/qemu-kvm.c
index 4c13628..809fd65 100644
--- a/qemu-kvm.c
+++ b/qemu-kvm.c
@@ -1867,8 +1867,8 @@ static int kvm_main_loop_cpu(CPUState *env)
 run_cpu = !env-halted;
 }
 if (run_cpu) {
-kvm_main_loop_wait(env, 0);
 kvm_cpu_exec(env);
+kvm_main_loop_wait(env, 0);
 } else {
 kvm_main_loop_wait(env, 1000);
 }
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Gleb Natapov wrote:
 On Tue, Oct 27, 2009 at 10:50:45AM -0400, Gregory Haskins wrote:
 Gleb Natapov wrote:
 On Tue, Oct 27, 2009 at 10:00:15AM -0400, Gregory Haskins wrote:
 Gregory Haskins wrote:
 Gleb Natapov wrote:
 On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
 The current code suffers from the following race condition:

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
rcu_read_unlock();

kvm_set_irq_routing() {
   mutex_lock();
   irq_rt = table;
   rcu_assign_pointer();
   mutex_unlock();
   synchronize_rcu();

   kfree(irq_rt);

irq_rt-entry-set(); /* bad */

 This is not what happens. irq_rt is never accessed outside read-side
 critical section.
 Sorry, I was generalizing to keep the comments short.  I figured it
 would be clear what I was actually saying, but realize in retrospect
 that I was a little ambiguous.
 Here is a revised problem statement

 thread-1thread-2
 ---

 kvm_set_irq() {
rcu_read_lock()
irq_rt = rcu_dereference(table);
entry_cache = get_entries(irq_rt);
rcu_read_unlock();

 invalidate_entries(irq_rt);

for_each_entry(entry_cache)
  entry-set(); /* bad */

 -


 invalidate_entries() may be any operation that deletes an entry at
 run-time (doesn't exist today), or as the guest is shutting down.  As
 far as I can tell, the current code does not protect us from either
 condition, and my proposed patch protects us from both.  Did I miss
 anything?

 Yes. What happened to irq_rt is completely irrelevant at the point you
 marked /* bad */.
 kfree() happened to irq_rt, and thus to the objects behind the pointers
 in entry_cache at the point I marked /* bad */.
 The entire entry is cached not a pointer to an entry! kfree().

light bulb goes off

Ah, I see.  I missed that detail that it was a structure copy, not a
pointer copy.

My bad.  You are right, and I am wrong.  I retract the 1/3 patch.

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-27 Thread Avi Kivity


On 10/27/2009 03:42 PM, Alexander Graf wrote:


I can, but it's only partly related. My patches don't even touch 
timing.c. The only thing I can imagine resulting in a breakage is that 
my patches allow for an =M setting.


So IMHO this patch should be applied before my series. Should I stick 
it as first patch in my git tree?




No need, I'll apply it independently.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/5]: Fix kdump under KVM

2009-10-27 Thread Chris Lalancette


This patch series aims to get kdump working inside a KVM guest.
The current problem with using kdump is that KVM always delivers
PIT interrupts to the BSP, and the BSP only.  While this is
technically allowed by the MPS spec, most motherboards actually
deliver timer interrupts to *any* LAPIC in virtual wire mode.
Since a crash can occur on any CPU, timer interrupts must
be able to reach any CPU in order for kdump to work properly.

Therefore, this patch series kicks all of the relevant vCPUs
when delivering a timer interrupt.  With these patches in
place, kdump in a RHEL-5 guest works properly.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/5] Fix up some comments around the source tree.

2009-10-27 Thread Chris Lalancette

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 34b700f... ba61f27... M  arch/x86/kvm/svm.c
:100644 100644 38a2d20... cd6f92b... M  virt/kvm/ioapic.c
:100644 100644 bd44fb4... c22bc17... M  virt/kvm/kvm_main.c
 arch/x86/kvm/svm.c  |2 +-
 virt/kvm/ioapic.c   |2 +-
 virt/kvm/kvm_main.c |2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 34b700f..ba61f27 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -2608,7 +2608,7 @@ static void svm_complete_interrupts(struct vcpu_svm *svm)
break;
case SVM_EXITINTINFO_TYPE_EXEPT:
/* In case of software exception do not reinject an exception
-  vector, but re-execute and instruction instead */
+  vector, but re-execute an instruction instead */
if (is_nested(svm))
break;
if (kvm_exception_is_soft(vector))
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index 38a2d20..cd6f92b 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -164,7 +164,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.shorthand = 0;
 
 #ifdef CONFIG_X86
-   /* Always delivery PIT interrupt to vcpu 0 */
+   /* Always deliver PIT interrupt to vcpu 0 */
if (irq == 0) {
irqe.dest_mode = 0; /* Physical mode. */
/* need to read apic_id from apic regiest since
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index bd44fb4..c22bc17 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -1180,7 +1180,7 @@ static int create_vcpu_fd(struct kvm_vcpu *vcpu)
 }
 
 /*
- * Creates some virtual cpus.  Good luck creating more than one.
+ * Creates a virtual cpu.
  */
 static int kvm_vm_ioctl_create_vcpu(struct kvm *kvm, u32 id)
 {
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] Remove references to VCPU in i8254

2009-10-27 Thread Chris Lalancette

Conceptually, the i8254 is hooked to a PIC or IOAPIC.  Therefore,
this patch removes most references to vcpu in i8254.c.
There are two exceptions to this:

1)  In pit_timer_fn, we still have to kick the BSP to wake it out
of idle.  This will be changed in a later patch.

2)  In __kvm_migrate_pit_timer, we have to migrate the PIT
around with the BSP, since hrtimers work on a per-CPU basis.
I've added a comment here to clarify why this is needed.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 fab7440... d5c08fa... M  arch/x86/kvm/i8254.c
:100644 100644 d4c1c7f... d7bc40b... M  arch/x86/kvm/i8254.h
:100644 100644 96dfbb6... ab3a56e... M  arch/x86/kvm/irq.c
:100644 100644 c025a23... e16b968... M  arch/x86/kvm/irq.h
:100644 100644 55c7524... ba39e25... M  arch/x86/kvm/kvm_timer.h
 arch/x86/kvm/i8254.c |   50 ++---
 arch/x86/kvm/i8254.h |4 ++-
 arch/x86/kvm/irq.c   |4 +-
 arch/x86/kvm/irq.h   |2 -
 arch/x86/kvm/kvm_timer.h |3 ++
 5 files changed, 50 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index fab7440..d5c08fa 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -33,6 +33,7 @@
 
 #include irq.h
 #include i8254.h
+#include kvm_timer.h
 
 #ifndef CONFIG_X86_64
 #define mod_64(x, y) ((x) - (y) * div64_u64(x, y))
@@ -227,12 +228,13 @@ static void pit_latch_status(struct kvm *kvm, int channel)
}
 }
 
-int pit_has_pending_timer(struct kvm_vcpu *vcpu)
+int pit_has_pending_timer(struct kvm *kvm)
 {
-   struct kvm_pit *pit = vcpu-kvm-arch.vpit;
+   struct kvm_pit *pit = kvm-arch.vpit;
 
-   if (pit  kvm_vcpu_is_bsp(vcpu)  pit-pit_state.irq_ack)
+   if (pit  pit-pit_state.irq_ack)
return atomic_read(pit-pit_state.pit_timer.pending);
+
return 0;
 }
 
@@ -252,6 +254,13 @@ void __kvm_migrate_pit_timer(struct kvm_vcpu *vcpu)
struct kvm_pit *pit = vcpu-kvm-arch.vpit;
struct hrtimer *timer;
 
+   /*
+* technically, the PIT isn't hooked to a particular VCPU;
+* the logical structure is PIT - [IOA]PIC - CPU[s].  However,
+* hrtimers expire on a per-cpu basis, and since we initially
+* created the hrtimer during BSP creation, we move it around
+* with the BSP.
+*/
if (!kvm_vcpu_is_bsp(vcpu) || !pit)
return;
 
@@ -277,6 +286,33 @@ static struct kvm_timer_ops kpit_ops = {
.is_periodic = kpit_is_periodic,
 };
 
+static enum hrtimer_restart pit_timer_fn(struct hrtimer *data)
+{
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   int restart_timer = 0;
+
+   /*
+* There is a race window between reading and incrementing, but we do
+* not care about potentially losing timer events in the !reinject
+* case anyway.
+*/
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
+   atomic_inc(ktimer-pending);
+
+   if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq))
+   wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   restart_timer = 1;
+   }
+
+   if (restart_timer)
+   return HRTIMER_RESTART;
+   else
+   return HRTIMER_NORESTART;
+}
+
 static void create_pit_timer(struct kvm_kpit_state *ps, u32 val, int is_period)
 {
struct kvm_timer *pt = ps-pit_timer;
@@ -291,10 +327,9 @@ static void create_pit_timer(struct kvm_kpit_state *ps, 
u32 val, int is_period)
pt-period = interval;
ps-is_periodic = is_period;
 
-   pt-timer.function = kvm_timer_fn;
+   pt-timer.function = pit_timer_fn;
pt-t_ops = kpit_ops;
pt-kvm = ps-pit-kvm;
-   pt-vcpu = pt-kvm-bsp_vcpu;
 
atomic_set(pt-pending, 0);
ps-irq_ack = 1;
@@ -705,10 +740,9 @@ static void __inject_pit_timer_intr(struct kvm *kvm)
kvm_apic_nmi_wd_deliver(vcpu);
 }
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu)
+void kvm_inject_pit_timer_irqs(struct kvm *kvm)
 {
-   struct kvm_pit *pit = vcpu-kvm-arch.vpit;
-   struct kvm *kvm = vcpu-kvm;
+   struct kvm_pit *pit = kvm-arch.vpit;
struct kvm_kpit_state *ps;
 
if (pit) {
diff --git a/arch/x86/kvm/i8254.h b/arch/x86/kvm/i8254.h
index d4c1c7f..d7bc40b 100644
--- a/arch/x86/kvm/i8254.h
+++ b/arch/x86/kvm/i8254.h
@@ -49,10 +49,12 @@ struct kvm_pit {
 #define KVM_MAX_PIT_INTR_INTERVAL   HZ / 100
 #define KVM_PIT_CHANNEL_MASK   0x3
 
-void kvm_inject_pit_timer_irqs(struct kvm_vcpu *vcpu);
+int pit_has_pending_timer(struct kvm *kvm);
+void kvm_inject_pit_timer_irqs(struct kvm *kvm);
 void kvm_pit_load_count(struct kvm *kvm, int channel, u32 val, int 
hpet_legacy_start);
 struct kvm_pit *kvm_create_pit(struct kvm *kvm, u32 flags);
 void kvm_free_pit(struct kvm *kvm);

[PATCH 5/5] Fix kdump under KVM.

2009-10-27 Thread Chris Lalancette

This patch is the main point of the series.  In order for
kdump to properly work inside a KVM guest, we need to make
sure that all VCPUs in virtual wire APIC mode get kicked
to try and pick up the timer interrupts.  To do this,
we iterate over the CPUs and deliver interrupts to the
proper VCPUs.

I don't love the concept of doing kvm_irq_kick_vcpus() from
within pit_timer_fn().  A PIT is not connected to a CPU at all,
only to a PIC or APIC.  However, if a CPU enters idle, this is
the only way to wake it up to check for the interrupt.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 d5c08fa... fe08823... M  arch/x86/kvm/i8254.c
:100644 100644 40f... 5b699c1... M  arch/x86/kvm/lapic.c
:100644 100644 40010b0... 9c4e52b... M  arch/x86/kvm/lapic.h
:100644 100644 053e49f... 975b0d6... M  include/linux/kvm_host.h
:100644 100644 cd6f92b... 717d265... M  virt/kvm/ioapic.c
:100644 100644 0d454d3... d24ac91... M  virt/kvm/irq_comm.c
 arch/x86/kvm/i8254.c |3 +--
 arch/x86/kvm/lapic.c |   12 
 arch/x86/kvm/lapic.h |1 +
 include/linux/kvm_host.h |2 ++
 virt/kvm/ioapic.c|9 -
 virt/kvm/irq_comm.c  |   12 
 6 files changed, 28 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kvm/i8254.c b/arch/x86/kvm/i8254.c
index d5c08fa..fe08823 100644
--- a/arch/x86/kvm/i8254.c
+++ b/arch/x86/kvm/i8254.c
@@ -299,8 +299,7 @@ static enum hrtimer_restart pit_timer_fn(struct hrtimer 
*data)
if (ktimer-reinject || !atomic_read(ktimer-pending))
atomic_inc(ktimer-pending);
 
-   if (waitqueue_active(ktimer-kvm-bsp_vcpu-wq))
-   wake_up_interruptible(ktimer-kvm-bsp_vcpu-wq);
+   kvm_irq_kick_vcpus(ktimer-kvm);
 
if (ktimer-t_ops-is_periodic(ktimer)) {
hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 40f..5b699c1 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1031,6 +1031,18 @@ void kvm_apic_nmi_wd_deliver(struct kvm_vcpu *vcpu)
kvm_apic_local_deliver(apic, APIC_LVT0);
 }
 
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu)
+{
+   if (kvm_lapic_enabled(vcpu)) {
+   u32 lvt0 = apic_get_reg(vcpu-arch.apic, APIC_LVT0);
+   if ((lvt0  APIC_LVT_MASKED) == 0 
+   GET_APIC_DELIVERY_MODE(lvt0) == APIC_MODE_EXTINT)
+   return 1;
+   }
+
+   return 0;
+}
+
 static struct kvm_timer_ops lapic_timer_ops = {
.is_periodic = lapic_is_periodic,
 };
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 40010b0..9c4e52b 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -30,6 +30,7 @@ void kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long 
cr8);
 void kvm_lapic_set_base(struct kvm_vcpu *vcpu, u64 value);
 u64 kvm_lapic_get_base(struct kvm_vcpu *vcpu);
 void kvm_apic_set_version(struct kvm_vcpu *vcpu);
+int kvm_apic_in_virtual_wire_mode(struct kvm_vcpu *vcpu);
 
 int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16 dest);
 int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8 mda);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 053e49f..975b0d6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -542,6 +542,8 @@ int kvm_set_irq_routing(struct kvm *kvm,
unsigned flags);
 void kvm_free_irq_routing(struct kvm *kvm);
 
+void kvm_irq_kick_vcpus(struct kvm *kvm);
+
 #else
 
 static inline void kvm_free_irq_routing(struct kvm *kvm) {}
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index cd6f92b..717d265 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -163,15 +163,6 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.level = 1;
irqe.shorthand = 0;
 
-#ifdef CONFIG_X86
-   /* Always deliver PIT interrupt to vcpu 0 */
-   if (irq == 0) {
-   irqe.dest_mode = 0; /* Physical mode. */
-   /* need to read apic_id from apic regiest since
-* it can be rewritten */
-   irqe.dest_id = ioapic-kvm-bsp_vcpu-vcpu_id;
-   }
-#endif
return kvm_irq_delivery_to_apic(ioapic-kvm, NULL, irqe);
 }
 
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 0d454d3..d24ac91 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -293,6 +293,18 @@ void kvm_free_irq_routing(struct kvm *kvm)
kfree(kvm-irq_routing);
 }
 
+void kvm_irq_kick_vcpus(struct kvm *kvm)
+{
+   int i;
+   struct kvm_vcpu *vcpu;
+
+   kvm_for_each_vcpu(i, vcpu, kvm) {
+   if (kvm_apic_in_virtual_wire_mode(vcpu))
+   if (waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
+   }
+}
+
 static int setup_routing_entry(struct kvm_irq_routing_table *rt,
   struct kvm_kernel_irq_routing_entry *e,

[PATCH 4/5] Remove timer.c

2009-10-27 Thread Chris Lalancette

The code in arch/x86/kvm/timer.c is not similar enough between
the various implementations to really share it.  Move the
implementation into the LAPIC code, and then remove timer.c

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 31a7035... 8d9adf6... M  arch/x86/kvm/Makefile
:100644 100644 ba39e25... 55b2dab... M  arch/x86/kvm/kvm_timer.h
:100644 100644 cd60c0b... 40f... M  arch/x86/kvm/lapic.c
:100644 00 72b5144... 000... D  arch/x86/kvm/timer.c
 arch/x86/kvm/Makefile|3 +-
 arch/x86/kvm/kvm_timer.h |3 --
 arch/x86/kvm/lapic.c |   34 -
 arch/x86/kvm/timer.c |   47 --
 4 files changed, 34 insertions(+), 53 deletions(-)

diff --git a/arch/x86/kvm/Makefile b/arch/x86/kvm/Makefile
index 31a7035..8d9adf6 100644
--- a/arch/x86/kvm/Makefile
+++ b/arch/x86/kvm/Makefile
@@ -10,8 +10,7 @@ kvm-y += $(addprefix ../../../virt/kvm/, 
kvm_main.o ioapic.o \
assigned-dev.o)
 kvm-$(CONFIG_IOMMU_API)+= $(addprefix ../../../virt/kvm/, iommu.o)
 
-kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o \
-  i8254.o timer.o
+kvm-y  += x86.o mmu.o emulate.o i8259.o irq.o lapic.o i8254.o
 kvm-intel-y+= vmx.o
 kvm-amd-y  += svm.o
 
diff --git a/arch/x86/kvm/kvm_timer.h b/arch/x86/kvm/kvm_timer.h
index ba39e25..55b2dab 100644
--- a/arch/x86/kvm/kvm_timer.h
+++ b/arch/x86/kvm/kvm_timer.h
@@ -15,7 +15,4 @@ struct kvm_timer_ops {
 bool (*is_periodic)(struct kvm_timer *);
 };
 
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data);
-
 #endif
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index cd60c0b..40f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1040,6 +1040,38 @@ static const struct kvm_io_device_ops apic_mmio_ops = {
.write= apic_mmio_write,
 };
 
+static enum hrtimer_restart lapic_timer_fn(struct hrtimer *data)
+{
+   struct kvm_vcpu *vcpu;
+   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
+   int restart_timer = 0;
+
+   vcpu = ktimer-vcpu;
+   if (!vcpu)
+   return HRTIMER_NORESTART;
+
+   /*
+* There is a race window between reading and incrementing, but we do
+* not care about potentially losing timer events in the !reinject
+* case anyway.
+*/
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
+   atomic_inc(ktimer-pending);
+
+   if (waitqueue_active(vcpu-wq))
+   wake_up_interruptible(vcpu-wq);
+
+   if (ktimer-t_ops-is_periodic(ktimer)) {
+   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
+   restart_timer = 1;
+   }
+
+   if (restart_timer)
+   return HRTIMER_RESTART;
+   else
+   return HRTIMER_NORESTART;
+}
+
 int kvm_create_lapic(struct kvm_vcpu *vcpu)
 {
struct kvm_lapic *apic;
@@ -1065,7 +1097,7 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu)
 
hrtimer_init(apic-lapic_timer.timer, CLOCK_MONOTONIC,
 HRTIMER_MODE_ABS);
-   apic-lapic_timer.timer.function = kvm_timer_fn;
+   apic-lapic_timer.timer.function = lapic_timer_fn;
apic-lapic_timer.t_ops = lapic_timer_ops;
apic-lapic_timer.kvm = vcpu-kvm;
apic-lapic_timer.vcpu = vcpu;
diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
deleted file mode 100644
index 72b5144..000
--- a/arch/x86/kvm/timer.c
+++ /dev/null
@@ -1,47 +0,0 @@
-#include linux/kvm_host.h
-#include linux/kvm.h
-#include linux/hrtimer.h
-#include asm/atomic.h
-#include kvm_timer.h
-
-static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct kvm_timer *ktimer)
-{
-   int restart_timer = 0;
-   wait_queue_head_t *q = vcpu-wq;
-
-   /*
-* There is a race window between reading and incrementing, but we do
-* not care about potentially loosing timer events in the !reinject
-* case anyway.
-*/
-   if (ktimer-reinject || !atomic_read(ktimer-pending))
-   atomic_inc(ktimer-pending);
-
-   if (waitqueue_active(q))
-   wake_up_interruptible(q);
-
-   if (ktimer-t_ops-is_periodic(ktimer)) {
-   hrtimer_add_expires_ns(ktimer-timer, ktimer-period);
-   restart_timer = 1;
-   }
-
-   return restart_timer;
-}
-
-enum hrtimer_restart kvm_timer_fn(struct hrtimer *data)
-{
-   int restart_timer;
-   struct kvm_vcpu *vcpu;
-   struct kvm_timer *ktimer = container_of(data, struct kvm_timer, timer);
-
-   vcpu = ktimer-vcpu;
-   if (!vcpu)
-   return HRTIMER_NORESTART;
-
-   restart_timer = __kvm_timer_fn(vcpu, ktimer);
-   if (restart_timer)
-   return HRTIMER_RESTART;
-   else
-   return HRTIMER_NORESTART;
-}
-
-- 
1.6.0.6

--
To unsubscribe from this

[PATCH 2/5] Remove KVM_REQ_PENDING_TIMER.

2009-10-27 Thread Chris Lalancette

KVM_REQ_PENDING_TIMER is set and cleared in a couple of places,
but it never seems to be actually checked.  Remove it.

Signed-off-by: Chris Lalancette clala...@redhat.com
---
:100644 100644 eea4043... 72b5144... M  arch/x86/kvm/timer.c
:100644 100644 2ef39062.. 93a65b4... M  arch/x86/kvm/x86.c
:100644 100644 bd5a616... 053e49f... M  include/linux/kvm_host.h
 arch/x86/kvm/timer.c |5 +
 arch/x86/kvm/x86.c   |1 -
 include/linux/kvm_host.h |1 -
 3 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kvm/timer.c b/arch/x86/kvm/timer.c
index eea4043..72b5144 100644
--- a/arch/x86/kvm/timer.c
+++ b/arch/x86/kvm/timer.c
@@ -14,11 +14,8 @@ static int __kvm_timer_fn(struct kvm_vcpu *vcpu, struct 
kvm_timer *ktimer)
 * not care about potentially loosing timer events in the !reinject
 * case anyway.
 */
-   if (ktimer-reinject || !atomic_read(ktimer-pending)) {
+   if (ktimer-reinject || !atomic_read(ktimer-pending))
atomic_inc(ktimer-pending);
-   /* FIXME: this code should not know anything about vcpus */
-   set_bit(KVM_REQ_PENDING_TIMER, vcpu-requests);
-   }
 
if (waitqueue_active(q))
wake_up_interruptible(q);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2ef3906..93a65b4 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3906,7 +3906,6 @@ static int __vcpu_run(struct kvm_vcpu *vcpu)
if (r = 0)
break;
 
-   clear_bit(KVM_REQ_PENDING_TIMER, vcpu-requests);
if (kvm_cpu_has_pending_timer(vcpu))
kvm_inject_pending_timer_irqs(vcpu);
 
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index bd5a616..053e49f 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -33,7 +33,6 @@
 #define KVM_REQ_REPORT_TPR_ACCESS  2
 #define KVM_REQ_MMU_RELOAD 3
 #define KVM_REQ_TRIPLE_FAULT   4
-#define KVM_REQ_PENDING_TIMER  5
 #define KVM_REQ_UNHALT 6
 #define KVM_REQ_MMU_SYNC   7
 #define KVM_REQ_KVMCLOCK_UPDATE8
-- 
1.6.0.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Qemu-devel] [PATCH 0/4] megaraid_sas HBA emulation

2009-10-27 Thread Gerd Hoffmann


  Hi,


The device can be accessed by

-drive if=raid,file=XXX


Don't extend that qemu automagic please.  The new way to handle this is:

  -drive if=none,id=mydisk,file=/path/to/some/disk.img
  -device megasas,id=raid
  -device scsi-disk,bus=raid.0,scsi-id=1,drive=mydisk


In order to support SCSI command emulation I had to update /
patch up the existing SCSI disk support. This might be
not to everyones taste, so I'm open to alternative
suggestions.

But I certainly do _not_ want to update the SCSI disk
emulation, as this is really quite tied to the SCSI parallel
interface used by the old lsi53c895a.c.


--verbose please.  I'd prefer to fix scsi-disk bugs and/or limitations 
instead of working around them.



Plus it doesn't do scatter-gather list handling,


Which should be fixed anyway.

cheers,
  Gerd
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Gregory Haskins

Gleb Natapov wrote:

 1) rcu_read_lock is something like 4x faster than srcu_read_lock(), but
 we are talking about nanoseconds on modern hardware (I think Paul quoted
 me 10ns vs 45ns on his rig).  I don't think either overhead is something
 to be concerned about in this case.

 If we can avoid why not? Nanoseconds tend to add up.
 

BTW: I didn't mean to imply that we should be cavalier in adding
overhead.  My point was that adding overhead is sometimes _necessary_ to
prevent a race above and beyond an RCU pointer acquisition, and 35ns is
not _terrible_ IMO.

I was suggesting to solve this by switching to SRCU, but an alternative
is copying the structure (when permitted with immutable objects), which
seems to work in this particular case.  It should be noted that the copy
has its own unquantified overhead beyond basic RCU as well, so its not
truly free (I'd guess its = as the switch to SRCU without copies, though).

IOW: sync hurts, but its sometimes a necessary evil ;)

Kind Regards,
-Greg



signature.asc
Description: OpenPGP digital signature

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Paul E. McKenney

On Tue, Oct 27, 2009 at 09:34:41AM -0400, Gregory Haskins wrote:
 Hi Paul,
 
 Paul E. McKenney wrote:
  On Mon, Oct 26, 2009 at 12:21:57PM -0400, Gregory Haskins wrote:
  The current code suffers from the following race condition:
 
  thread-1thread-2
  ---
 
  kvm_set_irq() {
 rcu_read_lock()
 irq_rt = rcu_dereference(table);
 rcu_read_unlock();
 
 kvm_set_irq_routing() {
mutex_lock();
irq_rt = table;
rcu_assign_pointer();
mutex_unlock();
synchronize_rcu();
 
kfree(irq_rt);
 
 irq_rt-entry-set(); /* bad */
 
  -
 
  Because the pointer is accessed outside of the read-side critical
  section.  There are two basic patterns we can use to fix this bug:
 
  1) Switch to sleeping-rcu and encompass the -set() access within the
 read-side critical section,
 
 OR
 
  2) Add reference counting to the irq_rt structure, and simply acquire
 the reference from within the RSCS.
 
  This patch implements solution (1).
  
  Looks like a good transformation!  A few questions interspersed below.
 
 Thanks for the review.  I would have CC'd you but I figured I pestered
 you enough with my RCU reviews in the past, and didn't want to annoy you ;)
 
 I will be sure to CC you in the future, unless you ask otherwise.

No problem either way.  ;-)

Thanx, Paul

  Signed-off-by: Gregory Haskins ghask...@novell.com
  ---
 
   include/linux/kvm_host.h |6 +-
   virt/kvm/irq_comm.c  |   50 
  +++---
   virt/kvm/kvm_main.c  |1 +
   3 files changed, 35 insertions(+), 22 deletions(-)
 
  diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
  index bd5a616..1fe135d 100644
  --- a/include/linux/kvm_host.h
  +++ b/include/linux/kvm_host.h
  @@ -185,7 +185,10 @@ struct kvm {
 
 struct mutex irq_lock;
   #ifdef CONFIG_HAVE_KVM_IRQCHIP
  -  struct kvm_irq_routing_table *irq_routing;
  +  struct {
  +  struct srcu_structsrcu;
  
  Each structure has its own SRCU domain.  This is OK, but just asking
  if that is the intent.  It does look like the SRCU primitives are
  passed a pointer to the correct structure, and that the return value
  from srcu_read_lock() gets passed into the matching srcu_read_unlock()
  like it needs to be, so that is good.
 
 Yeah, it was intentional.  Technically the table is per-guest, and thus
 the locking is too, which is the desired/intentional granularity.
 
 On that note, I tried to denote that kvm-irq_routing.srcu and
 kvm-irq_routing.table were related to one another, but then went ahead
 and modified the hunks that touched kvm-irq_ack_notifier_list, too.  In
 retrospect, this was probably a mistake.  I should leave the rcu usage
 outside of -irq_routing.table alone.
 
  
  +  struct kvm_irq_routing_table *table;
  +  } irq_routing;
 struct hlist_head mask_notifier_list;
 struct hlist_head irq_ack_notifier_list;
   #endif
  
  [ . . . ]
  
  @@ -155,21 +156,19 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, 
  u32 irq, int level)
  * IOAPIC.  So set the bit in both. The guest will ignore
  * writes to the unused one.
  */
  -  rcu_read_lock();
  -  irq_rt = rcu_dereference(kvm-irq_routing);
  +  idx = srcu_read_lock(kvm-irq_routing.srcu);
  +  irq_rt = rcu_dereference(kvm-irq_routing.table);
 if (irq  irq_rt-nr_rt_entries)
  -  hlist_for_each_entry(e, n, irq_rt-map[irq], link)
  -  irq_set[i++] = *e;
  -  rcu_read_unlock();
  +  hlist_for_each_entry(e, n, irq_rt-map[irq], link) {
  
  What prevents the above list from changing while we are traversing it?
  (Yes, presumably whatever was preventing it from changing before this
  patch, but what?)
  
  Mostly kvm-lock is held, but not always.  And if kvm-lock were held
  all the time, there would be no point in using SRCU.  ;-)
 
 This is protected by kvm-irq_lock within kvm_set_irq_routing().
 Entries are added to a copy of the list, and the top-level table pointer
 is swapped (via rcu_assign_pointer(), as it should be) while holding the
 lock.  Finally, we synchronize with the RSCS before deleting the old
 copy.  It looks to me like the original author got this part right, so I
 didn't modify it outside of converting to SRCU.
 
  
  +  int r;
 
  -  while(i--) {
  -  int r;
  -  r = irq_set[i].set(irq_set[i], kvm, irq_source_id, level);
  -  if (r  0)
  -  continue;
  +  r = e-set(e, kvm,

Re: [KVM PATCH v3 1/3] KVM: fix race in irq_routing logic

2009-10-27 Thread Paul E. McKenney

On Tue, Oct 27, 2009 at 11:02:23AM -0400, Gregory Haskins wrote:
 Thanks for this, Paul.
 
 Some questions and statements below.
 
 Paul E. McKenney wrote:
  On Tue, Oct 27, 2009 at 04:02:37PM +0200, Gleb Natapov wrote:
  On Tue, Oct 27, 2009 at 09:39:03AM -0400, Gregory Haskins wrote:
  
  [ . . . ]
  
  standard RCU RSCS, which is what SRCU is designed for.  So rather than
  inventing an awkward two-phased stack based solution, it's better to
  reuse the provided tools, IMO.
 
  To flip it around:  Is there any reason why an SRCU would not work here,
  and thus we were forced to use something like the stack-copy approach?
 
  If SRCU has no disadvantage comparing to RCU why not use it always? :)
  
  The disadvantages of SRCU compared to RCU include the following:
  
  1.  SRCU requires that the return value of srcu_read_lock()
  be fed into srcu_read_unlock().  This is usually not a problem,
  but can be painful if there are multiple levels of function
  call separating the two.
 
 Right, and this is simple/neat w.r.t. its usage in irq_routing, so no
 problem there.

Fair enough!

  
  2.  SRCU's grace periods are about 4x slower than those of RCU.
  And they also don't scale all that well with extremely large
  numbers of CPUs (but this can be fixed when/if it becomes a
  real problem).
 
 The irq_routing update path is extremely infrequent, so this should not
 be an issue.

Sounds good!

  3.  SRCU's read-side primitives are also significantly slower than
  those of RCU.
 
 Are the 10ns vs 45ns numbers that I mentioned in my last reply the
 proper ballpark?  How do these compare to an atomic-op, say an
 uncontended spinlock on modern hardware?  The assumption is that
 srcu_read_lock() should be significantly cheaper than a read-lock().  If
 its not, then we might as well use something else, I suppose.  But if
 its not, I guess you probably wouldn't have bothered to invent it in the
 first place ;)

SRCU read-side critical sections should indeed be quite a bit cheaper than
uncontended spinlock, particularly if the spinlock was last released by
some other CPU.  There are those who insist that uncontended spinlocks
and atomic operations will soon be free, but I will believe this when
I see it.  ;-)

  4.  SRCU does not have a call_srcu().  One could be provided, but
  its semantics would be a bit strange due to the need to limit
  the number of callbacks, given that general blocking is
  permitted in SRCU read-side critical sections.  (And it would
  take some doing to convince me to supply an SRCU!)
 
 This is not an issue in our design.

Very good!

  5.  The current SRCU has no reasonable way to implement read-side
  priority boosting, as there is no record of which task
  is read-holding which SRCU.
 
 Given the infrequency of the update path, I do not see this as a problem.

Sounds like you have it covered, then!

Thanx, Paul
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] Fix kdump under KVM.

2009-10-27 Thread Marcelo Tosatti

On Tue, Oct 27, 2009 at 05:41:07PM +0100, Chris Lalancette wrote:
 This patch is the main point of the series.  In order for
 kdump to properly work inside a KVM guest, we need to make
 sure that all VCPUs in virtual wire APIC mode get kicked
 to try and pick up the timer interrupts.  To do this,
 we iterate over the CPUs and deliver interrupts to the
 proper VCPUs.
 
 I don't love the concept of doing kvm_irq_kick_vcpus() from
 within pit_timer_fn().  A PIT is not connected to a CPU at all,
 only to a PIC or APIC.  However, if a CPU enters idle, this is
 the only way to wake it up to check for the interrupt.

The reason the PIT interrupt was fixed to BSP is:

http://www.mail-archive.com/kvm-de...@lists.sourceforge.net/msg13250.html

Perhaps enhancing ioapic_deliver's (irq == 0) check to bypass the   
destination overwrite in case its programmed by the guest to 
a single CPU would fix it?



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation

2009-10-27 Thread Michael S. Tsirkin

On Mon, Oct 26, 2009 at 12:22:08PM -0400, Gregory Haskins wrote:
 IRQFD currently uses a deferred workqueue item to execute the injection
 operation.  It was originally designed this way because kvm_set_irq()
 required the caller to hold the irq_lock mutex, and the eventfd callback
 is invoked from within a non-preemptible critical section.
 
 With the advent of lockless injection support for certain GSIs, the
 deferment mechanism is no longer technically needed in all cases.
 Since context switching to the workqueue is a source of interrupt
 latency, lets switch to a direct method whenever possible.  Fortunately
 for us, the most common use of irqfd (MSI-based GSIs) readily support
 lockless injection.
 
 Signed-off-by: Gregory Haskins ghask...@novell.com

This is a useful optimization I think.
Some comments below.

 ---
 
  virt/kvm/eventfd.c |   31 +++
  1 files changed, 27 insertions(+), 4 deletions(-)
 
 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 30f70fd..e6cc958 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -51,20 +51,34 @@ struct _irqfd {
   wait_queue_t  wait;
   struct work_structinject;
   struct work_structshutdown;
 + void (*execute)(struct _irqfd *);
  };
  
  static struct workqueue_struct *irqfd_cleanup_wq;
  
  static void
 -irqfd_inject(struct work_struct *work)
 +irqfd_inject(struct _irqfd *irqfd)
  {
 - struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
   struct kvm *kvm = irqfd-kvm;
  
   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
   kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
  }
  
 +static void
 +irqfd_deferred_inject(struct work_struct *work)
 +{
 + struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
 +
 + irqfd_inject(irqfd);
 +}
 +
 +static void
 +irqfd_schedule(struct _irqfd *irqfd)
 +{
 + schedule_work(irqfd-inject);
 +}
 +
  /*
   * Race-free decouple logic (ordering is critical)
   */
 @@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int sync, 
 void *key)
  
   if (flags  POLLIN)
   /* An event has been signaled, inject an interrupt */
 - schedule_work(irqfd-inject);
 + irqfd-execute(irqfd);
  
   if (flags  POLLHUP) {
   /* The eventfd is closing, detach from KVM */
 @@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
   irqfd-kvm = kvm;
   irqfd-gsi = gsi;
   INIT_LIST_HEAD(irqfd-list);
 - INIT_WORK(irqfd-inject, irqfd_inject);
 + INIT_WORK(irqfd-inject, irqfd_deferred_inject);
   INIT_WORK(irqfd-shutdown, irqfd_shutdown);
  
   file = eventfd_fget(fd);
 @@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
   list_add_tail(irqfd-list, kvm-irqfds.items);
   spin_unlock_irq(kvm-irqfds.lock);
  
 + ret = kvm_irq_check_lockless(kvm, gsi);
 + if (ret  0)
 + goto fail;
 +
 + if (ret)
 + irqfd-execute = irqfd_inject;
 + else
 + irqfd-execute = irqfd_schedule;
 +

Can't gsi get converted from lockless to non-lockless
after it's checked (by the routing ioctl)?  Kernel will crash then.

How about, each time we get event from eventfd, we implement
kvm_irqfd_toggle_lockless, which does a single scan, and returns
true/false status (and I really mean toggle, let's not do set 1 / set 0
as well) telling us whether interrupts could be delivered in a lockless
manner?

Then we can either just ignore the error (no one uses eventfd this way),
or handle the mostly irrelevant case of level by means of the workqueue,
like we did previously.


   /*
* Check if there was an event already pending on the eventfd
* before we registered, and trigger it as if we didn't miss it.
 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/3] introduce VMSTATE_U64

2009-10-27 Thread Andrew Theurer


 On Tue, Oct 20, 2009 at 08:40:26AM +0900, Avi Kivity wrote:
  On 10/17/2009 04:27 AM, Glauber Costa wrote:
  This is a patch actually written by Juan, which, according to him,
  he plans on posting to qemu.git. Problem is that linux defines
  u64 in a way that is type-uncompatible with uint64_t.
 
  I am including it here, because it is a dependency to my patch series
  that follows.
 
 
 
  Why can't we store these values in qemu as uint64_ts?
 Because then we have to redefine the whole structure in qemu.
 
 the proposal is to simply pick the structures directly from linux. I believe
 it is much easier, and the versioning scheme in vmstate will help us get 
 around any changes they might suffer in the future.

I get build errors with this.  Is there something extra I need to do?  I am 
currently running a rhel54 2.6.18 kernel.

  CCqdev-properties.o
savevm.c: In function ‘get_u64’:
savevm.c:856: error: ‘__u64’ undeclared (first use in this function)
savevm.c:856: error: (Each undeclared identifier is reported only once
savevm.c:856: error: for each function it appears in.)
savevm.c:856: error: ‘v’ undeclared (first use in this function)
savevm.c: In function ‘put_u64’:
savevm.c:863: error: ‘__u64’ undeclared (first use in this function)
savevm.c:863: error: ‘v’ undeclared (first use in this function)
make[1]: *** [savevm.o] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [build-all] Error 2


Thanks,

-Andrew





--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 1/3] KVM: x86: disallow multiple KVM_CREATE_IRQCHIP

2009-10-27 Thread Michael S. Tsirkin

On Tue, Oct 27, 2009 at 01:10:43PM -0200, Marcelo Tosatti wrote:
 Otherwise kvm will leak memory on multiple KVM_CREATE_IRQCHIP.
 
 CC: sta...@kernel.org
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/x86.c
 ===
 --- kvm.orig/arch/x86/kvm/x86.c
 +++ kvm/arch/x86/kvm/x86.c
 @@ -2285,6 +2285,9 @@ long kvm_arch_vm_ioctl(struct file *filp
   goto out;
   break;
   case KVM_CREATE_IRQCHIP:
 + r = -EEXIST;
 + if (kvm-arch.vpic)
 + goto out;
   r = -ENOMEM;
   kvm-arch.vpic = kvm_create_pic(kvm);
   if (kvm-arch.vpic) {
 @@ -2300,6 +2303,8 @@ long kvm_arch_vm_ioctl(struct file *filp
   if (r) {
   kfree(kvm-arch.vpic);
   kfree(kvm-arch.vioapic);
 + kvm-arch.vpic = NULL;
 + kvm-arch.vioapic = NULL;
   goto out;
   }
   break;

Is there a lock that protects this structure?
Can memory leak still occur if multiple threads call
KVM_CREATE_IRQCHIP in parallel?

 
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [patch 2/3] KVM: x86: disallow KVM_{SET,GET}_LAPIC without in kernel irqchip

2009-10-27 Thread Michael S. Tsirkin

On Tue, Oct 27, 2009 at 01:10:44PM -0200, Marcelo Tosatti wrote:
 Otherwise kvm might attempt to dereference a NULL pointer.
 
 CC: sta...@kernel.org
 Signed-off-by: Marcelo Tosatti mtosa...@redhat.com
 
 Index: kvm/arch/x86/kvm/x86.c
 ===
 --- kvm.orig/arch/x86/kvm/x86.c
 +++ kvm/arch/x86/kvm/x86.c
 @@ -1815,6 +1815,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi
  
   switch (ioctl) {
   case KVM_GET_LAPIC: {
 + r = -EINVAL;
 + if (!irqchip_in_kernel(vcpu-kvm))
 + goto out;
   lapic = kzalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
  
   r = -ENOMEM;
 @@ -1830,6 +1833,9 @@ long kvm_arch_vcpu_ioctl(struct file *fi
   break;
   }
   case KVM_SET_LAPIC: {
 + r = -EINVAL;
 + if (!irqchip_in_kernel(vcpu-kvm))
 + goto out;
   lapic = kmalloc(sizeof(struct kvm_lapic_state), GFP_KERNEL);
   r = -ENOMEM;
   if (!lapic)
 

Can the value of irqchip_in_kernel be changed by userspace
after we have checked it? If yes, this check won't help ...


 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [KVM PATCH v3 3/3] KVM: Directly inject interrupts if they support lockless operation

2009-10-27 Thread Gregory Haskins

Michael S. Tsirkin wrote:
 On Mon, Oct 26, 2009 at 12:22:08PM -0400, Gregory Haskins wrote:
 IRQFD currently uses a deferred workqueue item to execute the injection
 operation.  It was originally designed this way because kvm_set_irq()
 required the caller to hold the irq_lock mutex, and the eventfd callback
 is invoked from within a non-preemptible critical section.

 With the advent of lockless injection support for certain GSIs, the
 deferment mechanism is no longer technically needed in all cases.
 Since context switching to the workqueue is a source of interrupt
 latency, lets switch to a direct method whenever possible.  Fortunately
 for us, the most common use of irqfd (MSI-based GSIs) readily support
 lockless injection.

 Signed-off-by: Gregory Haskins ghask...@novell.com
 
 This is a useful optimization I think.
 Some comments below.
 
 ---

  virt/kvm/eventfd.c |   31 +++
  1 files changed, 27 insertions(+), 4 deletions(-)

 diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
 index 30f70fd..e6cc958 100644
 --- a/virt/kvm/eventfd.c
 +++ b/virt/kvm/eventfd.c
 @@ -51,20 +51,34 @@ struct _irqfd {
  wait_queue_t  wait;
  struct work_structinject;
  struct work_structshutdown;
 +void (*execute)(struct _irqfd *);
  };
  
  static struct workqueue_struct *irqfd_cleanup_wq;
  
  static void
 -irqfd_inject(struct work_struct *work)
 +irqfd_inject(struct _irqfd *irqfd)
  {
 -struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
  struct kvm *kvm = irqfd-kvm;
  
  kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 1);
  kvm_set_irq(kvm, KVM_USERSPACE_IRQ_SOURCE_ID, irqfd-gsi, 0);
  }
  
 +static void
 +irqfd_deferred_inject(struct work_struct *work)
 +{
 +struct _irqfd *irqfd = container_of(work, struct _irqfd, inject);
 +
 +irqfd_inject(irqfd);
 +}
 +
 +static void
 +irqfd_schedule(struct _irqfd *irqfd)
 +{
 +schedule_work(irqfd-inject);
 +}
 +
  /*
   * Race-free decouple logic (ordering is critical)
   */
 @@ -126,7 +140,7 @@ irqfd_wakeup(wait_queue_t *wait, unsigned mode, int 
 sync, void *key)
  
  if (flags  POLLIN)
  /* An event has been signaled, inject an interrupt */
 -schedule_work(irqfd-inject);
 +irqfd-execute(irqfd);
  
  if (flags  POLLHUP) {
  /* The eventfd is closing, detach from KVM */
 @@ -179,7 +193,7 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
  irqfd-kvm = kvm;
  irqfd-gsi = gsi;
  INIT_LIST_HEAD(irqfd-list);
 -INIT_WORK(irqfd-inject, irqfd_inject);
 +INIT_WORK(irqfd-inject, irqfd_deferred_inject);
  INIT_WORK(irqfd-shutdown, irqfd_shutdown);
  
  file = eventfd_fget(fd);
 @@ -209,6 +223,15 @@ kvm_irqfd_assign(struct kvm *kvm, int fd, int gsi)
  list_add_tail(irqfd-list, kvm-irqfds.items);
  spin_unlock_irq(kvm-irqfds.lock);
  
 +ret = kvm_irq_check_lockless(kvm, gsi);
 +if (ret  0)
 +goto fail;
 +
 +if (ret)
 +irqfd-execute = irqfd_inject;
 +else
 +irqfd-execute = irqfd_schedule;
 +
 
 Can't gsi get converted from lockless to non-lockless
 after it's checked (by the routing ioctl)?

I think I protect against this in patch 2/3 by ensuring that any vectors
that are added have to conform to the same locking rules.  The code
doesn't support deleting routes, so we really only need to make sure
that new routes do not change.

 Kernel will crash then.
 
 How about, each time we get event from eventfd, we implement
 kvm_irqfd_toggle_lockless, which does a single scan, and returns
 true/false status (and I really mean toggle, let's not do set 1 / set 0
 as well) telling us whether interrupts could be delivered in a lockless
 manner?

I am not sure I like this idea in general given that I believe I already
handle the error case you are concerned with.

However, the concept of providing a toggle option so we can avoid
scanning the list twice is a good one.  That can be done as a new patch
series, but it would be a nice addition.

Thanks Michael,
-Greg



signature.asc
Description: OpenPGP digital signature

[PATCH] [RFC] KVM test: Major control file cleanup

2009-10-27 Thread Lucas Meneghel Rodrigues

As pointed out before, the KVM reference control files
could use a little clean up. This patch implements severe
cleanup of the main control file by:

* Refactoring the code present there, moving it to the
kvm_utils.py library
* Treat the build test exactly the same way as other
tests, moving the config stuff that used to be in the
control file realm out to its own configuration file,
for the sake of consistency.
* Since a frequent complaint amongst users is that by
default the build test will build from release, and
that build might be unsuccessful depending on the system
being used (older kernel versions and other sorts of
stuff), change the default build method to 'noinstall'.

This way the control file becomes way shorter, fairly
well organized, and we have a consistent configuration
schema across the board, based on configuration files.

If people are OK with this change, final path will
change the control.parallel file as well.

Important: For the default control file, I removed
the part of the code that parses new strings for the
test configuration because I believe those
modifications belong to the test config files, they
could be just commented. For cases where keeping
all stuff inside the config file is not possible,
appropriate code will be added (in other example
control files).

Signed-off-by: Lucas Meneghel Rodrigues l...@redhat.com
---
 client/tests/kvm/control  |  203 +++--
 client/tests/kvm/kvm_build.cfg.sample |   82 +
 client/tests/kvm/kvm_utils.py |  112 ++
 3 files changed, 211 insertions(+), 186 deletions(-)
 create mode 100644 client/tests/kvm/kvm_build.cfg.sample

diff --git a/client/tests/kvm/control b/client/tests/kvm/control
index 431baf9..7acec14 100644
--- a/client/tests/kvm/control
+++ b/client/tests/kvm/control
@@ -20,194 +20,25 @@ KVM (both kernelspace and userspace) code.
 For online docs, please refer to http://www.linux-kvm.org/page/KVM-Autotest
 
 
+import sys, os, logging
+# Add the KVM tests dir to the python path
+kvm_test_dir = os.path.join(os.environ['AUTODIR'],'tests/kvm')
+sys.path.append(kvm_test_dir)
+import kvm_utils, kvm_config
 
-import sys, os
 
-#-
-# set English environment (command output might be localized, need to be safe)
-#-
-os.environ['LANG'] = 'en_US.UTF-8'
+## Prepare the environment for the KVM test.
+kvm_utils.prepare_test_run(basedir=kvm_test_dir,
+   rootdir='/tmp/kvm_autotest_root')
 
-#-
-# Enable modules import from current directory (tests/kvm)
-#-
-pwd = os.path.join(os.environ['AUTODIR'],'tests/kvm')
-sys.path.append(pwd)
+cfg_names = ['kvm_build.cfg', 'kvm_tests.cfg']
 
-# 
-# create required symlinks
-# 
-# When dispatching tests from autotest-server the links we need do not exist on
-# the host (the client). The following lines create those symlinks. Change
-# 'rootdir' here and/or mount appropriate directories in it.
-#
-# When dispatching tests on local host (client mode) one can either setup kvm
-# links, or same as server mode use rootdir and set all appropriate links and
-# mount-points there. For example, guest installation tests need to know where
-# to find the iso-files.
-#
-# We create the links only if not already exist, so if one already set up the
-# links for client/local run we do not touch the links.
-rootdir='/tmp/kvm_autotest_root'
-iso=os.path.join(rootdir, 'iso')
-images=os.path.join(rootdir, 'images')
-qemu=os.path.join(rootdir, 'qemu')
-qemu_img=os.path.join(rootdir, 'qemu-img')
+for cfg_name in cfg_names:
+cfg_path = os.path.join(kvm_test_dir, cfg_name)
+cfg = kvm_config.config(cfg_path)
+list = kvm_utils.get_test_list(cfg, cfg_name, kvm_test_dir)
+logging.info(Running test set defined on config file %s, cfg_name)
+kvm_utils.run_tests(list, job)
 
-
-def link_if_not_exist(ldir, target, link_name):
-t = target
-l = os.path.join(ldir, link_name)
-if not os.path.exists(l):
-os.system('ln -s %s %s' % (t, l))
-
-# Create links only if not already exist
-link_if_not_exist(pwd, '../../', 'autotest')
-link_if_not_exist(pwd, iso, 'isos')
-link_if_not_exist(pwd, images, 'images')
-link_if_not_exist(pwd, qemu, 'qemu')
-link_if_not_exist(pwd, qemu_img, 'qemu-img')
-
-# 
-# Params that will be passed to the KVM install/build test
-# 
-params = {
-name: build,
-shortname: build,
-type: build,
-mode: release,
-#mode: snapshot,
-#mode: localtar,
-#mode: localsrc,
-#mode: git,
-#mode: noinstall,
-#mode: koji,
-
-## Are we going to

Re: vhost-net patches

2009-10-27 Thread Michael S. Tsirkin

On Tue, Oct 27, 2009 at 09:36:18AM -0700, Shirley Ma wrote:
 Hello Michael,
 
 On Tue, 2009-10-27 at 17:27 +0200, Michael S. Tsirkin wrote:
  Possibly GFP_ATOMIC allocations in vring_add_indirect are failing?
  Is there a chance you are tight on guest memory for some reason?
  with vhost, virtio does currently consume a bit more memory than
  with userspace backend.
 
 I did see memory leak on host every time after exiting guest. I don't
 know where. Do you see it?

I didn't notice. I'll check this.

 Anyway after I reboot host and restart guest with large memory
 allocation,

How large is large here? I usually allocate 1G.

 I do see performance improves to 3xxxMb/s, and occasionally
 reaches 40xxMb/s.

This is same as userspace, isn't it?

 But queue full still exists, I can avoid the problem
 by increasing send queue size from qemu.

And what performance do you get then?

 I will apply deferring skb allocation patch on guest to see any
 performance gain after your vhost patch.
 
 Thanks
 Shirley
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: performance regression in virtio-net in 2.6.32-rc4

2009-10-27 Thread Rusty Russell

On Tue, 27 Oct 2009 05:18:35 am Michael S. Tsirkin wrote:
 Hi!
 I noticed a performance regression in virtio net: going from
 2.6.31 to 2.6.32-rc4 I see this, for guest to host communication:
...
 Size   SizeSize Time Throughput
 bytes  bytes   bytessecs.10^6bits/sec
 
  87380  16384  1638410.207806.48
...
  87380  16384  1638410.006814.60

Hmm, needs a bisect I'd say.

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [ANNOUNCE] Sheepdog: Distributed Storage System for KVM

2009-10-27 Thread MORITA Kazutaka


On 2009/10/21 14:13, MORITA Kazutaka wrote:

Hi everyone,

Sheepdog is a distributed storage system for KVM/QEMU. It provides
highly available block level storage volumes to VMs like Amazon EBS.
Sheepdog supports advanced volume management features such as snapshot,
cloning, and thin provisioning. Sheepdog runs on several tens or hundreds
of nodes, and the architecture is fully symmetric; there is no central
node such as a meta-data server.


We added some pages to Sheepdog website:

 Design: http://www.osrg.net/sheepdog/design.html
 FAQ   : http://www.osrg.net/sheepdog/faq.html

Sheepdog mailing list is also ready to use (thanks for Tomasz)

 Subscribe/Unsubscribe/Preferences
   http://lists.wpkg.org/mailman/listinfo/sheepdog
 Archive
   http://lists.wpkg.org/pipermail/sheepdog/

We are always looking for developers or users interested in
participating in Sheepdog project!

Thanks.

MORITA Kazutaka
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-27 Thread Avi Kivity


On 10/27/2009 01:21 AM, Olof Johansson wrote:

On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:



For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard holl...@us.ibm.com


I saw it when building with pasemi_defconfig + manually enabled KVM 
options (all available).




Alex, can you fold this patch in?

No need to repost; just update your git tree.

(btw, please make sure the patchset is bisectable).

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-27 Thread Alexander Graf



On 27.10.2009, at 09:56, Avi Kivity wrote:


On 10/27/2009 01:21 AM, Olof Johansson wrote:

On Oct 26, 2009, at 6:20 PM, Hollis Blanchard wrote:



For some reason, I'm not seeing this build break, but the patch is
obviously correct.

Acked-by: Hollis Blanchard holl...@us.ibm.com


I saw it when building with pasemi_defconfig + manually enabled KVM  
options (all available).




Alex, can you fold this patch in?


I can, but it's only partly related. My patches don't even touch  
timing.c. The only thing I can imagine resulting in a breakage is that  
my patches allow for an =M setting.


So IMHO this patch should be applied before my series. Should I stick  
it as first patch in my git tree?


Alex
--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 00/27] Add KVM support for Book3s_64 (PPC64) hosts v5

2009-10-27 Thread Avi Kivity


On 10/27/2009 03:42 PM, Alexander Graf wrote:


I can, but it's only partly related. My patches don't even touch 
timing.c. The only thing I can imagine resulting in a breakage is that 
my patches allow for an =M setting.


So IMHO this patch should be applied before my series. Should I stick 
it as first patch in my git tree?




No need, I'll apply it independently.

--
error compiling committee.c: too many arguments to function

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

83 matches

Mail list logo