[PATCH] Use clockevent multiplier and shifter for decrementer

2012-04-04 Thread Bharat Bhushan
Time for which the hrtimer is started for decrementer emulation is calculated 
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC 
reprogramming (if needed) and which calculate timebase ticks using the 
multiplier and shifter mechanism implemented within clockevent layer. It was 
observed that this conversion (timebase-time-timebase) are not correct 
because the mechanism are not consistent. In our setup it adds 2% jitter.

With this patch clockevent multiplier and shifter mechanism are used when 
starting hrtimer for decrementer emulation. Now the jitter is  0.5%.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/time.h |2 ++
 arch/powerpc/kernel/time.c  |6 ++
 arch/powerpc/kvm/emulate.c  |5 +++--
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 7eb10fb..6d631b2 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -202,6 +202,8 @@ extern u64 mulhdu(u64, u64);
 extern void div128_by_32(u64 dividend_high, u64 dividend_low,
 unsigned divisor, struct div_result *dr);
 
+extern void get_clockevent_mult(u64 *multi, u64 *shift);
+
 /* Used to store Processor Utilization register (purr) values */
 
 struct cpu_usage {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 567dd7c..d229edd 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -910,6 +910,12 @@ static void __init init_decrementer_clockevent(void)
register_decrementer_clockevent(cpu);
 }
 
+void get_clockevent_mult(u64 *multi, u64 *shift)
+{
+   *multi = decrementer_clockevent.mult;
+   *shift = decrementer_clockevent.shift;
+}
+
 void secondary_cpu_time_init(void)
 {
/* Start the decrementer on CPUs that have manual control
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index afc9154..4bfcaa1 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -76,6 +76,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
unsigned long dec_nsec;
unsigned long long dec_time;
+   u64 mult, shift;
 
pr_debug(mtDEC: %x\n, vcpu-arch.dec);
hrtimer_try_to_cancel(vcpu-arch.dec_timer);
@@ -103,9 +104,9 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 * host ticks.
 */
 
+   get_clockevent_mult(mult, shift);
dec_time = vcpu-arch.dec;
-   dec_time *= 1000;
-   do_div(dec_time, tb_ticks_per_usec);
+   dec_time = (dec_time  shift) / mult;
dec_nsec = do_div(dec_time, NSEC_PER_SEC);
hrtimer_start(vcpu-arch.dec_timer,
ktime_set(dec_time, dec_nsec), HRTIMER_MODE_REL);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questing regarding KVM Guest PMU

2012-04-04 Thread Gleb Natapov
On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote:
 On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla
 shashank.rachama...@gmail.com wrote:
  On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote:
  On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote:
  On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote:
   On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote:
   On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com wrote:
On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote:
 I guess things are working fine with perf. But why not with 
 oprofile ?

 Looks like it. I never tried oprofile. Will try to reproduce your
 problem and see what oprofile is doing.
   
I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and
oprofile 0.9.6.
Also, I have tried to capture kvm-events ( perf patch ) in host 
while
running oprofile and perf in guest.
Please see the attachment. I have run the tests in three cases for 
the
around 5 secs.
   
There are more number of MSR reads and writes in case of perf which 
I
think is normal. However, there are very few MSR reads and writes 
with
oprofile. Also, the number of NMI exceptions are too high in case of
oprofile.
   
Which host kernel are you using? Try latest kvm.git and check if you 
see
something unusual in dmesg.
  
   Currenly running 3.3.0-rc5. will try with the latest source from kvm
   git and let you know.
  
  
   Thanks, there were some fixes that didn't make it into 3.3. rdpmc
   instruction emulation fix is one of them. If oprofile uses it this can
   explain the problem.
  
  I have tried with latest kvm source from git and also with 3.0 guest
  kernel but oprofile fails to collect any samples on guest. I am using
  a core2duo processor which is considered by oprofile as pentium pro
  model.
 
  core2duo on the host or the guest? What is your qemu command line?
 
  both. qemu command line below.
  sudo /usr/local/bin/qemu-system-x86_64 -drive
  file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net
  user
 
 
 please find more info ( /proc/cpuinfo and uname of both host and guest
 ) in attached files.
 
oprofile does not work for me even on the host. After trying to use it I can
see why perf was written in the first place.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Paolo Bonzini
Il 04/04/2012 03:18, Michael Roth ha scritto:
 Attacking the IDL/schema side first is the more rationale approach. From
 there we can potentially generate ASN.1 BER/DER visitors for the protocol
 side, or potentially even just vmstate bindings as a start. I've recently
 started looking into the latter... it's completely feasible, the only
 downside is it complicates the IDL due requiring support for a lot of
 what are very much vmstate-specific items, but it should be possible to
 do this in a manner where those annotations are self-contained and
 ignorable if we opted to replace vmstate-style declarations.

We can also keep the current vmstate descriptions, but access fields
from the automatically-generated visitors instead of struct fields.
This keeps the IDL simple.

Paolo
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] virtio_blk: use disk_name_format() to support mass of disks naming

2012-04-04 Thread Michael S. Tsirkin
On Mon, Apr 02, 2012 at 12:00:45PM -0700, Tejun Heo wrote:
 Hello, James.
 
 On Mon, Apr 02, 2012 at 11:56:18AM -0700, James Bottomley wrote:
  So if we're agreed no other devices going forwards should ever use this
  interface, is there any point unifying the interface?  No matter how
  many caveats you hedge it round with, putting the API in a central place
  will be a bit like a honey trap for careless bears.   It might be safer
  just to leave it buried in the three current drivers.
 
 Yeah, that was my hope but I think it would be easier to enforce to
 have a common function which is clearly marked legacy so that new
 driver writers can go look for the naming code in the existing ones,
 find out they're all using the same function which is marked legacy
 and explains what to do for newer drivers.
 
 Thanks.

I think I'm not the only one to be confused about the
preferred direction here.
James, do you agree to the approach above?

It would be nice to fix virtio block for 3.4, so
how about this:
- I'll just apply the original bugfix patch for 3.4 -
  it only affects virtio
- Ren will repost the refactoring patch on top, and we can
  keep up the discussion

Ren if you agree, can you make this a two patch series please?

 -- 
 tejun
 ___
 Virtualization mailing list
 virtualizat...@lists.linux-foundation.org
 https://lists.linuxfoundation.org/mailman/listinfo/virtualization
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 42636] PCI passthrough does not work with AMD iommu for PCI device

2012-04-04 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=42636


Alexandrov Stanislav n...@nya.ai changed:

   What|Removed |Added

 CC||n...@nya.ai




--- Comment #8 from Alexandrov Stanislav n...@nya.ai  2012-04-04 08:22:51 ---
I have same problems with pci passthrough, while my pci-e video card hd7750
with hda(01:00.0/1) and integrated network card(05:00.0) succsessfilly assigned
in guest, when i try to add usb controller, i get same error about FLR:

Unable to reset PCI device :00:12.0: no FLR, PM reset or bus reset
available

with xen it works fine.

lspci -tv
-[:00]-+-00.0  Advanced Micro Devices [AMD] nee ATI RD890 PCI to PCI bridge
(external gfx0 port B)
   +-00.2  Advanced Micro Devices [AMD] nee ATI RD990 I/O Memory
Management Unit (IOMMU)
   +-02.0-[01]--+-00.0  Advanced Micro Devices [AMD] nee ATI Device
683f
   |\-00.1  Advanced Micro Devices [AMD] nee ATI Device
aab0
   +-09.0-[02]00.0  Etron Technology, Inc. EJ168 USB 3.0 Host
Controller
   +-0b.0-[03]--+-00.0  nVidia Corporation GF110 [GeForce GTX 560 Ti]
   |\-00.1  nVidia Corporation GF110 High Definition Audio
Controller

   +-11.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 SATA
Controller [AHCI mode]
   +-12.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
OHCI0 Controller
   +-12.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
EHCI Controller
   +-13.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
OHCI0 Controller
   +-13.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
EHCI Controller
   +-14.0  Advanced Micro Devices [AMD] nee ATI SBx00 SMBus Controller
   +-14.2  Advanced Micro Devices [AMD] nee ATI SBx00 Azalia (Intel
HDA)
   +-14.3  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 LPC
host controller
   +-14.4-[04]--
   +-14.5  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
OHCI2 Controller
   +-15.0-[05]00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168B
PCI Express Gigabit Ethernet controller
   +-15.1-[06]--
   +-15.2-[07]--
   +-15.3-[08]--
   +-16.0  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
OHCI0 Controller
   +-16.2  Advanced Micro Devices [AMD] nee ATI SB7x0/SB8x0/SB9x0 USB
EHCI Controller
   +-18.0  Advanced Micro Devices [AMD] Family 10h Processor
HyperTransport Configuration
   +-18.1  Advanced Micro Devices [AMD] Family 10h Processor Address
Map
   +-18.2  Advanced Micro Devices [AMD] Family 10h Processor DRAM
Controller
   +-18.3  Advanced Micro Devices [AMD] Family 10h Processor
Miscellaneous Control
   \-18.4  Advanced Micro Devices [AMD] Family 10h Processor Link
Control


qemu-kvm-1.0, linux-3.3
motherboard is GA-990FXA-D3

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Michael S. Tsirkin
On Tue, Apr 03, 2012 at 06:47:49PM +0200, Jan Kiszka wrote:
 On 2012-04-03 18:27, Avi Kivity wrote:
  On 03/29/2012 09:14 PM, Jan Kiszka wrote:
  Currently, MSI messages can only be injected to in-kernel irqchips by
  defining a corresponding IRQ route for each message. This is not only
  unhandy if the MSI messages are generated on the fly by user space,
  IRQ routes are a limited resource that user space has to manage
  carefully.
 
  By providing a direct injection path, we can both avoid using up limited
  resources and simplify the necessary steps for user land.
 
  diff --git a/Documentation/virtual/kvm/api.txt 
  b/Documentation/virtual/kvm/api.txt
  index 81ff39f..ed27d1b 100644
  --- a/Documentation/virtual/kvm/api.txt
  +++ b/Documentation/virtual/kvm/api.txt
  @@ -1482,6 +1482,27 @@ See KVM_ASSIGN_DEV_IRQ for the data structure.  The 
  target device is specified
   by assigned_dev_id.  In the flags field, only KVM_DEV_ASSIGN_MASK_INTX is
   evaluated.
   
  +4.61 KVM_SIGNAL_MSI
  +
  +Capability: KVM_CAP_SIGNAL_MSI
  +Architectures: x86
  +Type: vm ioctl
  +Parameters: struct kvm_msi (in)
  +Returns: 0 on delivery, 0 if guest blocked the MSI, and -1 on error
  +
  +Directly inject a MSI message. Only valid with in-kernel irqchip that 
  handles
  +MSI messages.
  +
  +struct kvm_msi {
  +  __u32 address_lo;
  +  __u32 address_hi;
  +  __u32 data;
  +  __u32 flags;
  +  __u8  pad[16];
  +};
  +
  +No flags are defined so far. The corresponding field must be 0.
 
  
  There are two ways in which this can be generalized:
  
  struct kvm_general_irq {
__u32 type; // line | MSI
__u32 op;  // raise/lower/trigger
union {
   ... line;
   struct kvm_msi msi;
}
  };
  
  so we have a single ioctl for all interrupt handling.  This allows
  eventual removal of the line-oriented ioctls.
  
  The other alternative is to have a dma interface, similar to the kvm_run
  mmio interface but with the kernel acting as destination.  The advantage
  here is that we can handle dma from a device to any kernel-emulated
  device, not just the APIC MSI range.  A downside is that we can't return
  values related to interrupt coalescing.
 
 Due to lacking injection feedback, I'm in favor of option 1. Will have a
 look.
 
  
  A performance note: delivering an interrupt needs to search all vcpus
  for an APIC ID match.  The previous plan was to cache (or pre-calculate)
  this lookup in the irq routing table.  Now it looks like we'll need a
  separate cache for this.
 
 As this is non-existent until today, we don't regress here. And it can
 still be added on top later on, transparently.

I always worry about hash collisions and the cost of
calculating good hash functions.

We could instead return an index in the cache on injection, maintain in
userspace and use it for fast path on the next injection.
Will make it easy to use an array index instead of a hash here,
and fallback to a slower ID lookup on mismatch.

Until we do have this fast path we can just fill this value with zeros,
so kernel patch (almost) does not need to change for this -
just the header.

  
  (yes, I said on the call I don't anticipate objections but preparing to
  apply a patch always triggers more critical thinking)
  
 
 Well, we make progress, though slower than I was hoping. :)
 
 Jan
 
 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/4] uq/master: Basic MSI support for in-kernel irqchip mode

2012-04-04 Thread Michael S. Tsirkin
On Tue, Apr 03, 2012 at 07:27:36PM +0200, Jan Kiszka wrote:
 On 2012-04-03 15:06, Michael S. Tsirkin wrote:
  On Tue, Apr 03, 2012 at 09:23:12AM +0200, Jan Kiszka wrote:
  This is v2 of the RFC, fixing a memory leak in
  kvm_flush_dynamic_msi_routes and adding support for the proposed
  KVM_SIGNAL_MSI IOCTL.
 
  This series depends on kvm: set gsi_bits and max_gsi correctly
  (http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88906).
  
  Looks good to me.
  How hard would it be to add irqfd support?
 
 Shouldn't be, but the changes will be a bit bigger.
 
 I'm thinking about a revamped interface between the MSI core and
 affected devices for a while. Will try to put down in some lines of code
 what I have in mind - once the dynamic MSI injection topic has settled.
 
 Jan

Yes it's not an objection - just a question.

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC][PATCH v2 0/4] uq/master: Basic MSI support for in-kernel irqchip mode

2012-04-04 Thread Michael S. Tsirkin
On Tue, Apr 03, 2012 at 09:17:30PM +0200, Jan Kiszka wrote:
 On 2012-04-03 09:23, Jan Kiszka wrote:
  This is v2 of the RFC, fixing a memory leak in
  kvm_flush_dynamic_msi_routes and adding support for the proposed
  KVM_SIGNAL_MSI IOCTL.
  
  This series depends on kvm: set gsi_bits and max_gsi correctly
  (http://thread.gmane.org/gmane.comp.emulators.kvm.devel/88906).
  
  Jan Kiszka (4):
kvm: Refactor KVMState::max_gsi to gsi_count
kvm: Introduce basic MSI support for in-kernel irqchips
KVM: x86: Wire up MSI support for in-kernel irqchip
kvm: Add support for direct MSI injections
  
   hw/apic.c |3 +
   hw/kvm/apic.c |   33 +-
   hw/pc.c   |5 --
   kvm-all.c |  195 
  +++--
   kvm.h |1 +
   5 files changed, 225 insertions(+), 12 deletions(-)
  
 
 As we obviously agreed on the general direction regarding an MSI
 injection interface, I think patches 1-3 can lose their RFC tags and are
 ready for uq/master (provided there are no further review comments).
 Patch 4 will be reworked once the kernel interface is finalized.
 
 Jan

I agree.
Acked-by: Michael S. Tsirkin m...@redhat.com

 -- 
 Siemens AG, Corporate Technology, CT T DE IT 1
 Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Use the lower four bytes while restoring guest readable SPRGs.

2012-04-04 Thread Varun Sethi
While restoring the hardware copies of guest SPRG4-7 registers we must use the
the lower 4 bytes of the 64 bit sotware copies maintained by KVM.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
 arch/powerpc/kvm/booke_interrupts.S |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index c8c4b87..feda1bb 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -419,13 +419,13 @@ lightweight_exit:
 * written directly to the shared area, so we
 * need to reload them here with the guest's values.
 */
-   lwz r3, VCPU_SHARED_SPRG4(r5)
+   lwz r3, (VCPU_SHARED_SPRG4 + 4)(r5)
mtspr   SPRN_SPRG4W, r3
-   lwz r3, VCPU_SHARED_SPRG5(r5)
+   lwz r3, (VCPU_SHARED_SPRG5 + 4)(r5)
mtspr   SPRN_SPRG5W, r3
-   lwz r3, VCPU_SHARED_SPRG6(r5)
+   lwz r3, (VCPU_SHARED_SPRG6 + 4)(r5)
mtspr   SPRN_SPRG6W, r3
-   lwz r3, VCPU_SHARED_SPRG7(r5)
+   lwz r3, (VCPU_SHARED_SPRG7 + 4)(r5)
mtspr   SPRN_SPRG7W, r3
 
 #ifdef CONFIG_KVM_EXIT_TIMING
-- 
1.7.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Avi Kivity
On 04/04/2012 11:38 AM, Michael S. Tsirkin wrote:
  
   
   A performance note: delivering an interrupt needs to search all vcpus
   for an APIC ID match.  The previous plan was to cache (or pre-calculate)
   this lookup in the irq routing table.  Now it looks like we'll need a
   separate cache for this.
  
  As this is non-existent until today, we don't regress here. And it can
  still be added on top later on, transparently.

 I always worry about hash collisions and the cost of
 calculating good hash functions.

 We could instead return an index in the cache on injection, maintain in
 userspace and use it for fast path on the next injection.

Ahem, that is almost the existing routing table to a T.

 Will make it easy to use an array index instead of a hash here,
 and fallback to a slower ID lookup on mismatch.

Need a free ioctl so we can reuse IDs.

 Until we do have this fast path we can just fill this value with zeros,
 so kernel patch (almost) does not need to change for this -
 just the header.

Partially implemented interfaces invite breakage.


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Avi Kivity
On 04/03/2012 08:24 PM, Jan Kiszka wrote:
  
 
  A performance note: delivering an interrupt needs to search all vcpus
  for an APIC ID match.  The previous plan was to cache (or pre-calculate)
  this lookup in the irq routing table.  Now it looks like we'll need a
  separate cache for this.
 
  As this is non-existent until today, we don't regress here. And it can
  still be added on top later on, transparently.
  
  Yes, it's just a note, not an objection.  The cache lookup will be
  slower than the gsi lookup (hash table vs. array) but still O(1) vs. the
  current O(n).

 If you are concerned about performance in this path, wouldn't a DMA
 interface for MSI injection be counterproductive?

Yes, it would.  The lack of coalescing reporting support is also
problematic.  I just mentioned this idea as food for thought.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Use the lower four bytes while restoring guest readable SPRGs.

2012-04-04 Thread b16395
From: Varun Sethi varun.se...@freescale.com

While restoring the hardware copies of guest SPRG4-7 registers we must use the
the lower 4 bytes of the 64 bit sotware copies maintained by KVM.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
 arch/powerpc/kvm/booke_interrupts.S |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index c8c4b87..feda1bb 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -419,13 +419,13 @@ lightweight_exit:
 * written directly to the shared area, so we
 * need to reload them here with the guest's values.
 */
-   lwz r3, VCPU_SHARED_SPRG4(r5)
+   lwz r3, (VCPU_SHARED_SPRG4 + 4)(r5)
mtspr   SPRN_SPRG4W, r3
-   lwz r3, VCPU_SHARED_SPRG5(r5)
+   lwz r3, (VCPU_SHARED_SPRG5 + 4)(r5)
mtspr   SPRN_SPRG5W, r3
-   lwz r3, VCPU_SHARED_SPRG6(r5)
+   lwz r3, (VCPU_SHARED_SPRG6 + 4)(r5)
mtspr   SPRN_SPRG6W, r3
-   lwz r3, VCPU_SHARED_SPRG7(r5)
+   lwz r3, (VCPU_SHARED_SPRG7 + 4)(r5)
mtspr   SPRN_SPRG7W, r3
 
 #ifdef CONFIG_KVM_EXIT_TIMING
-- 
1.7.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm-autotest: Allow migration of multiple machines.

2012-04-04 Thread Jiří Župka
Change migration_multi_host test to migrate all vms on same time.

https://github.com/autotest/autotest/pull/270

Signed-off-by: Jiří Župka jzu...@redhat.com
---
 client/tests/kvm/tests/migration_multi_host.py |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/client/tests/kvm/tests/migration_multi_host.py 
b/client/tests/kvm/tests/migration_multi_host.py
index fe6e29a..2dcb098 100644
--- a/client/tests/kvm/tests/migration_multi_host.py
+++ b/client/tests/kvm/tests/migration_multi_host.py
@@ -19,8 +19,9 @@ def run_migration_multi_host(test, params, env):
 def migration_scenario(self):
 srchost = self.params.get(hosts)[0]
 dsthost = self.params.get(hosts)[1]
+vms = params.get(vms).split()
 
-self.migrate_wait([vm1], srchost, dsthost)
+self.migrate_wait(vms, srchost, dsthost)
 
 mig = TestMultihostMigration(test, params, env)
 mig.run()
-- 
1.7.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Michael S. Tsirkin
On Wed, Apr 04, 2012 at 11:44:23AM +0300, Avi Kivity wrote:
 On 04/04/2012 11:38 AM, Michael S. Tsirkin wrote:
   

A performance note: delivering an interrupt needs to search all vcpus
for an APIC ID match.  The previous plan was to cache (or pre-calculate)
this lookup in the irq routing table.  Now it looks like we'll need a
separate cache for this.
   
   As this is non-existent until today, we don't regress here. And it can
   still be added on top later on, transparently.
 
  I always worry about hash collisions and the cost of
  calculating good hash functions.
 
  We could instead return an index in the cache on injection, maintain in
  userspace and use it for fast path on the next injection.
 
 Ahem, that is almost the existing routing table to a T.
 
  Will make it easy to use an array index instead of a hash here,
  and fallback to a slower ID lookup on mismatch.
 
 Need a free ioctl so we can reuse IDs.

No, it could be kernel controlled not userspace controlled. We get both
and address and an index:

if (table[u.i].addr == u.addr  table[u.i].data == u.data) {
return table[u.i].id;
}

u.i = find_lru_idx(table);
table[u.i].addr = u.addr;
table[u.i].data = u.data;
table[u.i].id = find_id(u.addr, u.data);
return table[u.i].id;


  Until we do have this fast path we can just fill this value with zeros,
  so kernel patch (almost) does not need to change for this -
  just the header.
 
 Partially implemented interfaces invite breakage.

Hmm true. OK scrap this idea then, it's not clear
whether we are going to optimize this anyway.

 
 -- 
 I have a truly marvellous patch that fixes the bug which this
 signature is too narrow to contain.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH RFC] virtio-net: remove useless disable on freeze

2012-04-04 Thread Michael S. Tsirkin
disable_cb is just an optimization: it
can not guarantee that there are no callbacks.

I didn't yet figure out whether a callback
in freeze will trigger a bug, but disable_cb
won't address it in any case. So let's remove
the useless calls as a first step.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 drivers/net/virtio_net.c |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
index 019da01..971931e5 100644
--- a/drivers/net/virtio_net.c
+++ b/drivers/net/virtio_net.c
@@ -1182,11 +1182,6 @@ static int virtnet_freeze(struct virtio_device *vdev)
 {
struct virtnet_info *vi = vdev-priv;
 
-   virtqueue_disable_cb(vi-rvq);
-   virtqueue_disable_cb(vi-svq);
-   if (virtio_has_feature(vi-vdev, VIRTIO_NET_F_CTRL_VQ))
-   virtqueue_disable_cb(vi-cvq);
-
netif_device_detach(vi-dev);
cancel_delayed_work_sync(vi-refill);
 
-- 
1.7.9.111.gf3fb0
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Jan Kiszka
On 2012-04-04 10:53, Michael S. Tsirkin wrote:
 On Wed, Apr 04, 2012 at 11:44:23AM +0300, Avi Kivity wrote:
 On 04/04/2012 11:38 AM, Michael S. Tsirkin wrote:


 A performance note: delivering an interrupt needs to search all vcpus
 for an APIC ID match.  The previous plan was to cache (or pre-calculate)
 this lookup in the irq routing table.  Now it looks like we'll need a
 separate cache for this.

 As this is non-existent until today, we don't regress here. And it can
 still be added on top later on, transparently.

 I always worry about hash collisions and the cost of
 calculating good hash functions.

 We could instead return an index in the cache on injection, maintain in
 userspace and use it for fast path on the next injection.

 Ahem, that is almost the existing routing table to a T.

 Will make it easy to use an array index instead of a hash here,
 and fallback to a slower ID lookup on mismatch.

 Need a free ioctl so we can reuse IDs.
 
 No, it could be kernel controlled not userspace controlled. We get both
 and address and an index:
 
 if (table[u.i].addr == u.addr  table[u.i].data == u.data) {
   return table[u.i].id;
 }
 
 u.i = find_lru_idx(table);
 table[u.i].addr = u.addr;
 table[u.i].data = u.data;
 table[u.i].id = find_id(u.addr, u.data);
 return table[u.i].id;
 
 
 Until we do have this fast path we can just fill this value with zeros,
 so kernel patch (almost) does not need to change for this -
 just the header.

 Partially implemented interfaces invite breakage.
 
 Hmm true. OK scrap this idea then, it's not clear
 whether we are going to optimize this anyway.
 

Also, the problem is that keeping that ID in userspace requires an
infrastructure like the MSIRoutingCache that I proposed originally. Not
much won /wrt invasiveness there. So we should really do the routing
optimization in the kernel - one day.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)

2012-04-04 Thread Michael S. Tsirkin
On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote:
 Michael S. Tsirkin noticed that we could run the refill work after
 ndo_close, which can re-enable napi - we don't disable it until
 virtnet_remove.  This is clearly wrong, so move the workqueue control
 to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).
 
 One subtle point: virtnet_probe() could simply fail if it couldn't
 allocate a receive buffer, but that's less polite in virtnet_open() so
 we schedule a refill as we do in the normal receive path if we run out
 of memory.
 
 Signed-off-by: Rusty Russell ru...@rustcorp.com.au

Doh.
napi_disable does not prevent the following
napi_schedule, does it?

Can someone confirm that I am not seeing things please?

And this means this hack does not work:
try_fill_recv can still run in parallel with
napi, corrupting the vq.

I suspect we need to resurrect a patch that used a
dedicated flag to avoid this race.

Comments?

 ---
  drivers/net/virtio_net.c |   17 +
  1 file changed, 13 insertions(+), 4 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct 
   return err;
  }
  
 -/* Returns false if we couldn't fill entirely (OOM). */
 +/*
 + * Returns false if we couldn't fill entirely (OOM).
 + *
 + * Normally run in the receive path, but can also be run from ndo_open
 + * before we're receiving packets, or from refill_work which is
 + * careful to disable receiving (using napi_disable).
 + */
  static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
  {
   int err;
 @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic
  {
   struct virtnet_info *vi = netdev_priv(dev);
  
 + /* Make sure we have some buffers: if oom use wq. */
 + if (!try_fill_recv(vi, GFP_KERNEL))
 + schedule_delayed_work(vi-refill, 0);
 +
   virtnet_napi_enable(vi);
   return 0;
  }
 @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi
  {
   struct virtnet_info *vi = netdev_priv(dev);
  
 + /* Make sure refill_work doesn't re-enable napi! */
 + cancel_delayed_work_sync(vi-refill);
   napi_disable(vi-napi);
  
   return 0;
 @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d
  
  unregister:
   unregister_netdev(dev);
 - cancel_delayed_work_sync(vi-refill);
  free_vqs:
   vdev-config-del_vqs(vdev);
  free_stats:
 @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str
   /* Stop all the virtqueues. */
   vdev-config-reset(vdev);
  
 -
   unregister_netdev(vi-dev);
 - cancel_delayed_work_sync(vi-refill);
  
   /* Free unused buffers in both send and recv, if any. */
   free_unused_bufs(vi);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Avi Kivity
On 04/04/2012 12:22 PM, Jan Kiszka wrote:
  
  Until we do have this fast path we can just fill this value with zeros,
  so kernel patch (almost) does not need to change for this -
  just the header.
 
  Partially implemented interfaces invite breakage.
  
  Hmm true. OK scrap this idea then, it's not clear
  whether we are going to optimize this anyway.
  

 Also, the problem is that keeping that ID in userspace requires an
 infrastructure like the MSIRoutingCache that I proposed originally. Not
 much won /wrt invasiveness there. 

Internal qemu refactorings are not a driver for kvm interface changes.

 So we should really do the routing
 optimization in the kernel - one day.

No, we need to make a choice:

explicit handles: array lookup, more expensive setup
no handles: hash loopup, more expensive, but no setup, and no artificial
limits

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Jan Kiszka
On 2012-04-04 11:36, Avi Kivity wrote:
 On 04/04/2012 12:22 PM, Jan Kiszka wrote:

 Until we do have this fast path we can just fill this value with zeros,
 so kernel patch (almost) does not need to change for this -
 just the header.

 Partially implemented interfaces invite breakage.

 Hmm true. OK scrap this idea then, it's not clear
 whether we are going to optimize this anyway.


 Also, the problem is that keeping that ID in userspace requires an
 infrastructure like the MSIRoutingCache that I proposed originally. Not
 much won /wrt invasiveness there. 
 
 Internal qemu refactorings are not a driver for kvm interface changes.

No, but qemu demonstrates the applicability and handiness of the kernel
interfaces.

 
 So we should really do the routing
 optimization in the kernel - one day.
 
 No, we need to make a choice:
 
 explicit handles: array lookup, more expensive setup
 no handles: hash loopup, more expensive, but no setup, and no artificial
 limits

...and I think we should head for option 2.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: question about napi_disable (was Re: [PATCH] virtio_net: set/cancel work on ndo_open/ndo_stop)

2012-04-04 Thread Michael S. Tsirkin
On Wed, Apr 04, 2012 at 12:32:29PM +0300, Michael S. Tsirkin wrote:
 On Thu, Dec 29, 2011 at 09:12:38PM +1030, Rusty Russell wrote:
  Michael S. Tsirkin noticed that we could run the refill work after
  ndo_close, which can re-enable napi - we don't disable it until
  virtnet_remove.  This is clearly wrong, so move the workqueue control
  to ndo_open and ndo_stop (aka. virtnet_open and virtnet_close).
  
  One subtle point: virtnet_probe() could simply fail if it couldn't
  allocate a receive buffer, but that's less polite in virtnet_open() so
  we schedule a refill as we do in the normal receive path if we run out
  of memory.
  
  Signed-off-by: Rusty Russell ru...@rustcorp.com.au
 
 Doh.
 napi_disable does not prevent the following
 napi_schedule, does it?
 
 Can someone confirm that I am not seeing things please?

Yes, I *was* seeing things. After napi_disable, NAPI_STATE_SCHED
is set to napi_schedule does nothing.
Sorry about the noise.

 And this means this hack does not work:
 try_fill_recv can still run in parallel with
 napi, corrupting the vq.
 
 I suspect we need to resurrect a patch that used a
 dedicated flag to avoid this race.
 
 Comments?
 
  ---
   drivers/net/virtio_net.c |   17 +
   1 file changed, 13 insertions(+), 4 deletions(-)
  
  diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
  --- a/drivers/net/virtio_net.c
  +++ b/drivers/net/virtio_net.c
  @@ -439,7 +439,13 @@ static int add_recvbuf_mergeable(struct 
  return err;
   }
   
  -/* Returns false if we couldn't fill entirely (OOM). */
  +/*
  + * Returns false if we couldn't fill entirely (OOM).
  + *
  + * Normally run in the receive path, but can also be run from ndo_open
  + * before we're receiving packets, or from refill_work which is
  + * careful to disable receiving (using napi_disable).
  + */
   static bool try_fill_recv(struct virtnet_info *vi, gfp_t gfp)
   {
  int err;
  @@ -719,6 +725,10 @@ static int virtnet_open(struct net_devic
   {
  struct virtnet_info *vi = netdev_priv(dev);
   
  +   /* Make sure we have some buffers: if oom use wq. */
  +   if (!try_fill_recv(vi, GFP_KERNEL))
  +   schedule_delayed_work(vi-refill, 0);
  +
  virtnet_napi_enable(vi);
  return 0;
   }
  @@ -772,6 +782,8 @@ static int virtnet_close(struct net_devi
   {
  struct virtnet_info *vi = netdev_priv(dev);
   
  +   /* Make sure refill_work doesn't re-enable napi! */
  +   cancel_delayed_work_sync(vi-refill);
  napi_disable(vi-napi);
   
  return 0;
  @@ -1082,7 +1094,6 @@ static int virtnet_probe(struct virtio_d
   
   unregister:
  unregister_netdev(dev);
  -   cancel_delayed_work_sync(vi-refill);
   free_vqs:
  vdev-config-del_vqs(vdev);
   free_stats:
  @@ -1121,9 +1132,7 @@ static void __devexit virtnet_remove(str
  /* Stop all the virtqueues. */
  vdev-config-reset(vdev);
   
  -
  unregister_netdev(vi-dev);
  -   cancel_delayed_work_sync(vi-refill);
   
  /* Free unused buffers in both send and recv, if any. */
  free_unused_bufs(vi);
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Avi Kivity
On 04/04/2012 12:38 PM, Jan Kiszka wrote:
 On 2012-04-04 11:36, Avi Kivity wrote:
  On 04/04/2012 12:22 PM, Jan Kiszka wrote:
 
  Until we do have this fast path we can just fill this value with zeros,
  so kernel patch (almost) does not need to change for this -
  just the header.
 
  Partially implemented interfaces invite breakage.
 
  Hmm true. OK scrap this idea then, it's not clear
  whether we are going to optimize this anyway.
 
 
  Also, the problem is that keeping that ID in userspace requires an
  infrastructure like the MSIRoutingCache that I proposed originally. Not
  much won /wrt invasiveness there. 
  
  Internal qemu refactorings are not a driver for kvm interface changes.

 No, but qemu demonstrates the applicability and handiness of the kernel
 interfaces.

True.

  
  So we should really do the routing
  optimization in the kernel - one day.
  
  No, we need to make a choice:
  
  explicit handles: array lookup, more expensive setup
  no handles: hash loopup, more expensive, but no setup, and no artificial
  limits

 ...and I think we should head for option 2.

I'm not so sure anymore.  Sorry about the U turn, but remind me why?  In
the long term it will be slower.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questing regarding KVM Guest PMU

2012-04-04 Thread shashank rachamalla
On Wed, Apr 4, 2012 at 12:34 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote:
 On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla
 shashank.rachama...@gmail.com wrote:
  On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote:
  On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote:
  On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote:
   On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote:
   On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com 
   wrote:
On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla wrote:
 I guess things are working fine with perf. But why not with 
 oprofile ?

 Looks like it. I never tried oprofile. Will try to reproduce your
 problem and see what oprofile is doing.
   
I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest and
oprofile 0.9.6.
Also, I have tried to capture kvm-events ( perf patch ) in host 
while
running oprofile and perf in guest.
Please see the attachment. I have run the tests in three cases for 
the
around 5 secs.
   
There are more number of MSR reads and writes in case of perf 
which I
think is normal. However, there are very few MSR reads and writes 
with
oprofile. Also, the number of NMI exceptions are too high in case 
of
oprofile.
   
Which host kernel are you using? Try latest kvm.git and check if 
you see
something unusual in dmesg.
  
   Currenly running 3.3.0-rc5. will try with the latest source from kvm
   git and let you know.
  
  
   Thanks, there were some fixes that didn't make it into 3.3. rdpmc
   instruction emulation fix is one of them. If oprofile uses it this can
   explain the problem.
  
  I have tried with latest kvm source from git and also with 3.0 guest
  kernel but oprofile fails to collect any samples on guest. I am using
  a core2duo processor which is considered by oprofile as pentium pro
  model.
 
  core2duo on the host or the guest? What is your qemu command line?
 
  both. qemu command line below.
  sudo /usr/local/bin/qemu-system-x86_64 -drive
  file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net
  user
 

 please find more info ( /proc/cpuinfo and uname of both host and guest
 ) in attached files.

 oprofile does not work for me even on the host. After trying to use it I can
 see why perf was written in the first place.

ok. seems to be. will move over to perf as its working fine inside guest.

 --
                        Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questing regarding KVM Guest PMU

2012-04-04 Thread Gleb Natapov
On Wed, Apr 04, 2012 at 03:49:42PM +0530, shashank rachamalla wrote:
 tatus: RO
 Content-Length: 2989
 Lines: 79
 
 On Wed, Apr 4, 2012 at 12:34 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote:
  On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla
  shashank.rachama...@gmail.com wrote:
   On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote:
   On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote:
   On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com wrote:
On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote:
On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com 
wrote:
 On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla 
 wrote:
  I guess things are working fine with perf. But why not with 
  oprofile ?
 
  Looks like it. I never tried oprofile. Will try to reproduce 
  your
  problem and see what oprofile is doing.

 I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest 
 and
 oprofile 0.9.6.
 Also, I have tried to capture kvm-events ( perf patch ) in host 
 while
 running oprofile and perf in guest.
 Please see the attachment. I have run the tests in three cases 
 for the
 around 5 secs.

 There are more number of MSR reads and writes in case of perf 
 which I
 think is normal. However, there are very few MSR reads and 
 writes with
 oprofile. Also, the number of NMI exceptions are too high in 
 case of
 oprofile.

 Which host kernel are you using? Try latest kvm.git and check if 
 you see
 something unusual in dmesg.
   
Currenly running 3.3.0-rc5. will try with the latest source from kvm
git and let you know.
   
   
Thanks, there were some fixes that didn't make it into 3.3. rdpmc
instruction emulation fix is one of them. If oprofile uses it this 
can
explain the problem.
   
   I have tried with latest kvm source from git and also with 3.0 guest
   kernel but oprofile fails to collect any samples on guest. I am using
   a core2duo processor which is considered by oprofile as pentium pro
   model.
  
   core2duo on the host or the guest? What is your qemu command line?
  
   both. qemu command line below.
   sudo /usr/local/bin/qemu-system-x86_64 -drive
   file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net
   user
  
 
  please find more info ( /proc/cpuinfo and uname of both host and guest
  ) in attached files.
 
  oprofile does not work for me even on the host. After trying to use it I can
  see why perf was written in the first place.
 
 ok. seems to be. will move over to perf as its working fine inside guest.
 
Good riddance IMO. I managed to run it on a guest (but not on my
host!). The thing is buggy. It does not use global ctrl MSR to enable
counters and kvm has all of them disabled by default. I didn't find what
value this MSR should have after reset, so this may be either kvm bug or
real BIOSes enable all counters in global ctrl MSR for PMUv1
compatibility. Doing wrmsr 0x38f 0x7000f solves this problem. The
second problem is that oprofile reprogram PMU counters without
disabling them first and this is explicitly prohibited by Intel SDM.
The patch below solve that, but oprofile is the one who should be fixed.

diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
index a73f0c1..be05028 100644
--- a/arch/x86/kvm/pmu.c
+++ b/arch/x86/kvm/pmu.c
@@ -396,6 +396,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 
data)
(pmc = get_fixed_pmc(pmu, index))) {
data = (s64)(s32)data;
pmc-counter += data - read_pmc(pmc);
+   reprogram_gp_counter(pmc, pmc-eventsel);
return 0;
} else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
if (data == pmc-eventsel)

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: bookehv: Fix save/restore of guest accessible SPRGs.

2012-04-04 Thread Varun Sethi
For Guest accessible SPRGs 4-7, save/restore must be handled differently for 
64bit and
non-64 bit case. The registers are maintained as 64 bit copies by KVM. While 
saving/restoring
for the non-64 bit case we should always take the lower 4 bytes.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
 arch/powerpc/kvm/bookehv_interrupts.S |   48 +++-
 1 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 909e96e..c1c0bae 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -320,13 +320,29 @@ _GLOBAL(kvmppc_resume_host)
PPC_STL r5, VCPU_LR(r4)
mfspr   r7, SPRN_SPRG5
PPC_STL r3, VCPU_VRSAVE(r4)
-   PPC_STL r6, VCPU_SHARED_SPRG4(r11)
+#ifdef CONFIG_64BIT
+   std r6, VCPU_SHARED_SPRG4(r11)
+#else
+   stw r6, (VCPU_SHARED_SPRG4 + 4)(r11)
+#endif
mfspr   r8, SPRN_SPRG6
-   PPC_STL r7, VCPU_SHARED_SPRG5(r11)
+#ifdef CONFIG_64BIT
+   std r7, VCPU_SHARED_SPRG5(r11)
+#else
+   stw r7, (VCPU_SHARED_SPRG5 + 4)(r11)
+#endif
mfspr   r9, SPRN_SPRG7
-   PPC_STL r8, VCPU_SHARED_SPRG6(r11)
+#ifdef CONFIG_64BIT
+   std r8, VCPU_SHARED_SPRG6(r11)
+#else
+   stw r8, (VCPU_SHARED_SPRG6 + 4)(r11)
+#endif
mfxer   r3
-   PPC_STL r9, VCPU_SHARED_SPRG7(r11)
+#ifdef CONFIG_64BIT
+   std r9, VCPU_SHARED_SPRG7(r11)
+#else
+   stw r9, (VCPU_SHARED_SPRG7 + 4)(r11)
+#endif
 
/* save guest MAS registers and restore host mas4  mas6 */
mfspr   r5, SPRN_MAS0
@@ -549,13 +565,29 @@ lightweight_exit:
 * SPRGs, so we need to reload them here with the guest's values.
 */
lwz r3, VCPU_VRSAVE(r4)
-   lwz r5, VCPU_SHARED_SPRG4(r11)
+#ifdef CONFIG_64BIT
+   ld  r5, VCPU_SHARED_SPRG4(r11)
+#else
+   lwz r5, (VCPU_SHARED_SPRG4 + 4)(r11)
+#endif
mtspr   SPRN_VRSAVE, r3
-   lwz r6, VCPU_SHARED_SPRG5(r11)
+#ifdef CONFIG_64BIT
+   ld  r6, VCPU_SHARED_SPRG5(r11)
+#else
+   lwz r6, (VCPU_SHARED_SPRG5 + 4)(r11)
+#endif
mtspr   SPRN_SPRG4W, r5
-   lwz r7, VCPU_SHARED_SPRG6(r11)
+#ifdef CONFIG_64BIT
+   ld  r7, VCPU_SHARED_SPRG6(r11)
+#else
+   lwz r7, (VCPU_SHARED_SPRG6 + 4)(r11)
+#endif
mtspr   SPRN_SPRG5W, r6
-   lwz r8, VCPU_SHARED_SPRG7(r11)
+#ifdef CONFIG_64BIT
+   ld  r8, VCPU_SHARED_SPRG7(r11)
+#else
+   lwz r8, (VCPU_SHARED_SPRG7 + 4)(r11)
+#endif
mtspr   SPRN_SPRG6W, r7
mtspr   SPRN_SPRG7W, r8
 
-- 
1.7.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Jan Kiszka
On 2012-04-04 11:55, Avi Kivity wrote:
 On 04/04/2012 12:38 PM, Jan Kiszka wrote:
 On 2012-04-04 11:36, Avi Kivity wrote:
 On 04/04/2012 12:22 PM, Jan Kiszka wrote:

 Until we do have this fast path we can just fill this value with zeros,
 so kernel patch (almost) does not need to change for this -
 just the header.

 Partially implemented interfaces invite breakage.

 Hmm true. OK scrap this idea then, it's not clear
 whether we are going to optimize this anyway.


 Also, the problem is that keeping that ID in userspace requires an
 infrastructure like the MSIRoutingCache that I proposed originally. Not
 much won /wrt invasiveness there. 

 Internal qemu refactorings are not a driver for kvm interface changes.

 No, but qemu demonstrates the applicability and handiness of the kernel
 interfaces.
 
 True.
 

 So we should really do the routing
 optimization in the kernel - one day.

 No, we need to make a choice:

 explicit handles: array lookup, more expensive setup
 no handles: hash loopup, more expensive, but no setup, and no artificial
 limits

 ...and I think we should head for option 2.
 
 I'm not so sure anymore.  Sorry about the U turn, but remind me why?  In
 the long term it will be slower.

Likely not measurably slower. If you look at a message through the arch
glasses, you can usually spot the destination directly, specifically if
a message targets a single processor - no need for hashing and table
lookups in the common case.

In contrast, the maintenance costs for the current explicit route based
model are significant as we see now.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Dor Laor

On 04/04/2012 01:37 PM, Michael Roth wrote:


On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com
mailto:pbonz...@redhat.com wrote:
 
  Il 04/04/2012 03:18, Michael Roth ha scritto:
   Attacking the IDL/schema side first is the more rationale approach.
From
   there we can potentially generate ASN.1 BER/DER visitors for the
protocol
   side, or potentially even just vmstate bindings as a start. I've
recently
   started looking into the latter... it's completely feasible, the only
   downside is it complicates the IDL due requiring support for a lot of
   what are very much vmstate-specific items, but it should be possible to
   do this in a manner where those annotations are self-contained and
   ignorable if we opted to replace vmstate-style declarations.
 
  We can also keep the current vmstate descriptions, but access fields
  from the automatically-generated visitors instead of struct fields.
  This keeps the IDL simple.

It may be worthwhile as an incremental step though, one nice thing about
automatically generated bindings is that with the QIDL Anthony
prototyped a while back we assume we serialize by default, so changes in
annotated structs automatically trigger changes in the generated
bindings unless you explicitly mark fields as immutable/derivable/etc,
which we can tie into the build or make check to automatically detect
and bring attention to changes in vmstate. This may be worth the effort
if we adopt the proposed 4 year migration support cycle for pc-1.0,
since that'll continue to rely on vmstate even after we move on to an
IDL and newer protocol.


Beyond ASL/IDL I like to be sure that we're not just translating one 
format to other representation but instead we introduce some new 
functionality like:

 - Ability to negotiate the protocol version
 - Bi-direction data exchange, the sender will send data as a function
   of the target release
   - Include the machine type too
 - Synchronize the entire qemu cmdline and don't relay on management to
   set it up.
   - Along the way, deal w/ hotplug events.



 
  Paolo
 



--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Anthony Liguori

On 04/03/2012 03:43 PM, Dor Laor wrote:

On 04/03/2012 05:43 PM, Markus Armbruster wrote:

I'm afraid my notes are rather rough...

* 1.1
soft freeze apr 15th (less than two weeks)
hard freeze may 1
three months cycle for 1.2
stable machine types only every few releases? pc-next

* Maintainers, got distracted and my notes make no sense, sorry

* MSI injection to KVM irqchips from userspace devices models

* qemu-kvm tree: working towards upstream merge

not much left, mostly device assignment

* Migration: vmstate and visitors, decoupling the wire format
why not ASN.1


Curiosity kills me of waiting for next week's meeting to get the answer


ASN.1 is an IDL format.  It's encoded in many ways including BER.  I think 
there's wide spread agreement that the next migration wire format should be 
encoded with BER which means it could be described via ASN.1 but I don't think 
we intend on using ASN.1 within the code base.


I don't think using ASN.1 to describe devices makes sense.  There really aren't 
very good Open Source ASN.1 compilers.  I also don't think the syntax is 
flexible enough to fully describe a device either.


Regards,

Anthony Liguori





* qtest: test cases wanted






--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Avi Kivity
On 04/04/2012 01:48 PM, Jan Kiszka wrote:
  
  I'm not so sure anymore.  Sorry about the U turn, but remind me why?  In
  the long term it will be slower.

 Likely not measurably slower. If you look at a message through the arch
 glasses, you can usually spot the destination directly, specifically if
 a message targets a single processor - no need for hashing and table
 lookups in the common case.

Not on x86.  The APIC ID is guest-provided.  In x2apic mode it can be
quite large.

 In contrast, the maintenance costs for the current explicit route based
 model are significant as we see now.


You mean in amount of code in userspace?  That doesn't get solved since
we need to keep compatibility.

-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Anthony Liguori

On 04/04/2012 05:53 AM, Dor Laor wrote:

On 04/04/2012 01:37 PM, Michael Roth wrote:


On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com
mailto:pbonz...@redhat.com wrote:

 Il 04/04/2012 03:18, Michael Roth ha scritto:
  Attacking the IDL/schema side first is the more rationale approach.
From
  there we can potentially generate ASN.1 BER/DER visitors for the
protocol
  side, or potentially even just vmstate bindings as a start. I've
recently
  started looking into the latter... it's completely feasible, the only
  downside is it complicates the IDL due requiring support for a lot of
  what are very much vmstate-specific items, but it should be possible to
  do this in a manner where those annotations are self-contained and
  ignorable if we opted to replace vmstate-style declarations.

 We can also keep the current vmstate descriptions, but access fields
 from the automatically-generated visitors instead of struct fields.
 This keeps the IDL simple.

It may be worthwhile as an incremental step though, one nice thing about
automatically generated bindings is that with the QIDL Anthony
prototyped a while back we assume we serialize by default, so changes in
annotated structs automatically trigger changes in the generated
bindings unless you explicitly mark fields as immutable/derivable/etc,
which we can tie into the build or make check to automatically detect
and bring attention to changes in vmstate. This may be worth the effort
if we adopt the proposed 4 year migration support cycle for pc-1.0,
since that'll continue to rely on vmstate even after we move on to an
IDL and newer protocol.


Beyond ASL/IDL I like to be sure that we're not just translating one format to
other representation but instead we introduce some new functionality like:
- Ability to negotiate the protocol version


Ack.


- Bi-direction data exchange, the sender will send data as a function
of the target release


The reason bi-direction data exchange doesn't exist is because it would add 
latency to the critical path.  I think we should avoid bi-directional data 
exchange unless there's an extremely compelling reason to do so.



- Include the machine type too
- Synchronize the entire qemu cmdline and don't relay on management to
set it up.
- Along the way, deal w/ hotplug events.


This will be address via QOM.  As we convert backends and machine types, we 
should be able to dump out the full configuration and send it over the wire.


Regards,

Anthony Liguori






 Paolo







--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v4] KVM: Introduce direct MSI message injection for in-kernel irqchips

2012-04-04 Thread Jan Kiszka
On 2012-04-04 13:50, Avi Kivity wrote:
 On 04/04/2012 01:48 PM, Jan Kiszka wrote:

 I'm not so sure anymore.  Sorry about the U turn, but remind me why?  In
 the long term it will be slower.

 Likely not measurably slower. If you look at a message through the arch
 glasses, you can usually spot the destination directly, specifically if
 a message targets a single processor - no need for hashing and table
 lookups in the common case.
 
 Not on x86.  The APIC ID is guest-provided.

...but is still a rather stable mapping on the physical ID.

  In x2apic mode it can be
 quite large.

Yes, but then you can at least hash/search/cache inside that group only,
with a smaller scope.

 
 In contrast, the maintenance costs for the current explicit route based
 model are significant as we see now.

 
 You mean in amount of code in userspace?  That doesn't get solved since
 we need to keep compatibility.

We do not need to track MSI origins to correlate them with routes (with
the exception of 3 special devices: vhost-based virtio, kvm device
assignment, and vfio device assignment). We emulate this centrally with
a hand full of LOC in the kvm layer, and we bypass it with the advent of
a direct injection API. Compare this to my original series that
introduced MSIRoutingCaches to cope with the current kernel API.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Dor Laor

On 04/04/2012 02:52 PM, Anthony Liguori wrote:

On 04/04/2012 05:53 AM, Dor Laor wrote:

On 04/04/2012 01:37 PM, Michael Roth wrote:


On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com
mailto:pbonz...@redhat.com wrote:

 Il 04/04/2012 03:18, Michael Roth ha scritto:
  Attacking the IDL/schema side first is the more rationale approach.
From
  there we can potentially generate ASN.1 BER/DER visitors for the
protocol
  side, or potentially even just vmstate bindings as a start. I've
recently
  started looking into the latter... it's completely feasible, the
only
  downside is it complicates the IDL due requiring support for a
lot of
  what are very much vmstate-specific items, but it should be
possible to
  do this in a manner where those annotations are self-contained and
  ignorable if we opted to replace vmstate-style declarations.

 We can also keep the current vmstate descriptions, but access fields
 from the automatically-generated visitors instead of struct fields.
 This keeps the IDL simple.

It may be worthwhile as an incremental step though, one nice thing about
automatically generated bindings is that with the QIDL Anthony
prototyped a while back we assume we serialize by default, so changes in
annotated structs automatically trigger changes in the generated
bindings unless you explicitly mark fields as immutable/derivable/etc,
which we can tie into the build or make check to automatically detect
and bring attention to changes in vmstate. This may be worth the effort
if we adopt the proposed 4 year migration support cycle for pc-1.0,
since that'll continue to rely on vmstate even after we move on to an
IDL and newer protocol.


Beyond ASL/IDL I like to be sure that we're not just translating one
format to
other representation but instead we introduce some new functionality
like:
- Ability to negotiate the protocol version


Ack.


- Bi-direction data exchange, the sender will send data as a function
of the target release


The reason bi-direction data exchange doesn't exist is because it would
add latency to the critical path. I think we should avoid bi-directional


Not necessarily, there is not need to do the exchange on the down time, 
you can do it ahead of time during the initial connection and few 
additional msec or even a second won't change much.



data exchange unless there's an extremely compelling reason to do so.


The key advantage is that you'll be able to migrate to an old qemu that 
may not be compatible w/ the standard protocol and the source will be 
able to discover this and adjust.
At the moment I don't have anything more concrete than that but I think 
that's happen in the past and will continue to happen and we can add the 
required hook into the protocol.





- Include the machine type too
- Synchronize the entire qemu cmdline and don't relay on management to
set it up.
- Along the way, deal w/ hotplug events.


This will be address via QOM. As we convert backends and machine types,
we should be able to dump out the full configuration and send it over
the wire.

Regards,

Anthony Liguori






 Paolo










--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Michael Roth
On Wed, Apr 04, 2012 at 01:53:34PM +0300, Dor Laor wrote:
 On 04/04/2012 01:37 PM, Michael Roth wrote:
 
 On Apr 4, 2012 2:42 AM, Paolo Bonzini pbonz...@redhat.com
 mailto:pbonz...@redhat.com wrote:
  
   Il 04/04/2012 03:18, Michael Roth ha scritto:
Attacking the IDL/schema side first is the more rationale approach.
 From
there we can potentially generate ASN.1 BER/DER visitors for the
 protocol
side, or potentially even just vmstate bindings as a start. I've
 recently
started looking into the latter... it's completely feasible, the only
downside is it complicates the IDL due requiring support for a lot of
what are very much vmstate-specific items, but it should be possible to
do this in a manner where those annotations are self-contained and
ignorable if we opted to replace vmstate-style declarations.
  
   We can also keep the current vmstate descriptions, but access fields
   from the automatically-generated visitors instead of struct fields.
   This keeps the IDL simple.
 
 It may be worthwhile as an incremental step though, one nice thing about
 automatically generated bindings is that with the QIDL Anthony
 prototyped a while back we assume we serialize by default, so changes in
 annotated structs automatically trigger changes in the generated
 bindings unless you explicitly mark fields as immutable/derivable/etc,
 which we can tie into the build or make check to automatically detect
 and bring attention to changes in vmstate. This may be worth the effort
 if we adopt the proposed 4 year migration support cycle for pc-1.0,
 since that'll continue to rely on vmstate even after we move on to an
 IDL and newer protocol.
 
 Beyond ASL/IDL I like to be sure that we're not just translating one
 format to other representation but instead we introduce some new
 functionality like:
  - Ability to negotiate the protocol version
  - Bi-direction data exchange, the sender will send data as a function
of the target release
- Include the machine type too

I've been toying with the notion of having the target start up a QMP
limited server that the source talks to to orchestrate negotiation. We
could potentially even send the device state by taking our QIDL-generated
visitors and serializing state via QmpOutputVisitor. QMP can be made
aware of the format of the device state input by taking the intermediate
step of generating QAPI schemas via QIDL, and using the QAPI code
generators to generate the visitors rather than QIDL directly. This
would also address the protocol side: just use QMP rather than ASN.1..

It's not as compact, but device state is such a small amount of data
compared to memory/disk that I don't think it's worth optimizing that
aspect, though we could use compression at the protocol layer if we
were inclined. Anything more suited to an out-of-band protocol, like
memory/disk, could be orchestrated via this interface... source can ask
target for a port to handle such things, negotiate stuff like XBZRLE,
etc.

  - Synchronize the entire qemu cmdline and don't relay on management to
set it up.
- Along the way, deal w/ hotplug events.

My initial plan for the QIDL-generated visitors is to associate a QOM
property, state, with each device, and to serialize device state by
walking the QOM composition tree, the main rationale being that if we
extend that serialization to include other QOM properties, I believe we
have everything we need to recreate all the devices on the target:
parent-child relationships, types, properties set via cmdline,
device state...

A simpler alternative would be to leverage just send the cmdline
options over to the target and assume that it results in the same underlying
machine, then just send off the device state. Much simpler actually...but
the above approach should work regardless of changes to the command-line
options on the source... having an internally stable cmdline scheme
might work as well...

I'm not sure what the right approach is here but whatever we decide on I think
being able to automatically generate visitors from annotations is a good
first-step and should tie into any forseeable approaches.

 
 
  
   Paolo
  
 
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: RCU warning in async pf

2012-04-04 Thread Gleb Natapov
On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote:
 On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote:
  Hi all,
  
  I got the spew at the bottom of the mail in a KVM guest using the KVM tools 
  and running trinity.
  
  I'm not quite sure how default_idle managed to trigger a pagefault, so that 
  part looks odd to me.
  
 This is not regular page fault. This is async page fault that tells the
 guest that a page, previously swapped out by hypervisor, is now swapped
 back in and it can happen while vcpu is idle. The code does not leave
 idle state properly though. We probably need to call rcu_irq_enter()
 there. Will look into it.
 

The patch below solves it for me:

Page ready async PF can kick vcpu out of idle state much like IRQ.
We need to tell RCU about this.

Signed-off-by: Gleb Natapov g...@redhat.com
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index f0c6fd6..380079f 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -38,6 +38,7 @@
 #include asm/traps.h
 #include asm/desc.h
 #include asm/tlbflush.h
+#include asm/idle.h
 
 static int kvmapf = 1;
 
@@ -253,7 +254,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long 
error_code)
kvm_async_pf_task_wait((u32)read_cr2());
break;
case KVM_PV_REASON_PAGE_READY:
+   rcu_irq_enter();
+   exit_idle();
kvm_async_pf_task_wake((u32)read_cr2());
+   rcu_irq_exit();
break;
}
 }
--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] Improve iteration through sptes from rmap

2012-04-04 Thread Takuya Yoshikawa
On Wed, 21 Mar 2012 23:48:23 +0900
Takuya Yoshikawa takuya.yoshik...@gmail.com wrote:

 By removing sptep from rmap_iterator, I could achieve 15% performance
 improvement without inlining.
 
 Takuya Yoshikawa (3):
   KVM: MMU: Make pte_list_desc fit cache lines well
   KVM: MMU: Improve iteration through sptes from rmap
 

ping

Takuya
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Igor Mammedov

On 04/04/2012 02:14 PM, Michael Roth wrote:

On Wed, Apr 04, 2012 at 01:53:34PM +0300, Dor Laor wrote:

On 04/04/2012 01:37 PM, Michael Roth wrote:


On Apr 4, 2012 2:42 AM, Paolo Bonzinipbonz...@redhat.com
mailto:pbonz...@redhat.com  wrote:


Il 04/04/2012 03:18, Michael Roth ha scritto:

Attacking the IDL/schema side first is the more rationale approach.

From

there we can potentially generate ASN.1 BER/DER visitors for the

protocol

side, or potentially even just vmstate bindings as a start. I've

recently

started looking into the latter... it's completely feasible, the only
downside is it complicates the IDL due requiring support for a lot of
what are very much vmstate-specific items, but it should be possible to
do this in a manner where those annotations are self-contained and
ignorable if we opted to replace vmstate-style declarations.


We can also keep the current vmstate descriptions, but access fields
from the automatically-generated visitors instead of struct fields.
This keeps the IDL simple.


It may be worthwhile as an incremental step though, one nice thing about
automatically generated bindings is that with the QIDL Anthony
prototyped a while back we assume we serialize by default, so changes in
annotated structs automatically trigger changes in the generated
bindings unless you explicitly mark fields as immutable/derivable/etc,
which we can tie into the build or make check to automatically detect
and bring attention to changes in vmstate. This may be worth the effort
if we adopt the proposed 4 year migration support cycle for pc-1.0,
since that'll continue to rely on vmstate even after we move on to an
IDL and newer protocol.


Beyond ASL/IDL I like to be sure that we're not just translating one
format to other representation but instead we introduce some new
functionality like:
  - Ability to negotiate the protocol version
  - Bi-direction data exchange, the sender will send data as a function
of the target release
- Include the machine type too


I've been toying with the notion of having the target start up a QMP
limited server that the source talks to to orchestrate negotiation. We
could potentially even send the device state by taking our QIDL-generated
visitors and serializing state via QmpOutputVisitor. QMP can be made
aware of the format of the device state input by taking the intermediate
step of generating QAPI schemas via QIDL, and using the QAPI code
generators to generate the visitors rather than QIDL directly. This
would also address the protocol side: just use QMP rather than ASN.1..

It's not as compact, but device state is such a small amount of data
compared to memory/disk that I don't think it's worth optimizing that
aspect, though we could use compression at the protocol layer if we
were inclined. Anything more suited to an out-of-band protocol, like
memory/disk, could be orchestrated via this interface... source can ask
target for a port to handle such things, negotiate stuff like XBZRLE,
etc.


  - Synchronize the entire qemu cmdline and don't relay on management to
set it up.
- Along the way, deal w/ hotplug events.


My initial plan for the QIDL-generated visitors is to associate a QOM
property, state, with each device, and to serialize device state by
walking the QOM composition tree, the main rationale being that if we
extend that serialization to include other QOM properties, I believe we
have everything we need to recreate all the devices on the target:
parent-child relationships, types, properties set via cmdline,
device state...

A simpler alternative would be to leverage just send the cmdline
options over to the target and assume that it results in the same underlying
machine, then just send off the device state. Much simpler actually...but
the above approach should work regardless of changes to the command-line
options on the source... having an internally stable cmdline scheme
might work as well...

Will command line take in account hot-plugged devices?



I'm not sure what the right approach is here but whatever we decide on I think
being able to automatically generate visitors from annotations is a good
first-step and should tie into any forseeable approaches.







Paolo









--
-
 Igor
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: RCU warning in async pf

2012-04-04 Thread Paul E. McKenney
On Wed, Apr 04, 2012 at 03:30:33PM +0300, Gleb Natapov wrote:
 On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote:
  On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote:
   Hi all,
   
   I got the spew at the bottom of the mail in a KVM guest using the KVM 
   tools and running trinity.
   
   I'm not quite sure how default_idle managed to trigger a pagefault, so 
   that part looks odd to me.
   
  This is not regular page fault. This is async page fault that tells the
  guest that a page, previously swapped out by hypervisor, is now swapped
  back in and it can happen while vcpu is idle. The code does not leave
  idle state properly though. We probably need to call rcu_irq_enter()
  there. Will look into it.
  
 
 The patch below solves it for me:
 
 Page ready async PF can kick vcpu out of idle state much like IRQ.
 We need to tell RCU about this.

This is invoked from an exception or interrupt handler, not from
process-level code?  If so:

Reviewed-by: Paul E. McKenney paul...@linux.vnet.ibm.com

 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
 index f0c6fd6..380079f 100644
 --- a/arch/x86/kernel/kvm.c
 +++ b/arch/x86/kernel/kvm.c
 @@ -38,6 +38,7 @@
  #include asm/traps.h
  #include asm/desc.h
  #include asm/tlbflush.h
 +#include asm/idle.h
 
  static int kvmapf = 1;
 
 @@ -253,7 +254,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned long 
 error_code)
   kvm_async_pf_task_wait((u32)read_cr2());
   break;
   case KVM_PV_REASON_PAGE_READY:
 + rcu_irq_enter();
 + exit_idle();
   kvm_async_pf_task_wake((u32)read_cr2());
 + rcu_irq_exit();
   break;
   }
  }
 --
   Gleb.
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: RCU warning in async pf

2012-04-04 Thread Gleb Natapov
On Wed, Apr 04, 2012 at 07:04:16AM -0700, Paul E. McKenney wrote:
 On Wed, Apr 04, 2012 at 03:30:33PM +0300, Gleb Natapov wrote:
  On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote:
   On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote:
Hi all,

I got the spew at the bottom of the mail in a KVM guest using the KVM 
tools and running trinity.

I'm not quite sure how default_idle managed to trigger a pagefault, so 
that part looks odd to me.

   This is not regular page fault. This is async page fault that tells the
   guest that a page, previously swapped out by hypervisor, is now swapped
   back in and it can happen while vcpu is idle. The code does not leave
   idle state properly though. We probably need to call rcu_irq_enter()
   there. Will look into it.
   
  
  The patch below solves it for me:
  
  Page ready async PF can kick vcpu out of idle state much like IRQ.
  We need to tell RCU about this.
 
 This is invoked from an exception or interrupt handler, not from
 process-level code?  If so:
 
From an exception.

 Reviewed-by: Paul E. McKenney paul...@linux.vnet.ibm.com
 
  Signed-off-by: Gleb Natapov g...@redhat.com
  diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
  index f0c6fd6..380079f 100644
  --- a/arch/x86/kernel/kvm.c
  +++ b/arch/x86/kernel/kvm.c
  @@ -38,6 +38,7 @@
   #include asm/traps.h
   #include asm/desc.h
   #include asm/tlbflush.h
  +#include asm/idle.h
  
   static int kvmapf = 1;
  
  @@ -253,7 +254,10 @@ do_async_page_fault(struct pt_regs *regs, unsigned 
  long error_code)
  kvm_async_pf_task_wait((u32)read_cr2());
  break;
  case KVM_PV_REASON_PAGE_READY:
  +   rcu_irq_enter();
  +   exit_idle();
  kvm_async_pf_task_wake((u32)read_cr2());
  +   rcu_irq_exit();
  break;
  }
   }
  --
  Gleb.
  
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] KVM call minutes April 3

2012-04-04 Thread Michael Roth
On Wed, Apr 04, 2012 at 03:21:26PM +0200, Igor Mammedov wrote:
 On 04/04/2012 02:14 PM, Michael Roth wrote:
 On Wed, Apr 04, 2012 at 01:53:34PM +0300, Dor Laor wrote:
 On 04/04/2012 01:37 PM, Michael Roth wrote:
 
 On Apr 4, 2012 2:42 AM, Paolo Bonzinipbonz...@redhat.com
 mailto:pbonz...@redhat.com  wrote:
 
 Il 04/04/2012 03:18, Michael Roth ha scritto:
 Attacking the IDL/schema side first is the more rationale approach.
 From
 there we can potentially generate ASN.1 BER/DER visitors for the
 protocol
 side, or potentially even just vmstate bindings as a start. I've
 recently
 started looking into the latter... it's completely feasible, the only
 downside is it complicates the IDL due requiring support for a lot of
 what are very much vmstate-specific items, but it should be possible to
 do this in a manner where those annotations are self-contained and
 ignorable if we opted to replace vmstate-style declarations.
 
 We can also keep the current vmstate descriptions, but access fields
 from the automatically-generated visitors instead of struct fields.
 This keeps the IDL simple.
 
 It may be worthwhile as an incremental step though, one nice thing about
 automatically generated bindings is that with the QIDL Anthony
 prototyped a while back we assume we serialize by default, so changes in
 annotated structs automatically trigger changes in the generated
 bindings unless you explicitly mark fields as immutable/derivable/etc,
 which we can tie into the build or make check to automatically detect
 and bring attention to changes in vmstate. This may be worth the effort
 if we adopt the proposed 4 year migration support cycle for pc-1.0,
 since that'll continue to rely on vmstate even after we move on to an
 IDL and newer protocol.
 
 Beyond ASL/IDL I like to be sure that we're not just translating one
 format to other representation but instead we introduce some new
 functionality like:
   - Ability to negotiate the protocol version
   - Bi-direction data exchange, the sender will send data as a function
 of the target release
 - Include the machine type too
 
 I've been toying with the notion of having the target start up a QMP
 limited server that the source talks to to orchestrate negotiation. We
 could potentially even send the device state by taking our QIDL-generated
 visitors and serializing state via QmpOutputVisitor. QMP can be made
 aware of the format of the device state input by taking the intermediate
 step of generating QAPI schemas via QIDL, and using the QAPI code
 generators to generate the visitors rather than QIDL directly. This
 would also address the protocol side: just use QMP rather than ASN.1..
 
 It's not as compact, but device state is such a small amount of data
 compared to memory/disk that I don't think it's worth optimizing that
 aspect, though we could use compression at the protocol layer if we
 were inclined. Anything more suited to an out-of-band protocol, like
 memory/disk, could be orchestrated via this interface... source can ask
 target for a port to handle such things, negotiate stuff like XBZRLE,
 etc.
 
   - Synchronize the entire qemu cmdline and don't relay on management to
 set it up.
 - Along the way, deal w/ hotplug events.
 
 My initial plan for the QIDL-generated visitors is to associate a QOM
 property, state, with each device, and to serialize device state by
 walking the QOM composition tree, the main rationale being that if we
 extend that serialization to include other QOM properties, I believe we
 have everything we need to recreate all the devices on the target:
 parent-child relationships, types, properties set via cmdline,
 device state...
 
 A simpler alternative would be to leverage just send the cmdline
 options over to the target and assume that it results in the same underlying
 machine, then just send off the device state. Much simpler actually...but
 the above approach should work regardless of changes to the command-line
 options on the source... having an internally stable cmdline scheme
 might work as well...
 Will command line take in account hot-plugged devices?

No, that's a good point. We'd probably need to generate the options
required to ensure the devices are created on the target, and we'd only
be able to do that just before sending the device state. That means we
need a way to create machines after we've completed tasks like memory
migration, which probably has similar requirements to just being able to
instantiate a machine from a serialized QOM composition tree.

 
 
 I'm not sure what the right approach is here but whatever we decide on I 
 think
 being able to automatically generate visitors from annotations is a good
 first-step and should tie into any forseeable approaches.
 
 
 
 
 Paolo
 
 
 
 
 
 -- 
 -
  Igor
 
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix warning

2012-04-04 Thread Tadeusz Struk
Hi Bjorn,
Did you have chance to look at this one?
Regards,
Tadeusz

On 01/03/12 17:18, tadeusz.st...@intel.com wrote:
 From: Tadeusz Struk tadeusz.st...@intel.com
 Date: Mon, 14 Feb 2011 14:38:18 +
 Subject: [PATCH] Fixed warning
 
 This patch fixes the following warning.
 # virsh start fedora16-64
 kernel: [  133.324565] pci-stub :02:01.1: claimed by stub
 kernel: [  134.163769] pci-stub :02:01.1: enabling device ( - 0002)
 kernel: [  164.282679] [ cut here ]
 kernel: [  164.282685] WARNING: at drivers/pci/search.c:46 
 pci_find_upstream_pcie_bridge+0x87/0x9f()
 kernel: [  164.282687] Hardware name: SandyBridge Platform
 kernel: [  164.282689] Modules linked in: sha512_generic sha256_generic 
 icp_qa_al(O) nfs fscache auth_rpcgss nfs_acl mga drm ip6table_filter 
 ip6_tables ebtable_nat ebtables lockd ipt_MASQUERADE iptable_nat nf_nat 
 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM 
 iptable_mangle tun bridge stp llc sunrpc btrfs zlib_deflate libcrc32c 
 virtio_net kvm_intel kvm uinput matroxfb_base matroxfb_DAC1064 matroxfb_accel 
 matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc e1000e iTCO_wdt 
 iTCO_vendor_support igb microcode i2c_i801 shpchp serio_raw i2c_core pcspkr 
 dca [last unloaded: scsi_wait_scan]
 kernel: [  164.282724] Pid: 1233, comm: qemu-kvm Tainted: G   O 3.2.5 
 #10
 kernel: [  164.282726] Call Trace:
 kernel: [  164.282732]  [8104eac6] warn_slowpath_common+0x83/0x9b
 kernel: [  164.282735]  [8104eaf8] warn_slowpath_null+0x1a/0x1c
 kernel: [  164.282737]  [8123e028] 
 pci_find_upstream_pcie_bridge+0x87/0x9f
 kernel: [  164.282741]  [813bf5eb] domain_context_mapping+0x50/0xe6
 kernel: [  164.282744]  [813bf6c5] domain_add_dev_info+0x44/0xe3
 kernel: [  164.282747]  [813bfcda] 
 intel_iommu_attach_device+0x14f/0x15c
 kernel: [  164.282750]  [813bb48b] iommu_attach_device+0x1c/0x1e
 kernel: [  164.282764]  [a00f43aa] kvm_assign_device+0x4a/0x114 
 [kvm]
 kernel: [  164.282773]  [a00f3963] 
 kvm_vm_ioctl_assigned_device+0x434/0xb25 [kvm]
 kernel: [  164.282777]  [810f0fee] ? __do_fault+0x351/0x38b
 kernel: [  164.282781]  [8107c05b] ? arch_local_irq_save+0x15/0x1b
 kernel: [  164.282784]  [814b26e4] ? 
 _raw_spin_unlock_irqrestore+0x17/0x19
 kernel: [  164.282787]  [813c0f35] ? pci_conf1_read+0xe1/0xee
 kernel: [  164.282794]  [a00f07df] kvm_vm_ioctl+0x377/0x3ac [kvm]
 kernel: [  164.282797]  [8123eb7c] ? pci_read_config+0xa2/0x1bd
 kernel: [  164.282801]  [8110edc2] ? virt_to_head_page+0xe/0x31
 kernel: [  164.282804]  [8112f210] do_vfs_ioctl+0x45d/0x49e
 kernel: [  164.282808]  [811208da] ? fsnotify_access+0x5f/0x67
 kernel: [  164.282811]  [8112f2a7] sys_ioctl+0x56/0x7b
 kernel: [  164.282814]  [814b8e42] system_call_fastpath+0x16/0x1b
 kernel: [  164.282816] ---[ end trace 6a834ec5ac21cba8 ]---
 
 Signed-off-by: Tadeusz Struk tadeusz.st...@intel.com
 
 ---
  drivers/pci/search.c |7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/drivers/pci/search.c b/drivers/pci/search.c
 index 9d75dc8..7847c6b 100644
 --- a/drivers/pci/search.c
 +++ b/drivers/pci/search.c
 @@ -26,6 +26,7 @@ struct pci_dev *
  pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
  {
   struct pci_dev *tmp = NULL;
 + struct pci_dev *vf = pdev;
  
   if (pci_is_pcie(pdev))
   return NULL;
 @@ -40,8 +41,10 @@ pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
   }
   /* PCI device should connect to a PCIe bridge */
   if (pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) {
 - /* Busted hardware? */
 - WARN_ON_ONCE(1);
 + if (!vf-is_virtfn) {
 + /* Busted hardware? */
 + WARN_ON_ONCE(1);
 + }
   return NULL;
   }
   return pdev;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm: RCU warning in async pf

2012-04-04 Thread Sasha Levin
On Wed, Apr 4, 2012 at 2:30 PM, Gleb Natapov g...@redhat.com wrote:
 On Tue, Apr 03, 2012 at 01:52:26PM +0300, Gleb Natapov wrote:
 On Mon, Apr 02, 2012 at 08:54:32PM -0400, Sasha Levin wrote:
  Hi all,
 
  I got the spew at the bottom of the mail in a KVM guest using the KVM 
  tools and running trinity.
 
  I'm not quite sure how default_idle managed to trigger a pagefault, so 
  that part looks odd to me.
 
 This is not regular page fault. This is async page fault that tells the
 guest that a page, previously swapped out by hypervisor, is now swapped
 back in and it can happen while vcpu is idle. The code does not leave
 idle state properly though. We probably need to call rcu_irq_enter()
 there. Will look into it.


 The patch below solves it for me:

 Page ready async PF can kick vcpu out of idle state much like IRQ.
 We need to tell RCU about this.

Looks good here.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Fix warning

2012-04-04 Thread Bjorn Helgaas
On Wed, Apr 4, 2012 at 9:30 AM, Tadeusz Struk tadeusz.st...@intel.com wrote:
 Hi Bjorn,
 Did you have chance to look at this one?

Yep.  It needs a changelog.  Fixed warning is inadequate.  It needs
an explanation of what the VF connection is.  It would also be nice if
you fixed the function comment, which is unintelligible, incorrect (a
root bus need not be bus 0), and doesn't match what the function does.

 On 01/03/12 17:18, tadeusz.st...@intel.com wrote:
 From: Tadeusz Struk tadeusz.st...@intel.com
 Date: Mon, 14 Feb 2011 14:38:18 +
 Subject: [PATCH] Fixed warning

 This patch fixes the following warning.
 # virsh start fedora16-64
 kernel: [  133.324565] pci-stub :02:01.1: claimed by stub
 kernel: [  134.163769] pci-stub :02:01.1: enabling device ( - 0002)
 kernel: [  164.282679] [ cut here ]
 kernel: [  164.282685] WARNING: at drivers/pci/search.c:46 
 pci_find_upstream_pcie_bridge+0x87/0x9f()
 kernel: [  164.282687] Hardware name: SandyBridge Platform
 kernel: [  164.282689] Modules linked in: sha512_generic sha256_generic 
 icp_qa_al(O) nfs fscache auth_rpcgss nfs_acl mga drm ip6table_filter 
 ip6_tables ebtable_nat ebtables lockd ipt_MASQUERADE iptable_nat nf_nat 
 nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack xt_CHECKSUM 
 iptable_mangle tun bridge stp llc sunrpc btrfs zlib_deflate libcrc32c 
 virtio_net kvm_intel kvm uinput matroxfb_base matroxfb_DAC1064 
 matroxfb_accel matroxfb_Ti3026 matroxfb_g450 g450_pll matroxfb_misc e1000e 
 iTCO_wdt iTCO_vendor_support igb microcode i2c_i801 shpchp serio_raw 
 i2c_core pcspkr dca [last unloaded: scsi_wait_scan]
 kernel: [  164.282724] Pid: 1233, comm: qemu-kvm Tainted: G           O 
 3.2.5 #10
 kernel: [  164.282726] Call Trace:
 kernel: [  164.282732]  [8104eac6] warn_slowpath_common+0x83/0x9b
 kernel: [  164.282735]  [8104eaf8] warn_slowpath_null+0x1a/0x1c
 kernel: [  164.282737]  [8123e028] 
 pci_find_upstream_pcie_bridge+0x87/0x9f
 kernel: [  164.282741]  [813bf5eb] domain_context_mapping+0x50/0xe6
 kernel: [  164.282744]  [813bf6c5] domain_add_dev_info+0x44/0xe3
 kernel: [  164.282747]  [813bfcda] 
 intel_iommu_attach_device+0x14f/0x15c
 kernel: [  164.282750]  [813bb48b] iommu_attach_device+0x1c/0x1e
 kernel: [  164.282764]  [a00f43aa] kvm_assign_device+0x4a/0x114 
 [kvm]
 kernel: [  164.282773]  [a00f3963] 
 kvm_vm_ioctl_assigned_device+0x434/0xb25 [kvm]
 kernel: [  164.282777]  [810f0fee] ? __do_fault+0x351/0x38b
 kernel: [  164.282781]  [8107c05b] ? arch_local_irq_save+0x15/0x1b
 kernel: [  164.282784]  [814b26e4] ? 
 _raw_spin_unlock_irqrestore+0x17/0x19
 kernel: [  164.282787]  [813c0f35] ? pci_conf1_read+0xe1/0xee
 kernel: [  164.282794]  [a00f07df] kvm_vm_ioctl+0x377/0x3ac [kvm]
 kernel: [  164.282797]  [8123eb7c] ? pci_read_config+0xa2/0x1bd
 kernel: [  164.282801]  [8110edc2] ? virt_to_head_page+0xe/0x31
 kernel: [  164.282804]  [8112f210] do_vfs_ioctl+0x45d/0x49e
 kernel: [  164.282808]  [811208da] ? fsnotify_access+0x5f/0x67
 kernel: [  164.282811]  [8112f2a7] sys_ioctl+0x56/0x7b
 kernel: [  164.282814]  [814b8e42] system_call_fastpath+0x16/0x1b
 kernel: [  164.282816] ---[ end trace 6a834ec5ac21cba8 ]---

 Signed-off-by: Tadeusz Struk tadeusz.st...@intel.com

 ---
  drivers/pci/search.c |    7 +--
  1 files changed, 5 insertions(+), 2 deletions(-)

 diff --git a/drivers/pci/search.c b/drivers/pci/search.c
 index 9d75dc8..7847c6b 100644
 --- a/drivers/pci/search.c
 +++ b/drivers/pci/search.c
 @@ -26,6 +26,7 @@ struct pci_dev *
  pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
  {
       struct pci_dev *tmp = NULL;
 +     struct pci_dev *vf = pdev;

       if (pci_is_pcie(pdev))
               return NULL;
 @@ -40,8 +41,10 @@ pci_find_upstream_pcie_bridge(struct pci_dev *pdev)
               }
               /* PCI device should connect to a PCIe bridge */
               if (pdev-pcie_type != PCI_EXP_TYPE_PCI_BRIDGE) {
 -                     /* Busted hardware? */
 -                     WARN_ON_ONCE(1);
 +                     if (!vf-is_virtfn) {
 +                             /* Busted hardware? */
 +                             WARN_ON_ONCE(1);
 +                     }
                       return NULL;
               }
               return pdev;

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Questing regarding KVM Guest PMU

2012-04-04 Thread shashank rachamalla
On Wed, Apr 4, 2012 at 3:59 PM, Gleb Natapov g...@redhat.com wrote:
 On Wed, Apr 04, 2012 at 03:49:42PM +0530, shashank rachamalla wrote:
 tatus: RO
 Content-Length: 2989
 Lines: 79

 On Wed, Apr 4, 2012 at 12:34 PM, Gleb Natapov g...@redhat.com wrote:
  On Wed, Apr 04, 2012 at 12:24:17AM +0530, shashank rachamalla wrote:
  On Wed, Apr 4, 2012 at 12:13 AM, shashank rachamalla
  shashank.rachama...@gmail.com wrote:
   On Tue, Apr 3, 2012 at 10:28 PM, Gleb Natapov g...@redhat.com wrote:
   On Tue, Apr 03, 2012 at 07:20:04PM +0530, shashank rachamalla wrote:
   On Mon, Mar 19, 2012 at 12:37 PM, Gleb Natapov g...@redhat.com 
   wrote:
On Mon, Mar 19, 2012 at 12:20:30PM +0530, shashank rachamalla wrote:
On Sun, Mar 18, 2012 at 10:21 PM, Gleb Natapov g...@redhat.com 
wrote:
 On Sun, Mar 18, 2012 at 09:47:55PM +0530, shashank rachamalla 
 wrote:
  I guess things are working fine with perf. But why not with 
  oprofile ?
 
  Looks like it. I never tried oprofile. Will try to reproduce 
  your
  problem and see what oprofile is doing.

 I am using ubuntu 10.04 with 2.6.32-21-generic kernel as guest 
 and
 oprofile 0.9.6.
 Also, I have tried to capture kvm-events ( perf patch ) in host 
 while
 running oprofile and perf in guest.
 Please see the attachment. I have run the tests in three cases 
 for the
 around 5 secs.

 There are more number of MSR reads and writes in case of perf 
 which I
 think is normal. However, there are very few MSR reads and 
 writes with
 oprofile. Also, the number of NMI exceptions are too high in 
 case of
 oprofile.

 Which host kernel are you using? Try latest kvm.git and check if 
 you see
 something unusual in dmesg.
   
Currenly running 3.3.0-rc5. will try with the latest source from 
kvm
git and let you know.
   
   
Thanks, there were some fixes that didn't make it into 3.3. rdpmc
instruction emulation fix is one of them. If oprofile uses it this 
can
explain the problem.
   
   I have tried with latest kvm source from git and also with 3.0 guest
   kernel but oprofile fails to collect any samples on guest. I am using
   a core2duo processor which is considered by oprofile as pentium pro
   model.
  
   core2duo on the host or the guest? What is your qemu command line?
  
   both. qemu command line below.
   sudo /usr/local/bin/qemu-system-x86_64 -drive
   file=vdisk1.img,if=virtio -cpu host -m 2000 -net nic,model=virtio -net
   user
  
 
  please find more info ( /proc/cpuinfo and uname of both host and guest
  ) in attached files.
 
  oprofile does not work for me even on the host. After trying to use it I 
  can
  see why perf was written in the first place.
 
 ok. seems to be. will move over to perf as its working fine inside guest.

 Good riddance IMO. I managed to run it on a guest (but not on my
 host!). The thing is buggy. It does not use global ctrl MSR to enable
 counters and kvm has all of them disabled by default. I didn't find what
 value this MSR should have after reset, so this may be either kvm bug or
 real BIOSes enable all counters in global ctrl MSR for PMUv1
 compatibility. Doing wrmsr 0x38f 0x7000f solves this problem. The
 second problem is that oprofile reprogram PMU counters without
 disabling them first and this is explicitly prohibited by Intel SDM.
 The patch below solve that, but oprofile is the one who should be fixed.

 diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
 index a73f0c1..be05028 100644
 --- a/arch/x86/kvm/pmu.c
 +++ b/arch/x86/kvm/pmu.c
 @@ -396,6 +396,7 @@ int kvm_pmu_set_msr(struct kvm_vcpu *vcpu, u32 index, u64 
 data)
                                (pmc = get_fixed_pmc(pmu, index))) {
                        data = (s64)(s32)data;
                        pmc-counter += data - read_pmc(pmc);
 +                       reprogram_gp_counter(pmc, pmc-eventsel);
                        return 0;
                } else if ((pmc = get_gp_pmc(pmu, index, MSR_P6_EVNTSEL0))) {
                        if (data == pmc-eventsel)

 --
                        Gleb.

thanks for the patch. will check it out.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2] kvm: Disable MSI/MSI-X in assigned device reset path

2012-04-04 Thread Alex Williamson
We've hit a kernel host panic, when issuing a 'system_reset' with an
82576 nic assigned and a Windows guest. Host system is a PowerEdge R815.

[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 32993
[Hardware Error]: APEI generic hardware error status
[Hardware Error]: severity: 1, fatal
[Hardware Error]: section: 0, severity: 1, fatal
[Hardware Error]: flags: 0x01
[Hardware Error]: primary
[Hardware Error]: section_type: PCIe error
[Hardware Error]: port_type: 0, PCIe end point
[Hardware Error]: version: 1.0
[Hardware Error]: command: 0x, status: 0x0010
[Hardware Error]: device_id: :08:00.0
[Hardware Error]: slot: 1
[Hardware Error]: secondary_bus: 0x00
[Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9
[Hardware Error]: class_code: 02
[Hardware Error]: aer_status: 0x0010, aer_mask: 0x00018000
[Hardware Error]: Unsupported Request
[Hardware Error]: aer_layer=Transaction Layer, aer_agent=Requester ID
[Hardware Error]: aer_uncor_severity: 0x00067011
[Hardware Error]: aer_tlp_header: 40001001 002f edbf800c 0100
[Hardware Error]: section: 1, severity: 1, fatal
[Hardware Error]: flags: 0x01
[Hardware Error]: primary
[Hardware Error]: section_type: PCIe error
[Hardware Error]: port_type: 0, PCIe end point
[Hardware Error]: version: 1.0
[Hardware Error]: command: 0x, status: 0x0010
[Hardware Error]: device_id: :08:00.0
[Hardware Error]: slot: 1
[Hardware Error]: secondary_bus: 0x00
[Hardware Error]: vendor_id: 0x8086, device_id: 0x10c9
[Hardware Error]: class_code: 02
[Hardware Error]: aer_status: 0x0010, aer_mask: 0x00018000
[Hardware Error]: Unsupported Request
[Hardware Error]: aer_layer=Transaction Layer, aer_agent=Requester ID
[Hardware Error]: aer_uncor_severity: 0x00067011
[Hardware Error]: aer_tlp_header: 40001001 002f edbf800c 0100
Kernel panic - not syncing: Fatal hardware error!
Pid: 0, comm: swapper Not tainted 2.6.32-242.el6.x86_64 #1
Call Trace:
 NMI  [814f2fe5] ? panic+0xa0/0x168
 [812f919c] ? ghes_notify_nmi+0x17c/0x180
 [814f91d5] ? notifier_call_chain+0x55/0x80
 [814f923a] ? atomic_notifier_call_chain+0x1a/0x20
 [8109667e] ? notify_die+0x2e/0x30
 [814f6e81] ? do_nmi+0x1a1/0x2b0
 [814f6760] ? nmi+0x20/0x30
 [8103762b] ? native_safe_halt+0xb/0x10
 EOE  [8101495d] ? default_idle+0x4d/0xb0
 [81009e06] ? cpu_idle+0xb6/0x110
 [814da63a] ? rest_init+0x7a/0x80
 [81c1ff7b] ? start_kernel+0x424/0x430
 [81c1f33a] ? x86_64_start_reservations+0x125/0x129
 [81c1f438] ? x86_64_start_kernel+0xfa/0x109

The root cause of the problem is that the 'reset_assigned_device()' code
first writes a 0 to the command register. Then, when qemu subsequently does
a kvm_deassign_irq() (called by assign_irq(), in the system_reset path),
the kernel ends up calling '__msix_mask_irq()', which performs a write to
the memory mapped msi vector space. Since, we've explicitly told the device
to disallow mmio access (via the 0 write to the command register), we end
up with the above 'Unsupported Request'.

The fix here is to first disable MSI-X, before doing the reset.  We also
disable MSI, leaving the device in INTx mode.  In this way, the device is
a known state after reset, and we avoid touching msi memory mapped space
on any subsequent 'kvm_deassign_irq()'.

Thanks to Michael S. Tsirkin for help in understanding what was going on
here and Jason Baron, the original debugger of this problem.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

Jason is out of the office for a couple weeks, so I'll try to resolve
this while he's away.  Somehow the emulated config updates were lost
in Jason's original posting, so I've fixed that and taken Jan's suggestion
to simply call into the update functions instead of open coding the
interrupt disable.  I think there still may be some disagreements about
how to handle guest generated errors in the host, but that's a large
project whereas this is something we should be doing at reset anyway,
and even if only a workaround, resolves the problem above.

 hw/device-assignment.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/hw/device-assignment.c b/hw/device-assignment.c
index 89823f1..2e6b93e 100644
--- a/hw/device-assignment.c
+++ b/hw/device-assignment.c
@@ -1613,6 +1613,29 @@ static void reset_assigned_device(DeviceState *dev)
 const char reset[] = 1;
 int fd, ret;
 
+/*
+ * If a guest is reset without being shutdown, MSI/MSI-X can still
+ * be running.  We want to return the device to a known state on
+ * reset, so disable those here.  We especially do not want MSI-X
+ * enabled since it lives in MMIO space, which is about to get
+ * disabled.
+ */
+if (adev-irq_requested_type  KVM_DEV_IRQ_GUEST_MSIX) {
+uint16_t ctrl = pci_get_word(pci_dev-config +
+ pci_dev-msix_cap + 

CPU softlockup due to smp_call_function()

2012-04-04 Thread Milton Miller

On Wed, 4 Apr 2012 about 22:12:36 +0200, Sasha Levin wrote:
 I've starting seeing soft lockups resulting from smp_call_function()
 calls. I've attached two different backtraces of this happening with
 different code paths.
 
 This is running inside a KVM guest with the trinity fuzzer, using
 today's linux-next kernel.

Hi Sasha.

You have two different call sites (arch/x86/mm/pageattr.c
cpa_flush_range and net/core/dev.c netdev_run_todo), and both use
call on_each_cpu with wait=1.

I tried a few options but can't get close enough to your compiled
length of 2a0 to know if the code is spinning on the first
csd_lock_wait in csd_lock or in the second csd_lock_wait after the
call to arch_send_call_function_ipi_mask (aka smp_ops + 0x44 in my
x86_64 compile).  Please check your disassembly and report.

If its the first lock, then the current stack is an innocent victim.
In either case we need to find what the cpu(s) holding up the reporting
cpus call function data (cfd_data per_cpu var) is(are) doing.

Since interrupts are on, we could read the time at entry (even jiffies)
and report both the function and mask of cpus that have not processed
the cpus entry if the elapsed time has exceeded some threshold.

I described the call flow of smp_call_function_many and outlined some
debug sanity checks that could be added at [1] if you suspect the function
list is getting corrupted.

Let me know if you need help creating this debug code.

[1] https://lkml.org/lkml/2012/1/13/308

milton
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Use clockevent multiplier and shifter for decrementer

2012-04-04 Thread Bharat Bhushan
Time for which the hrtimer is started for decrementer emulation is calculated 
using tb_ticks_per_usec. While hrtimer uses the clockevent for DEC 
reprogramming (if needed) and which calculate timebase ticks using the 
multiplier and shifter mechanism implemented within clockevent layer. It was 
observed that this conversion (timebase-time-timebase) are not correct 
because the mechanism are not consistent. In our setup it adds 2% jitter.

With this patch clockevent multiplier and shifter mechanism are used when 
starting hrtimer for decrementer emulation. Now the jitter is  0.5%.

Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
---
 arch/powerpc/include/asm/time.h |2 ++
 arch/powerpc/kernel/time.c  |6 ++
 arch/powerpc/kvm/emulate.c  |5 +++--
 3 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/time.h b/arch/powerpc/include/asm/time.h
index 7eb10fb..6d631b2 100644
--- a/arch/powerpc/include/asm/time.h
+++ b/arch/powerpc/include/asm/time.h
@@ -202,6 +202,8 @@ extern u64 mulhdu(u64, u64);
 extern void div128_by_32(u64 dividend_high, u64 dividend_low,
 unsigned divisor, struct div_result *dr);
 
+extern void get_clockevent_mult(u64 *multi, u64 *shift);
+
 /* Used to store Processor Utilization register (purr) values */
 
 struct cpu_usage {
diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c
index 567dd7c..d229edd 100644
--- a/arch/powerpc/kernel/time.c
+++ b/arch/powerpc/kernel/time.c
@@ -910,6 +910,12 @@ static void __init init_decrementer_clockevent(void)
register_decrementer_clockevent(cpu);
 }
 
+void get_clockevent_mult(u64 *multi, u64 *shift)
+{
+   *multi = decrementer_clockevent.mult;
+   *shift = decrementer_clockevent.shift;
+}
+
 void secondary_cpu_time_init(void)
 {
/* Start the decrementer on CPUs that have manual control
diff --git a/arch/powerpc/kvm/emulate.c b/arch/powerpc/kvm/emulate.c
index afc9154..4bfcaa1 100644
--- a/arch/powerpc/kvm/emulate.c
+++ b/arch/powerpc/kvm/emulate.c
@@ -76,6 +76,7 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 {
unsigned long dec_nsec;
unsigned long long dec_time;
+   u64 mult, shift;
 
pr_debug(mtDEC: %x\n, vcpu-arch.dec);
hrtimer_try_to_cancel(vcpu-arch.dec_timer);
@@ -103,9 +104,9 @@ void kvmppc_emulate_dec(struct kvm_vcpu *vcpu)
 * host ticks.
 */
 
+   get_clockevent_mult(mult, shift);
dec_time = vcpu-arch.dec;
-   dec_time *= 1000;
-   do_div(dec_time, tb_ticks_per_usec);
+   dec_time = (dec_time  shift) / mult;
dec_nsec = do_div(dec_time, NSEC_PER_SEC);
hrtimer_start(vcpu-arch.dec_timer,
ktime_set(dec_time, dec_nsec), HRTIMER_MODE_REL);
-- 
1.7.0.4


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: bookehv: Fix save/restore of guest accessible SPRGs.

2012-04-04 Thread b16395
From: Varun Sethi varun.se...@freescale.com

For Guest accessible SPRGs 4-7, save/restore must be handled differently for 
64bit and
non-64 bit case. The registers are maintained as 64 bit copies by KVM. While 
saving/restoring
for the non-64 bit case we should always take the lower 4 bytes.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
 arch/powerpc/kvm/bookehv_interrupts.S |   48 +++-
 1 files changed, 40 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kvm/bookehv_interrupts.S 
b/arch/powerpc/kvm/bookehv_interrupts.S
index 909e96e..c1c0bae 100644
--- a/arch/powerpc/kvm/bookehv_interrupts.S
+++ b/arch/powerpc/kvm/bookehv_interrupts.S
@@ -320,13 +320,29 @@ _GLOBAL(kvmppc_resume_host)
PPC_STL r5, VCPU_LR(r4)
mfspr   r7, SPRN_SPRG5
PPC_STL r3, VCPU_VRSAVE(r4)
-   PPC_STL r6, VCPU_SHARED_SPRG4(r11)
+#ifdef CONFIG_64BIT
+   std r6, VCPU_SHARED_SPRG4(r11)
+#else
+   stw r6, (VCPU_SHARED_SPRG4 + 4)(r11)
+#endif
mfspr   r8, SPRN_SPRG6
-   PPC_STL r7, VCPU_SHARED_SPRG5(r11)
+#ifdef CONFIG_64BIT
+   std r7, VCPU_SHARED_SPRG5(r11)
+#else
+   stw r7, (VCPU_SHARED_SPRG5 + 4)(r11)
+#endif
mfspr   r9, SPRN_SPRG7
-   PPC_STL r8, VCPU_SHARED_SPRG6(r11)
+#ifdef CONFIG_64BIT
+   std r8, VCPU_SHARED_SPRG6(r11)
+#else
+   stw r8, (VCPU_SHARED_SPRG6 + 4)(r11)
+#endif
mfxer   r3
-   PPC_STL r9, VCPU_SHARED_SPRG7(r11)
+#ifdef CONFIG_64BIT
+   std r9, VCPU_SHARED_SPRG7(r11)
+#else
+   stw r9, (VCPU_SHARED_SPRG7 + 4)(r11)
+#endif
 
/* save guest MAS registers and restore host mas4  mas6 */
mfspr   r5, SPRN_MAS0
@@ -549,13 +565,29 @@ lightweight_exit:
 * SPRGs, so we need to reload them here with the guest's values.
 */
lwz r3, VCPU_VRSAVE(r4)
-   lwz r5, VCPU_SHARED_SPRG4(r11)
+#ifdef CONFIG_64BIT
+   ld  r5, VCPU_SHARED_SPRG4(r11)
+#else
+   lwz r5, (VCPU_SHARED_SPRG4 + 4)(r11)
+#endif
mtspr   SPRN_VRSAVE, r3
-   lwz r6, VCPU_SHARED_SPRG5(r11)
+#ifdef CONFIG_64BIT
+   ld  r6, VCPU_SHARED_SPRG5(r11)
+#else
+   lwz r6, (VCPU_SHARED_SPRG5 + 4)(r11)
+#endif
mtspr   SPRN_SPRG4W, r5
-   lwz r7, VCPU_SHARED_SPRG6(r11)
+#ifdef CONFIG_64BIT
+   ld  r7, VCPU_SHARED_SPRG6(r11)
+#else
+   lwz r7, (VCPU_SHARED_SPRG6 + 4)(r11)
+#endif
mtspr   SPRN_SPRG5W, r6
-   lwz r8, VCPU_SHARED_SPRG7(r11)
+#ifdef CONFIG_64BIT
+   ld  r8, VCPU_SHARED_SPRG7(r11)
+#else
+   lwz r8, (VCPU_SHARED_SPRG7 + 4)(r11)
+#endif
mtspr   SPRN_SPRG6W, r7
mtspr   SPRN_SPRG7W, r8
 
-- 
1.7.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: PPC: booke: Use the lower four bytes while restoring guest readable SPRGs.

2012-04-04 Thread b16395
From: Varun Sethi varun.se...@freescale.com

While restoring the hardware copies of guest SPRG4-7 registers we must use the
the lower 4 bytes of the 64 bit sotware copies maintained by KVM.

Signed-off-by: Varun Sethi varun.se...@freescale.com
---
 arch/powerpc/kvm/booke_interrupts.S |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kvm/booke_interrupts.S 
b/arch/powerpc/kvm/booke_interrupts.S
index c8c4b87..feda1bb 100644
--- a/arch/powerpc/kvm/booke_interrupts.S
+++ b/arch/powerpc/kvm/booke_interrupts.S
@@ -419,13 +419,13 @@ lightweight_exit:
 * written directly to the shared area, so we
 * need to reload them here with the guest's values.
 */
-   lwz r3, VCPU_SHARED_SPRG4(r5)
+   lwz r3, (VCPU_SHARED_SPRG4 + 4)(r5)
mtspr   SPRN_SPRG4W, r3
-   lwz r3, VCPU_SHARED_SPRG5(r5)
+   lwz r3, (VCPU_SHARED_SPRG5 + 4)(r5)
mtspr   SPRN_SPRG5W, r3
-   lwz r3, VCPU_SHARED_SPRG6(r5)
+   lwz r3, (VCPU_SHARED_SPRG6 + 4)(r5)
mtspr   SPRN_SPRG6W, r3
-   lwz r3, VCPU_SHARED_SPRG7(r5)
+   lwz r3, (VCPU_SHARED_SPRG7 + 4)(r5)
mtspr   SPRN_SPRG7W, r3
 
 #ifdef CONFIG_KVM_EXIT_TIMING
-- 
1.7.2.2


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html