date:20120720

[PATCH RESEND] Recognize PCID feature

2012-07-20 Thread Mao, Junjie

This patch makes Qemu recognize the PCID feature specified from configuration 
or command line options.

Signed-off-by: Junjie Mao junjie@intel.com
---
 target-i386/cpu.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 5521709..efc6ece
100644
--- a/target-i386/cpu.c
+++ b/target-i386/cpu.c
@@ -50,7 +50,7 @@ static const char *ext_feature_name[] = {
 ds_cpl, vmx, smx, est,
 tm2, ssse3, cid, NULL,
 fma, cx16, xtpr, pdcm,
-NULL, NULL, dca, sse4.1|sse4_1,
+NULL, pcid, dca, sse4.1|sse4_1,
 sse4.2|sse4_2, x2apic, movbe, popcnt,
 tsc-deadline, aes, xsave, osxsave,
 avx, NULL, NULL, hypervisor,
--
1.7.1
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC PATCH v2 03/21][SeaBIOS] acpi-dsdt: Implement functions for memory hotplug

2012-07-20 Thread Vasilis Liaskovitis

On Tue, Jul 17, 2012 at 03:23:00PM +0800, Wen Congyang wrote:
  +Method(MESC, 0) {
  +// Local5 = active memdevice bitmap
  +Store (MES, Local5)
  +// Local2 = last read byte from bitmap
  +Store (Zero, Local2)
  +// Local0 = memory device iterator
  +Store (Zero, Local0)
  +While (LLess(Local0, SizeOf(MEON))) {
  +// Local1 = MEON flag for this memory device
  +Store(DerefOf(Index(MEON, Local0)), Local1)
  +If (And(Local0, 0x07)) {
  +// Shift down previously read bitmap byte
  +ShiftRight(Local2, 1, Local2)
  +} Else {
  +// Read next byte from memdevice bitmap
  +Store(DerefOf(Index(Local5, ShiftRight(Local0, 3))), 
  Local2)
  +}
  +// Local3 = active state for this memory device
  +Store(And(Local2, 1), Local3)
  +
  +If (LNotEqual(Local1, Local3)) {
 
 There are two ways to hot remove a memory device:
 1. dimm_del
 2. echo 1 /sys/bus/acpi/devices/PNP0C80:XX/eject
 
 In the 2nd case, we cannot hotplug this memory device again,
 because both Local1 and Local3 are 1.
 
 So, I think MEON flag for this meory device should be set to 0 in method _EJ0
 or implement method _PS3 for memory device.

good catch. Both internal seabios state (MEON) and the machine qemu bitmap
(mems_sts in hw/acpi_piix4.c) have to be updated when the ejection comes from
OSPM action. I will implement a _PS3 method that updates the MEON flag and also
signals qemu to change the mems_sts bitmap.

thanks,
- Vasilis

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH v5 0/4] kvm: level irqfd and new eoifd

2012-07-20 Thread Michael S. Tsirkin

On Thu, Jul 19, 2012 at 12:48:07PM -0600, Alex Williamson wrote:
 On Thu, 2012-07-19 at 20:45 +0300, Michael S. Tsirkin wrote:
  On Thu, Jul 19, 2012 at 11:29:38AM -0600, Alex Williamson wrote:
   On Thu, 2012-07-19 at 19:59 +0300, Michael S. Tsirkin wrote:
On Mon, Jul 16, 2012 at 02:33:38PM -0600, Alex Williamson wrote:
 v5:
  - irqfds now have a one-to-one mapping with eoifds to prevent users
from consuming all of kernel memory by repeatedly creating eoifds
from a single irqfd.
  - implement a kvm_clear_irq() which does a test_and_clear_bit of
the irq_state, only updating the pic/ioapic if changes and allowing
the caller to know if anything was done.  I added this onto the end
as it's essentially an optimization on the previous design.  It's
hard to tell if there's an actual performance benefit to this.
  - dropped eoifd gsi support patch as it was only an FYI.
 
 Thanks,
 
 Alex


So 3/4, 4/4 are racy and I think you convinced me it's best to drop it 
for
now. I hope that fact that we already scan all vcpus under spinlock for
level interrupts is enough to justify adding a lock here.

To summarize issues still outstanding with 1/2, 2/2:
   (a)
- source id lingering after irqfd was destroyed/deassigned
  prevents assigning a new irqfd
   (b)
- if same irqfd is deassigned and re-assigned, this
  seems to succeed but does not give any more EOIs
   (c)
- document that user needs to re-inject interrupts
  injected by level IRQFD after migration as they are cleared

Hope this helps!
   
   Thanks, I'm refining and testing a re-write.  One thing I also noticed
   is that we don't do anything when the eoifd is closed.  We'll cleanup
   when kvm is closed, but that can leave a lot of stray eoifds, and
   therefore used irq_source_ids tied up.  So, I think I need to pull in a
   lot of the irqfd code just to be able to catch the POLLHUP and do
   cleanup.
  
  I don't think it's worth it. With ioeventfd we have the same issue
  and we don't care: userspace should just DEASSIGN before close.
  With irqfd we committed to support cleanup by close but
  it happens kind of naturally since we poll irqfd anyway.
  
  It's there for irqfd for historical reasons.
 
 You're not dealing with such a limited resource for ioeventfds though.
 It's pretty easily conceivable we could run out of irq source IDs.

Running out of fds is also very conceivable.  Not deassigning
before close is a userspace bug anyway.

   Fixing (a) is a simple flush, so I already added that.  To
   solve (b), I think that saving the irqfd eventfd ctx was a bad idea.
  
  I actually think we should just fix it. Scan eoifds when closing/opening
  irqfds and bind/unbind source id.
 
 Hmm,  IMHO we had no business holding onto an eventfd ctx.  That was an
 ugly implementation detail forced by the desire to allow the same
 eventfd to be used in multiple eoifds.  The fallout from that leaves a
 lasting tie between the eoifd and the future use of that eventfd.  I can
 imagine the scenario you present is just one of the glitches and I
 really don't want to have one interface disable another.

Looks like this disabling is inherent in what we want eoifd to do.  You
bind irqfd and eoifd. If irqfd is deassigned, eoifd will not get any
more events, it is disabled. Whether it keeps the pointer to source id
internally or not does not matter to the user.

   The new api I will propose to solve it is that kvm_irqfd returns a token
   (or key) when used as a level irqfd (actually the irq source id, but the
   user shouldn't care what it is).  We pass that into eoifd instead of the
   irqfd.  That means that if the irqfd is closed and re-configured, the
   user will get a new key and should have no expectation that it's tied to
   the previous eoifd.  I'll add a comment for (c).  Thanks,
   
   Alex
  
  Hmm, another API rewrite, when I felt it is finally stabilizing. Maybe
  it's the right thing to do but it does feel like we change userspace ABI
  just because we have run into an implementation difficulty.
  
  Pls note I'm offline next week so won't have time to review soon.
 
 We could return the key in the struct kvm_irqfd if it adds anything, but
 I felt returning the key was preferable and is compatible with the
 existing ABI.  Thanks,
 
 Alex

You say it is preferable but I wonder what does it buy users compared to
using the fd directly - it is certainly more work for userspace to keep
track of it.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Stefan Hajnoczi

On Thu, Jul 19, 2012 at 2:09 PM, Michael S. Tsirkin m...@redhat.com wrote:
 On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Of course, the million dollar question is why would using AIO in the
 kernel be faster than using AIO in userspace?

 Actually for me a more important question is how does it compare
 with virtio-blk dataplane?

Hi Khoa,
I think you have results of data-plane and vhost-blk?  Is the
vhost-blk version identical to Asias' recent patches?

Stefan
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PIC: call ack notifiers for irqs that are dropped form irr

2012-07-20 Thread Marcelo Tosatti

On Tue, Jul 17, 2012 at 02:59:11PM +0300, Gleb Natapov wrote:
 After commit 242ec97c358256 PIT interrupts are no longer delivered after
 PIC reset. It happens because PIT injects interrupt only if previous one
 was acked, but since on PIC reset it is dropped from irr it will never
 be delivered and hence acknowledged. Fix that by calling ack notifier on
 PIC reset.
 
 Signed-off-by: Gleb Natapov g...@redhat.com
 diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
 index 81cf4fa..f09e790 100644
 --- a/arch/x86/kvm/i8259.c
 +++ b/arch/x86/kvm/i8259.c
 @@ -305,6 +305,7 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 
 val)
   addr = 1;
   if (addr == 0) {
   if (val  0x10) {
 + u8 edge_irr = s-irr  ~s-elcr;
   s-init4 = val  1;
   s-last_irr = 0;
   s-irr = s-elcr;
 @@ -322,6 +323,9 @@ static void pic_ioport_write(void *opaque, u32 addr, u32 
 val)
   if (val  0x08)
   pr_pic_unimpl(
   level sensitive irq not supported);
 + for (irq = 0; irq  PIC_NUM_PINS/2; irq++)
 + if (edge_irr  (1  irq))
 + pic_clear_isr(s, irq);
   } else if (val  0x08) {
   if (val  0x04)
   s-poll = 1;
 --
   Gleb.

Can modify kvm_pic_reset (currently unused BTW) to conform to
9ed049c3b6230b6898 ? It checks for APIC handling interrupts
before acking.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/9] KVM: MMU: fask check write-protect for direct mmu

2012-07-20 Thread Marcelo Tosatti

On Fri, Jul 20, 2012 at 11:45:59AM +0800, Xiao Guangrong wrote:
 BTW, they are some bug fix patches on -master branch, but
 it is not existed on -next branch:
 commit: f411930442e01f9cf1bf4df41ff7e89476575c4d
 commit: 85b7059169e128c57a3a8a3e588fb89cb2031da1
 
 It causes code conflict if we do the development on -next.

See auto-next branch.

http://www.linux-kvm.org/page/Kvm-Git-Workflow


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] KVM: x86: simplify read_emulated

2012-07-20 Thread Marcelo Tosatti

On Fri, Jul 20, 2012 at 10:17:36AM +0800, Xiao Guangrong wrote:
 On 07/20/2012 07:58 AM, Marcelo Tosatti wrote:
 
  -  }
  +  rc = ctxt-ops-read_emulated(ctxt, addr, mc-data + mc-end, size,
  +ctxt-exception);
  +  if (rc != X86EMUL_CONTINUE)
  +  return rc;
  +
  +  mc-end += size;
  +
  +read_cached:
  +  memcpy(dest, mc-data + mc-pos, size);
  
  What prevents read_emulated(size  8) call, with
  mc-pos == (mc-end - 8) now?
 
 Marcelo,
 
 The splitting has been done in emulator_read_write_onepage:
 
   while (bytes) {
   unsigned now = min(bytes, 8U);
 
   frag = vcpu-mmio_fragments[vcpu-mmio_nr_fragments++];
   frag-gpa = gpa;
   frag-data = val;
   frag-len = now;
   frag-write_readonly_mem = (ret == -EPERM);
 
   gpa += now;
   val += now;
   bytes -= now;
   }
 
 So i think it is safe to remove the splitting in read_emulated.

Yes, it is fine to remove it.

But splitting in emulate.c prevented the case of _cache read_ with size
 8 beyond end of mc-data. Must handle that case in read_emulated.

What prevents read_emulated(size  8) call, with mc-pos == (mc-end - 8) now?


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/9] KVM: MMU: fask check write-protect for direct mmu

2012-07-20 Thread Marcelo Tosatti

On Fri, Jul 20, 2012 at 10:34:28AM +0800, Xiao Guangrong wrote:
 On 07/20/2012 08:39 AM, Marcelo Tosatti wrote:
  On Tue, Jul 17, 2012 at 09:53:29PM +0800, Xiao Guangrong wrote:
  If it have no indirect shadow pages we need not protect any gfn,
  this is always true for direct mmu without nested
 
  Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com
  
  Xiao,
  
  What is the motivation? Numbers please.
  
 
 mmu_need_write_protect is the common path for both soft-mmu and
 hard-mmu, checking indirect_shadow_pages can skip hash-table walking
 for the case which is tdp is enabled without nested guest.

I mean motivation as observation that it is a bottleneck.

 I will post the Number after I do the performance test.
 
  In fact, what case was the original indirect_shadow_pages conditional in
  kvm_mmu_pte_write optimizing again?
  
 
 They are the different paths, mmu_need_write_protect is the real
 page fault path, and kvm_mmu_pte_write is caused by mmio emulation.

Sure. What i am asking is, what use case is the indirect_shadow_pages
optimizing? What scenario, what workload? 

See the When to optimize section of
http://en.wikipedia.org/wiki/Program_optimization.

Can't remember why indirect_shadow_pages was introduced in
kvm_mmu_pte_write.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC-v3 4/4] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-20 Thread Michael S. Tsirkin

On Wed, Jul 18, 2012 at 02:20:58PM -0700, Nicholas A. Bellinger wrote:
 On Wed, 2012-07-18 at 19:09 +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 12:59:32AM +, Nicholas A. Bellinger wrote:
 
 SNIP
 
   
   Changelog v2 - v3:
   
 Unlock on error in tcm_vhost_drop_nexus() (DanC)
 Fix strlen() doesn't count the terminator (DanC)
 Call kfree() on an error path (DanC)
 Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
 Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
 Add option under drivers/staging/Kconfig, and move to drivers/vhost/tcm/
 as requested by MST (nab)
   
   ---
drivers/staging/Kconfig   |2 +
drivers/vhost/Makefile|2 +
drivers/vhost/tcm/Kconfig |6 +
drivers/vhost/tcm/Makefile|1 +
drivers/vhost/tcm/tcm_vhost.c | 1611 
   +
drivers/vhost/tcm/tcm_vhost.h |   74 ++
6 files changed, 1696 insertions(+), 0 deletions(-)
create mode 100644 drivers/vhost/tcm/Kconfig
create mode 100644 drivers/vhost/tcm/Makefile
create mode 100644 drivers/vhost/tcm/tcm_vhost.c
create mode 100644 drivers/vhost/tcm/tcm_vhost.h
   
  
  Really sorry about making you run around like that,
  I did not mean moving all of tcm to a directory,
  just adding tcm/Kconfig or adding drivers/vhost/Kconfig.tcm
  because eventually it's easier to keep it all together
  in one place.
  
 
 Er, apologies for the slight mis-understanding here..  Moving back now +
 fixing up the Kbuild bits.

I'm going offline in several hours and am on vacation for a week
starting tomorrow. So to make 3.6, and if you intend to merge through my
tree, the best bet is if you can send the final version real soon now.

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Biweekly upstream qemu-kvm test report (using autotest + manual) - Week 28

2012-07-20 Thread Prem Karat

Folks,
Please find the result of upstream testing. This time we got a kernel panic
error while compiling mainline kernel (3.5.rc7). Hence we could verify only
mainline qemu-kvm.

We are analysing the failures and we will raise the bugs with the appropriate
community.

Host Kernel: Kernel: 3.1.0-7.fc16.x86_64
KVM Version:  1.1.50 (qemu-kvm-devel)
Date: Thu Jul 19 17:51:29 2012
Stat: 59 tests executed - 40 have passed 19 Failed
Number of Bugs raised: 2
https://bugzilla.kernel.org/show_bug.cgi?id=44901
https://github.com/autotest/autotest/issues/467


Tests Failed:

..
 Test Name  
ResultRun time
..

kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ffsb
FAIL   29
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.disktest
FAIL   24
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.hackbench   
FAIL   22
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.cpu_hotplug 
FAIL   57
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.block_stream 
FAIL  159
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.linux_s3 
FAIL  303
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model
  FAIL 2280
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model_and_flags
FAIL 2483
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_boot_guest_and_try_flags_under_load
  FAIL 2859
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_online_offline_guest_CPUs
FAIL 2619
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_migration_with_additional_flags
  FAIL 2665
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle
FAIL2
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle_multi  
FAIL  344
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpu_share 
FAIL1
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpuset_cpus   
FAIL1
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.freezer   
FAIL2
-


Tests Passed:

...
 Test Name  
ResultRun time
...
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.dbench  
PASS  131
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ebizzy  
PASS   22
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.stress  
PASS   88
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.sleeptest   
PASS   55
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.iozone  
PASS  540
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.jumbo
PASS  537
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_bandwidth   
PASS  28
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpuset_cpus_switching 
PASS   95
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.devices_access
PASS   15
kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.memory_limit  
PASS

Re: [net-next RFC V5 5/5] virtio_net: support negotiating the number of queues through ctrl vq

2012-07-20 Thread Michael S. Tsirkin

On Thu, Jul 05, 2012 at 06:29:54PM +0800, Jason Wang wrote:
 This patch let the virtio_net driver can negotiate the number of queues it
 wishes to use through control virtqueue and export an ethtool interface to let
 use tweak it.
 
 As current multiqueue virtio-net implementation has optimizations on per-cpu
 virtuqueues, so only two modes were support:
 
 - single queue pair mode
 - multiple queue paris mode, the number of queues matches the number of vcpus
 
 The single queue mode were used by default currently due to regression of
 multiqueue mode in some test (especially in stream test).
 
 Since virtio core does not support paritially deleting virtqueues, so during
 mode switching the whole virtqueue were deleted and the driver would re-create
 the virtqueues it would used.
 
 btw. The queue number negotiating were defered to .ndo_open(), this is because
 only after feature negotitaion could we send the command to control virtqueue
 (as it may also use event index).
 
 Signed-off-by: Jason Wang jasow...@redhat.com
 ---
  drivers/net/virtio_net.c   |  171 ++-
  include/linux/virtio_net.h |7 ++
  2 files changed, 142 insertions(+), 36 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 7410187..3339eeb 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -88,6 +88,7 @@ struct receive_queue {
  
  struct virtnet_info {
   u16 num_queue_pairs;/* # of RX/TX vq pairs */
 + u16 total_queue_pairs;
  
   struct send_queue *sq[MAX_QUEUES] cacheline_aligned_in_smp;
   struct receive_queue *rq[MAX_QUEUES] cacheline_aligned_in_smp;
 @@ -137,6 +138,8 @@ struct padded_vnet_hdr {
   char padding[6];
  };
  
 +static const struct ethtool_ops virtnet_ethtool_ops;
 +
  static inline int txq_get_qnum(struct virtnet_info *vi, struct virtqueue *vq)
  {
   int ret = virtqueue_get_queue_index(vq);
 @@ -802,22 +805,6 @@ static void virtnet_netpoll(struct net_device *dev)
  }
  #endif
  
 -static int virtnet_open(struct net_device *dev)
 -{
 - struct virtnet_info *vi = netdev_priv(dev);
 - int i;
 -
 - for (i = 0; i  vi-num_queue_pairs; i++) {
 - /* Make sure we have some buffers: if oom use wq. */
 - if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
 - queue_delayed_work(system_nrt_wq,
 -vi-rq[i]-refill, 0);
 - virtnet_napi_enable(vi-rq[i]);
 - }
 -
 - return 0;
 -}
 -
  /*
   * Send command via the control virtqueue and check status.  Commands
   * supported by the hypervisor, as indicated by feature bits, should
 @@ -873,6 +860,43 @@ static void virtnet_ack_link_announce(struct 
 virtnet_info *vi)
   rtnl_unlock();
  }
  
 +static int virtnet_set_queues(struct virtnet_info *vi)
 +{
 + struct scatterlist sg;
 + struct net_device *dev = vi-dev;
 + sg_init_one(sg, vi-num_queue_pairs, sizeof(vi-num_queue_pairs));
 +
 + if (!vi-has_cvq)
 + return -EINVAL;
 +
 + if (!virtnet_send_command(vi, VIRTIO_NET_CTRL_MULTIQUEUE,
 +   VIRTIO_NET_CTRL_MULTIQUEUE_QNUM, sg, 1, 0)){
 + dev_warn(dev-dev, Fail to set the number of queue pairs to
 +   %d\n, vi-num_queue_pairs);
 + return -EINVAL;
 + }
 +
 + return 0;
 +}
 +
 +static int virtnet_open(struct net_device *dev)
 +{
 + struct virtnet_info *vi = netdev_priv(dev);
 + int i;
 +
 + for (i = 0; i  vi-num_queue_pairs; i++) {
 + /* Make sure we have some buffers: if oom use wq. */
 + if (!try_fill_recv(vi-rq[i], GFP_KERNEL))
 + queue_delayed_work(system_nrt_wq,
 +vi-rq[i]-refill, 0);
 + virtnet_napi_enable(vi-rq[i]);
 + }
 +
 + virtnet_set_queues(vi);
 +
 + return 0;
 +}
 +
  static int virtnet_close(struct net_device *dev)
  {
   struct virtnet_info *vi = netdev_priv(dev);
 @@ -1013,12 +1037,6 @@ static void virtnet_get_drvinfo(struct net_device *dev,
  
  }
  
 -static const struct ethtool_ops virtnet_ethtool_ops = {
 - .get_drvinfo = virtnet_get_drvinfo,
 - .get_link = ethtool_op_get_link,
 - .get_ringparam = virtnet_get_ringparam,
 -};
 -
  #define MIN_MTU 68
  #define MAX_MTU 65535
  
 @@ -1235,7 +1253,7 @@ static int virtnet_find_vqs(struct virtnet_info *vi)
  
  err:
   if (ret  names)
 - for (i = 0; i  vi-num_queue_pairs * 2; i++)
 + for (i = 0; i  total_vqs * 2; i++)
   kfree(names[i]);
  
   kfree(names);
 @@ -1373,7 +1391,6 @@ static int virtnet_probe(struct virtio_device *vdev)
   mutex_init(vi-config_lock);
   vi-config_enable = true;
   INIT_WORK(vi-config_work, virtnet_config_changed_work);
 - vi-num_queue_pairs = num_queue_pairs;
  
   /* If we can receive ANY GSO packets, we must allocate large ones. */

Re: [PATCH 2/9] KVM: x86: simplify read_emulated

2012-07-20 Thread Xiao Guangrong

On 07/20/2012 06:58 PM, Marcelo Tosatti wrote:
 On Fri, Jul 20, 2012 at 10:17:36AM +0800, Xiao Guangrong wrote:
 On 07/20/2012 07:58 AM, Marcelo Tosatti wrote:

 -  }
 +  rc = ctxt-ops-read_emulated(ctxt, addr, mc-data + mc-end, size,
 +ctxt-exception);
 +  if (rc != X86EMUL_CONTINUE)
 +  return rc;
 +
 +  mc-end += size;
 +
 +read_cached:
 +  memcpy(dest, mc-data + mc-pos, size);

 What prevents read_emulated(size  8) call, with
 mc-pos == (mc-end - 8) now?

 Marcelo,

 The splitting has been done in emulator_read_write_onepage:

  while (bytes) {
  unsigned now = min(bytes, 8U);

  frag = vcpu-mmio_fragments[vcpu-mmio_nr_fragments++];
  frag-gpa = gpa;
  frag-data = val;
  frag-len = now;
  frag-write_readonly_mem = (ret == -EPERM);

  gpa += now;
  val += now;
  bytes -= now;
  }

 So i think it is safe to remove the splitting in read_emulated.
 
 Yes, it is fine to remove it.
 
 But splitting in emulate.c prevented the case of _cache read_ with size
 8 beyond end of mc-data. Must handle that case in read_emulated.
 
 What prevents read_emulated(size  8) call, with mc-pos == (mc-end - 8) 
 now?

You mean the mmio region is partly cached?

I think it can not happen. Now, we pass the whole size to 
emulator_read_write_onepage(),
after it is finished, it saves the whole data into mc-data[], so, the 
cache-read
can always get the whole data from mc-data[].

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: PIC: call ack notifiers for irqs that are dropped form irr

2012-07-20 Thread Gleb Natapov

On Fri, Jul 20, 2012 at 08:58:56AM -0300, Marcelo Tosatti wrote:
 On Tue, Jul 17, 2012 at 02:59:11PM +0300, Gleb Natapov wrote:
  After commit 242ec97c358256 PIT interrupts are no longer delivered after
  PIC reset. It happens because PIT injects interrupt only if previous one
  was acked, but since on PIC reset it is dropped from irr it will never
  be delivered and hence acknowledged. Fix that by calling ack notifier on
  PIC reset.
  
  Signed-off-by: Gleb Natapov g...@redhat.com
  diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
  index 81cf4fa..f09e790 100644
  --- a/arch/x86/kvm/i8259.c
  +++ b/arch/x86/kvm/i8259.c
  @@ -305,6 +305,7 @@ static void pic_ioport_write(void *opaque, u32 addr, 
  u32 val)
  addr = 1;
  if (addr == 0) {
  if (val  0x10) {
  +   u8 edge_irr = s-irr  ~s-elcr;
  s-init4 = val  1;
  s-last_irr = 0;
  s-irr = s-elcr;
  @@ -322,6 +323,9 @@ static void pic_ioport_write(void *opaque, u32 addr, 
  u32 val)
  if (val  0x08)
  pr_pic_unimpl(
  level sensitive irq not supported);
  +   for (irq = 0; irq  PIC_NUM_PINS/2; irq++)
  +   if (edge_irr  (1  irq))
  +   pic_clear_isr(s, irq);
  } else if (val  0x08) {
  if (val  0x04)
  s-poll = 1;
  --
  Gleb.
 
 Can modify kvm_pic_reset (currently unused BTW) to conform to
 9ed049c3b6230b6898 ? It checks for APIC handling interrupts
 before acking.
Since it is not used we can make it do anything. But preferably in
speared patch otherwise bug fix will be obfuscated by code move.

--
Gleb.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/9] KVM: MMU: fask check write-protect for direct mmu

2012-07-20 Thread Xiao Guangrong

On 07/20/2012 07:09 PM, Marcelo Tosatti wrote:
 On Fri, Jul 20, 2012 at 10:34:28AM +0800, Xiao Guangrong wrote:
 On 07/20/2012 08:39 AM, Marcelo Tosatti wrote:
 On Tue, Jul 17, 2012 at 09:53:29PM +0800, Xiao Guangrong wrote:
 If it have no indirect shadow pages we need not protect any gfn,
 this is always true for direct mmu without nested

 Signed-off-by: Xiao Guangrong xiaoguangr...@linux.vnet.ibm.com

 Xiao,

 What is the motivation? Numbers please.


 mmu_need_write_protect is the common path for both soft-mmu and
 hard-mmu, checking indirect_shadow_pages can skip hash-table walking
 for the case which is tdp is enabled without nested guest.
 
 I mean motivation as observation that it is a bottleneck.
 
 I will post the Number after I do the performance test.

 In fact, what case was the original indirect_shadow_pages conditional in
 kvm_mmu_pte_write optimizing again?


 They are the different paths, mmu_need_write_protect is the real
 page fault path, and kvm_mmu_pte_write is caused by mmio emulation.
 
 Sure. What i am asking is, what use case is the indirect_shadow_pages
 optimizing? What scenario, what workload? 
 

Sorry, Marcelo, i do know why i completely misunderstood your mail. :(

I am not sure whether this is a bottleneck, i just got it from
code review, i will measure it to see if we can get benefit from
it. :p

 See the When to optimize section of
 http://en.wikipedia.org/wiki/Program_optimization.
 
 Can't remember why indirect_shadow_pages was introduced in
 kvm_mmu_pte_write.
 

Please refer to:
https://lkml.org/lkml/2011/5/18/174

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [net-next RFC V5 4/5] virtio_net: multiqueue support

2012-07-20 Thread Michael S. Tsirkin

On Thu, Jul 05, 2012 at 06:29:53PM +0800, Jason Wang wrote:
 This patch converts virtio_net to a multi queue device. After negotiated
 VIRTIO_NET_F_MULTIQUEUE feature, the virtio device has many tx/rx queue pairs,
 and driver could read the number from config space.
 
 The driver expects the number of rx/tx queue paris is equal to the number of
 vcpus. To maximize the performance under this per-cpu rx/tx queue pairs, some
 optimization were introduced:
 
 - Txq selection is based on the processor id in order to avoid contending a 
 lock
   whose owner may exits to host.
 - Since the txq/txq were per-cpu, affinity hint were set to the cpu that owns
   the queue pairs.
 
 Signed-off-by: Krishna Kumar krkum...@in.ibm.com
 Signed-off-by: Jason Wang jasow...@redhat.com

Overall fine. I think it is best to smash the following patch
into this one, so that default behavior does not
jump to mq then back. some comments below: mostly nits, and a minor bug.

If you are worried the patch is too big, it can be split
differently
- rework to use send_queue/receive_queue structures, no
  functional changes.
- add multiqueue

but this is not a must.

 ---
  drivers/net/virtio_net.c   |  645 ++-
  include/linux/virtio_net.h |2 +
  2 files changed, 452 insertions(+), 195 deletions(-)
 
 diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c
 index 1db445b..7410187 100644
 --- a/drivers/net/virtio_net.c
 +++ b/drivers/net/virtio_net.c
 @@ -26,6 +26,7 @@
  #include linux/scatterlist.h
  #include linux/if_vlan.h
  #include linux/slab.h
 +#include linux/interrupt.h
  
  static int napi_weight = 128;
  module_param(napi_weight, int, 0444);
 @@ -41,6 +42,8 @@ module_param(gso, bool, 0444);
  #define VIRTNET_SEND_COMMAND_SG_MAX2
  #define VIRTNET_DRIVER_VERSION 1.0.0
  
 +#define MAX_QUEUES 256
 +
  struct virtnet_stats {
   struct u64_stats_sync tx_syncp;
   struct u64_stats_sync rx_syncp;

Would be a bit better not to have artificial limits like that.
Maybe allocate arrays at probe time, then we can
take whatever the device gives us?

 @@ -51,43 +54,69 @@ struct virtnet_stats {
   u64 rx_packets;
  };
  
 -struct virtnet_info {
 - struct virtio_device *vdev;
 - struct virtqueue *rvq, *svq, *cvq;
 - struct net_device *dev;
 +/* Internal representation of a send virtqueue */
 +struct send_queue {
 + /* Virtqueue associated with this send _queue */
 + struct virtqueue *vq;
 +
 + /* TX: fragments + linear part + virtio header */
 + struct scatterlist sg[MAX_SKB_FRAGS + 2];
 +};
 +
 +/* Internal representation of a receive virtqueue */
 +struct receive_queue {
 + /* Virtqueue associated with this receive_queue */
 + struct virtqueue *vq;
 +
 + /* Back pointer to the virtnet_info */
 + struct virtnet_info *vi;
 +
   struct napi_struct napi;
 - unsigned int status;
  
   /* Number of input buffers, and max we've ever had. */
   unsigned int num, max;
  
 + /* Work struct for refilling if we run low on memory. */
 + struct delayed_work refill;
 +
 + /* Chain pages by the private ptr. */
 + struct page *pages;
 +
 + /* RX: fragments + linear part + virtio header */
 + struct scatterlist sg[MAX_SKB_FRAGS + 2];
 +};
 +
 +struct virtnet_info {
 + u16 num_queue_pairs;/* # of RX/TX vq pairs */
 +
 + struct send_queue *sq[MAX_QUEUES] cacheline_aligned_in_smp;
 + struct receive_queue *rq[MAX_QUEUES] cacheline_aligned_in_smp;

The assumption is a tx/rx pair is handled on the same cpu, yes?
If yes maybe make it a single array to improve cache locality
a bit?
struct queue_pair {
struct send_queue sq;
struct receive_queue rq;
};

 + struct virtqueue *cvq;
 +
 + struct virtio_device *vdev;
 + struct net_device *dev;
 + unsigned int status;
 +
   /* I like... big packets and I cannot lie! */
   bool big_packets;
  
   /* Host will merge rx buffers for big packets (shake it! shake it!) */
   bool mergeable_rx_bufs;
  
 + /* Has control virtqueue */
 + bool has_cvq;
 +

won't checking (cvq != NULL) be enough?

   /* enable config space updates */
   bool config_enable;
  
   /* Active statistics */
   struct virtnet_stats __percpu *stats;
  
 - /* Work struct for refilling if we run low on memory. */
 - struct delayed_work refill;
 -
   /* Work struct for config space updates */
   struct work_struct config_work;
  
   /* Lock for config space updates */
   struct mutex config_lock;
 -
 - /* Chain pages by the private ptr. */
 - struct page *pages;
 -
 - /* fragments + linear part + virtio header */
 - struct scatterlist rx_sg[MAX_SKB_FRAGS + 2];
 - struct scatterlist tx_sg[MAX_SKB_FRAGS + 2];
  };
  
  struct skb_vnet_hdr {
 @@ -108,6 +137,22 @@ struct padded_vnet_hdr {
   char padding[6];
  };

Re: Unexpected host I/O load

2012-07-20 Thread Brian Conry

After some additional troubleshooting under the guidance of a friend,
this appears to be a libvirt issue.

I have opened the following bug for it:
https://bugzilla.redhat.com/show_bug.cgi?id=841918.

Thanks to any who spent time looking at this.

Brian
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] Biweekly upstream qemu-kvm test report (using autotest + manual) - Week 28

2012-07-20 Thread lei yang

Hi

seems there is no virtio_console testing.which I have no one pass
and why some of the below cases is mannual, and they are not in the
defalut tests.cfg

Lei


On Fri, Jul 20, 2012 at 8:20 PM, Prem Karat
prem.ka...@linux.vnet.ibm.com wrote:
 Folks,
 Please find the result of upstream testing. This time we got a kernel panic
 error while compiling mainline kernel (3.5.rc7). Hence we could verify only
 mainline qemu-kvm.

 We are analysing the failures and we will raise the bugs with the appropriate
 community.

 Host Kernel: Kernel: 3.1.0-7.fc16.x86_64
 KVM Version:  1.1.50 (qemu-kvm-devel)
 Date: Thu Jul 19 17:51:29 2012
 Stat: 59 tests executed - 40 have passed 19 Failed
 Number of Bugs raised: 2
 https://bugzilla.kernel.org/show_bug.cgi?id=44901
 https://github.com/autotest/autotest/issues/467


 Tests Failed:

 ..
  Test Name
   ResultRun time
 ..

 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ffsb  
   FAIL   29
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.disktest  
   FAIL   24
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.hackbench 
   FAIL   22
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.cpu_hotplug   
   FAIL   57
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.block_stream   
   FAIL  159
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.linux_s3   
   FAIL  303
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model
   FAIL 2280
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model_and_flags
 FAIL 2483
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_boot_guest_and_try_flags_under_load
   FAIL 2859
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_online_offline_guest_CPUs
 FAIL 2619
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_migration_with_additional_flags
   FAIL 2665
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle  
   FAIL2
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle_multi
   FAIL  344
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpu_share   
   FAIL1
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpuset_cpus 
   FAIL1
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.freezer 
   FAIL2
 -


 Tests Passed:

 ...
  Test Name
   ResultRun time
 ...
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.dbench
   PASS  131
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ebizzy
   PASS   22
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.stress
   PASS   88
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.sleeptest 
   PASS   55
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.iozone
   PASS  540
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.jumbo  
   PASS  537
 kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_bandwidth 
   PASS  28

Re: [PATCH v9 01/16] ARM: add mem_type prot_pte accessor

2012-07-20 Thread Andreas Färber

Am 03.07.2012 10:59, schrieb Christoffer Dall:
 From: Marc Zyngier marc.zyng...@arm.com
 
 The KVM hypervisor mmu code requires requires access to the

code requires access

Andreas

 mem_type prot_pte field when setting up page tables pointing
 to a device. Unfortunately, the mem_type structure is opaque.
 
 Add an accessor (get_mem_type_prot_pte()) to retrieve the
 prot_pte value.
 
 Signed-off-by: Marc Zyngier marc.zyng...@arm.com
 Signed-off-by: Christoffer Dall c.d...@virtualopensystems.com

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] KVM: MMU: Fix mmu_shrink() so that it can free mmu pages as intended

2012-07-20 Thread Marcelo Tosatti

On Fri, Jul 20, 2012 at 10:04:34AM +0900, Takuya Yoshikawa wrote:
 On Wed, 18 Jul 2012 17:52:46 -0300
 Marcelo Tosatti mtosa...@redhat.com wrote:
 
  Can't understand, can you please expand more clearly? 
 
 I think mmu pages are not worth freeing under usual memory pressure,
 especially when we have EPT/NPT on.
 
 What's happening:
 shrink_slab() vainly calls mmu_shrink() with the default batch size 128,
 and mmu_shrink() takes a long time to zap mmu pages far fewer than the
 requested number, usually just frees one.  Sadly, KVM may recreate the
 page soon after that.
 
 Since we set the seeks 10 times greater than the default, total_scan is
 very small and shrink_slab() just wastes time for freeing such small
 amount of may-be-reallocated-soon memory: I want it to use time for
 scanning other objects instead.
 
 Actually the total amount of memory used for mmu pages is not huge in
 the case of EPT/NPT on: maybe smaller that that of rmap?

rmap size is a function of mmu pages, so mmu_shrink indirectly 
releases rmap also.

 So, it's clear that no one wants mmu pages to be freed as other objects.
 Sure, our seeks size prevents shrink_slab() from calling mmu_shrink()
 usually.  But what if administrators want to drop clean caches on the
 host?
 
 Documentation/sysctl/vm.txt says:
   Writing to this will cause the kernel to drop clean caches, dentries and
   inodes from memory, causing that memory to become free.
 
   To free pagecache:
   echo 1  /proc/sys/vm/drop_caches
   To free dentries and inodes:
   echo 2  /proc/sys/vm/drop_caches
   To free pagecache, dentries and inodes:
   echo 3  /proc/sys/vm/drop_caches
 
 I don't want mmu pages to be freed in such cases.

drop_caches should be used in special occasions. I would not worry
about it.

 So, how about stopping reporting/returning the total number of used
 mmu pages to shrink_slab()?
 
 If we do so, it will think that there are not enough objects to get
 memory back from KVM.

No, its important to be able to release memory quickly in low memory
conditions.

I bet the reasoning behind current seeks value (10*default) is close to
arbitrary.

mmu_shrink can be smarter, by freeing pages which are less likely to
be used. IIRC Avi had some nice ideas for LRU-like schemes (search the
archives).

You can also consider the fact that freeing a higher level pagetable
frees all of its children (that is quite dumb actually, sequential
shrink passes should free only pages with no children).

 In the case of shadow paging, guests can do bad things to allocate
 enormous mmu pages, so we should report such exceeded numbers to
 shrink_slab() as freeable objects, not the total.

A guest idle for 2 months should not have its mmu pages in memory.

   |--- needed ---|--- freeable under memory pressure ---|
 
 We may be able to use n_max_mmu_pages for this: the shrinker tries
 to free mmu pages unless the number reaches the goal.
 
 Thanks,
   Takuya
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [Autotest] Biweekly upstream qemu-kvm test report (using autotest + manual) - Week 28

2012-07-20 Thread Qingtang Zhou

* On 2012-07-20 22:52:21 +0800, lei yang (yanglei.f...@gmail.com) wrote:
 Hi
 
 seems there is no virtio_console testing.which I have no one pass
 and why some of the below cases is mannual, and they are not in the
Because we have no guest agent test case in autotest now. though I'm
working on it. lol
 defalut tests.cfg
 
 Lei
 
 
 On Fri, Jul 20, 2012 at 8:20 PM, Prem Karat
 prem.ka...@linux.vnet.ibm.com wrote:
  Folks,
  Please find the result of upstream testing. This time we got a kernel panic
  error while compiling mainline kernel (3.5.rc7). Hence we could verify only
  mainline qemu-kvm.
 
  We are analysing the failures and we will raise the bugs with the 
  appropriate
  community.
 
  Host Kernel: Kernel: 3.1.0-7.fc16.x86_64
  KVM Version:  1.1.50 (qemu-kvm-devel)
  Date: Thu Jul 19 17:51:29 2012
  Stat: 59 tests executed - 40 have passed 19 Failed
  Number of Bugs raised: 2
  https://bugzilla.kernel.org/show_bug.cgi?id=44901
  https://github.com/autotest/autotest/issues/467
 
 
  Tests Failed:
 
  ..
   Test Name  
  ResultRun time
  ..
 
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ffsb
  FAIL   29
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.disktest
  FAIL   24
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.hackbench   
  FAIL   22
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.cpu_hotplug 
  FAIL   57
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.block_stream 
  FAIL  159
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.linux_s3 
  FAIL  303
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model
FAIL 2280
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model_and_flags
  FAIL 2483
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_boot_guest_and_try_flags_under_load
FAIL 2859
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_online_offline_guest_CPUs
  FAIL 2619
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_migration_with_additional_flags
FAIL 2665
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle
  FAIL2
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle_multi
FAIL  344
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpu_share 
  FAIL1
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpuset_cpus   
  FAIL1
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.freezer   
  FAIL2
  -
 
 
  Tests Passed:
 
  ...
   Test Name  
  ResultRun time
  ...
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.dbench  
  PASS  131
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ebizzy  
  PASS   22
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.stress  
  PASS   88
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.sleeptest   
  PASS   55
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.iozone  
  PASS  540
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.jumbo

Re: [Autotest] Biweekly upstream qemu-kvm test report (using autotest + manual) - Week 28

2012-07-20 Thread lei yang

On Fri, Jul 20, 2012 at 11:18 PM, Qingtang Zhou qz...@redhat.com wrote:
 * On 2012-07-20 22:52:21 +0800, lei yang (yanglei.f...@gmail.com) wrote:
 Hi

 seems there is no virtio_console testing.which I have no one pass
 and why some of the below cases is mannual, and they are not in the
 Because we have no guest agent test case in autotest now. though I'm
 working on it. lol

The guest agent testcase you metioned, is it the manual cases in this thead?
It seems we it select some cases not all the case to do the Biweekly
upstream test,
is there some stratigies to select which cases to run



 defalut tests.cfg

 Lei


 On Fri, Jul 20, 2012 at 8:20 PM, Prem Karat
 prem.ka...@linux.vnet.ibm.com wrote:
  Folks,
  Please find the result of upstream testing. This time we got a kernel panic
  error while compiling mainline kernel (3.5.rc7). Hence we could verify only
  mainline qemu-kvm.
 
  We are analysing the failures and we will raise the bugs with the 
  appropriate
  community.
 
  Host Kernel: Kernel: 3.1.0-7.fc16.x86_64
  KVM Version:  1.1.50 (qemu-kvm-devel)
  Date: Thu Jul 19 17:51:29 2012
  Stat: 59 tests executed - 40 have passed 19 Failed
  Number of Bugs raised: 2
  https://bugzilla.kernel.org/show_bug.cgi?id=44901
  https://github.com/autotest/autotest/issues/467
 
 
  Tests Failed:
 
  ..
   Test Name 
   ResultRun time
  ..
 
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ffsb   
   FAIL   29
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.disktest   
   FAIL   24
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.hackbench  
   FAIL   22
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.cpu_hotplug
   FAIL   57
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.block_stream
   FAIL  159
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.linux_s3
   FAIL  303
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model
FAIL 2280
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.boot_guest.qemu_boot_cpu_model_and_flags
  FAIL 2483
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_boot_guest_and_try_flags_under_load
FAIL 2859
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_online_offline_guest_CPUs
  FAIL 2619
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cpuflags.stress_guest.qemu_test_migration_with_additional_flags
FAIL 2665
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle   
   FAIL2
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.blkio_throttle_multi
FAIL  344
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpu_share
   FAIL1
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.cpuset_cpus  
   FAIL1
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.cgroup.freezer  
   FAIL2
  -
 
 
  Tests Passed:
 
  ...
   Test Name 
   ResultRun time
  ...
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.dbench 
   PASS  131
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.ebizzy 
   PASS   22
  kvm.qed.virtio_blk.smp2.virtio_net.RHEL.6.2.x86_64.autotest.stress 
   PASS   88

[PATCH v6 0/2] kvm: level irqfd and new eoifd

2012-07-20 Thread Alex Williamson

v6:

So we're back to just the first two patches, unfortunately the
diffstat got bigger though.  The reason for that is that I discovered
we don't do anything on release of an eoifd.  We cleanup if the kvm
vm is released, but we're dealing with a constrained resource of irq
source IDs, so I think it's best that we cleanup to make sure those
are returned.  To do that we need nearly the same infrastructure as
irqfd, we just only watch for POLLHUP.  So while there's more code
here, the structure and function names line up identically to irqfd.

The other big change here is that KVM_IRQFD returns a token when
setting up a level irqfd.  We use this token to associate the eoifd
with the right source.  This means we have to put the struct
_source_ids on a list so we can find them.  This removes the weird
interaction we were headed to where the eoifd is associated with
the eventfd of the irqfd.  There's potentially more flexibility for
the future here too as we might come up with other interfaces that
can return a source ID key.  Perhaps some future KVM_IRQFD will
allow specifying a key for re-attaching.  Anyway, the sequence
Michael pointed out where an irqfd is de-assigned then re-assigned
now results in a new key instead of leaving the user wondering if
it re-associates back to the eoifd.

Also added workqueue flushes on assign since releasing either
object now results in a lazy release via workqueue.  This ensures
we re-claim any source IDs we can.  Thanks,

Alex
---

Alex Williamson (2):
  kvm: KVM_EOIFD, an eventfd for EOIs
  kvm: Extend irqfd to support level interrupts


 Documentation/virtual/kvm/api.txt |   32 ++-
 arch/x86/kvm/x86.c|3 
 include/linux/kvm.h   |   18 +
 include/linux/kvm_host.h  |   17 +
 virt/kvm/eventfd.c|  463 -
 virt/kvm/kvm_main.c   |   11 +
 6 files changed, 536 insertions(+), 8 deletions(-)
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH v6 1/2] kvm: Extend irqfd to support level interrupts

2012-07-20 Thread Alex Williamson

In order to inject a level interrupt from an external source using an
irqfd, we need to allocate a new irq_source_id.  This allows us to
assert and (later) de-assert an interrupt line independently from
users of KVM_IRQ_LINE and avoid lost interrupts.

We also add what may appear like a bit of excessive infrastructure
around an object for storing this irq_source_id.  However, notice
that we only provide a way to assert the interrupt here.  A follow-on
interface will make use of the same irq_source_id to allow de-assert.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |   11 +++
 arch/x86/kvm/x86.c|1 
 include/linux/kvm.h   |3 +
 include/linux/kvm_host.h  |4 +
 virt/kvm/eventfd.c|  128 +++--
 5 files changed, 139 insertions(+), 8 deletions(-)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index bf33aaa..3911e62 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1936,7 +1936,7 @@ Capability: KVM_CAP_IRQFD
 Architectures: x86
 Type: vm ioctl
 Parameters: struct kvm_irqfd (in)
-Returns: 0 on success, -1 on error
+Returns: 0 (or = 0) on success, -1 on error
 
 Allows setting an eventfd to directly trigger a guest interrupt.
 kvm_irqfd.fd specifies the file descriptor to use as the eventfd and
@@ -1946,6 +1946,15 @@ the guest using the specified gsi pin.  The irqfd is 
removed using
 the KVM_IRQFD_FLAG_DEASSIGN flag, specifying both kvm_irqfd.fd
 and kvm_irqfd.gsi.
 
+The KVM_IRQFD_FLAG_LEVEL flag indicates the gsi input is for a level
+triggered interrupt.  In this case a new irqchip input is allocated
+which is logically OR'd with other inputs allowing multiple sources
+to independently assert level interrupts.  The KVM_IRQFD_FLAG_LEVEL
+is only necessary on setup, teardown is identical to that above.  The
+return value when called with this flag is a key (= 0) which may be
+used to associate this irqfd with other ioctls.  KVM_IRQFD_FLAG_LEVEL
+support is indicated by KVM_CAP_IRQFD_LEVEL.
+
 4.76 KVM_PPC_ALLOCATE_HTAB
 
 Capability: KVM_CAP_PPC_ALLOC_HTAB
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 59b5950..9ded39d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2170,6 +2170,7 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_GET_TSC_KHZ:
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
+   case KVM_CAP_IRQFD_LEVEL:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index 2ce09aa..b2e6e4f 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -618,6 +618,7 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_PPC_GET_SMMU_INFO 78
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
+#define KVM_CAP_IRQFD_LEVEL 81
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -683,6 +684,8 @@ struct kvm_xen_hvm_config {
 #endif
 
 #define KVM_IRQFD_FLAG_DEASSIGN (1  0)
+/* Available with KVM_CAP_IRQFD_LEVEL */
+#define KVM_IRQFD_FLAG_LEVEL (1  1)
 
 struct kvm_irqfd {
__u32 fd;
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b70b48b..c73f071 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -285,6 +285,10 @@ struct kvm {
struct list_head  items;
} irqfds;
struct list_head ioeventfds;
+   struct {
+   struct mutex lock;
+   struct list_head items;
+   } irqsources;
 #endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index 7d7e2aa..878cb52 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -36,6 +36,66 @@
 #include iodev.h
 
 /*
+ * An irq_source_id can be created from KVM_IRQFD for level interrupt
+ * injections and shared with other interfaces for EOI or de-assert.
+ * Create an object with reference counting to make it easy to use.
+ */
+struct _irq_source {
+   int id; /* the IRQ source ID */
+   int gsi;
+   struct kvm *kvm;
+   struct list_head list;
+   struct kref kref;
+};
+
+static void _irq_source_release(struct kref *kref)
+{
+   struct _irq_source *source =
+   container_of(kref, struct _irq_source, kref);
+
+   /* This also de-asserts */
+   kvm_free_irq_source_id(source-kvm, source-id);
+   list_del(source-list);
+   kfree(source);
+}
+
+static void _irq_source_put(struct _irq_source *source)
+{
+   if (source) {
+   mutex_lock(source-kvm-irqsources.lock);
+   kref_put(source-kref, _irq_source_release);
+   mutex_unlock(source-kvm-irqsources.lock);
+   }
+}
+
+static struct _irq_source *_irq_source_alloc(struct kvm *kvm, int gsi)
+{
+   struct _irq_source *source;
+   int id;
+
+   source = kzalloc(sizeof(*source), GFP_KERNEL);
+   if

[PATCH v6 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-07-20 Thread Alex Williamson

This new ioctl enables an eventfd to be triggered when an EOI is
written for a specified irqchip pin.  The first user of this will
be external device assignment through VFIO, using a level irqfd
for asserting a PCI INTx interrupt and this interface for de-assert
and notification once the interrupt is serviced.

Here we make use of the reference counting of the _irq_source
object allowing us to share it with an irqfd and cleanup regardless
of the release order.

Signed-off-by: Alex Williamson alex.william...@redhat.com
---

 Documentation/virtual/kvm/api.txt |   21 ++
 arch/x86/kvm/x86.c|2 
 include/linux/kvm.h   |   15 ++
 include/linux/kvm_host.h  |   13 +
 virt/kvm/eventfd.c|  335 +
 virt/kvm/kvm_main.c   |   11 +
 6 files changed, 397 insertions(+)

diff --git a/Documentation/virtual/kvm/api.txt 
b/Documentation/virtual/kvm/api.txt
index 3911e62..8cd6b36 100644
--- a/Documentation/virtual/kvm/api.txt
+++ b/Documentation/virtual/kvm/api.txt
@@ -1989,6 +1989,27 @@ return the hash table order in the parameter.  (If the 
guest is using
 the virtualized real-mode area (VRMA) facility, the kernel will
 re-create the VMRA HPTEs on the next KVM_RUN of any vcpu.)
 
+4.77 KVM_EOIFD
+
+Capability: KVM_CAP_EOIFD
+Architectures: x86
+Type: vm ioctl
+Parameters: struct kvm_eoifd (in)
+Returns: 0 on success,  0 on error
+
+KVM_EOIFD allows userspace to receive interrupt EOI notification
+through an eventfd.  kvm_eoifd.fd specifies the eventfd used for
+notification.  KVM_EOIFD_FLAG_DEASSIGN is used to de-assign an eoifd
+once assigned.  KVM_EOIFD also requires additional bits set in
+kvm_eoifd.flags to bind to the proper interrupt line.  The
+KVM_EOIFD_FLAG_LEVEL_IRQFD indicates that kvm_eoifd.key is provided
+and is a key from a level triggered interrupt (configured from
+KVM_IRQFD using KVM_IRQFD_FLAG_LEVEL).  The EOI notification is bound
+to the same GSI and irqchip input as the irqfd.  Both kvm_eoifd.key
+and KVM_EOIFD_FLAG_LEVEL_IRQFD must be specified on assignment and
+de-assignment of KVM_EOIFD.  A level irqfd may only be bound to a
+single eoifd.  KVM_CAP_EOIFD_LEVEL_IRQFD indicates support of
+KVM_EOIFD_FLAG_LEVEL_IRQFD.
 
 5. The kvm_run structure
 
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 9ded39d..8f3164e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2171,6 +2171,8 @@ int kvm_dev_ioctl_check_extension(long ext)
case KVM_CAP_PCI_2_3:
case KVM_CAP_KVMCLOCK_CTRL:
case KVM_CAP_IRQFD_LEVEL:
+   case KVM_CAP_EOIFD:
+   case KVM_CAP_EOIFD_LEVEL_IRQFD:
r = 1;
break;
case KVM_CAP_COALESCED_MMIO:
diff --git a/include/linux/kvm.h b/include/linux/kvm.h
index b2e6e4f..effb916 100644
--- a/include/linux/kvm.h
+++ b/include/linux/kvm.h
@@ -619,6 +619,8 @@ struct kvm_ppc_smmu_info {
 #define KVM_CAP_S390_COW 79
 #define KVM_CAP_PPC_ALLOC_HTAB 80
 #define KVM_CAP_IRQFD_LEVEL 81
+#define KVM_CAP_EOIFD 82
+#define KVM_CAP_EOIFD_LEVEL_IRQFD 83
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
@@ -694,6 +696,17 @@ struct kvm_irqfd {
__u8  pad[20];
 };
 
+#define KVM_EOIFD_FLAG_DEASSIGN (1  0)
+/* Available with KVM_CAP_EOIFD_LEVEL_IRQFD */
+#define KVM_EOIFD_FLAG_LEVEL_IRQFD (1  1)
+
+struct kvm_eoifd {
+   __u32 fd;
+   __u32 flags;
+   __u32 key;
+   __u8 pad[20];
+};
+
 struct kvm_clock_data {
__u64 clock;
__u32 flags;
@@ -834,6 +847,8 @@ struct kvm_s390_ucas_mapping {
 #define KVM_PPC_GET_SMMU_INFO_IOR(KVMIO,  0xa6, struct kvm_ppc_smmu_info)
 /* Available with KVM_CAP_PPC_ALLOC_HTAB */
 #define KVM_PPC_ALLOCATE_HTAB_IOWR(KVMIO, 0xa7, __u32)
+/* Available with KVM_CAP_EOIFD */
+#define KVM_EOIFD _IOW(KVMIO,  0xa8, struct kvm_eoifd)
 
 /*
  * ioctls for vcpu fds
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c73f071..01e72a6 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -289,6 +289,10 @@ struct kvm {
struct mutex lock;
struct list_head items;
} irqsources;
+   struct {
+   spinlock_t lock;
+   struct list_head items;
+   } eoifds;
 #endif
struct kvm_vm_stat stat;
struct kvm_arch arch;
@@ -832,6 +836,8 @@ int kvm_irqfd(struct kvm *kvm, struct kvm_irqfd *args);
 void kvm_irqfd_release(struct kvm *kvm);
 void kvm_irq_routing_update(struct kvm *, struct kvm_irq_routing_table *);
 int kvm_ioeventfd(struct kvm *kvm, struct kvm_ioeventfd *args);
+int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args);
+void kvm_eoifd_release(struct kvm *kvm);
 
 #else
 
@@ -857,6 +863,13 @@ static inline int kvm_ioeventfd(struct kvm *kvm, struct 
kvm_ioeventfd *args)
return -ENOSYS;
 }
 
+static inline int kvm_eoifd(struct kvm *kvm, struct kvm_eoifd *args)
+{
+   return -ENOSYS;
+}
+
+static inline void

[PATCH 2/2] kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero

2012-07-20 Thread Bruce Rogers

When a guest is migrated, a time offset is generated in order to
maintain the correct kvmclock based time for the guest. Detect when
all kvmclock time pages are deleted so that the kvmclock offset can
be safely reset to zero.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@gmail.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |5 -
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..112415c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -524,6 +524,7 @@ struct kvm_arch {
 
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
+   unsigned int n_time_pages;
raw_spinlock_t tsc_write_lock;
u64 last_tsc_nsec;
u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14c290d..350c51b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1511,6 +1511,8 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
if (vcpu-arch.time_page) {
kvm_release_page_dirty(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
+   if (--vcpu-kvm-arch.n_time_pages == 0)
+   vcpu-kvm-arch.kvmclock_offset = 0;
}
 }
 
@@ -1624,7 +1626,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
if (is_error_page(vcpu-arch.time_page)) {
kvm_release_page_clean(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
-   }
+   } else
+   vcpu-kvm-arch.n_time_pages++;
break;
}
case MSR_KVM_ASYNC_PF_EN:
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] kvm: kvmclock: fix kvmclock reboot after migrate issues

2012-07-20 Thread Bruce Rogers

When a linux guest live migrates to a new host and subsequently
reboots, the guest no longer has the correct time. This is due
to a failure to apply the kvmclock offset to the wall clock time.

The first patch addresses this failure directly, while the second
patch detects when the offset is no longer needed, and zeroes the
offset as a matter of cleaning up migration state which is no longer
relevant. Both patches address the issue, but in different ways. 


Bruce Rogers (2):
  kvm: kvmclock: apply kvmclock offset to guest wall clock time
  kvm: kvmclock: eliminate kvmclock offset when time page count goes to
zero

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |9 -
 2 files changed, 9 insertions(+), 1 deletions(-)

-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time

2012-07-20 Thread Bruce Rogers

When a guest migrates to a new host, the system time difference from the
previous host is used in the updates to the kvmclock system time visible
to the guest, resulting in a continuation of correct kvmclock based guest
timekeeping.

The wall clock component of the kvmclock provided time is currently not
updated with this same time offset. Since the Linux guest caches the
wall clock based time, this discrepency is not noticed until the guest is
rebooted. After reboot the guest's time calculations are off.

This patch adjusts the wall clock by the kvmclock_offset, resulting in
correct guest time after a reboot.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@gmail.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/kvm/x86.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..14c290d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t 
wall_clock)
 */
getboottime(boot);
 
+   if (kvm-arch.kvmclock_offset) {
+   struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset);
+   boot = timespec_sub(boot, ts);
+   }
wc.sec = boot.tv_sec;
wc.nsec = boot.tv_nsec;
wc.version = version;
-- 
1.7.7


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RFC V5 0/3] kvm: Improving directed yield in PLE handler

2012-07-20 Thread Marcelo Tosatti

On Wed, Jul 18, 2012 at 07:07:17PM +0530, Raghavendra K T wrote:
 
 Currently Pause Loop Exit (PLE) handler is doing directed yield to a
 random vcpu on pl-exit. We already have filtering while choosing
 the candidate to yield_to. This change adds more checks while choosing
 a candidate to yield_to.
 
 On a large vcpu guests, there is a high probability of
 yielding to the same vcpu who had recently done a pause-loop exit. 
 Such a yield can lead to the vcpu spinning again.
 
 The patchset keeps track of the pause loop exit and gives chance to a
 vcpu which has:
 
  (a) Not done pause loop exit at all (probably he is preempted lock-holder)
 
  (b) vcpu skipped in last iteration because it did pause loop exit, and
  probably has become eligible now (next eligible lock holder)
 
 This concept also helps in cpu relax interception cases which use same 
 handler.
 
 Changes since V4:
  - Naming Change (Avi):
   struct ple == struct spin_loop
   cpu_relax_intercepted == in_spin_loop
   vcpu_check_and_update_eligible == vcpu_eligible_for_directed_yield
  - mark vcpu in spinloop as not eligible to avoid influence of previous exit
 
 Changes since V3:
  - arch specific fix/changes (Christian)
 
 Changes since v2:
  - Move ple structure to common code (Avi)
  - rename pause_loop_exited to cpu_relax_intercepted (Avi)
  - add config HAVE_KVM_CPU_RELAX_INTERCEPT (Avi)
  - Drop superfluous curly braces (Ingo)
 
 Changes since v1:
  - Add more documentation for structure and algorithm and Rename
plo == ple (Rik).
  - change dy_eligible initial value to false. (otherwise very first directed
 yield will not be skipped. (Nikunj)
  - fixup signoff/from issue
 
 Future enhancements:
   (1) Currently we have a boolean to decide on eligibility of vcpu. It
 would be nice if I get feedback on guest (32 vcpu) whether we can
 improve better with integer counter. (with counter = say f(log n )).
   
   (2) We have not considered system load during iteration of vcpu. With
that information we can limit the scan and also decide whether schedule()
is better. [ I am able to use #kicked vcpus to decide on this But may
be there are better ideas like information from global loadavg.]
 
   (3) We can exploit this further with PV patches since it also knows about
next eligible lock-holder.
 
 Summary: There is a very good improvement for kvm based guest on PLE machine.
 The V5 has huge improvement for kbench.
 
 +---+---+---++---+
base_rikstdev   patched  stdev   %improve
 +---+---+---++---+
   kernbench (time in sec lesser is better)
 +---+---+---++---+
  1x49.2300 1.017122.6842 0.3073117.0233 %
  2x91.9358 1.776853.9608 1.015470.37516 %
 +---+---+---++---+
 
 +---+---+---++---+
   ebizzy (records/sec more is better)
 +---+---+---++---+
  1x  1129.250028.67932125.625032.823988.23334 %
  2x  1892.375075.11122377.1250   181.682225.61596 %
 +---+---+---++---+
 
 Note: The patches are tested on x86.
 
  Links
   V4: https://lkml.org/lkml/2012/7/16/80
   V3: https://lkml.org/lkml/2012/7/12/437
   V2: https://lkml.org/lkml/2012/7/10/392
   V1: https://lkml.org/lkml/2012/7/9/32
 
  Raghavendra K T (3):
config: Add config to support ple or cpu relax optimzation 
kvm : Note down when cpu relax intercepted or pause loop exited 
kvm : Choose a better candidate for directed yield 
 ---
  arch/s390/kvm/Kconfig|1 +
  arch/x86/kvm/Kconfig |1 +
  include/linux/kvm_host.h |   39 +++
  virt/kvm/Kconfig |3 +++
  virt/kvm/kvm_main.c  |   41 +
  5 files changed, 85 insertions(+), 0 deletions(-)

Reviewed-by: Marcelo Tosatti mtosa...@redhat.com

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC-v3 4/4] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-20 Thread Nicholas A. Bellinger

On Fri, 2012-07-20 at 15:03 +0300, Michael S. Tsirkin wrote:
 On Wed, Jul 18, 2012 at 02:20:58PM -0700, Nicholas A. Bellinger wrote:
  On Wed, 2012-07-18 at 19:09 +0300, Michael S. Tsirkin wrote:
   On Wed, Jul 18, 2012 at 12:59:32AM +, Nicholas A. Bellinger wrote:
  
  SNIP
  

Changelog v2 - v3:

  Unlock on error in tcm_vhost_drop_nexus() (DanC)
  Fix strlen() doesn't count the terminator (DanC)
  Call kfree() on an error path (DanC)
  Convert tcm_vhost_write_pending to use target_execute_cmd (hch + nab)
  Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
  Add option under drivers/staging/Kconfig, and move to 
drivers/vhost/tcm/
  as requested by MST (nab)

---
 drivers/staging/Kconfig   |2 +
 drivers/vhost/Makefile|2 +
 drivers/vhost/tcm/Kconfig |6 +
 drivers/vhost/tcm/Makefile|1 +
 drivers/vhost/tcm/tcm_vhost.c | 1611 
+
 drivers/vhost/tcm/tcm_vhost.h |   74 ++
 6 files changed, 1696 insertions(+), 0 deletions(-)
 create mode 100644 drivers/vhost/tcm/Kconfig
 create mode 100644 drivers/vhost/tcm/Makefile
 create mode 100644 drivers/vhost/tcm/tcm_vhost.c
 create mode 100644 drivers/vhost/tcm/tcm_vhost.h

   
   Really sorry about making you run around like that,
   I did not mean moving all of tcm to a directory,
   just adding tcm/Kconfig or adding drivers/vhost/Kconfig.tcm
   because eventually it's easier to keep it all together
   in one place.
   
  
  Er, apologies for the slight mis-understanding here..  Moving back now +
  fixing up the Kbuild bits.
 
 I'm going offline in several hours and am on vacation for a week
 starting tomorrow. So to make 3.6, and if you intend to merge through my
 tree, the best bet is if you can send the final version real soon now.
 

Ok, thanks for the heads up here..

So aside from Greg-KH's feedback to avoid the drivers/staging/ Kconfig
include usage, and one more bugfix from DanC from this morning those are
the only pending changes for RFC-v4.

If it's OK I'd prefer to take these via target-pending with the
necessary Acked-By's, especially if you'll be AFK next week..

Would you like to see a RFC-v4 with these changes included..?

Thank you,

--nab

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [RFC-v3 4/4] tcm_vhost: Initial merge for vhost level target fabric driver

2012-07-20 Thread Nicholas A. Bellinger

On Fri, 2012-07-20 at 11:00 -0700, Nicholas A. Bellinger wrote:
 On Fri, 2012-07-20 at 15:03 +0300, Michael S. Tsirkin wrote:
  On Wed, Jul 18, 2012 at 02:20:58PM -0700, Nicholas A. Bellinger wrote:
   On Wed, 2012-07-18 at 19:09 +0300, Michael S. Tsirkin wrote:
On Wed, Jul 18, 2012 at 12:59:32AM +, Nicholas A. Bellinger wrote:
   
   SNIP
   
 
 Changelog v2 - v3:
 
   Unlock on error in tcm_vhost_drop_nexus() (DanC)
   Fix strlen() doesn't count the terminator (DanC)
   Call kfree() on an error path (DanC)
   Convert tcm_vhost_write_pending to use target_execute_cmd (hch + 
 nab)
   Fix another strlen() off by one in tcm_vhost_make_tport (DanC)
   Add option under drivers/staging/Kconfig, and move to 
 drivers/vhost/tcm/
   as requested by MST (nab)
 
 ---
  drivers/staging/Kconfig   |2 +
  drivers/vhost/Makefile|2 +
  drivers/vhost/tcm/Kconfig |6 +
  drivers/vhost/tcm/Makefile|1 +
  drivers/vhost/tcm/tcm_vhost.c | 1611 
 +
  drivers/vhost/tcm/tcm_vhost.h |   74 ++
  6 files changed, 1696 insertions(+), 0 deletions(-)
  create mode 100644 drivers/vhost/tcm/Kconfig
  create mode 100644 drivers/vhost/tcm/Makefile
  create mode 100644 drivers/vhost/tcm/tcm_vhost.c
  create mode 100644 drivers/vhost/tcm/tcm_vhost.h
 

Really sorry about making you run around like that,
I did not mean moving all of tcm to a directory,
just adding tcm/Kconfig or adding drivers/vhost/Kconfig.tcm
because eventually it's easier to keep it all together
in one place.

   
   Er, apologies for the slight mis-understanding here..  Moving back now +
   fixing up the Kbuild bits.
  
  I'm going offline in several hours and am on vacation for a week
  starting tomorrow. So to make 3.6, and if you intend to merge through my
  tree, the best bet is if you can send the final version real soon now.
  
 
 Ok, thanks for the heads up here..
 
 So aside from Greg-KH's feedback to avoid the drivers/staging/ Kconfig
 include usage, and one more bugfix from DanC from this morning those are
 the only pending changes for RFC-v4.
 
 If it's OK I'd prefer to take these via target-pending with the
 necessary Acked-By's, especially if you'll be AFK next week..
 
 Would you like to see a RFC-v4 with these changes included..?
 
 Thank you,
 

Actually sorry, the patch from DanC is for target core, and not
tcm_vhost specific change..

So really the only thing left to resolve for an initial merge is
Greg-KH's comments wrt to drivers/staging Kconfig usage..

Are you OK with just adding CONFIG_STAGING following Greg-KH's
feedback..?

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCHv2] kvm: fix race with level interrupts

2012-07-20 Thread Marcelo Tosatti

On Thu, Jul 19, 2012 at 01:45:20PM +0300, Michael S. Tsirkin wrote:
 When more than 1 source id is in use for the same GSI, we have the
 following race related to handling irq_states race:
 
 CPU 0 clears bit 0. CPU 0 read irq_state as 0. CPU 1 sets level to 1.
 CPU 1 calls kvm_ioapic_set_irq(1). CPU 0 calls kvm_ioapic_set_irq(0).
 Now ioapic thinks the level is 0 but irq_state is not 0.
 
 Fix by performing all irq_states bitmap handling under pic/ioapic lock.
 This also removes the need for atomics with irq_states handling.
 
 Reported-by: Gleb Natapov g...@redhat.com
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---

Applied, thanks.

 Changes from v1:
   Address comments by Gleb and Alex:
   renamed some variables for clarify
   renamed kvm_irq_line_state -  __kvm_irq_line_state
 
 Any chance we can put this in 3.5? I know level IRQs are not widely
 used, which is likely why this went unnoticed for so long, but still ...

http://yarchive.net/comp/linux/merge_window.html

From: Linus Torvalds torva...@linux-foundation.org

The thing is, I don't take bug fixes late in the -rc just because they
are bug fixes.

And I really shouldn't.

If it's an old bug, and doesn't cause an oops or a security issue, it
had damn well better wait for the next merge window. There is absolutely
_no_ reason to just blindly fix bugs at the end of the rc stage,
because quite frankly, the risks coming from fixing a bug is often
bigger than the advantage.


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH] kvm: drop parameter validation

2012-07-20 Thread Marcelo Tosatti

On Thu, Jul 19, 2012 at 02:13:13PM +0300, Michael S. Tsirkin wrote:
 We validate irq pin number when routing is setup, so
 code handling illegal irq # in pic and ioapic on each injection
 is never called.
 Drop it.
 
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 
 Note: this is on top of
   [PATCHv2] kvm: fix race with level interrupts
 as these patches touch the same code.

kvm: fix race with level interrupts has been applied to next (branch
which contains next merge window group), which is in freeze mode (only 
critical fixes are accepted). 

This patch does not fall into that category, please resend once Linus
pulls the dependency.

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 0/5] Add vhost-blk support

2012-07-20 Thread Michael S. Tsirkin

On Fri, Jul 13, 2012 at 04:55:06PM +0800, Asias He wrote:
 
 Hi folks,
 
 [I am resending to fix the broken thread in the previous one.]
 
 This patchset adds vhost-blk support. vhost-blk is a in kernel virito-blk
 device accelerator. Compared to userspace virtio-blk implementation, vhost-blk
 gives about 5% to 15% performance improvement.
 
 Asias He (5):
   aio: Export symbols and struct kiocb_batch for in kernel aio usage
   eventfd: Export symbol eventfd_file_create()
   vhost: Make vhost a separate module
   vhost-net: Use VHOST_NET_FEATURES for vhost-net
   vhost-blk: Add vhost-blk support


OK so given the state it's in, and assuming you think it is helpful
to let it mature in tree and not out of tree, I think it's
reasonable to try to do it like tcm_vhost is going to do it:
- send me changes to vhost core ASAP (and keep it minimal, e.g.
  use your own header file to export to userspace)
- for other stuff - put in drivers/staging, and ask Greg to merge

-- 
MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/9] KVM: x86: simplify read_emulated

2012-07-20 Thread Marcelo Tosatti

On Fri, Jul 20, 2012 at 09:15:44PM +0800, Xiao Guangrong wrote:
 On 07/20/2012 06:58 PM, Marcelo Tosatti wrote:
  On Fri, Jul 20, 2012 at 10:17:36AM +0800, Xiao Guangrong wrote:
  On 07/20/2012 07:58 AM, Marcelo Tosatti wrote:
 
  -}
  +rc = ctxt-ops-read_emulated(ctxt, addr, mc-data + mc-end, 
  size,
  +  ctxt-exception);
  +if (rc != X86EMUL_CONTINUE)
  +return rc;
  +
  +mc-end += size;
  +
  +read_cached:
  +memcpy(dest, mc-data + mc-pos, size);
 
  What prevents read_emulated(size  8) call, with
  mc-pos == (mc-end - 8) now?
 
  Marcelo,
 
  The splitting has been done in emulator_read_write_onepage:
 
 while (bytes) {
 unsigned now = min(bytes, 8U);
 
 frag = vcpu-mmio_fragments[vcpu-mmio_nr_fragments++];
 frag-gpa = gpa;
 frag-data = val;
 frag-len = now;
 frag-write_readonly_mem = (ret == -EPERM);
 
 gpa += now;
 val += now;
 bytes -= now;
 }
 
  So i think it is safe to remove the splitting in read_emulated.
  
  Yes, it is fine to remove it.
  
  But splitting in emulate.c prevented the case of _cache read_ with size
  8 beyond end of mc-data. Must handle that case in read_emulated.
  
  What prevents read_emulated(size  8) call, with mc-pos == (mc-end - 8) 
  now?
 
 You mean the mmio region is partly cached?
 
 I think it can not happen. Now, we pass the whole size to 
 emulator_read_write_onepage(),
 after it is finished, it saves the whole data into mc-data[], so, the 
 cache-read
 can always get the whole data from mc-data[].

I mean that nothing prevents a caller from reading beyond the end of
mc-data array (but then again this was the previous behavior).

ACK

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 1/2] kvm: kvmclock: apply kvmclock offset to guest wall clock time

2012-07-20 Thread Bruce Rogers

When a guest migrates to a new host, the system time difference from the
previous host is used in the updates to the kvmclock system time visible
to the guest, resulting in a continuation of correct kvmclock based guest
timekeeping.

The wall clock component of the kvmclock provided time is currently not
updated with this same time offset. Since the Linux guest caches the
wall clock based time, this discrepency is not noticed until the guest is
rebooted. After reboot the guest's time calculations are off.

This patch adjusts the wall clock by the kvmclock_offset, resulting in
correct guest time after a reboot.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@redhat.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/kvm/x86.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index be6d549..14c290d 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -907,6 +907,10 @@ static void kvm_write_wall_clock(struct kvm *kvm, gpa_t 
wall_clock)
 */
getboottime(boot);
 
+   if (kvm-arch.kvmclock_offset) {
+   struct timespec ts = ns_to_timespec(kvm-arch.kvmclock_offset);
+   boot = timespec_sub(boot, ts);
+   }
wc.sec = boot.tv_sec;
wc.nsec = boot.tv_nsec;
wc.version = version;
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] kvm: kvmclock: eliminate kvmclock offset when time page count goes to zero

2012-07-20 Thread Bruce Rogers

When a guest is migrated, a time offset is generated in order to
maintain the correct kvmclock based time for the guest. Detect when
all kvmclock time pages are deleted so that the kvmclock offset can
be safely reset to zero.

Cc: Glauber Costa glom...@redhat.com
Cc: Zachary Amsden zams...@redhat.com
Signed-off-by: Bruce Rogers brog...@suse.com
---
 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |5 -
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index db7c1f2..112415c 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -524,6 +524,7 @@ struct kvm_arch {
 
unsigned long irq_sources_bitmap;
s64 kvmclock_offset;
+   unsigned int n_time_pages;
raw_spinlock_t tsc_write_lock;
u64 last_tsc_nsec;
u64 last_tsc_write;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 14c290d..350c51b 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1511,6 +1511,8 @@ static void kvmclock_reset(struct kvm_vcpu *vcpu)
if (vcpu-arch.time_page) {
kvm_release_page_dirty(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
+   if (--vcpu-kvm-arch.n_time_pages == 0)
+   vcpu-kvm-arch.kvmclock_offset = 0;
}
 }
 
@@ -1624,7 +1626,8 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, u32 msr, 
u64 data)
if (is_error_page(vcpu-arch.time_page)) {
kvm_release_page_clean(vcpu-arch.time_page);
vcpu-arch.time_page = NULL;
-   }
+   } else
+   vcpu-kvm-arch.n_time_pages++;
break;
}
case MSR_KVM_ASYNC_PF_EN:
-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[PATCH 0/2] kvm: kvmclock: fix kvmclock reboot after migrate issues

2012-07-20 Thread Bruce Rogers

When a linux guest live migrates to a new host and subsequently
reboots, the guest no longer has the correct time. This is due
to a failure to apply the kvmclock offset to the wall clock time.

The first patch addresses this failure directly, while the second
patch detects when the offset is no longer needed, and zeroes the
offset as a matter of cleaning up migration state which is no longer
relevant. Both patches address the issue, but in different ways. 


Bruce Rogers (2):
  kvm: kvmclock: apply kvmclock offset to guest wall clock time
  kvm: kvmclock: eliminate kvmclock offset when time page count goes to
zero

 arch/x86/include/asm/kvm_host.h |1 +
 arch/x86/kvm/x86.c  |9 -
 2 files changed, 9 insertions(+), 1 deletions(-)

-- 
1.7.7

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Anthony Liguori

Michael S. Tsirkin m...@redhat.com writes:

 On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:
 Of course, the million dollar question is why would using AIO in the
 kernel be faster than using AIO in userspace?

 Actually for me a more important question is how does it compare
 with virtio-blk dataplane?

I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.

There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.

Regards,

Anthony Liguori


 -- 
 MST
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2 v5] KVM: PPC: booke: Add watchdog emulation

2012-07-20 Thread Scott Wood

On 07/20/2012 12:00 AM, Bharat Bhushan wrote:
 This patch adds the watchdog emulation in KVM. The watchdog
 emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_WDT) ioctl.
 The kernel timer are used for watchdog emulation and emulates
 h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
 if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
 it is configured.
 
 Signed-off-by: Liu Yu yu@freescale.com
 Signed-off-by: Scott Wood scottw...@freescale.com
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 [bharat.bhus...@freescale.com: reworked patch]

Typically the [] note goes immediately before your signoff (but after
the others).

 +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
 +{
 + unsigned long nr_jiffies;
 +
 + spin_lock(vcpu-arch.wdt_lock);
 + nr_jiffies = watchdog_next_timeout(vcpu);
 + /*
 +  * If the number of jiffies of watchdog timer = NEXT_TIMER_MAX_DELTA
 +  * then do not run the watchdog timer as this can break timer APIs.
 +  */
 + if (nr_jiffies  NEXT_TIMER_MAX_DELTA)
 + mod_timer(vcpu-arch.wdt_timer, jiffies + nr_jiffies);
 + else
 + del_timer(vcpu-arch.wdt_timer);
 + spin_unlock(vcpu-arch.wdt_lock);
 +}

This needs to be an irqsave lock.

 @@ -386,13 +387,23 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
  #ifdef CONFIG_KVM_EXIT_TIMING
   mutex_init(vcpu-arch.exit_timing_lock);
  #endif
 -
 +#ifdef CONFIG_BOOKE
 + spin_lock_init(vcpu-arch.wdt_lock);
 + /* setup watchdog timer once */
 + setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
 + (unsigned long)vcpu);
 +#endif
   return 0;
  }

Can you do this in kvmppc_booke_init()?

  
  void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
  {
   kvmppc_mmu_destroy(vcpu);
 +#ifdef CONFIG_BOOKE
 + spin_lock(vcpu-arch.wdt_lock);
 + del_timer(vcpu-arch.wdt_timer);
 + spin_unlock(vcpu-arch.wdt_lock);
 +#endif
  }

Don't acquire the lock here, but use del_timer_sync().

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 5/5] vhost-blk: Add vhost-blk support

2012-07-20 Thread Asias He


On 07/21/2012 04:56 AM, Anthony Liguori wrote:

Michael S. Tsirkin m...@redhat.com writes:


On Thu, Jul 19, 2012 at 08:05:42AM -0500, Anthony Liguori wrote:

Of course, the million dollar question is why would using AIO in the
kernel be faster than using AIO in userspace?


Actually for me a more important question is how does it compare
with virtio-blk dataplane?


I'm not even asking for a benchmark comparision.  It's the same API
being called from a kernel thread vs. a userspace thread.  Why would
there be a 60% performance difference between the two?  That doesn't
make any sense.


Please read the commit log again. I am not saying vhost-blk v.s 
userspace implementation gives 60% improvement. I am saying the 
vhost-blk v.s original vhost-blk gives 60% improvement.




This patch is based on Liu Yuan's implementation with various
improvements and bug fixes. Notably, this patch makes guest notify and
host completion processing in parallel which gives about 60% performance
improvement compared to Liu Yuan's implementation.




There's got to be a better justification for putting this in the kernel
than just that we can.

I completely understand why Christoph's suggestion of submitting BIOs
directly would be faster.  There's no way to do that in userspace.


Well. With Zach and Dave's new in-kernel aio API, the aio usage in 
kernel is much simpler than in userspace. This a potential reason that 
in kernel one is better than userspace one. I am working on it right 
now. And for block based image, as suggested by Christoph, we can submit 
bio directly. This is another potential reason.


Why can't we just go further to see if we can improve the IO stack from 
guest kernel side all the way down to host kernel side. We can not do 
that if we stick to doing everything in userspace (qemu).


--
Asias


--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2 v5] KVM: PPC: booke: Add watchdog emulation

2012-07-20 Thread Scott Wood

On 07/20/2012 12:00 AM, Bharat Bhushan wrote:
 This patch adds the watchdog emulation in KVM. The watchdog
 emulation is enabled by KVM_ENABLE_CAP(KVM_CAP_PPC_WDT) ioctl.
 The kernel timer are used for watchdog emulation and emulates
 h/w watchdog state machine. On watchdog timer expiry, it exit to QEMU
 if TCR.WRC is non ZERO. QEMU can reset/shutdown etc depending upon how
 it is configured.
 
 Signed-off-by: Liu Yu yu@freescale.com
 Signed-off-by: Scott Wood scottw...@freescale.com
 Signed-off-by: Bharat Bhushan bharat.bhus...@freescale.com
 [bharat.bhus...@freescale.com: reworked patch]

Typically the [] note goes immediately before your signoff (but after
the others).

 +static void arm_next_watchdog(struct kvm_vcpu *vcpu)
 +{
 + unsigned long nr_jiffies;
 +
 + spin_lock(vcpu-arch.wdt_lock);
 + nr_jiffies = watchdog_next_timeout(vcpu);
 + /*
 +  * If the number of jiffies of watchdog timer = NEXT_TIMER_MAX_DELTA
 +  * then do not run the watchdog timer as this can break timer APIs.
 +  */
 + if (nr_jiffies  NEXT_TIMER_MAX_DELTA)
 + mod_timer(vcpu-arch.wdt_timer, jiffies + nr_jiffies);
 + else
 + del_timer(vcpu-arch.wdt_timer);
 + spin_unlock(vcpu-arch.wdt_lock);
 +}

This needs to be an irqsave lock.

 @@ -386,13 +387,23 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
  #ifdef CONFIG_KVM_EXIT_TIMING
   mutex_init(vcpu-arch.exit_timing_lock);
  #endif
 -
 +#ifdef CONFIG_BOOKE
 + spin_lock_init(vcpu-arch.wdt_lock);
 + /* setup watchdog timer once */
 + setup_timer(vcpu-arch.wdt_timer, kvmppc_watchdog_func,
 + (unsigned long)vcpu);
 +#endif
   return 0;
  }

Can you do this in kvmppc_booke_init()?

  
  void kvm_arch_vcpu_uninit(struct kvm_vcpu *vcpu)
  {
   kvmppc_mmu_destroy(vcpu);
 +#ifdef CONFIG_BOOKE
 + spin_lock(vcpu-arch.wdt_lock);
 + del_timer(vcpu-arch.wdt_timer);
 + spin_unlock(vcpu-arch.wdt_lock);
 +#endif
  }

Don't acquire the lock here, but use del_timer_sync().

-Scott


--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

41 matches

Mail list logo