Re: [Question]About KVM network zero-copy feature!

2012-08-13 Thread Michael S. Tsirkin
Last patchset version was this:
[PATCH v16 00/17] Provide a zero-copy method on KVM virtio-net

IIRC the main issue was the need to integrate the patchset with
macvtap (as opposed to using a separate device).

On Mon, Aug 13, 2012 at 09:24:54AM +0800, Peter Huang(Peng) wrote:
> Hi, Michael
> 
> IIt will be usefull if we can implement rx zero-copy, could you give
>  me some help if you know some technical details or some
> references on the internet?
> 
> I am wondering may be I can take a deep look on it first then decide
> if I can take it over or not.
> 
> Thanks a lot.
> 
> On 2012/8/12 17:37, Michael S. Tsirkin wrote:
> > On Sat, Aug 11, 2012 at 08:42:44PM -0400, Robert Vineyard wrote:
> >> (adding Xin Xiaohui to the conversation for comment)
> >>
> >> According to the NetworkingTodo page on the KVM wiki, zero-copy RX
> >> for macvtap is in fact on the roadmap, assigned to Xin:
> >>
> >> http://www.linux-kvm.org/page/NetworkingTodo
> > AFAIK Xin left Intel and is not working on it.
> > Contributions are welcome.
> >
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Add initial virtio-scsi support

2012-08-13 Thread Pekka Enberg
On Thu, Aug 9, 2012 at 3:51 AM, Asias He  wrote:
> This patch brings virito-scsi support to kvm tool.
>
> With the introduce of tcm_vhost (vhost-scsi)
>
>tcm_vhost: Initial merge for vhost level target fabric driver
>
> we can implement virito-scsi by simply having vhost-scsi to handle the
> SCSI command.
>
> Howto use:
> 1) Setup the tcm_vhost target through /sys/kernel/config
>
>[Stefan Hajnoczi, Thanks for the script to setup tcm_vhost]
>
>** Setup wwpn and tpgt
>$ wwpn="naa.0"
>$ tpgt=/sys/kernel/config/target/vhost/$wwpn/tpgt_0
>$ nexus=$tpgt/nexus
>$ mkdir -p $tpgt
>$ echo -n $wwpn > $nexus
>
>** Setup lun using /dev/ram
>$ n=0
>$ lun=$tpgt/lun/lun_${n}
>$ data=/sys/kernel/config/target/core/iblock_0/data_${n}
>$ ram=/dev/ram${n}
>$ mkdir -p $lun
>$ mkdir -p $data
>$ echo -n udev_path=${ram} > $data/control
>$ echo -n 1 > $data/enable
>$ ln -s $data $lun
>
> 2) Run kvm tool with the new disk option '-d scsi:$wwpn:$tpgt', e.g
>$ lkvm run -k /boot/bzImage -d ~/img/sid.img -d scsi:naa.0:0
>
> Signed-off-by: Asias He 
> Cc: Nicholas A. Bellinger 
> Cc: Stefan Hajnoczi 
> Cc: Paolo Bonzini 

I've included some comments below but overall, looks good to me. Sasha?

> ---
> diff --git a/tools/kvm/include/kvm/disk-image.h 
> b/tools/kvm/include/kvm/disk-image.h
> index 7ae17f8..54e4047 100644
> --- a/tools/kvm/include/kvm/disk-image.h
> +++ b/tools/kvm/include/kvm/disk-image.h
> @@ -41,6 +41,8 @@ struct disk_image_operations {
>
>  struct disk_image_params {
> const char *filename;
> +   const char *wwpn;
> +   const char *tpgt;

Maybe it's just me but "wwpn" and "tpgt" really could use a comment
explaining what they are...

> bool readonly;
> bool direct;
>  };
> @@ -57,6 +59,8 @@ struct disk_image {
>  #ifdef CONFIG_HAS_AIO
> io_context_tctx;
>  #endif
> +   const char  *wwpn;
> +   const char  *tpgt;
>  };
>
>  struct disk_image *disk_image__open(const char *filename, bool readonly, 
> bool direct);
> diff --git a/tools/kvm/virtio/blk.c b/tools/kvm/virtio/blk.c
> index 1fb969f..740442a 100644
> --- a/tools/kvm/virtio/blk.c
> +++ b/tools/kvm/virtio/blk.c
> @@ -290,6 +290,8 @@ int virtio_blk__init(struct kvm *kvm)
> int i, r = 0;
>
> for (i = 0; i < kvm->nr_disks; i++) {
> +   if (kvm->disks[i]->wwpn)
> +   continue;
> r = virtio_blk__init_one(kvm, kvm->disks[i]);
> if (r < 0)
> goto cleanup;
> diff --git a/tools/kvm/virtio/scsi.c b/tools/kvm/virtio/scsi.c
> new file mode 100644
> index 000..5bcb00c
> --- /dev/null
> +++ b/tools/kvm/virtio/scsi.c
> @@ -0,0 +1,332 @@
> +#include "kvm/virtio-scsi.h"
> +#include "kvm/virtio-pci-dev.h"
> +#include "kvm/disk-image.h"
> +#include "kvm/kvm.h"
> +#include "kvm/pci.h"
> +#include "kvm/ioeventfd.h"
> +#include "kvm/guest_compat.h"
> +#include "kvm/virtio-pci.h"
> +#include "kvm/virtio.h"
> +
> +#include 
> +#include 
> +#include 
> +
> +/**/
> +/* TODO: Remove this when tcm_vhost goes upstream */
> +#define TRANSPORT_IQN_LEN  224
> +#define VHOST_SCSI_ABI_VERSION 0
> +struct vhost_scsi_target {
> +   int abi_version;
> +   unsigned char vhost_wwpn[TRANSPORT_IQN_LEN];
> +   unsigned short vhost_tpgt;
> +};
> +/* VHOST_SCSI specific defines */
> +#define VHOST_SCSI_SET_ENDPOINT _IOW(VHOST_VIRTIO, 0x40, struct 
> vhost_scsi_target)
> +#define VHOST_SCSI_CLEAR_ENDPOINT _IOW(VHOST_VIRTIO, 0x41, struct 
> vhost_scsi_target)
> +#define VHOST_SCSI_GET_ABI_VERSION _IOW(VHOST_VIRTIO, 0x42, struct 
> vhost_scsi_target)
> +/**/

You might as well move the above block into a small helper header file.

> +
> +
> +#define VIRTIO_SCSI_QUEUE_SIZE 128
> +#define NUM_VIRT_QUEUES3
> +
> +static LIST_HEAD(sdevs);
> +static int compat_id = -1;
> +
> +struct scsi_dev {
> +   struct virt_queue   vqs[NUM_VIRT_QUEUES];
> +   struct virtio_scsi_config   scsi_config;
> +   struct vhost_scsi_targettarget;
> +   u32 features;
> +   int vhost_fd;
> +   struct virtio_devicevdev;
> +   struct list_headlist;
> +   struct kvm  *kvm;
> +};
> +
> +static void set_config(struct kvm *kvm, void *dev, u8 data, u32 offset)
> +{
> +   struct scsi_dev *sdev = dev;
> +
> +   ((u8 *)(&sdev->scsi_config))[offset] = data;

Can you introduce a helper function for this, please?

> +}
> +
> +static u8 get_config(struct kvm *kvm, void *dev, u32 offset)
> +{
> +   struct scsi_dev *sdev = dev;
> +
> +   return ((u8 *)(&sdev->scsi_config))[offset];

Ditto.

> +}
> +
--
To unsubscribe from this list

Re: [PATCH 2/2] kvm tools: Check for unknown parameters in network option handling

2012-08-13 Thread Pekka Enberg
On Mon, Aug 13, 2012 at 6:31 AM, Michael Ellerman
 wrote:
>> > diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
>> > index a36bd00..9e5c1d4 100644
>> > --- a/tools/kvm/builtin-run.c
>> > +++ b/tools/kvm/builtin-run.c
>> > @@ -257,7 +257,8 @@ static int set_net_param(struct virtio_net_params *p, 
>> > const char *param,
>> > p->vhost = atoi(val);
>> > } else if (strcmp(param, "fd") == 0) {
>> > p->fd = atoi(val);
>> > -   }
>> > +   } else
>> > +   die("Unknown network parameter %s", param);
>>
>> we need braces here:
>
> We don't _need_ braces, but I assume you mean you'd prefer braces?

This is Linux coding style so we don't prefer them either.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] kvm tools: Fix crash when /etc/resolv.conf doesn't exist

2012-08-13 Thread Pekka Enberg
I've applied all Michal's patches. Thanks guys!
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] KVM: vmx: restore MSR_IA32_DEBUGCTLMSR after VMEXIT

2012-08-13 Thread Avi Kivity
On 08/12/2012 08:28 PM, Gleb Natapov wrote:
> On Sun, Aug 12, 2012 at 04:40:48PM +0300, Avi Kivity wrote:
>> On 08/12/2012 04:25 PM, Gleb Natapov wrote:
>> 
>> >> How expensive is this?  We may want a follow-on patch to cache it in a
>> >> per-cpu variable.
>> >> 
>> > I have patches ready. I couldn't measure any overhead of the
>> > rdmsr(MSR_IA32_DEBUGCTLMSR).
>> > 
>> 
>> Do you mean while running kvm?  How about just running it in a loop?
>> 
> No, I mean running it in a loop. Loop with or without
> rdmsr(MSR_IA32_DEBUGCTLMSR) takes the same time as measured by rdtsc.

Ok, good.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v2 0/6] vhost-scsi: Add support for host virtualized target

2012-08-13 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

Hi Paolo, Stefan, & QEMU folks,

The following is the second RFC series for vhost-scsi patches against mainline
QEMU v1.1.0.  The series is available from the following working branch:

  git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git vhost-scsi-merge

Apologies for the delayed follow-up on this series.  The changes detailed below
addresses Paolo's original comments on vhost-scsi code from the last weeks.

As of this evening the tcm_vhost driver has now been merged into the mainline
kernel for 3.6-rc2 here:

  tcm_vhost: Initial merge for vhost level target fabric driver
  
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057cbf49a1f08297877

Also, after following up on the qemu-kvm IRQ injection changes (from Jan) that
caused a performance regerssion with QEMU 1.1.0 code originally reported here:

  vhost-scsi port to v1.1.0 + MSI-X performance regression
  http://comments.gmane.org/gmane.linux.scsi.target.devel/2277

It turns out that setting explict virtio-queue IRQ affinity within guest appears
to bring small block random IOPs performance back up to the pre IRQ injection
conversion levels.  I'm not sure why this ended up making so much of a 
difference
post IRQ injection conversion, but setting virtio-queue affinity is now getting
us back to pre IQN injection conversion levels.

Changes from v1 -> v2:

 - Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
   starting point for v3.6-rc code (Stefan + ALiguori + nab)
 - Fix upstream qemu conflict in hw/qdev-properties.c
 - Make GET_ABI_VERSION use int (nab + mst)
 - Drop unnecessary event-notifier changes (nab)
 - Fix vhost-scsi case lables in configure (reported by paolo)
 - Convert qdev_prop_vhost_scsi to use ->get() + ->set() following
   qdev_prop_netdev (reported by paolo)
 - Fix typo in qemu-options.hx definition of vhost-scsi (reported by paolo)
 - Squash virtio-scsi: use the vhost-scsi host device from stefan (nab)
 - Fix up virtio_scsi_properties[] conflict w/ upstream qemu (nab)
 - Drop usage of to_virtio_scsi() in virtio_scsi_set_status()
  (reported by paolo)
 - Use modern VirtIOSCSIConf define in virtio-scsi.h (reported by paolo)
 - Use s->conf->vhost_scsi instead of proxyconf->vhost_scsi in
   virtio_scsi_init() (reported by paolo)
 - Only register QEMU SCSI bus is vhost-scsi is not active (reported by paolo)
 - Fix incorrect VirtIOSCSI->cmd_vqs[0] definition (nab)

Please have another look, and let me know if anything else needs to be
addressed.

Thanks!

--nab

Nicholas Bellinger (3):
  msix: Work-around for vhost-scsi with KVM in-kernel MSI injection
  virtio-scsi: Set max_target=0 during vhost-scsi operation
  virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition

Stefan Hajnoczi (3):
  vhost: Pass device path to vhost_dev_init()
  vhost-scsi: add -vhost-scsi host device for use with tcm-vhost
  virtio-scsi: Add start/stop functionality for vhost-scsi

 configure|   10 +++
 hw/Makefile.objs |1 +
 hw/msix.c|3 +
 hw/qdev-properties.c |   40 
 hw/qdev.h|3 +
 hw/vhost-scsi.c  |  170 ++
 hw/vhost-scsi.h  |   50 +++
 hw/vhost.c   |5 +-
 hw/vhost.h   |3 +-
 hw/vhost_net.c   |2 +-
 hw/virtio-pci.c  |1 +
 hw/virtio-scsi.c |   56 -
 hw/virtio-scsi.h |1 +
 qemu-common.h|1 +
 qemu-config.c|   16 +
 qemu-options.hx  |4 +
 vl.c |   18 +
 17 files changed, 378 insertions(+), 6 deletions(-)
 create mode 100644 hw/vhost-scsi.c
 create mode 100644 hw/vhost-scsi.h

-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v2 2/6] vhost: Pass device path to vhost_dev_init()

2012-08-13 Thread Nicholas A. Bellinger
From: Stefan Hajnoczi 

The path to /dev/vhost-net is currently hardcoded in vhost_dev_init().
This needs to be changed so that /dev/vhost-scsi can be used.  Pass in
the device path instead of hardcoding it.

Signed-off-by: Stefan Hajnoczi 
Cc: Paolo Bonzini 
Cc: Michael S. Tsirkin 
Signed-off-by: Nicholas Bellinger 
---
 hw/vhost.c |5 +++--
 hw/vhost.h |3 ++-
 hw/vhost_net.c |2 +-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/hw/vhost.c b/hw/vhost.c
index 0fd8da8..d0ce5aa 100644
--- a/hw/vhost.c
+++ b/hw/vhost.c
@@ -747,14 +747,15 @@ static void vhost_eventfd_del(MemoryListener *listener,
 {
 }
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd, bool force)
+int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
+   bool force)
 {
 uint64_t features;
 int r;
 if (devfd >= 0) {
 hdev->control = devfd;
 } else {
-hdev->control = open("/dev/vhost-net", O_RDWR);
+hdev->control = open(devpath, O_RDWR);
 if (hdev->control < 0) {
 return -errno;
 }
diff --git a/hw/vhost.h b/hw/vhost.h
index 80e64df..0c47229 100644
--- a/hw/vhost.h
+++ b/hw/vhost.h
@@ -44,7 +44,8 @@ struct vhost_dev {
 bool force;
 };
 
-int vhost_dev_init(struct vhost_dev *hdev, int devfd, bool force);
+int vhost_dev_init(struct vhost_dev *hdev, int devfd, const char *devpath,
+   bool force);
 void vhost_dev_cleanup(struct vhost_dev *hdev);
 bool vhost_dev_query(struct vhost_dev *hdev, VirtIODevice *vdev);
 int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice *vdev);
diff --git a/hw/vhost_net.c b/hw/vhost_net.c
index ecaa22d..df2c4a3 100644
--- a/hw/vhost_net.c
+++ b/hw/vhost_net.c
@@ -109,7 +109,7 @@ struct vhost_net *vhost_net_init(NetClientState *backend, 
int devfd,
 (1 << VHOST_NET_F_VIRTIO_NET_HDR);
 net->backend = r;
 
-r = vhost_dev_init(&net->dev, devfd, force);
+r = vhost_dev_init(&net->dev, devfd, "/dev/vhost-net", force);
 if (r < 0) {
 goto fail;
 }
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This is required to get past the following assert with:

commit 1523ed9e1d46b0b54540049d491475ccac7e6421
Author: Jan Kiszka 
Date:   Thu May 17 10:32:39 2012 -0300

virtio/vhost: Add support for KVM in-kernel MSI injection

Cc: Stefan Hajnoczi 
Cc: Jan Kiszka 
Cc: Paolo Bonzini 
Cc: Anthony Liguori 
Signed-off-by: Nicholas Bellinger 
---
 hw/msix.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/msix.c b/hw/msix.c
index 800fc32..c1e6dc3 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
 {
 int vector;
 
+if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier)
+return;
+
 assert(dev->msix_vector_use_notifier &&
dev->msix_vector_release_notifier);
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v2 3/6] vhost-scsi: add -vhost-scsi host device for use with tcm-vhost

2012-08-13 Thread Nicholas A. Bellinger
From: Stefan Hajnoczi 

This patch adds a new type of host device that drives the vhost_scsi
device.  The syntax to add vhost-scsi is:

  qemu -vhost-scsi id=vhost-scsi0,wwpn=...,tpgt=123

The virtio-scsi emulated device will make use of vhost-scsi to process
virtio-scsi requests inside the kernel and hand them to the in-kernel
SCSI target stack using the tcm_vhost fabric driver.

The tcm_vhost driver was merged into the upstream linux kernel for 3.6-rc2,
and the commit can be found here:

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057cbf49a1f08297

Changelog v1 -> v2:

- Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
  starting point for v3.6-rc code (Stefan + ALiguori + nab)
- Fix upstream qemu conflict in hw/qdev-properties.c
- Make GET_ABI_VERSION use int (nab + mst)
- Fix vhost-scsi case lables in configure (reported by paolo)
- Convert qdev_prop_vhost_scsi to use ->get() + ->set() following
  qdev_prop_netdev (reported by paolo)
- Fix typo in qemu-options.hx definition of vhost-scsi (reported by paolo)

Changelog v0 -> v1:

- Add VHOST_SCSI_SET_ENDPOINT call (stefan)
- Enable vhost notifiers for multiple queues (Zhi)
- clear vhost-scsi endpoint on stopped (Zhi)
- Add CONFIG_VHOST_SCSI for QEMU build configure (nab)
- Rename vhost_vring_target -> vhost_scsi_target (mst + nab)
- Add support for VHOST_SCSI_GET_ABI_VERSION ioctl (aliguori + nab)

Cc: Stefan Hajnoczi 
Cc: Zhi Yong Wu 
Cc: Anthony Liguori 
Cc: Paolo Bonzini 
Cc: Michael S. Tsirkin 
Signed-off-by: Nicholas Bellinger 
---
 configure|   10 +++
 hw/Makefile.objs |1 +
 hw/qdev-properties.c |   40 
 hw/qdev.h|3 +
 hw/vhost-scsi.c  |  170 ++
 hw/vhost-scsi.h  |   50 +++
 qemu-common.h|1 +
 qemu-config.c|   16 +
 qemu-options.hx  |4 +
 vl.c |   18 +
 10 files changed, 313 insertions(+), 0 deletions(-)
 create mode 100644 hw/vhost-scsi.c
 create mode 100644 hw/vhost-scsi.h

diff --git a/configure b/configure
index f0dbc03..1f03202 100755
--- a/configure
+++ b/configure
@@ -168,6 +168,7 @@ libattr=""
 xfs=""
 
 vhost_net="no"
+vhost_scsi="no"
 kvm="no"
 gprof="no"
 debug_tcg="no"
@@ -513,6 +514,7 @@ Haiku)
   usb="linux"
   kvm="yes"
   vhost_net="yes"
+  vhost_scsi="yes"
   if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
 audio_possible_drivers="$audio_possible_drivers fmod"
   fi
@@ -818,6 +820,10 @@ for opt do
   ;;
   --enable-vhost-net) vhost_net="yes"
   ;;
+  --disable-vhost-scsi) vhost_scsi="no"
+  ;;
+  --enable-vhost-scsi) vhost_scsi="yes"
+  ;;
   --disable-opengl) opengl="no"
   ;;
   --enable-opengl) opengl="yes"
@@ -3116,6 +3122,7 @@ echo "posix_madvise $posix_madvise"
 echo "uuid support  $uuid"
 echo "libcap-ng support $cap_ng"
 echo "vhost-net support $vhost_net"
+echo "vhost-scsi support $vhost_scsi"
 echo "Trace backend $trace_backend"
 echo "Trace output file $trace_file-"
 echo "spice support $spice"
@@ -3828,6 +3835,9 @@ case "$target_arch2" in
   if test "$vhost_net" = "yes" ; then
 echo "CONFIG_VHOST_NET=y" >> $config_target_mak
   fi
+  if test "$vhost_scsi" = "yes" ; then
+echo "CONFIG_VHOST_SCSI=y" >> $config_target_mak
+  fi
 fi
 esac
 case "$target_arch2" in
diff --git a/hw/Makefile.objs b/hw/Makefile.objs
index 3ba5dd0..6ab75ec 100644
--- a/hw/Makefile.objs
+++ b/hw/Makefile.objs
@@ -169,6 +169,7 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o 
virtio-balloon.o virtio-net.o
 obj-$(CONFIG_VIRTIO) += virtio-serial-bus.o virtio-scsi.o
 obj-$(CONFIG_SOFTMMU) += vhost_net.o
 obj-$(CONFIG_VHOST_NET) += vhost.o
+obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o
 obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/
 obj-$(CONFIG_NO_PCI) += pci-stub.o
 obj-$(CONFIG_VGA) += vga.o
diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
index 8aca0d4..0266266 100644
--- a/hw/qdev-properties.c
+++ b/hw/qdev-properties.c
@@ -4,6 +4,7 @@
 #include "blockdev.h"
 #include "hw/block-common.h"
 #include "net/hub.h"
+#include "vhost-scsi.h"
 
 void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
 {
@@ -696,6 +697,45 @@ PropertyInfo qdev_prop_vlan = {
 .set   = set_vlan,
 };
 
+/* --- vhost-scsi --- */
+
+static int parse_vhost_scsi_dev(DeviceState *dev, const char *str, void **ptr)
+{
+   VHostSCSI *p;
+
+   p = find_vhost_scsi(str);
+   if (p == NULL)
+   return -ENOENT;
+
+   *ptr = p;
+   return 0;
+}
+
+static const char *print_vhost_scsi_dev(void *ptr)
+{
+VHostSCSI *p = ptr;
+
+return (p) ? vhost_scsi_get_id(p) : "";
+}
+
+static void get_vhost_scsi_dev(Object *obj, Visitor *v, void *opaque,
+   const char *name, Error **errp)
+{
+get_pointer(obj, v, opaque, print_vhost_scsi_dev, name, errp);
+}
+
+static void set_vhost_scsi_dev(Object *obj, Visitor *v, void *opaque,
+   const char *name, Error **errp)
+{
+s

[RFC-v2 6/6] virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition

2012-08-13 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This patch fixes bug in the definition of VirtIOSCSI->cmd_vqs[0],
where the return of virtio_add_queue() in virtio_scsi_init() ends up
overwriting past the end of ->cmd_vqs[0].

Since virtio_scsi currently assumes a single vqs for data, this patch
simply changes ->cmd_vqs[1] to handle the single VirtQueue.

Cc: Paolo Bonzini 
Cc: Stefan Hajnoczi 
Cc: Michael S. Tsirkin 
Signed-off-by: Nicholas Bellinger 
---
 hw/virtio-scsi.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 5e2ff6b..2c70f89 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -150,7 +150,7 @@ typedef struct {
 bool events_dropped;
 VirtQueue *ctrl_vq;
 VirtQueue *event_vq;
-VirtQueue *cmd_vqs[0];
+VirtQueue *cmd_vqs[1];
 
 bool vhost_started;
 VHostSCSI *vhost_scsi;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v2 4/6] virtio-scsi: Add start/stop functionality for vhost-scsi

2012-08-13 Thread Nicholas A. Bellinger
From: Stefan Hajnoczi 

This patch starts and stops vhost as the virtio device transitions
through its status phases.  Vhost can only be started once the guest
reports its driver has successfully initialized, which means the
virtqueues have been set up by the guest.

v2: - Squash virtio-scsi: use the vhost-scsi host device from stefan (nab)
- Fix up virtio_scsi_properties[] conflict w/ upstream qemu (nab)
- Drop usage of to_virtio_scsi() in virtio_scsi_set_status()
  (reported by paolo)
- Use modern VirtIOSCSIConf define in virtio-scsi.h (reported by paolo)
- Use s->conf->vhost_scsi instead of proxyconf->vhost_scsi in
  virtio_scsi_init() (reported by paolo)
- Only register QEMU SCSI bus is vhost-scsi is not active (reported
  by paolo)

Cc: Stefan Hajnoczi 
Cc: Zhi Yong Wu 
Cc: Michael S. Tsirkin 
Cc: Paolo Bonzini 
Signed-off-by: Nicholas Bellinger 
---
 hw/virtio-pci.c  |1 +
 hw/virtio-scsi.c |   48 
 hw/virtio-scsi.h |1 +
 3 files changed, 50 insertions(+), 0 deletions(-)

diff --git a/hw/virtio-pci.c b/hw/virtio-pci.c
index 125eded..b29fc3b 100644
--- a/hw/virtio-pci.c
+++ b/hw/virtio-pci.c
@@ -1036,6 +1036,7 @@ static void virtio_scsi_exit_pci(PCIDevice *pci_dev)
 }
 
 static Property virtio_scsi_properties[] = {
+DEFINE_PROP_VHOST_SCSI("vhost-scsi", VirtIOPCIProxy, scsi.vhost_scsi),
 DEFINE_PROP_BIT("ioeventfd", VirtIOPCIProxy, flags, 
VIRTIO_PCI_FLAG_USE_IOEVENTFD_BIT, true),
 DEFINE_PROP_UINT32("vectors", VirtIOPCIProxy, nvectors, 
DEV_NVECTORS_UNSPECIFIED),
 DEFINE_VIRTIO_SCSI_PROPERTIES(VirtIOPCIProxy, host_features, scsi),
diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 5f737ac..8130956 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -13,9 +13,13 @@
  *
  */
 
+#include "qemu-common.h"
+#include "qemu-error.h"
+#include "vhost-scsi.h"
 #include "virtio-scsi.h"
 #include 
 #include 
+#include "vhost.h"
 
 #define VIRTIO_SCSI_VQ_SIZE 128
 #define VIRTIO_SCSI_CDB_SIZE32
@@ -147,6 +151,9 @@ typedef struct {
 VirtQueue *ctrl_vq;
 VirtQueue *event_vq;
 VirtQueue *cmd_vqs[0];
+
+bool vhost_started;
+VHostSCSI *vhost_scsi;
 } VirtIOSCSI;
 
 typedef struct VirtIOSCSIReq {
@@ -699,6 +706,38 @@ static struct SCSIBusInfo virtio_scsi_scsi_info = {
 .load_request = virtio_scsi_load_request,
 };
 
+static bool virtio_scsi_started(VirtIOSCSI *s, uint8_t val)
+{
+return (val & VIRTIO_CONFIG_S_DRIVER_OK) && s->vdev.vm_running;
+}
+
+static void virtio_scsi_set_status(VirtIODevice *vdev, uint8_t val)
+{
+VirtIOSCSI *s = (VirtIOSCSI *)vdev;
+bool start = virtio_scsi_started(s, val);
+
+if (s->vhost_started == start) {
+return;
+}
+
+if (start) {
+int ret;
+
+ret = vhost_scsi_start(s->vhost_scsi, vdev);
+if (ret < 0) {
+error_report("virtio-scsi: unable to start vhost: %s\n",
+ strerror(-ret));
+
+/* There is no userspace virtio-scsi fallback so exit */
+exit(1);
+}
+} else {
+vhost_scsi_stop(s->vhost_scsi, vdev);
+}
+
+s->vhost_started = start;
+}
+
 VirtIODevice *virtio_scsi_init(DeviceState *dev, VirtIOSCSIConf *proxyconf)
 {
 VirtIOSCSI *s;
@@ -712,12 +751,17 @@ VirtIODevice *virtio_scsi_init(DeviceState *dev, 
VirtIOSCSIConf *proxyconf)
 
 s->qdev = dev;
 s->conf = proxyconf;
+s->vhost_started = false;
+s->vhost_scsi = s->conf->vhost_scsi;
 
 /* TODO set up vdev function pointers */
 s->vdev.get_config = virtio_scsi_get_config;
 s->vdev.set_config = virtio_scsi_set_config;
 s->vdev.get_features = virtio_scsi_get_features;
 s->vdev.reset = virtio_scsi_reset;
+if (s->vhost_scsi) {
+s->vdev.set_status = virtio_scsi_set_status;
+}
 
 s->ctrl_vq = virtio_add_queue(&s->vdev, VIRTIO_SCSI_VQ_SIZE,
virtio_scsi_handle_ctrl);
@@ -743,5 +787,9 @@ void virtio_scsi_exit(VirtIODevice *vdev)
 {
 VirtIOSCSI *s = (VirtIOSCSI *)vdev;
 unregister_savevm(s->qdev, "virtio-scsi", s);
+
+/* This will stop vhost backend if appropriate. */
+virtio_scsi_set_status(vdev, 0);
+
 virtio_cleanup(vdev);
 }
diff --git a/hw/virtio-scsi.h b/hw/virtio-scsi.h
index 4bc889d..74e9422 100644
--- a/hw/virtio-scsi.h
+++ b/hw/virtio-scsi.h
@@ -22,6 +22,7 @@
 #define VIRTIO_ID_SCSI  8
 
 struct VirtIOSCSIConf {
+VHostSCSI *vhost_scsi;
 uint32_t num_queues;
 uint32_t max_sectors;
 uint32_t cmd_per_lun;
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC-v2 5/6] virtio-scsi: Set max_target=0 during vhost-scsi operation

2012-08-13 Thread Nicholas A. Bellinger
From: Nicholas Bellinger 

This QEMU patch sets VirtIOSCSIConfig->max_target=0 for vhost-scsi operation
to restrict virtio-scsi LLD guest scanning to max_id=0 (a single target ID
instance) when connected to individual tcm_vhost endpoints.

This ensures that virtio-scsi LLD only attempts to scan target IDs up to
VIRTIO_SCSI_MAX_TARGET when connected via virtio-scsi-raw.

Cc: Stefan Hajnoczi 
Cc: Zhi Yong Wu 
Cc: Paolo Bonzini 
Signed-off-by: Nicholas Bellinger 
---
 hw/virtio-scsi.c |6 +-
 1 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
index 8130956..5e2ff6b 100644
--- a/hw/virtio-scsi.c
+++ b/hw/virtio-scsi.c
@@ -545,7 +545,11 @@ static void virtio_scsi_get_config(VirtIODevice *vdev,
 stl_raw(&scsiconf->sense_size, s->sense_size);
 stl_raw(&scsiconf->cdb_size, s->cdb_size);
 stl_raw(&scsiconf->max_channel, VIRTIO_SCSI_MAX_CHANNEL);
-stl_raw(&scsiconf->max_target, VIRTIO_SCSI_MAX_TARGET);
+if (s->vhost_scsi) {
+stl_raw(&scsiconf->max_target, 0);
+} else {
+stl_raw(&scsiconf->max_target, VIRTIO_SCSI_MAX_TARGET);
+}
 stl_raw(&scsiconf->max_lun, VIRTIO_SCSI_MAX_LUN);
 }
 
-- 
1.7.2.5

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/2] QEMU: extract file_load() function from rom_add_file() for reusing

2012-08-13 Thread Alexander Graf


On 13.08.2012, at 07:21, Olivia Yin  wrote:

> Sanity check in rom_add_file() could be reused by other image loaders.
> 
> Signed-off-by: Olivia Yin 

These are QEMU patches and don't have anything to do with KVM :).

Please send them again to qemu-devel.


Alex
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 08:35:12AM +, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> This is required to get past the following assert with:
> 
> commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> Author: Jan Kiszka 
> Date:   Thu May 17 10:32:39 2012 -0300
> 
> virtio/vhost: Add support for KVM in-kernel MSI injection
> 
> Cc: Stefan Hajnoczi 
> Cc: Jan Kiszka 
> Cc: Paolo Bonzini 
> Cc: Anthony Liguori 
> Signed-off-by: Nicholas Bellinger 

Could you please add a bit more explanation why
this happens with virtio scsi and why this is valid?

> ---
>  hw/msix.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 800fc32..c1e6dc3 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
>  {
>  int vector;
>  
> +if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier)
> +return;
> +
>  assert(dev->msix_vector_use_notifier &&
> dev->msix_vector_release_notifier);
>  
> -- 
> 1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 3/6] vhost-scsi: add -vhost-scsi host device for use with tcm-vhost

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 08:35:14AM +, Nicholas A. Bellinger wrote:
> From: Stefan Hajnoczi 
> 
> This patch adds a new type of host device that drives the vhost_scsi
> device.  The syntax to add vhost-scsi is:
> 
>   qemu -vhost-scsi id=vhost-scsi0,wwpn=...,tpgt=123
> 
> The virtio-scsi emulated device will make use of vhost-scsi to process
> virtio-scsi requests inside the kernel and hand them to the in-kernel
> SCSI target stack using the tcm_vhost fabric driver.
> 
> The tcm_vhost driver was merged into the upstream linux kernel for 3.6-rc2,
> and the commit can be found here:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057cbf49a1f08297
> 
> Changelog v1 -> v2:
> 
> - Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
>   starting point for v3.6-rc code (Stefan + ALiguori + nab)
> - Fix upstream qemu conflict in hw/qdev-properties.c
> - Make GET_ABI_VERSION use int (nab + mst)
> - Fix vhost-scsi case lables in configure (reported by paolo)
> - Convert qdev_prop_vhost_scsi to use ->get() + ->set() following
>   qdev_prop_netdev (reported by paolo)
> - Fix typo in qemu-options.hx definition of vhost-scsi (reported by paolo)
> 
> Changelog v0 -> v1:
> 
> - Add VHOST_SCSI_SET_ENDPOINT call (stefan)
> - Enable vhost notifiers for multiple queues (Zhi)
> - clear vhost-scsi endpoint on stopped (Zhi)
> - Add CONFIG_VHOST_SCSI for QEMU build configure (nab)
> - Rename vhost_vring_target -> vhost_scsi_target (mst + nab)
> - Add support for VHOST_SCSI_GET_ABI_VERSION ioctl (aliguori + nab)
> 
> Cc: Stefan Hajnoczi 
> Cc: Zhi Yong Wu 
> Cc: Anthony Liguori 
> Cc: Paolo Bonzini 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Nicholas Bellinger 
> ---
>  configure|   10 +++
>  hw/Makefile.objs |1 +
>  hw/qdev-properties.c |   40 
>  hw/qdev.h|3 +
>  hw/vhost-scsi.c  |  170 
> ++
>  hw/vhost-scsi.h  |   50 +++
>  qemu-common.h|1 +
>  qemu-config.c|   16 +
>  qemu-options.hx  |4 +
>  vl.c |   18 +
>  10 files changed, 313 insertions(+), 0 deletions(-)
>  create mode 100644 hw/vhost-scsi.c
>  create mode 100644 hw/vhost-scsi.h
> 
> diff --git a/configure b/configure
> index f0dbc03..1f03202 100755
> --- a/configure
> +++ b/configure
> @@ -168,6 +168,7 @@ libattr=""
>  xfs=""
>  
>  vhost_net="no"
> +vhost_scsi="no"
>  kvm="no"
>  gprof="no"
>  debug_tcg="no"
> @@ -513,6 +514,7 @@ Haiku)
>usb="linux"
>kvm="yes"
>vhost_net="yes"
> +  vhost_scsi="yes"
>if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
>  audio_possible_drivers="$audio_possible_drivers fmod"
>fi
> @@ -818,6 +820,10 @@ for opt do
>;;
>--enable-vhost-net) vhost_net="yes"
>;;
> +  --disable-vhost-scsi) vhost_scsi="no"
> +  ;;
> +  --enable-vhost-scsi) vhost_scsi="yes"
> +  ;;
>--disable-opengl) opengl="no"
>;;
>--enable-opengl) opengl="yes"
> @@ -3116,6 +3122,7 @@ echo "posix_madvise $posix_madvise"
>  echo "uuid support  $uuid"
>  echo "libcap-ng support $cap_ng"
>  echo "vhost-net support $vhost_net"
> +echo "vhost-scsi support $vhost_scsi"
>  echo "Trace backend $trace_backend"
>  echo "Trace output file $trace_file-"
>  echo "spice support $spice"
> @@ -3828,6 +3835,9 @@ case "$target_arch2" in
>if test "$vhost_net" = "yes" ; then
>  echo "CONFIG_VHOST_NET=y" >> $config_target_mak
>fi
> +  if test "$vhost_scsi" = "yes" ; then
> +echo "CONFIG_VHOST_SCSI=y" >> $config_target_mak
> +  fi
>  fi
>  esac
>  case "$target_arch2" in
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 3ba5dd0..6ab75ec 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -169,6 +169,7 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o 
> virtio-balloon.o virtio-net.o
>  obj-$(CONFIG_VIRTIO) += virtio-serial-bus.o virtio-scsi.o
>  obj-$(CONFIG_SOFTMMU) += vhost_net.o
>  obj-$(CONFIG_VHOST_NET) += vhost.o
> +obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o
>  obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/
>  obj-$(CONFIG_NO_PCI) += pci-stub.o
>  obj-$(CONFIG_VGA) += vga.o
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 8aca0d4..0266266 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -4,6 +4,7 @@
>  #include "blockdev.h"
>  #include "hw/block-common.h"
>  #include "net/hub.h"
> +#include "vhost-scsi.h"
>  
>  void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
>  {
> @@ -696,6 +697,45 @@ PropertyInfo qdev_prop_vlan = {
>  .set   = set_vlan,
>  };
>  
> +/* --- vhost-scsi --- */
> +
> +static int parse_vhost_scsi_dev(DeviceState *dev, const char *str, void 
> **ptr)
> +{
> +   VHostSCSI *p;
> +
> +   p = find_vhost_scsi(str);
> +   if (p == NULL)
> +   return -ENOENT;
> +
> +   *ptr = p;
> +   return 0;
> +}
> +
> +static const char *print_vhost_scsi_dev(void *ptr)
> +{
> +VHostSCSI *p = ptr;
> +
> +return 

Re: [PATCH 3/4] s390/kvm: Add a channel I/O based virtio transport driver.

2012-08-13 Thread Cornelia Huck
On Wed, 08 Aug 2012 13:52:57 +0930
Rusty Russell  wrote:

> On Tue,  7 Aug 2012 16:52:47 +0200, Cornelia Huck  
> wrote:
> > Add a driver for kvm guests that matches virtual ccw devices provided
> > by the host as virtio bridge devices.
> 
> Hi Cornelia,
> 
> OK, this is a good opportunity to fix some limitations, just as
> we did for virtio_mmio (drivers/virtio/virtio_mmio.c).
> 
> 1) Please don't limit yourself to 32 feature bits!  If you look at how
>virtio_mmio does it, they use a selector to index into a
>theoretically-infinite array of feature bits:
> 
>  * 0x010  R  HostFeatures Features supported by the host
>  * 0x014  W  HostFeaturesSel  Set of host features to access via HostFeatures
>  *
>  * 0x020  W  GuestFeaturesFeatures activated by the guest
>  * 0x024  W  GuestFeaturesSel Set of activated features to set via 
> GuestFeatures

It should be easy to extend the data processed by the feature ccws to a
feature/index combination. Would it be practical to limit the index to
an 8 bit value?

> 
> 2) Please also allow the guest to set the alignment for virtio ring
>layout (it controls the spacing between the rings), eg:
> 
>  * 0x03c  W  QueueAlign   Used Ring alignment for the current queue

I think the set_vq ccw could be extended with that info.

> 
> 3) Finally, make sure the guest can set the size of the queue, in case
>it can't allocate the size the host suggests, eg:
> 
>  * 0x034  R  QueueNumMax  Maximum size of the currently selected queue
>  * 0x038  W  QueueNum Queue size for the currently selected queue
> 
>This means the host can suggest huge queues, knowing the guest won't
>simply fail if it does so.

Makes sense, I didn't like just failing to allocate either. The actual
size could probably go into the set_vq ccw as well.

> 
> Note that we're also speculating a move to a new vring format, which
> will probably be little-endian.  But you probably want a completely new
> ccw code for that anyway.

Do you have a pointer to that discussion handy?

If the host may support different vring formats, I'll probably want to
add some kind of discovery mechanism for that as well (what discovery
mechanism depends on whether this would be per-device or per-machine).

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 3/6] vhost-scsi: add -vhost-scsi host device for use with tcm-vhost

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 08:35:14AM +, Nicholas A. Bellinger wrote:
> From: Stefan Hajnoczi 
> 
> This patch adds a new type of host device that drives the vhost_scsi
> device.  The syntax to add vhost-scsi is:
> 
>   qemu -vhost-scsi id=vhost-scsi0,wwpn=...,tpgt=123
> 
> The virtio-scsi emulated device will make use of vhost-scsi to process
> virtio-scsi requests inside the kernel and hand them to the in-kernel
> SCSI target stack using the tcm_vhost fabric driver.
> 
> The tcm_vhost driver was merged into the upstream linux kernel for 3.6-rc2,
> and the commit can be found here:
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057cbf49a1f08297
> 
> Changelog v1 -> v2:
> 
> - Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
>   starting point for v3.6-rc code (Stefan + ALiguori + nab)
> - Fix upstream qemu conflict in hw/qdev-properties.c
> - Make GET_ABI_VERSION use int (nab + mst)
> - Fix vhost-scsi case lables in configure (reported by paolo)
> - Convert qdev_prop_vhost_scsi to use ->get() + ->set() following
>   qdev_prop_netdev (reported by paolo)
> - Fix typo in qemu-options.hx definition of vhost-scsi (reported by paolo)
> 
> Changelog v0 -> v1:
> 
> - Add VHOST_SCSI_SET_ENDPOINT call (stefan)
> - Enable vhost notifiers for multiple queues (Zhi)
> - clear vhost-scsi endpoint on stopped (Zhi)
> - Add CONFIG_VHOST_SCSI for QEMU build configure (nab)
> - Rename vhost_vring_target -> vhost_scsi_target (mst + nab)
> - Add support for VHOST_SCSI_GET_ABI_VERSION ioctl (aliguori + nab)
> 
> Cc: Stefan Hajnoczi 
> Cc: Zhi Yong Wu 
> Cc: Anthony Liguori 
> Cc: Paolo Bonzini 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Nicholas Bellinger 


Sent mail too fast, sorry. More comments below.

> ---
>  configure|   10 +++
>  hw/Makefile.objs |1 +
>  hw/qdev-properties.c |   40 
>  hw/qdev.h|3 +
>  hw/vhost-scsi.c  |  170 
> ++
>  hw/vhost-scsi.h  |   50 +++
>  qemu-common.h|1 +
>  qemu-config.c|   16 +
>  qemu-options.hx  |4 +
>  vl.c |   18 +
>  10 files changed, 313 insertions(+), 0 deletions(-)
>  create mode 100644 hw/vhost-scsi.c
>  create mode 100644 hw/vhost-scsi.h
> 
> diff --git a/configure b/configure
> index f0dbc03..1f03202 100755
> --- a/configure
> +++ b/configure
> @@ -168,6 +168,7 @@ libattr=""
>  xfs=""
>  
>  vhost_net="no"
> +vhost_scsi="no"
>  kvm="no"
>  gprof="no"
>  debug_tcg="no"
> @@ -513,6 +514,7 @@ Haiku)
>usb="linux"
>kvm="yes"
>vhost_net="yes"
> +  vhost_scsi="yes"
>if [ "$cpu" = "i386" -o "$cpu" = "x86_64" ] ; then
>  audio_possible_drivers="$audio_possible_drivers fmod"
>fi
> @@ -818,6 +820,10 @@ for opt do
>;;
>--enable-vhost-net) vhost_net="yes"
>;;
> +  --disable-vhost-scsi) vhost_scsi="no"
> +  ;;
> +  --enable-vhost-scsi) vhost_scsi="yes"
> +  ;;
>--disable-opengl) opengl="no"
>;;
>--enable-opengl) opengl="yes"
> @@ -3116,6 +3122,7 @@ echo "posix_madvise $posix_madvise"
>  echo "uuid support  $uuid"
>  echo "libcap-ng support $cap_ng"
>  echo "vhost-net support $vhost_net"
> +echo "vhost-scsi support $vhost_scsi"
>  echo "Trace backend $trace_backend"
>  echo "Trace output file $trace_file-"
>  echo "spice support $spice"
> @@ -3828,6 +3835,9 @@ case "$target_arch2" in
>if test "$vhost_net" = "yes" ; then
>  echo "CONFIG_VHOST_NET=y" >> $config_target_mak
>fi
> +  if test "$vhost_scsi" = "yes" ; then
> +echo "CONFIG_VHOST_SCSI=y" >> $config_target_mak
> +  fi
>  fi
>  esac
>  case "$target_arch2" in
> diff --git a/hw/Makefile.objs b/hw/Makefile.objs
> index 3ba5dd0..6ab75ec 100644
> --- a/hw/Makefile.objs
> +++ b/hw/Makefile.objs
> @@ -169,6 +169,7 @@ obj-$(CONFIG_VIRTIO) += virtio.o virtio-blk.o 
> virtio-balloon.o virtio-net.o
>  obj-$(CONFIG_VIRTIO) += virtio-serial-bus.o virtio-scsi.o
>  obj-$(CONFIG_SOFTMMU) += vhost_net.o
>  obj-$(CONFIG_VHOST_NET) += vhost.o
> +obj-$(CONFIG_VHOST_SCSI) += vhost-scsi.o
>  obj-$(CONFIG_REALLY_VIRTFS) += 9pfs/
>  obj-$(CONFIG_NO_PCI) += pci-stub.o
>  obj-$(CONFIG_VGA) += vga.o
> diff --git a/hw/qdev-properties.c b/hw/qdev-properties.c
> index 8aca0d4..0266266 100644
> --- a/hw/qdev-properties.c
> +++ b/hw/qdev-properties.c
> @@ -4,6 +4,7 @@
>  #include "blockdev.h"
>  #include "hw/block-common.h"
>  #include "net/hub.h"
> +#include "vhost-scsi.h"
>  
>  void *qdev_get_prop_ptr(DeviceState *dev, Property *prop)
>  {
> @@ -696,6 +697,45 @@ PropertyInfo qdev_prop_vlan = {
>  .set   = set_vlan,
>  };
>  
> +/* --- vhost-scsi --- */
> +
> +static int parse_vhost_scsi_dev(DeviceState *dev, const char *str, void 
> **ptr)
> +{
> +   VHostSCSI *p;
> +
> +   p = find_vhost_scsi(str);
> +   if (p == NULL)
> +   return -ENOENT;
> +
> +   *ptr = p;
> +   return 0;
> +}
> +
> +static const char *print_vhost_scsi_dev(void *ptr)

Re: [RFC-v2 6/6] virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 08:35:17AM +, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> This patch fixes bug in the definition of VirtIOSCSI->cmd_vqs[0],
> where the return of virtio_add_queue() in virtio_scsi_init() ends up
> overwriting past the end of ->cmd_vqs[0].
> 
> Since virtio_scsi currently assumes a single vqs for data, this patch
> simply changes ->cmd_vqs[1] to handle the single VirtQueue.
> 
> Cc: Paolo Bonzini 
> Cc: Stefan Hajnoczi 
> Cc: Michael S. Tsirkin 
> Signed-off-by: Nicholas Bellinger 

This is a bugfix we need even without vhost, right?

> ---
>  hw/virtio-scsi.c |2 +-
>  1 files changed, 1 insertions(+), 1 deletions(-)
> 
> diff --git a/hw/virtio-scsi.c b/hw/virtio-scsi.c
> index 5e2ff6b..2c70f89 100644
> --- a/hw/virtio-scsi.c
> +++ b/hw/virtio-scsi.c
> @@ -150,7 +150,7 @@ typedef struct {
>  bool events_dropped;
>  VirtQueue *ctrl_vq;
>  VirtQueue *event_vq;
> -VirtQueue *cmd_vqs[0];
> +VirtQueue *cmd_vqs[1];
>  
>  bool vhost_started;
>  VHostSCSI *vhost_scsi;
> -- 
> 1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 0/6] vhost-scsi: Add support for host virtualized target

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 08:35:11AM +, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> Hi Paolo, Stefan, & QEMU folks,
> 
> The following is the second RFC series for vhost-scsi patches against mainline
> QEMU v1.1.0.  The series is available from the following working branch:
> 
>   git://git.kernel.org/pub/scm/virt/kvm/nab/qemu-kvm.git vhost-scsi-merge
> 
> Apologies for the delayed follow-up on this series.  The changes detailed 
> below
> addresses Paolo's original comments on vhost-scsi code from the last weeks.

Looks good overall. I sent some comments on list.
Thanks!

> As of this evening the tcm_vhost driver has now been merged into the mainline
> kernel for 3.6-rc2 here:
> 
>   tcm_vhost: Initial merge for vhost level target fabric driver
>   
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux.git;a=commitdiff;h=057cbf49a1f08297877
> 
> Also, after following up on the qemu-kvm IRQ injection changes (from Jan) that
> caused a performance regerssion with QEMU 1.1.0 code originally reported here:
> 
>   vhost-scsi port to v1.1.0 + MSI-X performance regression
>   http://comments.gmane.org/gmane.linux.scsi.target.devel/2277
> 
> It turns out that setting explict virtio-queue IRQ affinity within guest 
> appears
> to bring small block random IOPs performance back up to the pre IRQ injection
> conversion levels.  I'm not sure why this ended up making so much of a 
> difference
> post IRQ injection conversion, but setting virtio-queue affinity is now 
> getting
> us back to pre IQN injection conversion levels.
> 
> Changes from v1 -> v2:
> 
>  - Expose ABI version via VHOST_SCSI_GET_ABI_VERSION + use Rev 0 as
>starting point for v3.6-rc code (Stefan + ALiguori + nab)
>  - Fix upstream qemu conflict in hw/qdev-properties.c
>  - Make GET_ABI_VERSION use int (nab + mst)
>  - Drop unnecessary event-notifier changes (nab)
>  - Fix vhost-scsi case lables in configure (reported by paolo)
>  - Convert qdev_prop_vhost_scsi to use ->get() + ->set() following
>qdev_prop_netdev (reported by paolo)
>  - Fix typo in qemu-options.hx definition of vhost-scsi (reported by paolo)
>  - Squash virtio-scsi: use the vhost-scsi host device from stefan (nab)
>  - Fix up virtio_scsi_properties[] conflict w/ upstream qemu (nab)
>  - Drop usage of to_virtio_scsi() in virtio_scsi_set_status()
>   (reported by paolo)
>  - Use modern VirtIOSCSIConf define in virtio-scsi.h (reported by paolo)
>  - Use s->conf->vhost_scsi instead of proxyconf->vhost_scsi in
>virtio_scsi_init() (reported by paolo)
>  - Only register QEMU SCSI bus is vhost-scsi is not active (reported by paolo)
>  - Fix incorrect VirtIOSCSI->cmd_vqs[0] definition (nab)
> 
> Please have another look, and let me know if anything else needs to be
> addressed.
> 
> Thanks!
> 
> --nab
> 
> Nicholas Bellinger (3):
>   msix: Work-around for vhost-scsi with KVM in-kernel MSI injection
>   virtio-scsi: Set max_target=0 during vhost-scsi operation
>   virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition
> 
> Stefan Hajnoczi (3):
>   vhost: Pass device path to vhost_dev_init()
>   vhost-scsi: add -vhost-scsi host device for use with tcm-vhost
>   virtio-scsi: Add start/stop functionality for vhost-scsi
> 
>  configure|   10 +++
>  hw/Makefile.objs |1 +
>  hw/msix.c|3 +
>  hw/qdev-properties.c |   40 
>  hw/qdev.h|3 +
>  hw/vhost-scsi.c  |  170 
> ++
>  hw/vhost-scsi.h  |   50 +++
>  hw/vhost.c   |5 +-
>  hw/vhost.h   |3 +-
>  hw/vhost_net.c   |2 +-
>  hw/virtio-pci.c  |1 +
>  hw/virtio-scsi.c |   56 -
>  hw/virtio-scsi.h |1 +
>  qemu-common.h|1 +
>  qemu-config.c|   16 +
>  qemu-options.hx  |4 +
>  vl.c |   18 +
>  17 files changed, 378 insertions(+), 6 deletions(-)
>  create mode 100644 hw/vhost-scsi.c
>  create mode 100644 hw/vhost-scsi.h
> 
> -- 
> 1.7.2.5
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
Here is a quick prototype of what we discussed yesterday. This one
caches only MSI interrupts for now. The obvious problem is that not
all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
routing table, so they cannot be cached.

Gleb Natapov (2):
  Call irq_rt callback under rcu_read_lock()
  Cache msi irq destination.

 arch/x86/kvm/lapic.c |2 +-
 include/linux/kvm_host.h |1 +
 virt/kvm/ioapic.c|2 +-
 virt/kvm/ioapic.h|3 ++-
 virt/kvm/irq_comm.c  |   37 -
 5 files changed, 25 insertions(+), 20 deletions(-)

-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 1/2] Call irq_rt callback under rcu_read_lock()

2012-08-13 Thread Gleb Natapov
Callbacks are no longer sleep.

Signed-off-by: Gleb Natapov 
---
 virt/kvm/irq_comm.c |   21 -
 1 file changed, 8 insertions(+), 13 deletions(-)

diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index 7118be0..aad58e7 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -143,8 +143,8 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi 
*msi)
  */
 int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 irq, int level)
 {
-   struct kvm_kernel_irq_routing_entry *e, irq_set[KVM_NR_IRQCHIPS];
-   int ret = -1, i = 0;
+   struct kvm_kernel_irq_routing_entry *e;
+   int ret = -1;
struct kvm_irq_routing_table *irq_rt;
struct hlist_node *n;
 
@@ -157,19 +157,14 @@ int kvm_set_irq(struct kvm *kvm, int irq_source_id, u32 
irq, int level)
rcu_read_lock();
irq_rt = rcu_dereference(kvm->irq_routing);
if (irq < irq_rt->nr_rt_entries)
-   hlist_for_each_entry(e, n, &irq_rt->map[irq], link)
-   irq_set[i++] = *e;
+   hlist_for_each_entry(e, n, &irq_rt->map[irq], link) {
+   int r = e->set(e, kvm, irq_source_id, level);
+   if (r < 0)
+   continue;
+   ret = r + ((ret < 0) ? 0 : ret);
+   }
rcu_read_unlock();
 
-   while(i--) {
-   int r;
-   r = irq_set[i].set(&irq_set[i], kvm, irq_source_id, level);
-   if (r < 0)
-   continue;
-
-   ret = r + ((ret < 0) ? 0 : ret);
-   }
-
return ret;
 }
 
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC PATCH 2/2] Cache msi irq destination.

2012-08-13 Thread Gleb Natapov

Signed-off-by: Gleb Natapov 
---
 arch/x86/kvm/lapic.c |2 +-
 include/linux/kvm_host.h |1 +
 virt/kvm/ioapic.c|2 +-
 virt/kvm/ioapic.h|3 ++-
 virt/kvm/irq_comm.c  |   16 
 5 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index 18d149d..367a514 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -629,7 +629,7 @@ static void apic_send_ipi(struct kvm_lapic *apic)
   irq.trig_mode, irq.level, irq.dest_mode, irq.delivery_mode,
   irq.vector);
 
-   kvm_irq_delivery_to_apic(apic->vcpu->kvm, apic, &irq);
+   kvm_irq_delivery_to_apic(NULL, apic->vcpu->kvm, apic, &irq);
 }
 
 static u32 apic_get_tmcct(struct kvm_lapic *apic)
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index d2b897e..bcd3dc7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -271,6 +271,7 @@ struct kvm_kernel_irq_routing_entry {
struct msi_msg msi;
};
struct hlist_node link;
+   struct kvm_vcpu *vcpu;
 };
 
 #ifdef __KVM_HAVE_IOAPIC
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ef61d52..e6c8717 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -188,7 +188,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic, int 
irq)
irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
}
 #endif
-   return kvm_irq_delivery_to_apic(ioapic->kvm, NULL, &irqe);
+   return kvm_irq_delivery_to_apic(NULL, ioapic->kvm, NULL, &irqe);
 }
 
 int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, int irq_source_id,
diff --git a/virt/kvm/ioapic.h b/virt/kvm/ioapic.h
index a30abfe..2a715bd 100644
--- a/virt/kvm/ioapic.h
+++ b/virt/kvm/ioapic.h
@@ -78,7 +78,8 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int irq, 
int irq_source_id,
   int level);
 void kvm_ioapic_clear_all(struct kvm_ioapic *ioapic, int irq_source_id);
 void kvm_ioapic_reset(struct kvm_ioapic *ioapic);
-int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
+int kvm_irq_delivery_to_apic(struct kvm_kernel_irq_routing_entry *e,
+   struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq);
 int kvm_get_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
 int kvm_set_ioapic(struct kvm *kvm, struct kvm_ioapic_state *state);
diff --git a/virt/kvm/irq_comm.c b/virt/kvm/irq_comm.c
index aad58e7..b556c2c 100644
--- a/virt/kvm/irq_comm.c
+++ b/virt/kvm/irq_comm.c
@@ -61,11 +61,12 @@ inline static bool kvm_is_dm_lowest_prio(struct 
kvm_lapic_irq *irq)
 #endif
 }
 
-int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
+int kvm_irq_delivery_to_apic(struct kvm_kernel_irq_routing_entry *e,
+   struct kvm *kvm, struct kvm_lapic *src,
struct kvm_lapic_irq *irq)
 {
-   int i, r = -1;
-   struct kvm_vcpu *vcpu, *lowest = NULL;
+   int i, r = -1, c = 0;
+   struct kvm_vcpu *vcpu, *cache = NULL, *lowest = NULL;
 
if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
kvm_is_dm_lowest_prio(irq))
@@ -82,6 +83,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
if (!kvm_is_dm_lowest_prio(irq)) {
if (r < 0)
r = 0;
+   c++;
+   cache = vcpu;
r += kvm_apic_set_irq(vcpu, irq);
} else if (kvm_lapic_enabled(vcpu)) {
if (!lowest)
@@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
kvm_lapic *src,
 
if (lowest)
r = kvm_apic_set_irq(lowest, irq);
+   else if (e && c == 1)
+   e->vcpu = cache; /* cache it */
 
return r;
 }
@@ -118,7 +123,9 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
irq.shorthand = 0;
 
/* TODO Deal with RH bit of MSI message address */
-   return kvm_irq_delivery_to_apic(kvm, NULL, &irq);
+   if (e->vcpu)
+   return kvm_apic_set_irq(e->vcpu, &irq);
+   return kvm_irq_delivery_to_apic(e, kvm, NULL, &irq);
 }
 
 int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
@@ -131,6 +138,7 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi 
*msi)
route.msi.address_lo = msi->address_lo;
route.msi.address_hi = msi->address_hi;
route.msi.data = msi->data;
+   route.vcpu = NULL;
 
return kvm_set_msi(&route, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
 }
-- 
1.7.10.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Cache msi irq destination.

2012-08-13 Thread Avi Kivity
On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> Signed-off-by: Gleb Natapov 
>  
> -int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> +int kvm_irq_delivery_to_apic(struct kvm_kernel_irq_routing_entry *e,
> + struct kvm *kvm, struct kvm_lapic *src,
>   struct kvm_lapic_irq *irq)

Would be nicer to put e at the end, and explain that it is optional.

>  {
> - int i, r = -1;
> - struct kvm_vcpu *vcpu, *lowest = NULL;
> + int i, r = -1, c = 0;
> + struct kvm_vcpu *vcpu, *cache = NULL, *lowest = NULL;
>  
>   if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
>   kvm_is_dm_lowest_prio(irq))
> @@ -82,6 +83,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
> kvm_lapic *src,
>   if (!kvm_is_dm_lowest_prio(irq)) {
>   if (r < 0)
>   r = 0;
> + c++;
> + cache = vcpu;
>   r += kvm_apic_set_irq(vcpu, irq);
>   } else if (kvm_lapic_enabled(vcpu)) {
>   if (!lowest)
> @@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
> kvm_lapic *src,
>  
>   if (lowest)
>   r = kvm_apic_set_irq(lowest, irq);
> + else if (e && c == 1)
> + e->vcpu = cache; /* cache it */
>  
>   return r;
>  }
> @@ -118,7 +123,9 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
>   irq.shorthand = 0;
>  
>   /* TODO Deal with RH bit of MSI message address */
> - return kvm_irq_delivery_to_apic(kvm, NULL, &irq);
> + if (e->vcpu)
> + return kvm_apic_set_irq(e->vcpu, &irq);
> + return kvm_irq_delivery_to_apic(e, kvm, NULL, &irq);
>  }
>  
>  int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
> @@ -131,6 +138,7 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct 
> kvm_msi *msi)
>   route.msi.address_lo = msi->address_lo;
>   route.msi.address_hi = msi->address_hi;
>   route.msi.data = msi->data;
> + route.vcpu = NULL;
>  
>   return kvm_set_msi(&route, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
>  }
> 

Missing cache invalidate on apicid write?

Otherwise nice and simple.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 12:16:46PM +0300, Gleb Natapov wrote:
> Here is a quick prototype of what we discussed yesterday. This one
> caches only MSI interrupts for now. The obvious problem is that not
> all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> routing table, so they cannot be cached.

I thought this means ID changes need to replace the routing table, no?

> 
> Gleb Natapov (2):
>   Call irq_rt callback under rcu_read_lock()
>   Cache msi irq destination.
> 
>  arch/x86/kvm/lapic.c |2 +-
>  include/linux/kvm_host.h |1 +
>  virt/kvm/ioapic.c|2 +-
>  virt/kvm/ioapic.h|3 ++-
>  virt/kvm/irq_comm.c  |   37 -
>  5 files changed, 25 insertions(+), 20 deletions(-)
> 
> -- 
> 1.7.10.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 2/2] Cache msi irq destination.

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 12:32:44PM +0300, Avi Kivity wrote:
> On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> > Signed-off-by: Gleb Natapov 
> >  
> > -int kvm_irq_delivery_to_apic(struct kvm *kvm, struct kvm_lapic *src,
> > +int kvm_irq_delivery_to_apic(struct kvm_kernel_irq_routing_entry *e,
> > +   struct kvm *kvm, struct kvm_lapic *src,
> > struct kvm_lapic_irq *irq)
> 
> Would be nicer to put e at the end, and explain that it is optional.
> 
Just a prototype to see how it goes :)

> >  {
> > -   int i, r = -1;
> > -   struct kvm_vcpu *vcpu, *lowest = NULL;
> > +   int i, r = -1, c = 0;
> > +   struct kvm_vcpu *vcpu, *cache = NULL, *lowest = NULL;
> >  
> > if (irq->dest_mode == 0 && irq->dest_id == 0xff &&
> > kvm_is_dm_lowest_prio(irq))
> > @@ -82,6 +83,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
> > kvm_lapic *src,
> > if (!kvm_is_dm_lowest_prio(irq)) {
> > if (r < 0)
> > r = 0;
> > +   c++;
> > +   cache = vcpu;
> > r += kvm_apic_set_irq(vcpu, irq);
> > } else if (kvm_lapic_enabled(vcpu)) {
> > if (!lowest)
> > @@ -93,6 +96,8 @@ int kvm_irq_delivery_to_apic(struct kvm *kvm, struct 
> > kvm_lapic *src,
> >  
> > if (lowest)
> > r = kvm_apic_set_irq(lowest, irq);
> > +   else if (e && c == 1)
> > +   e->vcpu = cache; /* cache it */
> >  
> > return r;
> >  }
> > @@ -118,7 +123,9 @@ int kvm_set_msi(struct kvm_kernel_irq_routing_entry *e,
> > irq.shorthand = 0;
> >  
> > /* TODO Deal with RH bit of MSI message address */
> > -   return kvm_irq_delivery_to_apic(kvm, NULL, &irq);
> > +   if (e->vcpu)
> > +   return kvm_apic_set_irq(e->vcpu, &irq);
> > +   return kvm_irq_delivery_to_apic(e, kvm, NULL, &irq);
> >  }
> >  
> >  int kvm_send_userspace_msi(struct kvm *kvm, struct kvm_msi *msi)
> > @@ -131,6 +138,7 @@ int kvm_send_userspace_msi(struct kvm *kvm, struct 
> > kvm_msi *msi)
> > route.msi.address_lo = msi->address_lo;
> > route.msi.address_hi = msi->address_hi;
> > route.msi.data = msi->data;
> > +   route.vcpu = NULL;
> >  
> > return kvm_set_msi(&route, kvm, KVM_USERSPACE_IRQ_SOURCE_ID, 1);
> >  }
> > 
> 
> Missing cache invalidate on apicid write?
> 
Yes. Need to call to kvm_set_irq_routing() in strategic places. Same for
ioapic.

> Otherwise nice and simple.
> 
> 
> -- 
> error compiling committee.c: too many arguments to function

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 12:34:32PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 12:16:46PM +0300, Gleb Natapov wrote:
> > Here is a quick prototype of what we discussed yesterday. This one
> > caches only MSI interrupts for now. The obvious problem is that not
> > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > routing table, so they cannot be cached.
> 
> I thought this means ID changes need to replace the routing table, no?
> 
Correct. This is missing from the patches, but as I said this is just
prototype to see if it is feasible. What this prototype shows is that we
have problems with IPIs and MSIs from userspace.

> > 
> > Gleb Natapov (2):
> >   Call irq_rt callback under rcu_read_lock()
> >   Cache msi irq destination.
> > 
> >  arch/x86/kvm/lapic.c |2 +-
> >  include/linux/kvm_host.h |1 +
> >  virt/kvm/ioapic.c|2 +-
> >  virt/kvm/ioapic.h|3 ++-
> >  virt/kvm/irq_comm.c  |   37 -
> >  5 files changed, 25 insertions(+), 20 deletions(-)
> > 
> > -- 
> > 1.7.10.4

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> Here is a quick prototype of what we discussed yesterday. This one
> caches only MSI interrupts for now. The obvious problem is that not
> all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> routing table, so they cannot be cached.

We can have a small rcu-managed hash table to look those up.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> Here is a quick prototype of what we discussed yesterday. This one
> caches only MSI interrupts for now. The obvious problem is that not
> all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> routing table, so they cannot be cached.

Missing: switch the uncached path to a work queue, so we don't have to
iterate over all vcpus in interrupt context.

That isn't trivial; for edge-triggered interrupts we need to ignore
zeros (if polarity=0) but for level-triggered interrupts we need them to
override the previous setting.  But we don't know the trigger mode and
polarity at this point.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 12:36:27PM +0300, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 12:34:32PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 13, 2012 at 12:16:46PM +0300, Gleb Natapov wrote:
> > > Here is a quick prototype of what we discussed yesterday. This one
> > > caches only MSI interrupts for now. The obvious problem is that not
> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > > routing table, so they cannot be cached.
> > 
> > I thought this means ID changes need to replace the routing table, no?
> > 
> Correct. This is missing from the patches, but as I said this is just
> prototype to see if it is feasible.
> What this prototype shows is that we
> have problems with IPIs and MSIs from userspace.


I think it's a worthwhile optimization all the same.  When you feel it's
ready, I'm willing to test to see if it helps vhost.


> > > 
> > > Gleb Natapov (2):
> > >   Call irq_rt callback under rcu_read_lock()
> > >   Cache msi irq destination.
> > > 
> > >  arch/x86/kvm/lapic.c |2 +-
> > >  include/linux/kvm_host.h |1 +
> > >  virt/kvm/ioapic.c|2 +-
> > >  virt/kvm/ioapic.h|3 ++-
> > >  virt/kvm/irq_comm.c  |   37 -
> > >  5 files changed, 25 insertions(+), 20 deletions(-)
> > > 
> > > -- 
> > > 1.7.10.4
> 
> --
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 12:46:09PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 12:36:27PM +0300, Gleb Natapov wrote:
> > On Mon, Aug 13, 2012 at 12:34:32PM +0300, Michael S. Tsirkin wrote:
> > > On Mon, Aug 13, 2012 at 12:16:46PM +0300, Gleb Natapov wrote:
> > > > Here is a quick prototype of what we discussed yesterday. This one
> > > > caches only MSI interrupts for now. The obvious problem is that not
> > > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > > > routing table, so they cannot be cached.
> > > 
> > > I thought this means ID changes need to replace the routing table, no?
> > > 
> > Correct. This is missing from the patches, but as I said this is just
> > prototype to see if it is feasible.
> > What this prototype shows is that we
> > have problems with IPIs and MSIs from userspace.
> 
> 
> I think it's a worthwhile optimization all the same.  When you feel it's
> ready, I'm willing to test to see if it helps vhost.
> 
>
You can test it now. It passes my very simple testing.  Guests usually do
not change apic ids after HW initialization. Maximum it will fail and
you'll tell me how bad it failed :)

> > > > 
> > > > Gleb Natapov (2):
> > > >   Call irq_rt callback under rcu_read_lock()
> > > >   Cache msi irq destination.
> > > > 
> > > >  arch/x86/kvm/lapic.c |2 +-
> > > >  include/linux/kvm_host.h |1 +
> > > >  virt/kvm/ioapic.c|2 +-
> > > >  virt/kvm/ioapic.h|3 ++-
> > > >  virt/kvm/irq_comm.c  |   37 -
> > > >  5 files changed, 25 insertions(+), 20 deletions(-)
> > > > 
> > > > -- 
> > > > 1.7.10.4
> > 
> > --
> > Gleb.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


KVM call agenda for Tuesday, August 14

2012-08-13 Thread Juan Quintela

Hi

Please send in any agenda items you are interested in covering.

Thanks, Juan.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 12:43:50PM +0300, Avi Kivity wrote:
> On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> > Here is a quick prototype of what we discussed yesterday. This one
> > caches only MSI interrupts for now. The obvious problem is that not
> > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > routing table, so they cannot be cached.
> 
> Missing: switch the uncached path to a work queue, so we don't have to
> iterate over all vcpus in interrupt context.
>
> That isn't trivial; for edge-triggered interrupts we need to ignore
> zeros (if polarity=0) but for level-triggered interrupts we need them to
> override the previous setting.  But we don't know the trigger mode and
> polarity at this point.

Instead of doing it like this, can we simply require
callers to use a workqueue?
Add kvm_set_msi_inatomic that returns WOULDBLOCK if cache is NULL.

> 
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 12:51:49PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 12:43:50PM +0300, Avi Kivity wrote:
> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> > > Here is a quick prototype of what we discussed yesterday. This one
> > > caches only MSI interrupts for now. The obvious problem is that not
> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > > routing table, so they cannot be cached.
> > 
> > Missing: switch the uncached path to a work queue, so we don't have to
> > iterate over all vcpus in interrupt context.
> >
> > That isn't trivial; for edge-triggered interrupts we need to ignore
> > zeros (if polarity=0) but for level-triggered interrupts we need them to
> > override the previous setting.  But we don't know the trigger mode and
> > polarity at this point.
> 
> Instead of doing it like this, can we simply require
> callers to use a workqueue?
> Add kvm_set_msi_inatomic that returns WOULDBLOCK if cache is NULL.
> 
kvm_set_msi is simple since it is always edge and bails out very early
if level is 0. I do not yet understand where is the problem with ioapic
though.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
> On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> > Here is a quick prototype of what we discussed yesterday. This one
> > caches only MSI interrupts for now. The obvious problem is that not
> > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > routing table, so they cannot be cached.
> 
> We can have a small rcu-managed hash table to look those up.

Yes but how small? We probably need at least one entry
per vcpu, no?

> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> > > Here is a quick prototype of what we discussed yesterday. This one
> > > caches only MSI interrupts for now. The obvious problem is that not
> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > > routing table, so they cannot be cached.
> > 
> > We can have a small rcu-managed hash table to look those up.
> 
> Yes but how small? We probably need at least one entry
> per vcpu, no?
> 
One entry? We will spend more time managing it than injecting interrupts
:) ideally we need entry for each IPI sent and for each potential MSI
from userspace. What happens when hash table is full? We stop caching or
invalidate old entries? If later then cache can go valid->invalid which
may complicate the code.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 01:16 PM, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
>> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
>> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
>> > > Here is a quick prototype of what we discussed yesterday. This one
>> > > caches only MSI interrupts for now. The obvious problem is that not
>> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
>> > > routing table, so they cannot be cached.
>> > 
>> > We can have a small rcu-managed hash table to look those up.
>> 
>> Yes but how small? We probably need at least one entry
>> per vcpu, no?
>> 
> One entry? We will spend more time managing it than injecting interrupts
> :) ideally we need entry for each IPI sent and for each potential MSI
> from userspace. What happens when hash table is full? 

Drop the entire cache.

> We stop caching or
> invalidate old entries? If later then cache can go valid->invalid which
> may complicate the code.
> 

We can drop the entire cache via rcu freeing.  In fact we can have a
closed hash allocated as a single blob, easy to manage.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 01:21:33PM +0300, Avi Kivity wrote:
> On 08/13/2012 01:16 PM, Gleb Natapov wrote:
> > On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
> >> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
> >> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> >> > > Here is a quick prototype of what we discussed yesterday. This one
> >> > > caches only MSI interrupts for now. The obvious problem is that not
> >> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> >> > > routing table, so they cannot be cached.
> >> > 
> >> > We can have a small rcu-managed hash table to look those up.
> >> 
> >> Yes but how small? We probably need at least one entry
> >> per vcpu, no?
> >> 
> > One entry? We will spend more time managing it than injecting interrupts
> > :) ideally we need entry for each IPI sent and for each potential MSI
> > from userspace. What happens when hash table is full? 
> 
> Drop the entire cache.
> 
OK. Then it should be big enough to not do it frequently.

> > We stop caching or
> > invalidate old entries? If later then cache can go valid->invalid which
> > may complicate the code.
> > 
> 
> We can drop the entire cache via rcu freeing.  In fact we can have a
> closed hash allocated as a single blob, easy to manage.
> 
That's what I am locking at doing, yes.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 01:24 PM, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 01:21:33PM +0300, Avi Kivity wrote:
>> On 08/13/2012 01:16 PM, Gleb Natapov wrote:
>> > On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
>> >> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
>> >> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
>> >> > > Here is a quick prototype of what we discussed yesterday. This one
>> >> > > caches only MSI interrupts for now. The obvious problem is that not
>> >> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
>> >> > > routing table, so they cannot be cached.
>> >> > 
>> >> > We can have a small rcu-managed hash table to look those up.
>> >> 
>> >> Yes but how small? We probably need at least one entry
>> >> per vcpu, no?
>> >> 
>> > One entry? We will spend more time managing it than injecting interrupts
>> > :) ideally we need entry for each IPI sent and for each potential MSI
>> > from userspace. What happens when hash table is full? 
>> 
>> Drop the entire cache.
>> 
> OK. Then it should be big enough to not do it frequently.

Should be sized N * vcpus, where N is several dozen (generous amount of
non-device vectors, though multicast will break it since it's
essentially random).



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 12:43:50PM +0300, Avi Kivity wrote:
> On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> > Here is a quick prototype of what we discussed yesterday. This one
> > caches only MSI interrupts for now. The obvious problem is that not
> > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use irq
> > routing table, so they cannot be cached.
> 
> Missing: switch the uncached path to a work queue, so we don't have to
> iterate over all vcpus in interrupt context.
> 
> That isn't trivial; for edge-triggered interrupts we need to ignore
> zeros (if polarity=0) but for level-triggered interrupts we need them to
> override the previous setting.  But we don't know the trigger mode and
> polarity at this point.
> 
Looked at it and I think we have enough info about trigger mode and
polarity at the point where cache is checked, but we can't switch to
a work queue there because some callers want to get injection state and
this requires injection to be synchronous. Only high level caller knows
if work queue is OK or not.
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 01:31:36PM +0300, Avi Kivity wrote:
> On 08/13/2012 01:24 PM, Gleb Natapov wrote:
> > On Mon, Aug 13, 2012 at 01:21:33PM +0300, Avi Kivity wrote:
> >> On 08/13/2012 01:16 PM, Gleb Natapov wrote:
> >> > On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
> >> >> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
> >> >> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> >> >> > > Here is a quick prototype of what we discussed yesterday. This one
> >> >> > > caches only MSI interrupts for now. The obvious problem is that not
> >> >> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use 
> >> >> > > irq
> >> >> > > routing table, so they cannot be cached.
> >> >> > 
> >> >> > We can have a small rcu-managed hash table to look those up.
> >> >> 
> >> >> Yes but how small? We probably need at least one entry
> >> >> per vcpu, no?
> >> >> 
> >> > One entry? We will spend more time managing it than injecting interrupts
> >> > :) ideally we need entry for each IPI sent and for each potential MSI
> >> > from userspace. What happens when hash table is full? 
> >> 
> >> Drop the entire cache.
> >> 
> > OK. Then it should be big enough to not do it frequently.
> 
> Should be sized N * vcpus, where N is several dozen (generous amount of
> non-device vectors, though multicast will break it since it's
> essentially random).
> 
We can even grow it at runtime if it fills out frequently.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tsc: use kvmclock for calibration

2012-08-13 Thread Gerd Hoffmann
  Hi,

> Isnt pmtimer ioport usable? 14MHz.

Can give it a try.  14 MHz looks wrong though, apci.h says:

/* PM Timer ticks per second (HZ) */
#define PM_TIMER_FREQUENCY  3579545

Is this fixed?  Or hardware specific?

cheers,
  Gerd

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 01:31:36PM +0300, Avi Kivity wrote:
> On 08/13/2012 01:24 PM, Gleb Natapov wrote:
> > On Mon, Aug 13, 2012 at 01:21:33PM +0300, Avi Kivity wrote:
> >> On 08/13/2012 01:16 PM, Gleb Natapov wrote:
> >> > On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
> >> >> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
> >> >> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> >> >> > > Here is a quick prototype of what we discussed yesterday. This one
> >> >> > > caches only MSI interrupts for now. The obvious problem is that not
> >> >> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use 
> >> >> > > irq
> >> >> > > routing table, so they cannot be cached.
> >> >> > 
> >> >> > We can have a small rcu-managed hash table to look those up.
> >> >> 
> >> >> Yes but how small? We probably need at least one entry
> >> >> per vcpu, no?
> >> >> 
> >> > One entry? We will spend more time managing it than injecting interrupts
> >> > :) ideally we need entry for each IPI sent and for each potential MSI
> >> > from userspace. What happens when hash table is full? 
> >> 
> >> Drop the entire cache.
> >> 
> > OK. Then it should be big enough to not do it frequently.
> 
> Should be sized N * vcpus, where N is several dozen (generous amount of
> non-device vectors, though multicast will break it since it's
> essentially random).

KVM_MAX_VCPUS is 256 multiply by what? 50? this is 10K already.
You can not allocate that much in a single chunk, right?

> 
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH repost] kvm: drop parameter validation

2012-08-13 Thread Michael S. Tsirkin
We validate irq pin number when routing is setup, so
code handling illegal irq # in pic and ioapic on each injection
is never called.
Drop it.

Signed-off-by: Michael S. Tsirkin 
---

Reposting, applies without changes to kvm/next.

 arch/x86/kvm/i8259.c | 16 +++-
 virt/kvm/ioapic.c| 35 +--
 2 files changed, 24 insertions(+), 27 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 1df8fb9..277ec0d 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -190,17 +190,15 @@ void kvm_pic_update_irq(struct kvm_pic *s)
 
 int kvm_pic_set_irq(struct kvm_pic *s, int irq, int irq_source_id, int level)
 {
-   int ret = -1;
+   int ret, irq_level;
 
pic_lock(s);
-   if (irq >= 0 && irq < PIC_NUM_PINS) {
-   int irq_level = __kvm_irq_line_state(&s->irq_states[irq],
-irq_source_id, level);
-   ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, irq_level);
-   pic_update_irq(s);
-   trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
- s->pics[irq >> 3].imr, ret == 0);
-   }
+   irq_level = __kvm_irq_line_state(&s->irq_states[irq],
+irq_source_id, level);
+   ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, irq_level);
+   pic_update_irq(s);
+   trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
+ s->pics[irq >> 3].imr, ret == 0);
pic_unlock(s);
 
return ret;
diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
index ef61d52..4d824c7 100644
--- a/virt/kvm/ioapic.c
+++ b/virt/kvm/ioapic.c
@@ -197,28 +197,27 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
irq, int irq_source_id,
u32 old_irr;
u32 mask = 1 << irq;
union kvm_ioapic_redirect_entry entry;
-   int ret = 1;
+   int ret, irq_level;
 
spin_lock(&ioapic->lock);
old_irr = ioapic->irr;
-   if (irq >= 0 && irq < IOAPIC_NUM_PINS) {
-   int irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
-irq_source_id, level);
-   entry = ioapic->redirtbl[irq];
-   irq_level ^= entry.fields.polarity;
-   if (!irq_level)
-   ioapic->irr &= ~mask;
-   else {
-   int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
-   ioapic->irr |= mask;
-   if ((edge && old_irr != ioapic->irr) ||
-   (!edge && !entry.fields.remote_irr))
-   ret = ioapic_service(ioapic, irq);
-   else
-   ret = 0; /* report coalesced interrupt */
-   }
-   trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
+   irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
+irq_source_id, level);
+   entry = ioapic->redirtbl[irq];
+   irq_level ^= entry.fields.polarity;
+   if (!irq_level) {
+   ioapic->irr &= ~mask;
+   ret = 1;
+   } else {
+   int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
+   ioapic->irr |= mask;
+   if ((edge && old_irr != ioapic->irr) ||
+   (!edge && !entry.fields.remote_irr))
+   ret = ioapic_service(ioapic, irq);
+   else
+   ret = 0; /* report coalesced interrupt */
}
+   trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
spin_unlock(&ioapic->lock);
 
return ret;
-- 
MST
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tsc: use kvmclock for calibration

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 12:37:11PM +0200, Gerd Hoffmann wrote:
>   Hi,
> 
> > Isnt pmtimer ioport usable? 14MHz.
> 
> Can give it a try.  14 MHz looks wrong though, apci.h says:
> 
> /* PM Timer ticks per second (HZ) */
> #define PM_TIMER_FREQUENCY  3579545
> 
> Is this fixed?  Or hardware specific?
> 
3.579545 MHz clock required by ACPI spec.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 01:38 PM, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 01:31:36PM +0300, Avi Kivity wrote:
>> On 08/13/2012 01:24 PM, Gleb Natapov wrote:
>> > On Mon, Aug 13, 2012 at 01:21:33PM +0300, Avi Kivity wrote:
>> >> On 08/13/2012 01:16 PM, Gleb Natapov wrote:
>> >> > On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
>> >> >> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
>> >> >> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
>> >> >> > > Here is a quick prototype of what we discussed yesterday. This one
>> >> >> > > caches only MSI interrupts for now. The obvious problem is that not
>> >> >> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) use 
>> >> >> > > irq
>> >> >> > > routing table, so they cannot be cached.
>> >> >> > 
>> >> >> > We can have a small rcu-managed hash table to look those up.
>> >> >> 
>> >> >> Yes but how small? We probably need at least one entry
>> >> >> per vcpu, no?
>> >> >> 
>> >> > One entry? We will spend more time managing it than injecting interrupts
>> >> > :) ideally we need entry for each IPI sent and for each potential MSI
>> >> > from userspace. What happens when hash table is full? 
>> >> 
>> >> Drop the entire cache.
>> >> 
>> > OK. Then it should be big enough to not do it frequently.
>> 
>> Should be sized N * vcpus, where N is several dozen (generous amount of
>> non-device vectors, though multicast will break it since it's
>> essentially random).
> 
> KVM_MAX_VCPUS is 256 multiply by what? 50? this is 10K already.
> You can not allocate that much in a single chunk, right?

Actually this is overkill.  Suppose we do an apicid->vcpu translation
cache?  Then we retain O(1) behaviour, no need for a huge cache.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 01:58:21PM +0300, Avi Kivity wrote:
> On 08/13/2012 01:38 PM, Michael S. Tsirkin wrote:
> > On Mon, Aug 13, 2012 at 01:31:36PM +0300, Avi Kivity wrote:
> >> On 08/13/2012 01:24 PM, Gleb Natapov wrote:
> >> > On Mon, Aug 13, 2012 at 01:21:33PM +0300, Avi Kivity wrote:
> >> >> On 08/13/2012 01:16 PM, Gleb Natapov wrote:
> >> >> > On Mon, Aug 13, 2012 at 01:12:46PM +0300, Michael S. Tsirkin wrote:
> >> >> >> On Mon, Aug 13, 2012 at 12:36:41PM +0300, Avi Kivity wrote:
> >> >> >> > On 08/13/2012 12:16 PM, Gleb Natapov wrote:
> >> >> >> > > Here is a quick prototype of what we discussed yesterday. This 
> >> >> >> > > one
> >> >> >> > > caches only MSI interrupts for now. The obvious problem is that 
> >> >> >> > > not
> >> >> >> > > all interrupts (namely IPIs and MSIs using KVM_CAP_SIGNAL_MSI) 
> >> >> >> > > use irq
> >> >> >> > > routing table, so they cannot be cached.
> >> >> >> > 
> >> >> >> > We can have a small rcu-managed hash table to look those up.
> >> >> >> 
> >> >> >> Yes but how small? We probably need at least one entry
> >> >> >> per vcpu, no?
> >> >> >> 
> >> >> > One entry? We will spend more time managing it than injecting 
> >> >> > interrupts
> >> >> > :) ideally we need entry for each IPI sent and for each potential MSI
> >> >> > from userspace. What happens when hash table is full? 
> >> >> 
> >> >> Drop the entire cache.
> >> >> 
> >> > OK. Then it should be big enough to not do it frequently.
> >> 
> >> Should be sized N * vcpus, where N is several dozen (generous amount of
> >> non-device vectors, though multicast will break it since it's
> >> essentially random).
> > 
> > KVM_MAX_VCPUS is 256 multiply by what? 50? this is 10K already.
> > You can not allocate that much in a single chunk, right?
> 
> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> 
Not sure I follow.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: PPC: Add return value in prepare_to_enter

2012-08-13 Thread Alexander Graf
Our prepare_to_enter helper wants to be able to return in more circumstances
to the host than only when an interrupt is pending. Broaden the interface a
bit and move even more generic code to the generic helper.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/book3s_pr.c |   12 ++--
 arch/powerpc/kvm/booke.c |   13 ++---
 arch/powerpc/kvm/powerpc.c   |   11 ---
 3 files changed, 20 insertions(+), 16 deletions(-)

diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index a19799d..ec4849b 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -591,6 +591,7 @@ int kvmppc_handle_exit(struct kvm_run *run, struct kvm_vcpu 
*vcpu,
unsigned int exit_nr)
 {
int r = RESUME_HOST;
+   int s;
 
vcpu->stat.sum_exits++;
 
@@ -864,10 +865,10 @@ program_interrupt:
 * again due to a host external interrupt.
 */
local_irq_disable();
-   if (kvmppc_prepare_to_enter(vcpu)) {
+   s = kvmppc_prepare_to_enter(vcpu);
+   if (s <= 0) {
local_irq_enable();
-   run->exit_reason = KVM_EXIT_INTR;
-   r = -EINTR;
+   r = s;
} else {
kvmppc_lazy_ee_enable();
}
@@ -1076,10 +1077,9 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
 * a host external interrupt.
 */
local_irq_disable();
-   if (kvmppc_prepare_to_enter(vcpu)) {
+   ret = kvmppc_prepare_to_enter(vcpu);
+   if (ret <= 0) {
local_irq_enable();
-   kvm_run->exit_reason = KVM_EXIT_INTR;
-   ret = -EINTR;
goto out;
}
 
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 5e8dc19..3627c18 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -480,10 +480,10 @@ int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct 
kvm_vcpu *vcpu)
}
 
local_irq_disable();
-   if (kvmppc_prepare_to_enter(vcpu)) {
+   s = kvmppc_prepare_to_enter(vcpu);
+   if (s <= 0) {
local_irq_enable();
-   kvm_run->exit_reason = KVM_EXIT_INTR;
-   ret = -EINTR;
+   r = s;
goto out;
}
kvmppc_lazy_ee_enable();
@@ -948,11 +948,10 @@ int kvmppc_handle_exit(struct kvm_run *run, struct 
kvm_vcpu *vcpu,
 */
if (!(r & RESUME_HOST)) {
local_irq_disable();
-   if (kvmppc_prepare_to_enter(vcpu)) {
+   s = kvmppc_prepare_to_enter(vcpu);
+   if (s <= 0) {
local_irq_enable();
-   run->exit_reason = KVM_EXIT_INTR;
-   r = (-EINTR << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
-   kvmppc_account_exit(vcpu, SIGNAL_EXITS);
+   r = (s << 2) | RESUME_HOST | (r & RESUME_FLAG_NV);
} else {
kvmppc_lazy_ee_enable();
}
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index c8cec45..51d3e8c 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -53,11 +53,14 @@ int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu)
  * Common checks before entering the guest world.  Call with interrupts
  * disabled.
  *
- * returns !0 if a signal is pending and check_signal is true
+ * returns:
+ *
+ * == 1 if we're ready to go into guest state
+ * <= 0 if we need to go back to the host with return value
  */
 int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
 {
-   int r = 0;
+   int r = 1;
 
WARN_ON_ONCE(!irqs_disabled());
while (true) {
@@ -69,7 +72,9 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
}
 
if (signal_pending(current)) {
-   r = 1;
+   kvmppc_account_exit(vcpu, SIGNAL_EXITS);
+   run->exit_reason = KVM_EXIT_INTR;
+   r = -EINTR;
break;
}
 
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] Make request checks return value aware

2012-08-13 Thread Alexander Graf
These 2 patches allow us to return for arbitrary reasons in
the prepare_to_enter and check_reqest functions. They are a
prerequisite for Bharat's watchdog work.

Alex

Alexander Graf (2):
  KVM: PPC: Add return value in prepare_to_enter
  KVM: PPC: Add return value to core_check_requests

 arch/powerpc/include/asm/kvm_ppc.h |2 +-
 arch/powerpc/kvm/book3s_pr.c   |   18 +++---
 arch/powerpc/kvm/booke.c   |   19 +++
 arch/powerpc/kvm/powerpc.c |   17 -
 4 files changed, 35 insertions(+), 21 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: Add return value to core_check_requests

2012-08-13 Thread Alexander Graf
Requests may want to tell us that we need to go back into host state,
so add a return value for the checks.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/include/asm/kvm_ppc.h |2 +-
 arch/powerpc/kvm/book3s_pr.c   |6 +-
 arch/powerpc/kvm/booke.c   |6 +-
 arch/powerpc/kvm/powerpc.c |6 --
 4 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/include/asm/kvm_ppc.h 
b/arch/powerpc/include/asm/kvm_ppc.h
index 5459364..3dfc437 100644
--- a/arch/powerpc/include/asm/kvm_ppc.h
+++ b/arch/powerpc/include/asm/kvm_ppc.h
@@ -112,7 +112,7 @@ extern int kvmppc_core_emulate_mtspr(struct kvm_vcpu *vcpu, 
int sprn,
 ulong val);
 extern int kvmppc_core_emulate_mfspr(struct kvm_vcpu *vcpu, int sprn,
 ulong *val);
-extern void kvmppc_core_check_requests(struct kvm_vcpu *vcpu);
+extern int kvmppc_core_check_requests(struct kvm_vcpu *vcpu);
 
 extern int kvmppc_booke_init(void);
 extern void kvmppc_booke_exit(void);
diff --git a/arch/powerpc/kvm/book3s_pr.c b/arch/powerpc/kvm/book3s_pr.c
index ec4849b..ee13d58 100644
--- a/arch/powerpc/kvm/book3s_pr.c
+++ b/arch/powerpc/kvm/book3s_pr.c
@@ -86,12 +86,16 @@ void kvmppc_core_vcpu_put(struct kvm_vcpu *vcpu)
kvmppc_giveup_ext(vcpu, MSR_VSX);
 }
 
-void kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
+int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 {
+   int r = 1; /* Indicate we want to get back into the guest */
+
/* We misuse TLB_FLUSH to indicate that we want to clear
   all shadow cache entries */
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
kvmppc_mmu_pte_flush(vcpu, 0, 0);
+
+   return r;
 }
 
 /* MMU Notifiers */
diff --git a/arch/powerpc/kvm/booke.c b/arch/powerpc/kvm/booke.c
index 3627c18..25dff80 100644
--- a/arch/powerpc/kvm/booke.c
+++ b/arch/powerpc/kvm/booke.c
@@ -455,14 +455,18 @@ int kvmppc_core_prepare_to_enter(struct kvm_vcpu *vcpu)
return r;
 }
 
-void kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
+int kvmppc_core_check_requests(struct kvm_vcpu *vcpu)
 {
+   int r = 1; /* Indicate we want to get back into the guest */
+
if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu))
update_timer_ints(vcpu);
 #if defined(CONFIG_KVM_E500V2) || defined(CONFIG_KVM_E500MC)
if (kvm_check_request(KVM_REQ_TLB_FLUSH, vcpu))
kvmppc_core_flush_tlb(vcpu);
 #endif
+
+   return r;
 }
 
 int kvmppc_vcpu_run(struct kvm_run *kvm_run, struct kvm_vcpu *vcpu)
diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c
index 51d3e8c..7443e20 100644
--- a/arch/powerpc/kvm/powerpc.c
+++ b/arch/powerpc/kvm/powerpc.c
@@ -83,9 +83,11 @@ int kvmppc_prepare_to_enter(struct kvm_vcpu *vcpu)
/* Make sure we process requests preemptable */
local_irq_enable();
trace_kvm_check_requests(vcpu);
-   kvmppc_core_check_requests(vcpu);
+   r = kvmppc_core_check_requests(vcpu);
local_irq_disable();
-   continue;
+   if (r > 0)
+   continue;
+   break;
}
 
if (kvmppc_core_prepare_to_enter(vcpu)) {
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] Introduce a common kvm requests handler

2012-08-13 Thread Alexander Graf

On 09.08.2012, at 08:38, Bharat Bhushan wrote:

> Added a common requests handler which is called before returning to guest.
> This returns non zero value when some request demands exit to userspace.
> 
> Signed-off-by: Bharat Bhushan 

Thanks, applied patches 2 and 3 on top of the 2 patches I just sent out.


Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 02:01 PM, Gleb Natapov wrote:
>> 
>> Actually this is overkill.  Suppose we do an apicid->vcpu translation
>> cache?  Then we retain O(1) behaviour, no need for a huge cache.
>> 
> Not sure I follow.

Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
apic id, using a static lookup table (only changed when the guest
updates apicid or a vcpu is inserted).



-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> >> 
> >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> >> 
> > Not sure I follow.
> 
> Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> apic id, using a static lookup table (only changed when the guest
> updates apicid or a vcpu is inserted).
> 
To check that MSI/IPI is unicast you need to check a lot of things: delivery
mode, shorthand, dest mode, vector. In short everything but level. This
is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
is not enough, caching (delivery mode, shorthand, dest mode,
vector)->vcpu is enough and this is exactly what the patch does for irq
routing entries.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> >> 
> >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> >> 
> > Not sure I follow.
> 
> Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> apic id, using a static lookup table (only changed when the guest
> updates apicid or a vcpu is inserted).

Looks like kvm_apic_id is always 8 bit, so it's just a
table with 256 entries?


> 
> 
> -- 
> error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 02:12:41PM +0300, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> > On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> > >> 
> > >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> > >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> > >> 
> > > Not sure I follow.
> > 
> > Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> > apic id, using a static lookup table (only changed when the guest
> > updates apicid or a vcpu is inserted).
> > 
> To check that MSI/IPI is unicast you need to check a lot of things: delivery
> mode, shorthand, dest mode, vector. In short everything but level. This
> is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
> is not enough, caching (delivery mode, shorthand, dest mode,
> vector)->vcpu is enough and this is exactly what the patch does for irq
> routing entries.

At least for MSI I think it is simple. Here's the relevant code from
my old patch:

+static bool kvm_msi_is_multicast(unsigned dest, int dest_mode)
+{
+   if (dest_mode == 0)
+   /* Physical mode. */
+   return dest == 0xff;
+   else
+   /* Logical mode. */
+   return dest & (dest - 1);
+}


> --
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 02:22:14PM +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 02:12:41PM +0300, Gleb Natapov wrote:
> > On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> > > On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> > > >> 
> > > >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> > > >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> > > >> 
> > > > Not sure I follow.
> > > 
> > > Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> > > apic id, using a static lookup table (only changed when the guest
> > > updates apicid or a vcpu is inserted).
> > > 
> > To check that MSI/IPI is unicast you need to check a lot of things: delivery
> > mode, shorthand, dest mode, vector. In short everything but level. This
> > is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
> > is not enough, caching (delivery mode, shorthand, dest mode,
> > vector)->vcpu is enough and this is exactly what the patch does for irq
> > routing entries.
> 
> At least for MSI I think it is simple. Here's the relevant code from
> my old patch:
> 
> +static bool kvm_msi_is_multicast(unsigned dest, int dest_mode)
> +{
> +   if (dest_mode == 0)
> +   /* Physical mode. */
> +   return dest == 0xff;
> +   else
> +   /* Logical mode. */
> +   return dest & (dest - 1);
> +}
> 
MSI does not have shorthand, so it is simpler but the code above does
work for APIC_DFR_CLUSTER as far as I can tell and it does not check
lowest prio, which is not multicast, but should bot be cached.
 
--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 02:12 PM, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
>> On 08/13/2012 02:01 PM, Gleb Natapov wrote:
>> >> 
>> >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
>> >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
>> >> 
>> > Not sure I follow.
>> 
>> Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
>> apic id, using a static lookup table (only changed when the guest
>> updates apicid or a vcpu is inserted).
>> 
> To check that MSI/IPI is unicast you need to check a lot of things: delivery
> mode, shorthand, dest mode, vector. In short everything but level. This
> is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
> is not enough, caching (delivery mode, shorthand, dest mode,
> vector)->vcpu is enough and this is exactly what the patch does for irq
> routing entries.


apicid is checked in a loop, the others aren't.  apicid is
unpredicatable; the others are.

I think we should use apicid loopup exclusively.  It doesn't accelerate
everything, but most things, and is common to all unicast interrupts
except PIC (and we can also precompute the target vcpu for PIC, too).

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 02:30:49PM +0300, Avi Kivity wrote:
> On 08/13/2012 02:12 PM, Gleb Natapov wrote:
> > On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> >> On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> >> >> 
> >> >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> >> >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> >> >> 
> >> > Not sure I follow.
> >> 
> >> Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> >> apic id, using a static lookup table (only changed when the guest
> >> updates apicid or a vcpu is inserted).
> >> 
> > To check that MSI/IPI is unicast you need to check a lot of things: delivery
> > mode, shorthand, dest mode, vector. In short everything but level. This
> > is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
> > is not enough, caching (delivery mode, shorthand, dest mode,
> > vector)->vcpu is enough and this is exactly what the patch does for irq
> > routing entries.
> 
> 
> apicid is checked in a loop, the others aren't. 
Along with dest_id.

>apicid is
> unpredicatable; the others are.
What do you mean "unpredicatable"?

> 
> I think we should use apicid loopup exclusively.  It doesn't accelerate
> everything, but most things, and is common to all unicast interrupts
> except PIC (and we can also precompute the target vcpu for PIC, too).
> 
We can change kvm_irq_delivery_to_apic() to avoid the loop if interrupt
is physical, non broadcast, non low prio. Do whatever it does now
otherwise. You think we do not need cache in such case?

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 02:29:31PM +0300, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 02:22:14PM +0300, Michael S. Tsirkin wrote:
> > On Mon, Aug 13, 2012 at 02:12:41PM +0300, Gleb Natapov wrote:
> > > On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> > > > On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> > > > >> 
> > > > >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> > > > >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> > > > >> 
> > > > > Not sure I follow.
> > > > 
> > > > Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> > > > apic id, using a static lookup table (only changed when the guest
> > > > updates apicid or a vcpu is inserted).
> > > > 
> > > To check that MSI/IPI is unicast you need to check a lot of things: 
> > > delivery
> > > mode, shorthand, dest mode, vector. In short everything but level. This
> > > is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
> > > is not enough, caching (delivery mode, shorthand, dest mode,
> > > vector)->vcpu is enough and this is exactly what the patch does for irq
> > > routing entries.
> > 
> > At least for MSI I think it is simple. Here's the relevant code from
> > my old patch:
> > 
> > +static bool kvm_msi_is_multicast(unsigned dest, int dest_mode)
> > +{
> > +   if (dest_mode == 0)
> > +   /* Physical mode. */
> > +   return dest == 0xff;
> > +   else
> > +   /* Logical mode. */
> > +   return dest & (dest - 1);
> > +}
> > 
> MSI does not have shorthand, so it is simpler but the code above does
> work for APIC_DFR_CLUSTER as far as I can tell and it does not check
> lowest prio, which is not multicast, but should bot be cached.
>  
It also a little bit pessimistic for logical mode. Dest may have more
than one bit set, but be delivered to only one cpu.

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Jan Kiszka
On 2012-08-13 10:35, Nicholas A. Bellinger wrote:
> From: Nicholas Bellinger 
> 
> This is required to get past the following assert with:
> 
> commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> Author: Jan Kiszka 
> Date:   Thu May 17 10:32:39 2012 -0300
> 
> virtio/vhost: Add support for KVM in-kernel MSI injection
> 
> Cc: Stefan Hajnoczi 
> Cc: Jan Kiszka 
> Cc: Paolo Bonzini 
> Cc: Anthony Liguori 
> Signed-off-by: Nicholas Bellinger 
> ---
>  hw/msix.c |3 +++
>  1 files changed, 3 insertions(+), 0 deletions(-)
> 
> diff --git a/hw/msix.c b/hw/msix.c
> index 800fc32..c1e6dc3 100644
> --- a/hw/msix.c
> +++ b/hw/msix.c
> @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
>  {
>  int vector;
>  
> +if (!dev->msix_vector_use_notifier && !dev->msix_vector_release_notifier)
> +return;
> +
>  assert(dev->msix_vector_use_notifier &&
> dev->msix_vector_release_notifier);
>  
> 

I think to remember pointing out that there is a bug somewhere in the
reset code which deactivates a non-active vhost instance, no?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 02:41 PM, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 02:30:49PM +0300, Avi Kivity wrote:
>> On 08/13/2012 02:12 PM, Gleb Natapov wrote:
>> > On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
>> >> On 08/13/2012 02:01 PM, Gleb Natapov wrote:
>> >> >> 
>> >> >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
>> >> >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
>> >> >> 
>> >> > Not sure I follow.
>> >> 
>> >> Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
>> >> apic id, using a static lookup table (only changed when the guest
>> >> updates apicid or a vcpu is inserted).
>> >> 
>> > To check that MSI/IPI is unicast you need to check a lot of things: 
>> > delivery
>> > mode, shorthand, dest mode, vector. In short everything but level. This
>> > is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
>> > is not enough, caching (delivery mode, shorthand, dest mode,
>> > vector)->vcpu is enough and this is exactly what the patch does for irq
>> > routing entries.
>> 
>> 
>> apicid is checked in a loop, the others aren't. 
> Along with dest_id.

Right, that is converted to a lookup.

> 
>>apicid is
>> unpredicatable; the others are.
> What do you mean "unpredicatable"?

In terms of branch prediction.  We can't tell when the loop will
terminate.  On the other hand most IPIs are likely to have the same
delivery mode/shorthand/dest mode.

(not entirely true, we can expect a mix of broadcast/unicast/multicast)

> 
>> 
>> I think we should use apicid loopup exclusively.  It doesn't accelerate
>> everything, but most things, and is common to all unicast interrupts
>> except PIC (and we can also precompute the target vcpu for PIC, too).
>> 
> We can change kvm_irq_delivery_to_apic() to avoid the loop if interrupt
> is physical, non broadcast, non low prio. Do whatever it does now
> otherwise. You think we do not need cache in such case?

We can also loop in logical, since the loop is limited to 16 (in x2apic
mode); it doesn't scale with the number of vcpus.  We need a lookup
table of cluster id -> array of 16 vcpus.

Broadcast obviously must loop, no cache can help.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Avi Kivity
On 08/13/2012 02:43 PM, Gleb Natapov wrote:

>> MSI does not have shorthand, so it is simpler but the code above does
>> work for APIC_DFR_CLUSTER as far as I can tell and it does not check
>> lowest prio, which is not multicast, but should bot be cached.
>>  
> It also a little bit pessimistic for logical mode. Dest may have more
> than one bit set, but be delivered to only one cpu.

We can still loop, for_each_set_bit().  Even if all are set, it's
limited to 16.


-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] KVM: VMX: Advertize RDTSC exiting to nested guests

2012-08-13 Thread Avi Kivity
All processors that support VMX have that feature, and guests (Xen) depend on
it.  As we already implement it, advertize it to the guest.

Signed-off-by: Avi Kivity 
---
 arch/x86/kvm/vmx.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index cc8ad98..8092f25 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -1990,7 +1990,7 @@ static __init void nested_vmx_setup_ctls_msrs(void)
 #endif
CPU_BASED_MOV_DR_EXITING | CPU_BASED_UNCOND_IO_EXITING |
CPU_BASED_USE_IO_BITMAPS | CPU_BASED_MONITOR_EXITING |
-   CPU_BASED_RDPMC_EXITING |
+   CPU_BASED_RDPMC_EXITING | CPU_BASED_RDTSC_EXITING |
CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
/*
 * We can allow some features even when not supported by the
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [SeaBIOS] [PATCH] tsc: use kvmclock for calibration

2012-08-13 Thread Fred .
Add a comment about it in the source code.

-#define PM_TIMER_FREQUENCY  3579545
+#define PM_TIMER_FREQUENCY  3579545 // 3.579545 MHz clock required by
ACPI spec.

On Mon, Aug 13, 2012 at 12:46 PM, Gleb Natapov  wrote:
> On Mon, Aug 13, 2012 at 12:37:11PM +0200, Gerd Hoffmann wrote:
>>   Hi,
>>
>> > Isnt pmtimer ioport usable? 14MHz.
>>
>> Can give it a try.  14 MHz looks wrong though, apci.h says:
>>
>> /* PM Timer ticks per second (HZ) */
>> #define PM_TIMER_FREQUENCY  3579545
>>
>> Is this fixed?  Or hardware specific?
>>
> 3.579545 MHz clock required by ACPI spec.
>
> --
> Gleb.
>
> ___
> SeaBIOS mailing list
> seab...@seabios.org
> http://www.seabios.org/mailman/listinfo/seabios
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH 0/2] irq destination caching prototype

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 02:41:47PM +0300, Gleb Natapov wrote:
> On Mon, Aug 13, 2012 at 02:30:49PM +0300, Avi Kivity wrote:
> > On 08/13/2012 02:12 PM, Gleb Natapov wrote:
> > > On Mon, Aug 13, 2012 at 02:03:51PM +0300, Avi Kivity wrote:
> > >> On 08/13/2012 02:01 PM, Gleb Natapov wrote:
> > >> >> 
> > >> >> Actually this is overkill.  Suppose we do an apicid->vcpu translation
> > >> >> cache?  Then we retain O(1) behaviour, no need for a huge cache.
> > >> >> 
> > >> > Not sure I follow.
> > >> 
> > >> Unicast MSIs and IPIs can be speeded up by looking up the vcpu using the
> > >> apic id, using a static lookup table (only changed when the guest
> > >> updates apicid or a vcpu is inserted).
> > >> 
> > > To check that MSI/IPI is unicast you need to check a lot of things: 
> > > delivery
> > > mode, shorthand, dest mode, vector. In short everything but level. This
> > > is exactly what kvm_irq_delivery_to_apic() is doing. Caching apicid->vcpu
> > > is not enough, caching (delivery mode, shorthand, dest mode,
> > > vector)->vcpu is enough and this is exactly what the patch does for irq
> > > routing entries.
> > 
> > 
> > apicid is checked in a loop, the others aren't. 
> Along with dest_id.
> 
> >apicid is
> > unpredicatable; the others are.
> What do you mean "unpredicatable"?
> 
> > 
> > I think we should use apicid loopup exclusively.  It doesn't accelerate
> > everything, but most things, and is common to all unicast interrupts
> > except PIC (and we can also precompute the target vcpu for PIC, too).
> > 
> We can change kvm_irq_delivery_to_apic() to avoid the loop if interrupt
> is physical,

logical is also not too hard. need two extra tables for
cluster/non cluster.

> non broadcast, non low prio. Do whatever it does now
> otherwise. You think we do not need cache in such case?
> 
> --
>   Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] add acpi pmtimer support

2012-08-13 Thread Gerd Hoffmann
This patch makes seabios use the acpi pmtimer instead of tsc for
timekeeping.  The pmtimer has a fixed frequency and doesn't need
calibration, thus it doesn't suffer from calibration errors due to a
loaded host machine.

Signed-off-by: Gerd Hoffmann 
---
 src/clock.c   |   29 +
 src/pciinit.c |5 +
 src/util.h|1 +
 3 files changed, 35 insertions(+), 0 deletions(-)

diff --git a/src/clock.c b/src/clock.c
index 69e9f17..59f269b 100644
--- a/src/clock.c
+++ b/src/clock.c
@@ -129,11 +129,40 @@ emulate_tsc(void)
 return ret;
 }
 
+u16 pmtimer_ioport VAR16VISIBLE;
+u32 pmtimer_wraps VARLOW;
+u32 pmtimer_last VARLOW;
+
+void pmtimer_init(u16 ioport, u32 khz)
+{
+dprintf(1, "Using pmtimer, ioport 0x%x, freq %d kHz\n", ioport, khz);
+SET_GLOBAL(pmtimer_ioport, ioport);
+SET_GLOBAL(cpu_khz, khz);
+}
+
+static u64 pmtimer_get(void)
+{
+u16 ioport = GET_GLOBAL(pmtimer_ioport);
+u32 wraps = GET_LOW(pmtimer_wraps);
+u32 pmtimer = inl(ioport);
+
+if (pmtimer < GET_LOW(pmtimer_last)) {
+wraps++;
+SET_LOW(pmtimer_wraps, wraps);
+}
+SET_LOW(pmtimer_last, pmtimer);
+
+dprintf(9, "pmtimer: %u:%u\n", wraps, pmtimer);
+return (u64)wraps << 24 | pmtimer;
+}
+
 static u64
 get_tsc(void)
 {
 if (unlikely(GET_GLOBAL(no_tsc)))
 return emulate_tsc();
+if (GET_GLOBAL(pmtimer_ioport))
+return pmtimer_get();
 return rdtscll();
 }
 
diff --git a/src/pciinit.c b/src/pciinit.c
index 68f302a..31115ee 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -180,6 +180,9 @@ static const struct pci_device_id pci_class_tbl[] = {
 PCI_DEVICE_END,
 };
 
+/* PM Timer ticks per second (HZ) */
+#define PM_TIMER_FREQUENCY  3579545
+
 /* PIIX4 Power Management device (for ACPI) */
 static void piix4_pm_init(struct pci_device *pci, void *arg)
 {
@@ -191,6 +194,8 @@ static void piix4_pm_init(struct pci_device *pci, void *arg)
 pci_config_writeb(bdf, 0x80, 0x01); /* enable PM io space */
 pci_config_writel(bdf, 0x90, PORT_SMB_BASE | 1);
 pci_config_writeb(bdf, 0xd2, 0x09); /* enable SMBus io space */
+
+pmtimer_init(PORT_ACPI_PM_BASE + 0x08, PM_TIMER_FREQUENCY / 1000);
 }
 
 static const struct pci_device_id pci_device_tbl[] = {
diff --git a/src/util.h b/src/util.h
index 89e928c..1603a57 100644
--- a/src/util.h
+++ b/src/util.h
@@ -312,6 +312,7 @@ void lpt_setup(void);
 // clock.c
 #define PIT_TICK_RATE 1193180   // Underlying HZ of PIT
 #define PIT_TICK_INTERVAL 65536 // Default interval for 18.2Hz timer
+void pmtimer_init(u16 ioport, u32 khz);
 int check_tsc(u64 end);
 void timer_setup(void);
 void ndelay(u32 count);
-- 
1.7.1

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] TSC in qem[-kvm] 1.1+ and in-kernel irqchip

2012-08-13 Thread Jan Kiszka
On 2012-08-12 11:24, Michael Tokarev wrote:
> On 12.08.2012 12:10, Gleb Natapov wrote:
> []
>> Any chance to bisect it?
> 
> The bisecion leads to this commit:
> 
> commit 17ee47418e65b1593defb30edbab33ccd47fc1f8
> Merge: 13b0496 5d17c0d
> Author: Jan Kiszka 
> Date:   Tue Apr 10 16:26:23 2012 +0200
> 
> Merge commit '5d17c0d2df4998598e6002b27b8e47e792899a0f' into 
> queues/qemu-merge
> 
> Conflicts:
> hw/pc.c
> 
> diff --cc Makefile.target
> index 33a7255,1bd25a8..32c8e42
> --- a/Makefile.target
> +++ b/Makefile.target
> @@@ -245,13 -244,8 +245,13 @@@ obj-i386-y += pci-hotplug.o smbios.o wd
>   obj-i386-y += debugcon.o multiboot.o
>   obj-i386-y += pc_piix.o
>   obj-i386-y += pc_sysfw.o
> - obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o
> + obj-i386-$(CONFIG_KVM) += kvm/clock.o kvm/apic.o kvm/i8259.o kvm/ioapic.o 
> kvm/i8254.o
>   obj-i386-$(CONFIG_SPICE) += qxl.o qxl-logger.o qxl-render.o
>  +obj-i386-y += testdev.o
>  +obj-i386-y += acpi.o acpi_piix4.o
>  +
>  +obj-i386-y += i8254_common.o i8254.o
>  +obj-i386-$(CONFIG_KVM_DEVICE_ASSIGNMENT) += device-assignment.o
> 
>   # shared objects
>   obj-ppc-y = ppc.o ppc_booke.o
> diff --cc hw/pc.c
> index 74c19b9,bb9867b..feb6ef3
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@@ -1116,8 -1118,12 +1122,12 @@@ void pc_basic_device_init(ISABus *isa_b
> 
>   qemu_register_boot_set(pc_boot_set, *rtc_state);
> 
> - pit = pit_init(isa_bus, 0x40, pit_isa_irq, pit_alt_irq);
> + if (kvm_irqchip_in_kernel()) {
> + pit = kvm_pit_init(isa_bus, 0x40);
> + } else {
> + pit = pit_init(isa_bus, 0x40, pit_isa_irq, pit_alt_irq);
> + }
>  -if (hpet) {
>  +if (hpet && !(kvm_enabled() && kvm_irqchip_in_kernel())) {
>   /* connect PIT to output control line of the HPET */
>   qdev_connect_gpio_out(hpet, 0, qdev_get_gpio_in(&pit->qdev, 0));
>   }
> 
> 
> 
> Note this commit itself talks about pit and irqchip.
> But I don't know what does it mean.
> 
> Cc'ing Jan for help.  The short story: tsc timer calibration
> broke in 1.1+ with in-kernel irqchip (only) for several
> apps (seabios and grub are two examples), the time is ticking
> about 100 times faster.  In grub the timer is calibrated
> using pit.  The above commit is the result of bisection.

Did the versions you tested include the commit 0cdd3d1444 (kvm: i8254:
Fix conversion of in-kernel to userspace state)?

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] TSC in qem[-kvm] 1.1+ and in-kernel irqchip

2012-08-13 Thread Michael Tokarev
On 13.08.2012 17:07, Jan Kiszka wrote:
[]
>> The bisecion leads to this commit:
>>
>> commit 17ee47418e65b1593defb30edbab33ccd47fc1f8
>> Merge: 13b0496 5d17c0d
>> Author: Jan Kiszka 
>> Date:   Tue Apr 10 16:26:23 2012 +0200
>>
>> Merge commit '5d17c0d2df4998598e6002b27b8e47e792899a0f' into 
>> queues/qemu-merge
[]
>> Cc'ing Jan for help.  The short story: tsc timer calibration
>> broke in 1.1+ with in-kernel irqchip (only) for several
>> apps (seabios and grub are two examples), the time is ticking
>> about 100 times faster.  In grub the timer is calibrated
>> using pit.  The above commit is the result of bisection.
> 
> Did the versions you tested include the commit 0cdd3d1444 (kvm: i8254:
> Fix conversion of in-kernel to userspace state)?

While bisecting I didn't have this commit applied, since it were
applied past (qemu)-1.1.  It is included into qemu-kvm 1.1.0
(as 960d355dc60d9), and that version behaves _exactly_ the same -
the time in grub is ticking 100 times faster.  I mentioned in this
thread that the problem persists in current qemu (and qemu-kvm)
git too.

I can repeat the bisection with this commit applied after the
above "bad" commit.  Should I?

Thanks,

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] KVM: PPC: Assorted 440 fixes

2012-08-13 Thread Alexander Graf
With these patches, I get at least shortly into a KVM guest on 440. It
fails for me later on, but I'm quite sure that's an orthogonal issue.

Alexander Graf (2):
  KVM: PPC: 44x: Initialize PVR
  KVM: PPC: BookE: Add MCSR SPR support

 arch/powerpc/kvm/44x.c   |1 +
 arch/powerpc/kvm/booke_emulate.c |6 ++
 2 files changed, 7 insertions(+), 0 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] KVM: PPC: 44x: Initialize PVR

2012-08-13 Thread Alexander Graf
We need to make sure that vcpu->arch.pvr is initialized to a sane value,
so let's just take the host PVR.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/44x.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/44x.c b/arch/powerpc/kvm/44x.c
index 50e7dbc..3d7fd21 100644
--- a/arch/powerpc/kvm/44x.c
+++ b/arch/powerpc/kvm/44x.c
@@ -83,6 +83,7 @@ int kvmppc_core_vcpu_setup(struct kvm_vcpu *vcpu)
vcpu_44x->shadow_refs[i].gtlb_index = -1;
 
vcpu->arch.cpu_type = KVM_CPU_440;
+   vcpu->arch.pvr = mfspr(SPRN_PVR);
 
return 0;
 }
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] KVM: PPC: BookE: Add MCSR SPR support

2012-08-13 Thread Alexander Graf
Add support for the MCSR SPR. This only implements the SPR storage
bits, not actual machine checks.

Signed-off-by: Alexander Graf 
---
 arch/powerpc/kvm/booke_emulate.c |6 ++
 1 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kvm/booke_emulate.c b/arch/powerpc/kvm/booke_emulate.c
index cc99a0b..514790f 100644
--- a/arch/powerpc/kvm/booke_emulate.c
+++ b/arch/powerpc/kvm/booke_emulate.c
@@ -237,6 +237,9 @@ int kvmppc_booke_emulate_mtspr(struct kvm_vcpu *vcpu, int 
sprn, ulong spr_val)
case SPRN_IVOR15:
vcpu->arch.ivor[BOOKE_IRQPRIO_DEBUG] = spr_val;
break;
+   case SPRN_MCSR:
+   vcpu->arch.mcsr &= ~spr_val;
+   break;
 
default:
emulated = EMULATE_FAIL;
@@ -329,6 +332,9 @@ int kvmppc_booke_emulate_mfspr(struct kvm_vcpu *vcpu, int 
sprn, ulong *spr_val)
case SPRN_IVOR15:
*spr_val = vcpu->arch.ivor[BOOKE_IRQPRIO_DEBUG];
break;
+   case SPRN_MCSR:
+   *spr_val = vcpu->arch.mcsr;
+   break;
 
default:
emulated = EMULATE_FAIL;
-- 
1.6.0.2

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Anthony Liguori
Alex Williamson  writes:

> VFIO kernel support was just merged into Linux, so I'd like to
> formally propose inclusion of the QEMU vfio-pci driver for
> QEMU 1.2.  Included here is support for x86 PCI device assignment.
> PCI INTx is not yet enabled, but devices making use of either MSI
> or MSI-X work.  The level irqfd and eoifd support I've proposed
> for KVM enable an accelerated patch for this through KVM.  I'd
> like to get this base driver in first and enable the remaining
> support in-tree.
>
> I've split this version up a little from the RFC to make it a bit
> easier to review.  Review comments from Blue Swirl and Avi are
> already incorporated, including Avi's requests to simplify both
> the PCI BAR mapping and unmapping paths.

Hi Alex,

Thanks for pushing this forward!  Hopefully this will finally kill off
qemu-kvm.git for good.

I think this series is going to have to wait for 1.3 to open up.  We
have a very short release window for this release and I'd feel a lot
more comfortable having such a significant feature spend some time in
the development cycle getting testing/review.

I'd like to see a few Reviewed-by's too for this series before it goes
in.  I expect they won't be hard to get but I also expect it will take a
few more revisions of this series to get there.

Regards,

Anthony Liguori

>
> This series is also available at:
>
> git://github.com/awilliam/qemu-vfio.git tags/vfio-pci-for-qemu-1.2
>
> Thanks,
>
> Alex
>
> ---
>
> Alex Williamson (3):
>   vfio: Enable vfio-pci and mark supported
>   vfio: vfio-pci device assignment driver
>   vfio: Import vfio kernel header
>
>
>  MAINTAINERS|5 
>  configure  |   12 
>  hw/i386/Makefile.objs  |1 
>  hw/vfio_pci.c  | 1853 
> 
>  hw/vfio_pci.h  |  101 ++
>  linux-headers/linux/vfio.h |  368 +
>  6 files changed, 2340 insertions(+)
>  create mode 100644 hw/vfio_pci.c
>  create mode 100644 hw/vfio_pci.h
>  create mode 100644 linux-headers/linux/vfio.h
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [Qemu-devel] TSC in qem[-kvm] 1.1+ and in-kernel irqchip

2012-08-13 Thread Jan Kiszka
On 2012-08-13 15:16, Michael Tokarev wrote:
> On 13.08.2012 17:07, Jan Kiszka wrote:
> []
>>> The bisecion leads to this commit:
>>>
>>> commit 17ee47418e65b1593defb30edbab33ccd47fc1f8
>>> Merge: 13b0496 5d17c0d
>>> Author: Jan Kiszka 
>>> Date:   Tue Apr 10 16:26:23 2012 +0200
>>>
>>> Merge commit '5d17c0d2df4998598e6002b27b8e47e792899a0f' into 
>>> queues/qemu-merge
> []
>>> Cc'ing Jan for help.  The short story: tsc timer calibration
>>> broke in 1.1+ with in-kernel irqchip (only) for several
>>> apps (seabios and grub are two examples), the time is ticking
>>> about 100 times faster.  In grub the timer is calibrated
>>> using pit.  The above commit is the result of bisection.
>>
>> Did the versions you tested include the commit 0cdd3d1444 (kvm: i8254:
>> Fix conversion of in-kernel to userspace state)?
> 
> While bisecting I didn't have this commit applied, since it were
> applied past (qemu)-1.1.  It is included into qemu-kvm 1.1.0
> (as 960d355dc60d9), and that version behaves _exactly_ the same -
> the time in grub is ticking 100 times faster.  I mentioned in this
> thread that the problem persists in current qemu (and qemu-kvm)
> git too.
> 
> I can repeat the bisection with this commit applied after the
> above "bad" commit.  Should I?

Don't think this will make a difference. There is some other issue. I
reproduced the bug, will see if I can analyze it.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Avi Kivity
On 08/13/2012 04:27 PM, Anthony Liguori wrote:

> Thanks for pushing this forward!  Hopefully this will finally kill off
> qemu-kvm.git for good.

No, it won't.  vfio requires a 3.6 kernel, which we cannot assume anyone
has.  We'll need the original device assignment code side-by-side.

-- 
error compiling committee.c: too many arguments to function
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Jan Kiszka
On 2012-08-13 15:58, Avi Kivity wrote:
> On 08/13/2012 04:27 PM, Anthony Liguori wrote:
> 
>> Thanks for pushing this forward!  Hopefully this will finally kill off
>> qemu-kvm.git for good.
> 
> No, it won't.  vfio requires a 3.6 kernel, which we cannot assume anyone
> has.  We'll need the original device assignment code side-by-side.

...which is on my to-do list for 1.3.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Alex Williamson
On Mon, 2012-08-13 at 08:27 -0500, Anthony Liguori wrote:
> Alex Williamson  writes:
> 
> > VFIO kernel support was just merged into Linux, so I'd like to
> > formally propose inclusion of the QEMU vfio-pci driver for
> > QEMU 1.2.  Included here is support for x86 PCI device assignment.
> > PCI INTx is not yet enabled, but devices making use of either MSI
> > or MSI-X work.  The level irqfd and eoifd support I've proposed
> > for KVM enable an accelerated patch for this through KVM.  I'd
> > like to get this base driver in first and enable the remaining
> > support in-tree.
> >
> > I've split this version up a little from the RFC to make it a bit
> > easier to review.  Review comments from Blue Swirl and Avi are
> > already incorporated, including Avi's requests to simplify both
> > the PCI BAR mapping and unmapping paths.
> 
> Hi Alex,
> 
> Thanks for pushing this forward!  Hopefully this will finally kill off
> qemu-kvm.git for good.
> 
> I think this series is going to have to wait for 1.3 to open up.  We
> have a very short release window for this release and I'd feel a lot
> more comfortable having such a significant feature spend some time in
> the development cycle getting testing/review.
> 
> I'd like to see a few Reviewed-by's too for this series before it goes
> in.  I expect they won't be hard to get but I also expect it will take a
> few more revisions of this series to get there.

That's disappointing, but I can understand your reluctance.  Blue Swirl
reviewed the RFC and could perhaps add a Reviewed-by.  Alexey has been
working on the POWER port and I'm sure could provide a Reviewed-by.  We
also have a few early adopters that are already making use of this code.
Towards accepting it, the driver is entirely self contained, there's
really no risk to the rest of qemu.  The only missing functionality is
legacy interrupt support.  Perhaps there's a compromise where this
driver could be considered a tech preview in 1.2 (x-vfio-pci?).  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH repost] kvm: drop parameter validation

2012-08-13 Thread Gleb Natapov
On Mon, Aug 13, 2012 at 01:43:58PM +0300, Michael S. Tsirkin wrote:
> We validate irq pin number when routing is setup, so
> code handling illegal irq # in pic and ioapic on each injection
> is never called.
> Drop it.
> 
I would leave BUG_ON there for a while.

> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Reposting, applies without changes to kvm/next.
> 
>  arch/x86/kvm/i8259.c | 16 +++-
>  virt/kvm/ioapic.c| 35 +--
>  2 files changed, 24 insertions(+), 27 deletions(-)
> 
> diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
> index 1df8fb9..277ec0d 100644
> --- a/arch/x86/kvm/i8259.c
> +++ b/arch/x86/kvm/i8259.c
> @@ -190,17 +190,15 @@ void kvm_pic_update_irq(struct kvm_pic *s)
>  
>  int kvm_pic_set_irq(struct kvm_pic *s, int irq, int irq_source_id, int level)
>  {
> - int ret = -1;
> + int ret, irq_level;
>  
>   pic_lock(s);
> - if (irq >= 0 && irq < PIC_NUM_PINS) {
> - int irq_level = __kvm_irq_line_state(&s->irq_states[irq],
> -  irq_source_id, level);
> - ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, irq_level);
> - pic_update_irq(s);
> - trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
> -   s->pics[irq >> 3].imr, ret == 0);
> - }
> + irq_level = __kvm_irq_line_state(&s->irq_states[irq],
> +  irq_source_id, level);
> + ret = pic_set_irq1(&s->pics[irq >> 3], irq & 7, irq_level);
> + pic_update_irq(s);
> + trace_kvm_pic_set_irq(irq >> 3, irq & 7, s->pics[irq >> 3].elcr,
> +   s->pics[irq >> 3].imr, ret == 0);
>   pic_unlock(s);
>  
>   return ret;
> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
> index ef61d52..4d824c7 100644
> --- a/virt/kvm/ioapic.c
> +++ b/virt/kvm/ioapic.c
> @@ -197,28 +197,27 @@ int kvm_ioapic_set_irq(struct kvm_ioapic *ioapic, int 
> irq, int irq_source_id,
>   u32 old_irr;
>   u32 mask = 1 << irq;
>   union kvm_ioapic_redirect_entry entry;
> - int ret = 1;
> + int ret, irq_level;
>  
>   spin_lock(&ioapic->lock);
>   old_irr = ioapic->irr;
> - if (irq >= 0 && irq < IOAPIC_NUM_PINS) {
> - int irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
> -  irq_source_id, level);
> - entry = ioapic->redirtbl[irq];
> - irq_level ^= entry.fields.polarity;
> - if (!irq_level)
> - ioapic->irr &= ~mask;
> - else {
> - int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
> - ioapic->irr |= mask;
> - if ((edge && old_irr != ioapic->irr) ||
> - (!edge && !entry.fields.remote_irr))
> - ret = ioapic_service(ioapic, irq);
> - else
> - ret = 0; /* report coalesced interrupt */
> - }
> - trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
> + irq_level = __kvm_irq_line_state(&ioapic->irq_states[irq],
> +  irq_source_id, level);
> + entry = ioapic->redirtbl[irq];
> + irq_level ^= entry.fields.polarity;
> + if (!irq_level) {
> + ioapic->irr &= ~mask;
> + ret = 1;
> + } else {
> + int edge = (entry.fields.trig_mode == IOAPIC_EDGE_TRIG);
> + ioapic->irr |= mask;
> + if ((edge && old_irr != ioapic->irr) ||
> + (!edge && !entry.fields.remote_irr))
> + ret = ioapic_service(ioapic, irq);
> + else
> + ret = 0; /* report coalesced interrupt */
>   }
> + trace_kvm_ioapic_set_irq(entry.bits, irq, ret == 0);
>   spin_unlock(&ioapic->lock);
>  
>   return ret;
> -- 
> MST

--
Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv5 0/4] improve speed of "rep ins" emulation

2012-08-13 Thread Richard W.M. Jones
On Mon, Jul 30, 2012 at 05:38:17PM +0300, Gleb Natapov wrote:
> And now for something completely different.
> 
> So this series (or rather the last patch of it) takes different approach
> to "rep ins" optimization. Instead of writing separate fast path for
> it do the fast path inside emulator itself. This way nobody can say the
> code is not reused!
> 
> Patch 1,2 are now, strictly speaking, not needed, but I think this is still
> nice cleanup and, in case of patch 1, eliminates some ifs at each KVM_RUN 
> ioctl.
>  
> Gleb Natapov (4):
>   Provide userspace IO exit completion callback.
>   KVM: emulator: make x86 emulation modes enum instead of defines
>   KVM: emulator: string_addr_inc() cleanup
>   KVM: emulator: optimize "rep ins" handling.
> 
>  arch/x86/include/asm/kvm_emulate.h |   26 +-
>  arch/x86/include/asm/kvm_host.h|1 +
>  arch/x86/kvm/emulate.c |   48 ++-
>  arch/x86/kvm/x86.c |   92 
> +---
>  4 files changed, 104 insertions(+), 63 deletions(-)

I tested this patch series and it really helped to improve the
performance of loading the libguestfs -kernel and -initrd:

  https://bugzilla.redhat.com/show_bug.cgi?id=847546

Rich.

-- 
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
virt-p2v converts physical machines to virtual machines.  Boot with a
live CD or over the network (PXE) and turn machines into Xen guests.
http://et.redhat.com/~rjones/virt-p2v
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 16630] New: Intermittently fails to start VM.

2012-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=16630

   Summary: Intermittently fails to start VM.
   Product: Virtualization
   Version: unspecified
Kernel Version: 2.6.18-164.el5
  Platform: All
OS/Version: Linux
  Tree: Mainline
Status: RESOLVED
  Severity: high
  Priority: P1
 Component: kvm
AssignedTo: virtualization_...@kernel-bugs.osdl.org
ReportedBy: lokesh.dube...@gmail.com
CC: a...@lxorguk.ukuu.org.uk
Regression: No



Alan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||a...@lxorguk.ukuu.org.uk
 Resolution||INVALID


command used to start VM:
virsh start vhost0088

std out: 

error: Failed to start domain vhost0088 
error: internal error unable to start guest: char device redirected to
/dev/pts/10.

Description:

Actually I have automated the provisioning and when it fails to start the vm it
removes the vm.img and vm.xml file. So there's no chance of trying a reboot
again.
Grepping logs is also not easy as the error is too intermittent.

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 16631] Intermittently fails to start VM.

2012-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=16631


Alan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||a...@lxorguk.ukuu.org.uk
 Resolution||INVALID




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 17591] kvm hangs when resuming guest (deadlock?)

2012-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=17591


Alan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||a...@lxorguk.ukuu.org.uk
 Resolution||OBSOLETE




--- Comment #8 from Alan   2012-08-13 16:08:51 ---
Closing as obsolete, please re-open if seen versus a modern kernel

-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Andreas Hartmann
Alex Williamson wrote:
> On Mon, 2012-08-13 at 08:27 -0500, Anthony Liguori wrote:
>> Alex Williamson  writes:
>>
>>> VFIO kernel support was just merged into Linux, so I'd like to
>>> formally propose inclusion of the QEMU vfio-pci driver for
>>> QEMU 1.2.  Included here is support for x86 PCI device assignment.
>>> PCI INTx is not yet enabled, but devices making use of either MSI
>>> or MSI-X work.  The level irqfd and eoifd support I've proposed
>>> for KVM enable an accelerated patch for this through KVM.  I'd
>>> like to get this base driver in first and enable the remaining
>>> support in-tree.
>>>
>>> I've split this version up a little from the RFC to make it a bit
>>> easier to review.  Review comments from Blue Swirl and Avi are
>>> already incorporated, including Avi's requests to simplify both
>>> the PCI BAR mapping and unmapping paths.
>>
>> Hi Alex,
>>
>> Thanks for pushing this forward!  Hopefully this will finally kill
>> off qemu-kvm.git for good.
>>
>> I think this series is going to have to wait for 1.3 to open up.  We
>> have a very short release window for this release and I'd feel a lot
>> more comfortable having such a significant feature spend some time in
>> the development cycle getting testing/review.
>>
>> I'd like to see a few Reviewed-by's too for this series before it
>> goes in.  I expect they won't be hard to get but I also expect it
>> will take a few more revisions of this series to get there.
> 
> That's disappointing, but I can understand your reluctance.  Blue
> Swirl reviewed the RFC and could perhaps add a Reviewed-by.  Alexey
> has been working on the POWER port and I'm sure could provide a
> Reviewed-by.  We also have a few early adopters that are already
> making use of this code.

I'm running qemu with vfio patch since Jun 05, 2012
(awilliam-qemu-vfio-v0.14.0-rc0-6402-g323cf9f.tar.gz). I didn't
encounter any problem so far.

If you like, I could compile a more actual version, too (if there have
been any changes).

To see more about my use case:
http://permalink.gmane.org/gmane.linux.drivers.rt2x00.user/1051

You may add a Tested-by Andreas Hartmann 
if you like.

Unfortunately, I'm only running the vfio VM (kvm) with this version of
qemu, but I'm running parallel 4 other VM's with the unchanged version
of qemu (kvm-0.15.0-123.2.x86_64), too.
One of these 4 VM's uses PCIe passthrough.

I now tried to run all VMs with the new version of qemu. At this point,
I unfortunately run into a problem with the VM which passes through a
PCIe device. The error message is (during start of VM):

virsh start VM
error: Failed to start domain VM
error: internal error process exited while connecting to monitor: 
qemu-system-x86_64: -device 
pci-assign,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x6: Parameter 'driver' 
expects device type

The xml file for libvirt looks like this: 


  VM
  ----
  1048576
  262144
  1
  
hvm

  
  


  
  
  destroy
  restart
  destroy
  
/usr/local/bin/qemu-system-x86_64


  

  



  
  
  





  

  



Maybe, this is fixed in a newer version of qemu for vfio?


Thanks,
kind regards,
Andreas Hartmann
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Alex Williamson
On Mon, 2012-08-13 at 17:48 +0200, Andreas Hartmann wrote:
> Alex Williamson wrote:
> > On Mon, 2012-08-13 at 08:27 -0500, Anthony Liguori wrote:
> >> Alex Williamson  writes:
> >>
> >>> VFIO kernel support was just merged into Linux, so I'd like to
> >>> formally propose inclusion of the QEMU vfio-pci driver for
> >>> QEMU 1.2.  Included here is support for x86 PCI device assignment.
> >>> PCI INTx is not yet enabled, but devices making use of either MSI
> >>> or MSI-X work.  The level irqfd and eoifd support I've proposed
> >>> for KVM enable an accelerated patch for this through KVM.  I'd
> >>> like to get this base driver in first and enable the remaining
> >>> support in-tree.
> >>>
> >>> I've split this version up a little from the RFC to make it a bit
> >>> easier to review.  Review comments from Blue Swirl and Avi are
> >>> already incorporated, including Avi's requests to simplify both
> >>> the PCI BAR mapping and unmapping paths.
> >>
> >> Hi Alex,
> >>
> >> Thanks for pushing this forward!  Hopefully this will finally kill
> >> off qemu-kvm.git for good.
> >>
> >> I think this series is going to have to wait for 1.3 to open up.  We
> >> have a very short release window for this release and I'd feel a lot
> >> more comfortable having such a significant feature spend some time in
> >> the development cycle getting testing/review.
> >>
> >> I'd like to see a few Reviewed-by's too for this series before it
> >> goes in.  I expect they won't be hard to get but I also expect it
> >> will take a few more revisions of this series to get there.
> > 
> > That's disappointing, but I can understand your reluctance.  Blue
> > Swirl reviewed the RFC and could perhaps add a Reviewed-by.  Alexey
> > has been working on the POWER port and I'm sure could provide a
> > Reviewed-by.  We also have a few early adopters that are already
> > making use of this code.
> 
> I'm running qemu with vfio patch since Jun 05, 2012
> (awilliam-qemu-vfio-v0.14.0-rc0-6402-g323cf9f.tar.gz). I didn't
> encounter any problem so far.
> 
> If you like, I could compile a more actual version, too (if there have
> been any changes).

The only change in the version proposed for qemu is that legacy
interrupt support has been removed until we can agree on interfaces in
kvm and plumb an EOI path through qemu.  IIRC, the devices you're using
require legacy interrupt support.

> To see more about my use case:
> http://permalink.gmane.org/gmane.linux.drivers.rt2x00.user/1051
> 
> You may add a Tested-by Andreas Hartmann 
> if you like.
> 
> Unfortunately, I'm only running the vfio VM (kvm) with this version of
> qemu, but I'm running parallel 4 other VM's with the unchanged version
> of qemu (kvm-0.15.0-123.2.x86_64), too.
> One of these 4 VM's uses PCIe passthrough.
> 
> I now tried to run all VMs with the new version of qemu. At this point,
> I unfortunately run into a problem with the VM which passes through a
> PCIe device. The error message is (during start of VM):
> 
> virsh start VM
> error: Failed to start domain VM
> error: internal error process exited while connecting to monitor: 
> qemu-system-x86_64: -device 
> pci-assign,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x6: Parameter 'driver' 
> expects device type
> 
> The xml file for libvirt looks like this: 

libvirt doesn't yet support vfio-pci and current qemu.git doesn't yet
support pci-assign.  To use libvirt for device assignment, your only
option right now is to use a qemu-kvm.git based version of qemu.
Thanks,

Alex

> 
>   VM
>   ----
>   1048576
>   262144
>   1
>   
> hvm
> 
>   
>   
> 
> 
>   
>   
>   destroy
>   restart
>   destroy
>   
> /usr/local/bin/qemu-system-x86_64
> 
> 
>   
> 
>   
> 
> 
> 
>   
>   
>   
> 
> 
> 
> 
> 
>   
> 
>   
> 
> 
> 
> Maybe, this is fixed in a newer version of qemu for vfio?
> 
> 
> Thanks,
> kind regards,
> Andreas Hartmann



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Bug 17902] qemu/kvm seem to cause warning or panic

2012-08-13 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=17902


Alan  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 CC||a...@lxorguk.ukuu.org.uk
 Resolution||CODE_FIX




-- 
Configure bugmail: https://bugzilla.kernel.org/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are watching the assignee of the bug.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/5] KVM: PPC: Book3S HV: Handle memory slot deletion and modification correctly

2012-08-13 Thread Marcelo Tosatti
On Sat, Aug 11, 2012 at 10:37:54AM +1000, Paul Mackerras wrote:
> On Fri, Aug 10, 2012 at 03:35:53PM -0300, Marcelo Tosatti wrote:
> > There's no plan. I just wanted to confirm this before converting 
> > to per-memslot flush.
> > 
> > 1) __kvm_set_memory_region line 807:
> > 
> >  *  - gfn_to_hva (kvm_read_guest, gfn_to_pfn)
> >  *  - kvm_is_visible_gfn (mmu_check_roots)
> >  */
> > kvm_arch_flush_shadow(kvm);
> > kfree(old_memslots);
> > }
> > 
> > This can be converted to per-memslot flush.
> 
> Yes, that would be good.
> 
> > 2) __kvm_set_memory_region line 846:
> > 
> > /*
> >  * If the new memory slot is created, we need to clear all
> >  * mmio sptes.
> >  */
> > if (npages && old.base_gfn != mem->guest_phys_addr >> PAGE_SHIFT)
> > kvm_arch_flush_shadow(kvm);
> > 
> > This must flush all translations in the new and old GPA ranges:
> > a) to remove any mmio sptes that existed in the new GPA range
> >(see ce88decffd17bf9f373cc233c for a description).
> 
> I assume that mmio sptes are ones that are created for accesses to
> guest physical addresses where there is no memslot.  On Book3S HV we
> trap those accesses (on POWER7 at least) but we don't create any
> hardware PTEs for them.  So I don't have anything to do here.
> 
> > b) to remove any translations in the old range (this is
> >necessary because change of GPA base is allowed).
> 
> In fact I need to remove any translations in the old range *before*
> the new memslot gets committed, whereas this happens after that.
> This is why I was doing the flush in kvm_arch_prepare_memory_region.

I see. The problem with that is, given that the fault path (reader 
of memslots) is protected only by SRCU, the following window is open:

A) kvm_arch_prepare_memory_region
B) rcu_assign_pointer(kvm->memslots, slots)
C) kvm_arch_commit_memory_region

The window between A and B where a guest pagefault can instantiate a new
translation using the old memslot information (while you already removed
any translations in the old range).


Avi, Gleb, Alex, do you know why it is necessary to support change of
GPA base again? 

Without taking into consideration backwards compatibility, userspace 
can first delete the slot and later create a new one.

> If the new memslot got committed while there were still some
> translations left in the hardware page table, it's possible that the
> guest could ask to remove one of those hardware PTEs.  As part of that
> process, I get the GPA that the HPTE referred to, look up which
> memslot it belongs to, and use the offset from the base_gfn of the
> memslot to index into the memslot's rmap array.  If the base_gfn has
> been changed then I will hit the wrong rmap entry, or possibly not
> find the memslot at all, and bad things will happen.

So, that notification before-commit-of-new-memslot is only necessary for
base change, correct? That is, you don't need that pre notification for
entirely new memslots in a previously unpopulated range.

> Basically, the rmap array has to be empty before we commit the new
> memslot.
> 
> > Alex/Paul, slot creation should be rare (and modification of GPA base
> > should not be used, even though it is supported). But if you really need
> > a new callback, the two points above are what the second call needs (x86
> > will flush everything).
> > 
> > The other two kvm_arch_flush_shadow in kvm_mmu_notifier_release and
> > kvm_destroy_vm must remove all sptes obviously.
> 
> Will these be the only remaining call sites for kvm_arch_flush_shadow?

Yes, should be. OK, lets first get the callbacks in
kvm_set_memory_region right and later you can restrict
kvm_arch_flush_shadow as necessary.

> I can see an optimization here if no vcpus are running (which should
> be the case) -- I can set a flag to say that the hardware page table
> is completely invalid, and if any vcpu ever does try to run again,
> clear out the hardware page table and flush all TLBs at that point.
> 
> Paul.
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Andreas Hartmann
Alex Williamson schrieb:
> On Mon, 2012-08-13 at 17:48 +0200, Andreas Hartmann wrote:
>> Alex Williamson wrote:
>>> On Mon, 2012-08-13 at 08:27 -0500, Anthony Liguori wrote:
 Alex Williamson  writes:

> VFIO kernel support was just merged into Linux, so I'd like to
> formally propose inclusion of the QEMU vfio-pci driver for
> QEMU 1.2.  Included here is support for x86 PCI device assignment.
> PCI INTx is not yet enabled, but devices making use of either MSI
> or MSI-X work.  The level irqfd and eoifd support I've proposed
> for KVM enable an accelerated patch for this through KVM.  I'd
> like to get this base driver in first and enable the remaining
> support in-tree.
>
> I've split this version up a little from the RFC to make it a bit
> easier to review.  Review comments from Blue Swirl and Avi are
> already incorporated, including Avi's requests to simplify both
> the PCI BAR mapping and unmapping paths.

 Hi Alex,

 Thanks for pushing this forward!  Hopefully this will finally kill
 off qemu-kvm.git for good.

 I think this series is going to have to wait for 1.3 to open up.  We
 have a very short release window for this release and I'd feel a lot
 more comfortable having such a significant feature spend some time in
 the development cycle getting testing/review.

 I'd like to see a few Reviewed-by's too for this series before it
 goes in.  I expect they won't be hard to get but I also expect it
 will take a few more revisions of this series to get there.
>>>
>>> That's disappointing, but I can understand your reluctance.  Blue
>>> Swirl reviewed the RFC and could perhaps add a Reviewed-by.  Alexey
>>> has been working on the POWER port and I'm sure could provide a
>>> Reviewed-by.  We also have a few early adopters that are already
>>> making use of this code.
>>
>> I'm running qemu with vfio patch since Jun 05, 2012
>> (awilliam-qemu-vfio-v0.14.0-rc0-6402-g323cf9f.tar.gz). I didn't
>> encounter any problem so far.
>>
>> If you like, I could compile a more actual version, too (if there have
>> been any changes).
> 
> The only change in the version proposed for qemu is that legacy
> interrupt support has been removed until we can agree on interfaces in
> kvm and plumb an EOI path through qemu.  IIRC, the devices you're using
> require legacy interrupt support.
> 
>> To see more about my use case:
>> http://permalink.gmane.org/gmane.linux.drivers.rt2x00.user/1051
>>
>> You may add a Tested-by Andreas Hartmann 
>> if you like.
>>
>> Unfortunately, I'm only running the vfio VM (kvm) with this version of
>> qemu, but I'm running parallel 4 other VM's with the unchanged version
>> of qemu (kvm-0.15.0-123.2.x86_64), too.
>> One of these 4 VM's uses PCIe passthrough.
>>
>> I now tried to run all VMs with the new version of qemu. At this point,
>> I unfortunately run into a problem with the VM which passes through a
>> PCIe device. The error message is (during start of VM):
>>
>> virsh start VM
>> error: Failed to start domain VM
>> error: internal error process exited while connecting to monitor: 
>> qemu-system-x86_64: -device 
>> pci-assign,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x6: Parameter 'driver' 
>> expects device type
>>
>> The xml file for libvirt looks like this: 
> 
> libvirt doesn't yet support vfio-pci and current qemu.git doesn't yet
> support pci-assign.  To use libvirt for device assignment, your only
> option right now is to use a qemu-kvm.git based version of qemu.

If I'm using your qemu instead of qemu from kvm-0.15 (opensuse package),
this error comes up when passing through a PCIe device, which works
absolutely fine with kvm 0.15. I would have expected, that your qemu
works with the legacy way of handling pcie passthrough, too (with
pci-stub module).

This would mean, that all users get errors if they use the traditional
way. IOW: there are changes needed (which?) to move from kvm 0.15 to
your qemu version.

> 
>> 
>>   VM
>>   ----
>>   1048576
>>   262144
>>   1
>>   
>> hvm
>> 
>>   
>>   
>> 
>> 
>>   
>>   
>>   destroy
>>   restart
>>   destroy
>>   
>> /usr/local/bin/qemu-system-x86_64
>>
>> 
>>   
>> 
>>   
>> 
>>
>> 
>>   
>>   
>>   
>> 
>>
>> 
>> 
>> 
>>   
>> 
>>   
>> 
>>
>>
>> Maybe, this is fixed in a newer version of qemu for vfio?

Kind regards,
Andreas


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-08-13 Thread Alex Williamson
On Sun, 2012-08-12 at 10:49 +0300, Michael S. Tsirkin wrote:
> On Wed, Aug 01, 2012 at 01:06:42PM -0600, Alex Williamson wrote:
> > On Mon, 2012-07-30 at 19:12 -0600, Alex Williamson wrote:
> > > On Tue, 2012-07-31 at 03:36 +0300, Michael S. Tsirkin wrote:
> > > > On Mon, Jul 30, 2012 at 06:26:31PM -0600, Alex Williamson wrote:
> > > > > On Tue, 2012-07-31 at 03:01 +0300, Michael S. Tsirkin wrote:
> > > > > > You keep saying this but it is still true: once irqfd
> > > > > > is closed eoifd does not get any more interrupts.
> > > > > 
> > > > > How does that matter?
> > > > 
> > > > Well if it does not get events it is disabled.
> > > > so you have one ifc disabling another, anyway.
> > > 
> > > And a level irqfd without an eoifd can never be de-asserted.  Either we
> > > make modular components, assemble them to do useful work, and
> > > disassemble them independently so they can be used by future interfaces
> > > or we bundle eoifd as just an option of irqfd.  Which is it gonna be?
> > 
> > I don't think I've been successful at explaining my reasoning for making
> > EOI notification a separate interface, so let me try again...
> > 
> > When kvm is not enabled, the qemu vfio driver still needs to know about
> > EOIs to re-enable the physical interrupt.  Since the ioapic is emulated
> > in qemu, we can setup a notifier for this and create abstraction to make
> > it non-x86 specific, etc.  We just need to come up with a design and
> > implement it.  But what happens when kvm is then enabled?  ioapic
> > emulation moves to the kernel (assume kvm includes irqchip for this
> > argument even though it doesn't for POWER), qemu no longer knows about
> > EOIs, and the interface we just created to handle the non-kvm case stops
> > working.  Is anyone going to accept adding a qemu EOI notification
> > interface that only works when kvm is not enabled?
> 
> Yes, it's only a question of abstracting it at the right level.
> 
> For example, if as you suggest below kvm gives you an eventfd that
> asserts an irq, laters automatically deasserts it and notifies another
> eventfd, we can do exactly this in both tcg and kvm:
> 
> setup_level_irq(int gsi, int assert_eventfd, int deassert_eventfd)
> 
> Not advocating this interface but pointing out that to make
> same abstraction to work in tcg and kvm, see what it does in
> each of them first.

The tcg model I was thinking of is that we continue to use qemu_set_irq
to assert and de-assert the interrupt and add an eoi/ack notification
mechanism, much like the ack notifier that already exists in kvm.  There
doesn't seem to be much advantage to creating a new interrupt
infrastructure in tcg that can trigger interrupts by eventfds, so I
assume VFIO is always going to be responsible for the translation of an
eventfd to an irq assertion, get some kind of notification through qemu,
de-assert the interrupt and unmask the device.  With that model in mind,
perhaps it makes more sense why I've been keeping the eoi/ack separate
from irqfd.

> > I suspect we therefore need a notification mechanism between kvm and
> > qemu to make it possible for that interface to continue working.
> 
> Even though no one is actually using it. IMHO, this is a maintainance
> problem.

That's why I'm designing it the way I am.  VFIO will make use of it.  It
will just be using the de-assert and notify mode vs a notify-only mode
that tcg would use.  It would also be easy to add an option to vfio so
that we could fully test both modes.

> > An
> > eventfd also seems like the right mechanism there.  A simple
> > modification to the proposed KVM_EOIFD here would allow it to trigger an
> > eventfd when an EOI is written to a specific gsi on
> > KVM_USERSPACE_IRQ_SOURCE_ID (define a flag and pass gsi in place of
> > key).
> > 
> > The split proposed here does require some assembly, but KVM_EOIFD is
> > re-usable as either a de-assert and notify mechanism tied to an irqfd or
> > a notify-only mechanism allowing users of a qemu EOI notification
> > infrastructure to continue working.  vfio doesn't necessarily need this
> > middle ground, but can easily be used to test it.
> > 
> > The alternative is that we pull eoifd into KVM_IRQFD and invent some
> > other new EOI interface for qemu.  That means we get EOIs tied to an
> > irqfd via one path and other EOIs via another ioctl.  Personally that
> > seems less desirable, but I'm willing to explore that route if
> > necessary.  Thanks,
> > 
> > Alex
> 
> Maybe we should focus on the fact that we notify userspace that we
> deasserted interrupt instead of EOI.

But will a tcg user want the de-assert?  I assume not.  The de-assert is
an optimization to allow us to bypass evaluation in userspace.  In tcg
we're already there.  Thanks,

Alex

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Alex Williamson
On Mon, 2012-08-13 at 18:36 +0200, Andreas Hartmann wrote:
> Alex Williamson schrieb:
> > On Mon, 2012-08-13 at 17:48 +0200, Andreas Hartmann wrote:
> >> Alex Williamson wrote:
> >>> On Mon, 2012-08-13 at 08:27 -0500, Anthony Liguori wrote:
>  Alex Williamson  writes:
> 
> > VFIO kernel support was just merged into Linux, so I'd like to
> > formally propose inclusion of the QEMU vfio-pci driver for
> > QEMU 1.2.  Included here is support for x86 PCI device assignment.
> > PCI INTx is not yet enabled, but devices making use of either MSI
> > or MSI-X work.  The level irqfd and eoifd support I've proposed
> > for KVM enable an accelerated patch for this through KVM.  I'd
> > like to get this base driver in first and enable the remaining
> > support in-tree.
> >
> > I've split this version up a little from the RFC to make it a bit
> > easier to review.  Review comments from Blue Swirl and Avi are
> > already incorporated, including Avi's requests to simplify both
> > the PCI BAR mapping and unmapping paths.
> 
>  Hi Alex,
> 
>  Thanks for pushing this forward!  Hopefully this will finally kill
>  off qemu-kvm.git for good.
> 
>  I think this series is going to have to wait for 1.3 to open up.  We
>  have a very short release window for this release and I'd feel a lot
>  more comfortable having such a significant feature spend some time in
>  the development cycle getting testing/review.
> 
>  I'd like to see a few Reviewed-by's too for this series before it
>  goes in.  I expect they won't be hard to get but I also expect it
>  will take a few more revisions of this series to get there.
> >>>
> >>> That's disappointing, but I can understand your reluctance.  Blue
> >>> Swirl reviewed the RFC and could perhaps add a Reviewed-by.  Alexey
> >>> has been working on the POWER port and I'm sure could provide a
> >>> Reviewed-by.  We also have a few early adopters that are already
> >>> making use of this code.
> >>
> >> I'm running qemu with vfio patch since Jun 05, 2012
> >> (awilliam-qemu-vfio-v0.14.0-rc0-6402-g323cf9f.tar.gz). I didn't
> >> encounter any problem so far.
> >>
> >> If you like, I could compile a more actual version, too (if there have
> >> been any changes).
> > 
> > The only change in the version proposed for qemu is that legacy
> > interrupt support has been removed until we can agree on interfaces in
> > kvm and plumb an EOI path through qemu.  IIRC, the devices you're using
> > require legacy interrupt support.
> > 
> >> To see more about my use case:
> >> http://permalink.gmane.org/gmane.linux.drivers.rt2x00.user/1051
> >>
> >> You may add a Tested-by Andreas Hartmann 
> >> if you like.
> >>
> >> Unfortunately, I'm only running the vfio VM (kvm) with this version of
> >> qemu, but I'm running parallel 4 other VM's with the unchanged version
> >> of qemu (kvm-0.15.0-123.2.x86_64), too.
> >> One of these 4 VM's uses PCIe passthrough.
> >>
> >> I now tried to run all VMs with the new version of qemu. At this point,
> >> I unfortunately run into a problem with the VM which passes through a
> >> PCIe device. The error message is (during start of VM):
> >>
> >> virsh start VM
> >> error: Failed to start domain VM
> >> error: internal error process exited while connecting to monitor: 
> >> qemu-system-x86_64: -device 
> >> pci-assign,host=04:00.0,id=hostdev0,bus=pci.0,addr=0x6: Parameter 'driver' 
> >> expects device type
> >>
> >> The xml file for libvirt looks like this: 
> > 
> > libvirt doesn't yet support vfio-pci and current qemu.git doesn't yet
> > support pci-assign.  To use libvirt for device assignment, your only
> > option right now is to use a qemu-kvm.git based version of qemu.
> 
> If I'm using your qemu instead of qemu from kvm-0.15 (opensuse package),
> this error comes up when passing through a PCIe device, which works
> absolutely fine with kvm 0.15. I would have expected, that your qemu
> works with the legacy way of handling pcie passthrough, too (with
> pci-stub module).

VFIO cannot work with pci-stub, the backends are fundamentally
different.  KVM making use pci-stub to hold onto a device is actually
one of the design problems that VFIO is meant to correct.  The other
significant interface change is use of IOMMU groups, which is actually
why VFIO works for some of your uses while pci-assign does not.

> This would mean, that all users get errors if they use the traditional
> way. IOW: there are changes needed (which?) to move from kvm 0.15 to
> your qemu version.

But the way we solve this is to make libvirt understand how to do both.
Then it can probe the qemu/kvm binary and host system to figure out
which is supported and use the correct device options based on what it
finds.  Trying to do both via the same qemu command line doesn't make
sense to me, especially when the device setup is so different.  Thanks,

Alex

--
To unsubscribe from this list: send the lin

Re: [PATCH v7 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 10:48:15AM -0600, Alex Williamson wrote:
> On Sun, 2012-08-12 at 10:49 +0300, Michael S. Tsirkin wrote:
> > On Wed, Aug 01, 2012 at 01:06:42PM -0600, Alex Williamson wrote:
> > > On Mon, 2012-07-30 at 19:12 -0600, Alex Williamson wrote:
> > > > On Tue, 2012-07-31 at 03:36 +0300, Michael S. Tsirkin wrote:
> > > > > On Mon, Jul 30, 2012 at 06:26:31PM -0600, Alex Williamson wrote:
> > > > > > On Tue, 2012-07-31 at 03:01 +0300, Michael S. Tsirkin wrote:
> > > > > > > You keep saying this but it is still true: once irqfd
> > > > > > > is closed eoifd does not get any more interrupts.
> > > > > > 
> > > > > > How does that matter?
> > > > > 
> > > > > Well if it does not get events it is disabled.
> > > > > so you have one ifc disabling another, anyway.
> > > > 
> > > > And a level irqfd without an eoifd can never be de-asserted.  Either we
> > > > make modular components, assemble them to do useful work, and
> > > > disassemble them independently so they can be used by future interfaces
> > > > or we bundle eoifd as just an option of irqfd.  Which is it gonna be?
> > > 
> > > I don't think I've been successful at explaining my reasoning for making
> > > EOI notification a separate interface, so let me try again...
> > > 
> > > When kvm is not enabled, the qemu vfio driver still needs to know about
> > > EOIs to re-enable the physical interrupt.  Since the ioapic is emulated
> > > in qemu, we can setup a notifier for this and create abstraction to make
> > > it non-x86 specific, etc.  We just need to come up with a design and
> > > implement it.  But what happens when kvm is then enabled?  ioapic
> > > emulation moves to the kernel (assume kvm includes irqchip for this
> > > argument even though it doesn't for POWER), qemu no longer knows about
> > > EOIs, and the interface we just created to handle the non-kvm case stops
> > > working.  Is anyone going to accept adding a qemu EOI notification
> > > interface that only works when kvm is not enabled?
> > 
> > Yes, it's only a question of abstracting it at the right level.
> > 
> > For example, if as you suggest below kvm gives you an eventfd that
> > asserts an irq, laters automatically deasserts it and notifies another
> > eventfd, we can do exactly this in both tcg and kvm:
> > 
> > setup_level_irq(int gsi, int assert_eventfd, int deassert_eventfd)
> > 
> > Not advocating this interface but pointing out that to make
> > same abstraction to work in tcg and kvm, see what it does in
> > each of them first.
> 
> The tcg model I was thinking of is that we continue to use qemu_set_irq
> to assert and de-assert the interrupt and add an eoi/ack notification
> mechanism, much like the ack notifier that already exists in kvm.  There
> doesn't seem to be much advantage to creating a new interrupt
> infrastructure in tcg that can trigger interrupts by eventfds, so I
> assume VFIO is always going to be responsible for the translation of an
> eventfd to an irq assertion, get some kind of notification through qemu,
> de-assert the interrupt and unmask the device.  With that model in mind,
> perhaps it makes more sense why I've been keeping the eoi/ack separate
> from irqfd.
> 
> > > I suspect we therefore need a notification mechanism between kvm and
> > > qemu to make it possible for that interface to continue working.
> > 
> > Even though no one is actually using it. IMHO, this is a maintainance
> > problem.
> 
> That's why I'm designing it the way I am.  VFIO will make use of it.  It
> will just be using the de-assert and notify mode vs a notify-only mode
> that tcg would use.  It would also be easy to add an option to vfio so
> that we could fully test both modes.
> 
> > > An
> > > eventfd also seems like the right mechanism there.  A simple
> > > modification to the proposed KVM_EOIFD here would allow it to trigger an
> > > eventfd when an EOI is written to a specific gsi on
> > > KVM_USERSPACE_IRQ_SOURCE_ID (define a flag and pass gsi in place of
> > > key).
> > > 
> > > The split proposed here does require some assembly, but KVM_EOIFD is
> > > re-usable as either a de-assert and notify mechanism tied to an irqfd or
> > > a notify-only mechanism allowing users of a qemu EOI notification
> > > infrastructure to continue working.  vfio doesn't necessarily need this
> > > middle ground, but can easily be used to test it.
> > > 
> > > The alternative is that we pull eoifd into KVM_IRQFD and invent some
> > > other new EOI interface for qemu.  That means we get EOIs tied to an
> > > irqfd via one path and other EOIs via another ioctl.  Personally that
> > > seems less desirable, but I'm willing to explore that route if
> > > necessary.  Thanks,
> > > 
> > > Alex
> > 
> > Maybe we should focus on the fact that we notify userspace that we
> > deasserted interrupt instead of EOI.
> 
> But will a tcg user want the de-assert?  I assume not.  The de-assert is
> an optimization to allow us to bypass evaluation in userspace.  In tcg
> we're already there.  Th

Re: [PATCH 2/4] s390: Add a mechanism to get the subchannel id.

2012-08-13 Thread Sebastian Ott
On Tue, 7 Aug 2012, Cornelia Huck wrote:
> This will be needed by the new virtio-ccw transport.

We already have ccw_device_get_subchannel_id which is currently used by
qdio only and thus buried in an internal header file. So it would be
better to just clean up this one and make it available to virtio-ccw.
The function looks a little different to what you suggested by it should
do the trick.

---
 arch/s390/include/asm/ccwdev.h |2 ++
 drivers/s390/cio/device.c  |   11 ---
 drivers/s390/cio/device.h  |2 --
 drivers/s390/cio/device_ops.c  |   13 +
 4 files changed, 15 insertions(+), 13 deletions(-)

--- a/arch/s390/include/asm/ccwdev.h
+++ b/arch/s390/include/asm/ccwdev.h
@@ -10,6 +10,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 
@@ -226,5 +227,6 @@ int ccw_device_siosl(struct ccw_device *
 // FIXME: these have to go
 extern int _ccw_device_get_subchannel_number(struct ccw_device *);
 
+extern struct subchannel_id ccw_device_get_subchannel_id(struct ccw_device *);
 extern void *ccw_device_get_chp_desc(struct ccw_device *, int);
 #endif /* _S390_CCWDEV_H_ */
--- a/drivers/s390/cio/device.c
+++ b/drivers/s390/cio/device.c
@@ -2037,16 +2037,6 @@ void ccw_driver_unregister(struct ccw_dr
driver_unregister(&cdriver->driver);
 }
 
-/* Helper func for qdio. */
-struct subchannel_id
-ccw_device_get_subchannel_id(struct ccw_device *cdev)
-{
-   struct subchannel *sch;
-
-   sch = to_subchannel(cdev->dev.parent);
-   return sch->schid;
-}
-
 static void ccw_device_todo(struct work_struct *work)
 {
struct ccw_device_private *priv;
@@ -2139,4 +2129,3 @@ EXPORT_SYMBOL(ccw_device_set_offline);
 EXPORT_SYMBOL(ccw_driver_register);
 EXPORT_SYMBOL(ccw_driver_unregister);
 EXPORT_SYMBOL(get_ccwdev_by_busid);
-EXPORT_SYMBOL_GPL(ccw_device_get_subchannel_id);
--- a/drivers/s390/cio/device.h
+++ b/drivers/s390/cio/device.h
@@ -142,9 +142,7 @@ int ccw_device_notify(struct ccw_device 
 void ccw_device_set_disconnected(struct ccw_device *cdev);
 void ccw_device_set_notoper(struct ccw_device *cdev);
 
-/* qdio needs this. */
 void ccw_device_set_timeout(struct ccw_device *, int);
-extern struct subchannel_id ccw_device_get_subchannel_id(struct ccw_device *);
 
 /* Channel measurement facility related */
 void retry_set_schib(struct ccw_device *cdev);
--- a/drivers/s390/cio/device_ops.c
+++ b/drivers/s390/cio/device_ops.c
@@ -763,6 +763,19 @@ _ccw_device_get_subchannel_number(struct
return cdev->private->schid.sch_no;
 }
 
+/**
+ * ccw_device_get_subchannel_id - obtain a subchannel id
+ * @cdev: device to obtain the id for
+ * Return struct subchannel_id of the parent subchannel.
+ */
+struct subchannel_id ccw_device_get_subchannel_id(struct ccw_device *cdev)
+{
+   struct subchannel *sch;
+
+   sch = to_subchannel(cdev->dev.parent);
+   return sch->schid;
+}
+EXPORT_SYMBOL_GPL(ccw_device_get_subchannel_id);
 
 MODULE_LICENSE("GPL");
 EXPORT_SYMBOL(ccw_device_set_options_mask);

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] kvm tools: Fix segfault on "lkvm run"

2012-08-13 Thread Paul Neumann
The errors from kvm_cmd_run_init() are not handled properly as they are
returned as positive values.

Signed-off-by: Paul Neumann 
---
 tools/kvm/builtin-run.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c
index a36bd00..47c6634 100644
--- a/tools/kvm/builtin-run.c
+++ b/tools/kvm/builtin-run.c
@@ -952,7 +952,7 @@ static int kvm_cmd_run_init(int argc, const char **argv)
fprintf(stderr, "Cannot handle parameter: "
"%s\n", argv[0]);
usage_with_options(run_usage, options);
-   return EINVAL;
+   return -EINVAL;
}
if (kvm_run_wrapper == KVM_RUN_SANDBOX) {
/*
@@ -979,7 +979,7 @@ static int kvm_cmd_run_init(int argc, const char **argv)
 
if (!kernel_filename) {
kernel_usage_with_options();
-   return EINVAL;
+   return -EINVAL;
}
 
vmlinux_filename = find_vmlinux();
-- 
1.7.11.4

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] powerpc/epapr: export epapr_hypercall_start

2012-08-13 Thread Scott Wood
On 08/11/2012 08:07 AM, Tabi Timur-B04825 wrote:
> On Sat, Aug 11, 2012 at 2:01 AM, Geert Uytterhoeven
>  wrote:
>> On Sat, Aug 11, 2012 at 12:21 AM, Scott Wood  wrote:
>>> +EXPORT_SYMBOL(epapr_hypercall_start);
>>
>> EXPORT_SYMBOL_GPL?
> 
> We prefer EXPORT_SYMBOL.  We don't want to restrict our customers from
> having to use GPL code.
> 

More specifically in this case, I don't see how use of this symbol in
any way suggests that the code would be GPL-derivative.  The API is
documented in the ePAPR standard or other external sources -- not a
kernel internal implementation issue.  The contents of
epapr_hypercall_start itself are not copyrightable (by default it's
basically "return -1", and is patched at runtime with a few instructions
provided by the hypervisor).  The header file with the inline accessors
is dual licensed (and also unlikely to be copyrightable, in terms of
what actually makes it into the output binary).

-Scott


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 02:06:10PM +0200, Jan Kiszka wrote:
> On 2012-08-13 10:35, Nicholas A. Bellinger wrote:
> > From: Nicholas Bellinger 
> > 
> > This is required to get past the following assert with:
> > 
> > commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> > Author: Jan Kiszka 
> > Date:   Thu May 17 10:32:39 2012 -0300
> > 
> > virtio/vhost: Add support for KVM in-kernel MSI injection
> > 
> > Cc: Stefan Hajnoczi 
> > Cc: Jan Kiszka 
> > Cc: Paolo Bonzini 
> > Cc: Anthony Liguori 
> > Signed-off-by: Nicholas Bellinger 
> > ---
> >  hw/msix.c |3 +++
> >  1 files changed, 3 insertions(+), 0 deletions(-)
> > 
> > diff --git a/hw/msix.c b/hw/msix.c
> > index 800fc32..c1e6dc3 100644
> > --- a/hw/msix.c
> > +++ b/hw/msix.c
> > @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
> >  {
> >  int vector;
> >  
> > +if (!dev->msix_vector_use_notifier && 
> > !dev->msix_vector_release_notifier)
> > +return;
> > +
> >  assert(dev->msix_vector_use_notifier &&
> > dev->msix_vector_release_notifier);
> >  
> > 
> 
> I think to remember pointing out that there is a bug somewhere in the
> reset code which deactivates a non-active vhost instance, no?
> 
> Jan

Could not find it. Could you dig it up pls?

> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Jan Kiszka
On 2012-08-13 20:03, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 02:06:10PM +0200, Jan Kiszka wrote:
>> On 2012-08-13 10:35, Nicholas A. Bellinger wrote:
>>> From: Nicholas Bellinger 
>>>
>>> This is required to get past the following assert with:
>>>
>>> commit 1523ed9e1d46b0b54540049d491475ccac7e6421
>>> Author: Jan Kiszka 
>>> Date:   Thu May 17 10:32:39 2012 -0300
>>>
>>> virtio/vhost: Add support for KVM in-kernel MSI injection
>>>
>>> Cc: Stefan Hajnoczi 
>>> Cc: Jan Kiszka 
>>> Cc: Paolo Bonzini 
>>> Cc: Anthony Liguori 
>>> Signed-off-by: Nicholas Bellinger 
>>> ---
>>>  hw/msix.c |3 +++
>>>  1 files changed, 3 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/hw/msix.c b/hw/msix.c
>>> index 800fc32..c1e6dc3 100644
>>> --- a/hw/msix.c
>>> +++ b/hw/msix.c
>>> @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
>>>  {
>>>  int vector;
>>>  
>>> +if (!dev->msix_vector_use_notifier && 
>>> !dev->msix_vector_release_notifier)
>>> +return;
>>> +
>>>  assert(dev->msix_vector_use_notifier &&
>>> dev->msix_vector_release_notifier);
>>>  
>>>
>>
>> I think to remember pointing out that there is a bug somewhere in the
>> reset code which deactivates a non-active vhost instance, no?
>>
>> Jan
> 
> Could not find it. Could you dig it up pls?

http://thread.gmane.org/gmane.linux.scsi.target.devel/2277/focus=2309

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: vhost-scsi port to v1.1.0 + MSI-X performance regression

2012-08-13 Thread Michael S. Tsirkin
On Tue, Jul 24, 2012 at 01:20:56PM -0700, Nicholas A. Bellinger wrote:
> On Tue, 2012-07-24 at 09:57 +0200, Jan Kiszka wrote:
> > On 2012-07-24 09:42, Nicholas A. Bellinger wrote:
> > > Hi Anthony, Stefan & QEMU folks,
> > > 
> 
> 
> 
> > > However, thus far I've not been able to get virtio-scsi <-> tcm_vhost
> > > I/O to actually work against the latest qemu.git/master..  
> > > 
> > > So while doing a (manual) bisection w/ this series to track down the
> > > issue with qemu/master, I managed to run across something else..  With
> > > the vhost-scsi series applied, everything is working as expected up
> > > until the following commit:
> > > 
> > > commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> > > Author: Jan Kiszka 
> > > Date:   Thu May 17 10:32:39 2012 -0300
> > > 
> > > virtio/vhost: Add support for KVM in-kernel MSI injection
> > > 
> > > 
> > > This commit ends up triggering the following assert immediately after
> > > starting qemu with virtio-scsi <-> tcm_vhost:
> > > 
> > > qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515:
> > >msix_unset_vector_notifiers: Assertion 
> > > `dev->msix_vector_use_notifier &&
> > > dev->msix_vector_release_notifier' 
> > > failed.
> > > 
> > > OK, so adding the following hack allows me to boot:
> > > 
> > > diff --git a/hw/msix.c b/hw/msix.c
> > > index 59c7a83..6036909 100644
> > > --- a/hw/msix.c
> > > +++ b/hw/msix.c
> > > @@ -511,6 +511,11 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
> > >  {
> > >  int vector;
> > >  
> > > +if (!dev->msix_vector_use_notifier && 
> > > !dev->msix_vector_release_notifier) {
> > > +printf("Hit NULL msix_unset_vector_notifiers for: %s\n", 
> > > dev->name);
> > > +return;
> > > +}
> > > +
> > >  assert(dev->msix_vector_use_notifier &&
> > > dev->msix_vector_release_notifier);
> > >  
> > > --
> > 
> > Can you post a backtrace from gdb?
> > 
> 
> Sure, w/o the above patch the backtrace with commit 1523ed9e1d looks
> like the following:
> 
> (gdb) run
> Starting program: /usr/src/qemu.git/x86_64-softmmu/qemu-system-x86_64 
> -enable-kvm -smp 2 -m 2048 -serial file:/tmp/vhost-serial.txt -hda 
> /usr/src/qemu-vhost.git/debian_squeeze_amd64_standard-old.qcow2 -vhost-scsi 
> id=vhost-scsi0,wwpn=naa.600140579ad21088,tpgt=1 -device 
> virtio-scsi-pci,vhost-scsi=vhost-scsi0,event_idx=off
> [Thread debugging using libthread_db enabled]
> wwpn = "vhost-scsi0" tpgt = "1"
> [New Thread 0x745f8700 (LWP 26508)]
> [New Thread 0x73bf6700 (LWP 26509)]
> [New Thread 0x733f5700 (LWP 26510)]
> vhost_scsi_stop
> Failed to clear endpoint
> qemu-system-x86_64: /usr/src/qemu.git/hw/msix.c:515: 
> msix_unset_vector_notifiers: Assertion `dev->msix_vector_use_notifier && 
> dev->msix_vector_release_notifier' failed.
> 
> Program received signal SIGABRT, Aborted.
> 0x75e8b165 in raise () from /lib/libc.so.6
> (gdb) bt
> #0  0x75e8b165 in raise () from /lib/libc.so.6
> #1  0x75e8df70 in abort () from /lib/libc.so.6
> #2  0x75e842b1 in __assert_fail () from /lib/libc.so.6
> #3  0x004a84a1 in msix_unset_vector_notifiers (dev=0x1463a70) at 
> /usr/src/qemu.git/hw/msix.c:514
> #4  0x004d2865 in virtio_pci_set_guest_notifiers (opaque=0x6788, 
> assign=136)
> at /usr/src/qemu.git/hw/virtio-pci.c:703
> #5  0x0062955f in vhost_dev_stop (hdev=0x126c8a8, vdev=0x1465220) at 
> /usr/src/qemu.git/hw/vhost.c:954
> #6  0x00628989 in vhost_scsi_stop (vs=0x126c890, vdev=0x1465220) at 
> /usr/src/qemu.git/hw/vhost-scsi.c:115
> #7  0x0062f5c9 in virtio_scsi_set_status (vdev=0x1465220, val= optimized out>)
> at /usr/src/qemu.git/hw/virtio-scsi.c:631
> #8  0x00632082 in virtio_set_status (vdev=0x1465220, val=136 '\210') 
> at /usr/src/qemu.git/hw/virtio.c:507
> #9  0x00633410 in virtio_reset (opaque=0x6788) at 
> /usr/src/qemu.git/hw/virtio.c:517


This is strange, code shows virtio_reset calls virtio_set_status
with value 0. Why is it 136 here?
Stack corruption?

We actually did have a memory corruptor with virtio scsi:

You sent patch
virtio-scsi: Fix incorrect VirtIOSCSI->cmd_vqs[0] definition

maybe the bug goes away if you apply that?

> #10 0x004d30a9 in virtio_pci_reset (d=0x1463a70) at 
> /usr/src/qemu.git/hw/virtio-pci.c:280
> #11 0x004fc909 in qdev_reset_one (dev=0x6788, opaque=0x6788) at 
> /usr/src/qemu.git/hw/qdev.c:207
> #12 0x004fc670 in qdev_walk_children (dev=0x1463a70, devfn=0x4fc8f0 
> , 
> busfn=0x4fc510 , opaque=0x0) at 
> /usr/src/qemu.git/hw/qdev.c:372
> #13 0x004ae43d in pci_device_reset (dev=0x6788) at 
> /usr/src/qemu.git/hw/pci.c:163
> #14 0x004ae64f in pci_bus_reset (bus=0x1415bd0) at 
> /usr/src/qemu.git/hw/pci.c:206
> #15 0x004ae699 in pcibus_reset (qbus=0x6788) at 
> /usr/src/qemu.git/hw/pci.c:213
> #16 0x004fc710 in qbus_walk_children (bus=0x1415bd0, devfn=0x4fc8f0 
> , busfn=0x6, 

Re: [RFC-v2 1/6] msix: Work-around for vhost-scsi with KVM in-kernel MSI injection

2012-08-13 Thread Michael S. Tsirkin
On Mon, Aug 13, 2012 at 08:06:17PM +0200, Jan Kiszka wrote:
> On 2012-08-13 20:03, Michael S. Tsirkin wrote:
> > On Mon, Aug 13, 2012 at 02:06:10PM +0200, Jan Kiszka wrote:
> >> On 2012-08-13 10:35, Nicholas A. Bellinger wrote:
> >>> From: Nicholas Bellinger 
> >>>
> >>> This is required to get past the following assert with:
> >>>
> >>> commit 1523ed9e1d46b0b54540049d491475ccac7e6421
> >>> Author: Jan Kiszka 
> >>> Date:   Thu May 17 10:32:39 2012 -0300
> >>>
> >>> virtio/vhost: Add support for KVM in-kernel MSI injection
> >>>
> >>> Cc: Stefan Hajnoczi 
> >>> Cc: Jan Kiszka 
> >>> Cc: Paolo Bonzini 
> >>> Cc: Anthony Liguori 
> >>> Signed-off-by: Nicholas Bellinger 
> >>> ---
> >>>  hw/msix.c |3 +++
> >>>  1 files changed, 3 insertions(+), 0 deletions(-)
> >>>
> >>> diff --git a/hw/msix.c b/hw/msix.c
> >>> index 800fc32..c1e6dc3 100644
> >>> --- a/hw/msix.c
> >>> +++ b/hw/msix.c
> >>> @@ -544,6 +544,9 @@ void msix_unset_vector_notifiers(PCIDevice *dev)
> >>>  {
> >>>  int vector;
> >>>  
> >>> +if (!dev->msix_vector_use_notifier && 
> >>> !dev->msix_vector_release_notifier)
> >>> +return;
> >>> +
> >>>  assert(dev->msix_vector_use_notifier &&
> >>> dev->msix_vector_release_notifier);
> >>>  
> >>>
> >>
> >> I think to remember pointing out that there is a bug somewhere in the
> >> reset code which deactivates a non-active vhost instance, no?
> >>
> >> Jan
> > 
> > Could not find it. Could you dig it up pls?
> 
> http://thread.gmane.org/gmane.linux.scsi.target.devel/2277/focus=2309
> 
> Jan

Ah yes. So let's not work around, need to get to the bottom of that.

> -- 
> Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
> Corporate Competence Center Embedded Linux
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v7 2/2] kvm: KVM_EOIFD, an eventfd for EOIs

2012-08-13 Thread Alex Williamson
On Mon, 2012-08-13 at 19:59 +0300, Michael S. Tsirkin wrote:
> On Mon, Aug 13, 2012 at 10:48:15AM -0600, Alex Williamson wrote:
> > On Sun, 2012-08-12 at 10:49 +0300, Michael S. Tsirkin wrote:
> > > On Wed, Aug 01, 2012 at 01:06:42PM -0600, Alex Williamson wrote:
> > > > On Mon, 2012-07-30 at 19:12 -0600, Alex Williamson wrote:
> > > > > On Tue, 2012-07-31 at 03:36 +0300, Michael S. Tsirkin wrote:
> > > > > > On Mon, Jul 30, 2012 at 06:26:31PM -0600, Alex Williamson wrote:
> > > > > > > On Tue, 2012-07-31 at 03:01 +0300, Michael S. Tsirkin wrote:
> > > > > > > > You keep saying this but it is still true: once irqfd
> > > > > > > > is closed eoifd does not get any more interrupts.
> > > > > > > 
> > > > > > > How does that matter?
> > > > > > 
> > > > > > Well if it does not get events it is disabled.
> > > > > > so you have one ifc disabling another, anyway.
> > > > > 
> > > > > And a level irqfd without an eoifd can never be de-asserted.  Either 
> > > > > we
> > > > > make modular components, assemble them to do useful work, and
> > > > > disassemble them independently so they can be used by future 
> > > > > interfaces
> > > > > or we bundle eoifd as just an option of irqfd.  Which is it gonna be?
> > > > 
> > > > I don't think I've been successful at explaining my reasoning for making
> > > > EOI notification a separate interface, so let me try again...
> > > > 
> > > > When kvm is not enabled, the qemu vfio driver still needs to know about
> > > > EOIs to re-enable the physical interrupt.  Since the ioapic is emulated
> > > > in qemu, we can setup a notifier for this and create abstraction to make
> > > > it non-x86 specific, etc.  We just need to come up with a design and
> > > > implement it.  But what happens when kvm is then enabled?  ioapic
> > > > emulation moves to the kernel (assume kvm includes irqchip for this
> > > > argument even though it doesn't for POWER), qemu no longer knows about
> > > > EOIs, and the interface we just created to handle the non-kvm case stops
> > > > working.  Is anyone going to accept adding a qemu EOI notification
> > > > interface that only works when kvm is not enabled?
> > > 
> > > Yes, it's only a question of abstracting it at the right level.
> > > 
> > > For example, if as you suggest below kvm gives you an eventfd that
> > > asserts an irq, laters automatically deasserts it and notifies another
> > > eventfd, we can do exactly this in both tcg and kvm:
> > > 
> > > setup_level_irq(int gsi, int assert_eventfd, int deassert_eventfd)
> > > 
> > > Not advocating this interface but pointing out that to make
> > > same abstraction to work in tcg and kvm, see what it does in
> > > each of them first.
> > 
> > The tcg model I was thinking of is that we continue to use qemu_set_irq
> > to assert and de-assert the interrupt and add an eoi/ack notification
> > mechanism, much like the ack notifier that already exists in kvm.  There
> > doesn't seem to be much advantage to creating a new interrupt
> > infrastructure in tcg that can trigger interrupts by eventfds, so I
> > assume VFIO is always going to be responsible for the translation of an
> > eventfd to an irq assertion, get some kind of notification through qemu,
> > de-assert the interrupt and unmask the device.  With that model in mind,
> > perhaps it makes more sense why I've been keeping the eoi/ack separate
> > from irqfd.
> > 
> > > > I suspect we therefore need a notification mechanism between kvm and
> > > > qemu to make it possible for that interface to continue working.
> > > 
> > > Even though no one is actually using it. IMHO, this is a maintainance
> > > problem.
> > 
> > That's why I'm designing it the way I am.  VFIO will make use of it.  It
> > will just be using the de-assert and notify mode vs a notify-only mode
> > that tcg would use.  It would also be easy to add an option to vfio so
> > that we could fully test both modes.
> > 
> > > > An
> > > > eventfd also seems like the right mechanism there.  A simple
> > > > modification to the proposed KVM_EOIFD here would allow it to trigger an
> > > > eventfd when an EOI is written to a specific gsi on
> > > > KVM_USERSPACE_IRQ_SOURCE_ID (define a flag and pass gsi in place of
> > > > key).
> > > > 
> > > > The split proposed here does require some assembly, but KVM_EOIFD is
> > > > re-usable as either a de-assert and notify mechanism tied to an irqfd or
> > > > a notify-only mechanism allowing users of a qemu EOI notification
> > > > infrastructure to continue working.  vfio doesn't necessarily need this
> > > > middle ground, but can easily be used to test it.
> > > > 
> > > > The alternative is that we pull eoifd into KVM_IRQFD and invent some
> > > > other new EOI interface for qemu.  That means we get EOIs tied to an
> > > > irqfd via one path and other EOIs via another ioctl.  Personally that
> > > > seems less desirable, but I'm willing to explore that route if
> > > > necessary.  Thanks,
> > > > 
> > > > Alex
> > > 
> > > Maybe we should focus 

[PATCH uq/master] kvm: i8254: Finish time conversion fix

2012-08-13 Thread Jan Kiszka
0cdd3d1444 fixed reading back the counter load time from the kernel
while assuming the kernel would always update its load time on writing
the state. That is only true for channel 1, and so pit_get_channel_info
returned wrong output pin states for high counter values.

Fix this by applying the offset also on kvm_pit_put. For this purpose,
we cache the clock offset in KVMPITState, only updating it on VM state
changes or when we write the state while the VM is stopped.

Signed-off-by: Jan Kiszka 
---
 hw/kvm/i8254.c |   52 ++--
 1 files changed, 34 insertions(+), 18 deletions(-)

diff --git a/hw/kvm/i8254.c b/hw/kvm/i8254.c
index c5d3711..53d13e3 100644
--- a/hw/kvm/i8254.c
+++ b/hw/kvm/i8254.c
@@ -35,7 +35,8 @@
 typedef struct KVMPITState {
 PITCommonState pit;
 LostTickPolicy lost_tick_policy;
-bool state_valid;
+bool vm_stopped;
+int64_t kernel_clock_offset;
 } KVMPITState;
 
 static int64_t abs64(int64_t v)
@@ -43,19 +44,11 @@ static int64_t abs64(int64_t v)
 return v < 0 ? -v : v;
 }
 
-static void kvm_pit_get(PITCommonState *pit)
+static void kvm_pit_update_clock_offset(KVMPITState *s)
 {
-KVMPITState *s = DO_UPCAST(KVMPITState, pit, pit);
-struct kvm_pit_state2 kpit;
-struct kvm_pit_channel_state *kchan;
-struct PITChannelState *sc;
 int64_t offset, clock_offset;
 struct timespec ts;
-int i, ret;
-
-if (s->state_valid) {
-return;
-}
+int i;
 
 /*
  * Measure the delta between CLOCK_MONOTONIC, the base used for
@@ -72,6 +65,21 @@ static void kvm_pit_get(PITCommonState *pit)
 clock_offset = offset;
 }
 }
+s->kernel_clock_offset = clock_offset;
+}
+
+static void kvm_pit_get(PITCommonState *pit)
+{
+KVMPITState *s = DO_UPCAST(KVMPITState, pit, pit);
+struct kvm_pit_state2 kpit;
+struct kvm_pit_channel_state *kchan;
+struct PITChannelState *sc;
+int i, ret;
+
+/* No need to re-read the state if VM is stopped. */
+if (s->vm_stopped) {
+return;
+}
 
 if (kvm_has_pit_state2()) {
 ret = kvm_vm_ioctl(kvm_state, KVM_GET_PIT2, &kpit);
@@ -106,7 +114,7 @@ static void kvm_pit_get(PITCommonState *pit)
 sc->mode = kchan->mode;
 sc->bcd = kchan->bcd;
 sc->gate = kchan->gate;
-sc->count_load_time = kchan->count_load_time + clock_offset;
+sc->count_load_time = kchan->count_load_time + s->kernel_clock_offset;
 }
 
 sc = &pit->channels[0];
@@ -114,17 +122,23 @@ static void kvm_pit_get(PITCommonState *pit)
 pit_get_next_transition_time(sc, sc->count_load_time);
 }
 
-static void kvm_pit_put(PITCommonState *s)
+static void kvm_pit_put(PITCommonState *pit)
 {
+KVMPITState *s = DO_UPCAST(KVMPITState, pit, pit);
 struct kvm_pit_state2 kpit;
 struct kvm_pit_channel_state *kchan;
 struct PITChannelState *sc;
 int i, ret;
 
-kpit.flags = s->channels[0].irq_disabled ? KVM_PIT_FLAGS_HPET_LEGACY : 0;
+/* The offset keeps changing as long as the VM is stopped. */
+if (s->vm_stopped) {
+kvm_pit_update_clock_offset(s);
+}
+
+kpit.flags = pit->channels[0].irq_disabled ? KVM_PIT_FLAGS_HPET_LEGACY : 0;
 for (i = 0; i < 3; i++) {
 kchan = &kpit.channels[i];
-sc = &s->channels[i];
+sc = &pit->channels[i];
 kchan->count = sc->count;
 kchan->latched_count = sc->latched_count;
 kchan->count_latched = sc->count_latched;
@@ -137,7 +151,7 @@ static void kvm_pit_put(PITCommonState *s)
 kchan->mode = sc->mode;
 kchan->bcd = sc->bcd;
 kchan->gate = sc->gate;
-kchan->count_load_time = sc->count_load_time;
+kchan->count_load_time = sc->count_load_time - s->kernel_clock_offset;
 }
 
 ret = kvm_vm_ioctl(kvm_state,
@@ -211,10 +225,12 @@ static void kvm_pit_vm_state_change(void *opaque, int 
running,
 KVMPITState *s = opaque;
 
 if (running) {
-s->state_valid = false;
+kvm_pit_update_clock_offset(s);
+s->vm_stopped = false;
 } else {
+kvm_pit_update_clock_offset(s);
 kvm_pit_get(&s->pit);
-s->state_valid = true;
+s->vm_stopped = true;
 }
 }
 
-- 
1.7.3.4
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v5 00/12] KVM: introduce readonly memslot

2012-08-13 Thread Marcelo Tosatti
On Sat, Aug 11, 2012 at 11:36:20AM +0800, Xiao Guangrong wrote:
> On 08/11/2012 02:14 AM, Marcelo Tosatti wrote:
> > On Tue, Aug 07, 2012 at 05:47:15PM +0800, Xiao Guangrong wrote:
> >> Changelog:
> >> - introduce KVM_PFN_ERR_RO_FAULT instead of dummy page
> >> - introduce KVM_HVA_ERR_BAD and optimize error hva indicators
> >>
> >> The test case can be found at:
> >> http://lkml.indiana.edu/hypermail/linux/kernel/1207.2/00819/migrate-perf.tar.bz2
> >>
> >> In current code, if we map a readonly memory space from host to guest
> >> and the page is not currently mapped in the host, we will get a fault-pfn
> >> and async is not allowed, then the vm will crash.
> >>
> >> As Avi's suggestion, We introduce readonly memory region to map ROM/ROMD
> >> to the guest, read access is happy for readonly memslot, write access on
> >> readonly memslot will cause KVM_EXIT_MMIO exit.
> > 
> > Memory slots whose QEMU mapping is write protected is supported
> > today, as long as there are no write faults.
> > 
> > What prevents the use of mmap(!MAP_WRITE) to handle read-only memslots
> > again?
> > 
> 
> It is happy to map !write host memory space to the readonly memslot,
> and they can coexist as well.
> 
> readonly memslot checks the write-permission by seeing slot->flags and
> !write memory checks the write-permission in hva_to_pfn() function
> which checks vma->flags. It is no conflict.

Yes, there is no conflict. The point is, if you can use the
mmap(PROT_READ) interface (supporting read faults on read-only slots)
for this behavior, what is the advantage of a new memslot flag?

I'm not saying mmap(PROT_READ) is the best interface, i am just asking
why it is not.

> > The initial objective was to fix a vm crash, can you explain that
> > initial problem?
> >
> 
> The issue was trigged by this code:
> 
> } else {
> if (async && (vma->vm_flags & VM_WRITE))
> *async = true;
> pfn = KVM_PFN_ERR_FAULT;
> }
> 
> If the host memory region is readonly (!vma->vm_flags & VM_WRITE) and
> its physical page is swapped out (or the file data does not be read in),
> get_user_page_nowait will fail, above code reject to set async,
> then we will get a fault pfn and async=false.
> 
> I guess this issue also exists in "QEMU write protected mapping" as
> you mentioned above.

Yes, it does. As far as i understand, what that check does from a high
level pov is:

- Did get_user_pages_nowait() fail due to a swapped out page (in which 
case we should try to swappin the page asynchronously), or due to 
another reason (for which case an error should be returned).

Using vma->vm_flags VM_WRITE for that is trying to guess why
get_user_pages_nowait() failed, because it (gup_nowait return values) 
does not provide sufficient information by itself.

Can't that be fixed separately? 

Another issue which is also present with the mmap(PROT_READ) scheme is
interaction with reexecute_instruction. That is, unless i am mistaken,
reexecute_instruction can succeed (return true) on a region that is
write protected. This breaks the "write faults on read-only slots exit
to userspace via EXIT_MMIO" behaviour.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] VFIO-based PCI device assignment for QEMU 1.2

2012-08-13 Thread Andreas Hartmann
Alex Williamson wrote:
> On Mon, 2012-08-13 at 18:36 +0200, Andreas Hartmann wrote:
[...]
>> If I'm using your qemu instead of qemu from kvm-0.15 (opensuse package),
>> this error comes up when passing through a PCIe device, which works
>> absolutely fine with kvm 0.15. I would have expected, that your qemu
>> works with the legacy way of handling pcie passthrough, too (with
>> pci-stub module).
> 
> VFIO cannot work with pci-stub, the backends are fundamentally
> different.  KVM making use pci-stub to hold onto a device is actually
> one of the design problems that VFIO is meant to correct.  The other
> significant interface change is use of IOMMU groups, which is actually
> why VFIO works for some of your uses while pci-assign does not.
> 
>> This would mean, that all users get errors if they use the traditional
>> way. IOW: there are changes needed (which?) to move from kvm 0.15 to
>> your qemu version.
> 
> But the way we solve this is to make libvirt understand how to do both.
> Then it can probe the qemu/kvm binary and host system to figure out
> which is supported and use the correct device options based on what it
> finds.  Trying to do both via the same qemu command line doesn't make
> sense to me, especially when the device setup is so different.

Ok! I thought your qemu version still would understand the old way via
pci-stub. But this obviously was a misunderstanding of mine :-(.


Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH uq/master] kvm: i8254: Finish time conversion fix

2012-08-13 Thread Michael Tokarev
On 13.08.2012 22:18, Jan Kiszka wrote:
> 0cdd3d1444 fixed reading back the counter load time from the kernel
> while assuming the kernel would always update its load time on writing
> the state. That is only true for channel 1, and so pit_get_channel_info
> returned wrong output pin states for high counter values.
> 
> Fix this by applying the offset also on kvm_pit_put. For this purpose,
> we cache the clock offset in KVMPITState, only updating it on VM state
> changes or when we write the state while the VM is stopped.

Wug.  The fix (consisting of two halves) appears to be quite messy.
Is it a (temporary) workaround or a real solution?

And yes, this second half fixes the reported issue with grub timekeeping,
and should fix the seabios problem as well (so it shouldn't be necessary
to mess with timekeeping in seabios anymore).

Thank you Jan!

/mjt
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


  1   2   >