[PATCH] vhost-blk: Add vhost-blk support v3

2012-10-10 Thread Asias He
vhost-blk is an in-kernel virito-blk device accelerator.

Due to lack of proper in-kernel AIO interface, this version converts
guest's I/O request to bio and use submit_bio() to submit I/O directly.
So this version any supports raw block device as guest's disk image,
e.g. /dev/sda, /dev/ram0. We can add file based image support to
vhost-blk once we have in-kernel AIO interface. There are some work in
progress for in-kernel AIO interface from Dave Kleikamp and Zach Brown:

   http://marc.info/?l=linux-fsdevelm=133312234313122

Performance evaluation:
-
1) LKVM
Fio with libaio ioengine on Fusion IO device using kvm tool
IOPS   Before   After   Improvement
seq-read   107  121 +13.0%
seq-write  130  179 +37.6%
rnd-read   102  122 +19.6%
rnd-write  125  159 +27.0%

2) QEMU
Fio with libaio ioengine on Fusion IO device using QEMU
IOPS   Before   After   Improvement
seq-read   76   123 +61.8%
seq-write  139  173 +24.4%
rnd-read   73   120 +64.3%
rnd-write  75   156 +108.0%

Userspace bits:
-
1) LKVM
The latest vhost-blk userspace bits for kvm tool can be found here:
g...@github.com:asias/linux-kvm.git blk.vhost-blk

2) QEMU
The latest vhost-blk userspace prototype for QEMU can be found here:
g...@github.com:asias/qemu.git blk.vhost-blk

Changes in V3:
- Sending REQ_FLUSH bio instead of vfs_fsync, thanks Christoph!
- Check file passed by user is a raw block device file

Signed-off-by: Asias He as...@redhat.com
---
 drivers/vhost/Kconfig |   1 +
 drivers/vhost/Kconfig.blk |  10 +
 drivers/vhost/Makefile|   2 +
 drivers/vhost/blk.c   | 669 ++
 drivers/vhost/blk.h   |   8 +
 5 files changed, 690 insertions(+)
 create mode 100644 drivers/vhost/Kconfig.blk
 create mode 100644 drivers/vhost/blk.c
 create mode 100644 drivers/vhost/blk.h

diff --git a/drivers/vhost/Kconfig b/drivers/vhost/Kconfig
index 202bba6..acd8038 100644
--- a/drivers/vhost/Kconfig
+++ b/drivers/vhost/Kconfig
@@ -11,4 +11,5 @@ config VHOST_NET
 
 if STAGING
 source drivers/vhost/Kconfig.tcm
+source drivers/vhost/Kconfig.blk
 endif
diff --git a/drivers/vhost/Kconfig.blk b/drivers/vhost/Kconfig.blk
new file mode 100644
index 000..ff8ab76
--- /dev/null
+++ b/drivers/vhost/Kconfig.blk
@@ -0,0 +1,10 @@
+config VHOST_BLK
+   tristate Host kernel accelerator for virtio blk (EXPERIMENTAL)
+   depends on BLOCK   EXPERIMENTAL  m
+   ---help---
+ This kernel module can be loaded in host kernel to accelerate
+ guest block with virtio_blk. Not to be confused with virtio_blk
+ module itself which needs to be loaded in guest kernel.
+
+ To compile this driver as a module, choose M here: the module will
+ be called vhost_blk.
diff --git a/drivers/vhost/Makefile b/drivers/vhost/Makefile
index a27b053..1a8a4a5 100644
--- a/drivers/vhost/Makefile
+++ b/drivers/vhost/Makefile
@@ -2,3 +2,5 @@ obj-$(CONFIG_VHOST_NET) += vhost_net.o
 vhost_net-y := vhost.o net.o
 
 obj-$(CONFIG_TCM_VHOST) += tcm_vhost.o
+obj-$(CONFIG_VHOST_BLK) += vhost_blk.o
+vhost_blk-y := blk.o
diff --git a/drivers/vhost/blk.c b/drivers/vhost/blk.c
new file mode 100644
index 000..4a4e793
--- /dev/null
+++ b/drivers/vhost/blk.c
@@ -0,0 +1,669 @@
+/*
+ * Copyright (C) 2011 Taobao, Inc.
+ * Author: Liu Yuan tailai...@taobao.com
+ *
+ * Copyright (C) 2012 Red Hat, Inc.
+ * Author: Asias He as...@redhat.com
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.
+ *
+ * virtio-blk server in host kernel.
+ */
+
+#include linux/miscdevice.h
+#include linux/module.h
+#include linux/vhost.h
+#include linux/virtio_blk.h
+#include linux/mutex.h
+#include linux/file.h
+#include linux/kthread.h
+#include linux/blkdev.h
+
+#include vhost.c
+#include vhost.h
+#include blk.h
+
+#define BLK_HDR0
+
+static DEFINE_IDA(vhost_blk_index_ida);
+
+enum {
+   VHOST_BLK_VQ_REQ = 0,
+   VHOST_BLK_VQ_MAX = 1,
+};
+
+struct req_page_list {
+   struct page **pages;
+   int pages_nr;
+};
+
+struct vhost_blk_req {
+   struct llist_node llnode;
+   struct req_page_list *pl;
+   struct vhost_blk *blk;
+
+   struct iovec *iov;
+   int iov_nr;
+
+   struct bio **bio;
+   atomic_t bio_nr;
+
+   sector_t sector;
+   int write;
+   u16 head;
+   long len;
+
+   u8 *status;
+};
+
+struct vhost_blk {
+   struct task_struct *host_kick;
+   struct vhost_blk_req *reqs;
+   struct vhost_virtqueue vq;
+   struct llist_head llhead;
+   struct vhost_dev dev;
+   u16 reqs_nr;
+   int index;
+};
+
+static inline int iov_num_pages(struct iovec *iov)
+{
+   return (PAGE_ALIGN((unsigned long)iov-iov_base + iov-iov_len) -
+  ((unsigned long)iov-iov_base  PAGE_MASK))  PAGE_SHIFT;
+}
+
+static int vhost_blk_setup(struct vhost_blk *blk)
+{
+  

Re: Using PCI config space to indicate config location

2012-10-10 Thread Michael S. Tsirkin
On Mon, Oct 08, 2012 at 12:51:25PM +1030, Rusty Russell wrote:
 (Topic updated, cc's trimmed).
 
 Anthony Liguori aligu...@us.ibm.com writes:
  Rusty Russell ru...@rustcorp.com.au writes:
  4) The only significant change to the spec is that we use PCI
 capabilities, so we can have infinite feature bits.
 (see 
  http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)
 
  We discussed this on IRC last night.  I don't think PCI capabilites are
  a good mechanism to use...
 
  PCI capabilities are there to organize how the PCI config space is
  allocated to allow vendor extensions to co-exist with future PCI
  extensions.
 
  But we've never used the PCI config space within virtio-pci.  We do
  everything in BAR0.  I don't think there's any real advantage of using
  the config space vs. a BAR for virtio-pci.
 
 Note before anyone gets confused; we were talking about using the PCI
 config space to indicate what BAR(s) the virtio stuff is in.  An
 alternative would be to simply specify a new layout format in BAR1.

One problem we are still left with is this: device specific
config accesses are still non atomic.
This is a problem for multibyte fields such as MAC address
where MAC could change while we are accessing it.

I was thinking about some backwards compatible way to solve this, but if
we are willing to break compatiblity or use some mode switch, how about
we give up on virtio config space completely, and do everything besides
IO and ISR through guest memory?

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Rusty Russell
Jamie Lokier ja...@shareable.org writes:

 Rusty Russell wrote:
 I don't think it'll be that bad; reset clears the device to unknown,
 bar0 moves it from unknown-legacy mode, bar1/2/3 changes it from
 unknown-modern mode, and anything else is bad (I prefer being strict so
 we catch bad implementations from the beginning).

 Will that work, if the guest with kernel that uses modern mode, kexecs
 to an older (but presumed reliable) kernel that only knows about legacy mode?

 I.e. will the replacement kernel, or (ideally) replacement driver on
 the rare occasion that is needed on a running kernel, be able to reset
 the device hard enough?

Well, you need to reset the device, so yes.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Rusty Russell
Anthony Liguori aligu...@us.ibm.com writes:
 Rusty Russell ru...@rustcorp.com.au writes:

 Anthony Liguori aligu...@us.ibm.com writes:
 We'll never remove legacy so we shouldn't plan on it.  There are
 literally hundreds of thousands of VMs out there with the current virtio
 drivers installed in them.  We'll be supporting them for a very, very
 long time :-)

 You will be supporting this for qemu on x86, sure.

 And PPC.

Yeah, Ben was a bit early fixing the virtio bugs I guess.  Oh well.

 As I think we're
 still in the growth phase for virtio, I prioritize future spec
 cleanliness pretty high.

 But I think you'll be surprised how fast this is deprecated:
 1) Bigger queues for block devices (guest-specified ringsize)
 2) Smaller rings for openbios (guest-specified alignment)
 3) All-mmio mode (powerpc)
 4) Whatever network features get numbers  31.

 We can do all of these things with incremental change to the existing
 layout.  That's the only way what I'm suggesting is different.

Now extend your proposal to add all those features we want.  We add
if()s to the driver that we need to keep forever.  Let's just look at
what your proposal get us:

bool has_feature(struct virtio_pci_dev *dev, u32 num)
{
/* Do we need to set selector? */
if (dev-extended_features) {
unsigned long selector_off = VIRTIO_PCI_HOST_FEATURES_SELECT;
if (dev-using_bar0  !dev-msi_active)
selector_off -= 32;
writel(dev-config + selector_off, num / 32);
num = 31;
} else if (num  31)
return false;
return readl(dev-config + VIRTIO_PCI_HOST_FEATURES)  (1  num);
}

vs two cases which independent with methods set at detection time:

virtio_pci_legacy.c:
bool has_feature(struct virtio_pci_dev *dev, u32 num)
{
if (num  31)
return false;
return readl(dev-config + VIRTIO_PCI_LEGACY_HOST_FEATURES)  (1  
num);
}

virtio_pci_new.c:
bool has_feature(struct virtio_pci_dev *dev, u32 num)
{
writel(dev-config + VIRTIO_PCI_HOST_FEATURES_SELECT, num / 32);
num = 31;
return readl(dev-config + VIRTIO_PCI_HOST_FEATURES)  (1  num);
}

 You want to reorder all of the fields and have a driver flag day.  But I
 strongly suspect we'll decide we need to do the same exercise again in 4
 years when we now need to figure out how to take advantage of
 transactional memory or some other whiz-bang hardware feature.

It's not a flag day, as it's backwards compatible.

Sure, we might have a clean cut again.  I'm not afraid to rewrite this
code to make it cleaner; perhaps this is somewhere qemu could benefit
from a similar approach.

 What I'm suggesting is:

 struct {
 __le32 host_features;   /* read-only */
 __le32 guest_features;  /* read/write */
 __le32 queue_pfn;   /* read/write */
 __le16 queue_size;  /* read-only */
 __le16 queue_sel;   /* read/write */
 __le16 queue_notify;/* read/write */
 u8 status;  /* read/write */
 u8 isr; /* read-only, clear on read */
 __le16 msi_config_vector;   /* read/write */
 __le16 msi_queue_vector;/* read/write */
 __le32 host_feature_select; /* read/write */
 __le32 guest_feature_select;/* read/write */
 __le32 queue_pfn_hi;/* read/write */
 };

 With the additional semantic that the virtio-config space is overlayed
 on top of the register set in BAR0 unless the
 VIRTIO_PCI_F_SEPARATE_CONFIG feature is acknowledged.  This feature
 acts as a latch and when set, removes the config space overlay.

 If the config space overlays the registers, the offset in BAR0 of the
 overlay depends on whether MSI is enabled or not in the PCI device.

 BAR1 is an MMIO mirror of BAR0 except that the config space is never
 overlayed in BAR1 regardless of VIRTIO_PCI_F_SEPARATE_CONFIG.

Creating a horrible mess for new code in order to support old code.
But I have to maintain the new code, and this is horrible.

 You mean, accesses to bar1/2/3 change it to modern mode.  That's a
 pretty big side effect of a read.

They don't need to, but I'd prefer they did to keep implementations
from mixing old and new.

 See above.  A guest could happily just use BAR1/BAR2 and completely
 ignore BAR0 provided that BAR1/BAR2 are present.

But x86 guests want BAR0 because IO space is faster than MMIO.  Right?
So, not happily.

 It also means the guest drivers don't need separate code paths for
 legacy vs. modern.  They can switch between the two by just changing
 a single pointer.

This is wrong.  Of course they have separate code paths, but now you've
got lots of crossover, rather than two distinct ones.

 The implementation ends up being pretty trivial too.  We can use the
 same MemoryRegion in QEMU for both PIO and MMIO.  The kernel code stays
 the same.  It just takes a couple if()s to handle 

Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Rusty Russell
Gerd Hoffmann kra...@redhat.com writes:
 So how about this:

 (1) Add a vendor specific pci capability for new-style virtio.
 Specifies the pci bar used for new-style virtio registers.
 Guests can use it to figure whenever new-style virtio is
 supported and to map the correct bar (which will probably
 be bar 1 in most cases).

This was closer to the original proposal[1], which I really liked (you
can layout bars however you want).  Anthony thought that vendor
capabilities were a PCI-e feature, but it seems they're blessed in PCI
2.3.

So let's return to that proposal, giving something like this:

/* IDs for different capabilities.  Must all exist. */
/* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
/* Common configuration */
#define VIRTIO_PCI_CAP_COMMON_CFG   1
/* Notifications */
#define VIRTIO_PCI_CAP_NOTIFY_CFG   2
/* ISR access */
#define VIRTIO_PCI_CAP_ISR_CFG  3
/* Device specific confiuration */
#define VIRTIO_PCI_CAP_DEVICE_CFG   4

/* This is the PCI capability header: */
struct virtio_pci_cap {
u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */
u8 cap_next;/* Generic PCI field: next ptr. */
u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */
u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */
u8 bar; /* Where to find it. */
u8 unused;
__le16 offset;  /* Offset within bar. */
__le32 length;  /* Length. */
};

This means qemu can point the isr_cfg into the legacy area if it wants.
In fact, it can put everything in BAR0 if it wants.

Thoughts?
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Proposal for virtio standardization.

2012-10-10 Thread Rusty Russell
Cornelia Huck cornelia.h...@de.ibm.com writes:
 On Thu, 27 Sep 2012 09:59:33 +0930
 Rusty Russell ru...@rustcorp.com.au wrote:
 3) Various clarifications, formalizations and cleanups to the spec text,
and possibly elimination of old deprecated features.
 
 4) The only significant change to the spec is that we use PCI
capabilities, so we can have infinite feature bits.
(see
 http://lists.linuxfoundation.org/pipermail/virtualization/2011-December/019198.html)

 Infinite only applies to virtio-pci, no?

Yes, you already have infinite feature bits for ccw, as does every bus
we did since PCI.

 Sounds like a good idea. I'll be happy to review the spec with an eye
 to virtio-ccw.

Thanks!
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Michael S. Tsirkin
On Wed, Oct 10, 2012 at 02:14:11PM +1030, Rusty Russell wrote:
  See above.  A guest could happily just use BAR1/BAR2 and completely
  ignore BAR0 provided that BAR1/BAR2 are present.
 
 But x86 guests want BAR0 because IO space is faster than MMIO.  Right?

Or to be a bit more precise, ATM on x86 emulating IO instructions is
faster than MMIO. IIUC this is since you get the address/data in
registers and don't need to decode them.

-- 
MST
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Anthony Liguori
Rusty Russell ru...@rustcorp.com.au writes:

 Gerd Hoffmann kra...@redhat.com writes:
 So how about this:

 (1) Add a vendor specific pci capability for new-style virtio.
 Specifies the pci bar used for new-style virtio registers.
 Guests can use it to figure whenever new-style virtio is
 supported and to map the correct bar (which will probably
 be bar 1 in most cases).

 This was closer to the original proposal[1], which I really liked (you
 can layout bars however you want).  Anthony thought that vendor
 capabilities were a PCI-e feature, but it seems they're blessed in PCI
 2.3.

2.3 was standardized in 2002.  Are we confident that vendor extensions
play nice with pre-2.3 OSes like Win2k, WinXP, etc?

I still think it's a bad idea to rely on something so new in something
as fundamental as virtio-pci unless we have to.

Regards,

Anthony Liguori


 So let's return to that proposal, giving something like this:

 /* IDs for different capabilities.  Must all exist. */
 /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
 /* Common configuration */
 #define VIRTIO_PCI_CAP_COMMON_CFG 1
 /* Notifications */
 #define VIRTIO_PCI_CAP_NOTIFY_CFG 2
 /* ISR access */
 #define VIRTIO_PCI_CAP_ISR_CFG3
 /* Device specific confiuration */
 #define VIRTIO_PCI_CAP_DEVICE_CFG 4

 /* This is the PCI capability header: */
 struct virtio_pci_cap {
   u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */
   u8 cap_next;/* Generic PCI field: next ptr. */
   u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */
   u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */
   u8 bar; /* Where to find it. */
   u8 unused;
   __le16 offset;  /* Offset within bar. */
   __le32 length;  /* Length. */
 };

 This means qemu can point the isr_cfg into the legacy area if it wants.
 In fact, it can put everything in BAR0 if it wants.

 Thoughts?
 Rusty.
 --
 To unsubscribe from this list: send the line unsubscribe kvm in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Michael S. Tsirkin
On Wed, Oct 10, 2012 at 08:36:12AM -0500, Anthony Liguori wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 
  Gerd Hoffmann kra...@redhat.com writes:
  So how about this:
 
  (1) Add a vendor specific pci capability for new-style virtio.
  Specifies the pci bar used for new-style virtio registers.
  Guests can use it to figure whenever new-style virtio is
  supported and to map the correct bar (which will probably
  be bar 1 in most cases).
 
  This was closer to the original proposal[1], which I really liked (you
  can layout bars however you want).  Anthony thought that vendor
  capabilities were a PCI-e feature, but it seems they're blessed in PCI
  2.3.
 
 2.3 was standardized in 2002.  Are we confident that vendor extensions
 play nice with pre-2.3 OSes like Win2k, WinXP, etc?
 
 I still think it's a bad idea to rely on something so new in something
 as fundamental as virtio-pci unless we have to.
 
 Regards,
 
 Anthony Liguori

Pre 2.3 it was the case that *all* space outside
the capability linked list, and any capability not
recognized by space, was considered vendor specific.
So there's no problem really.

 
  So let's return to that proposal, giving something like this:
 
  /* IDs for different capabilities.  Must all exist. */
  /* FIXME: Do we win from separating ISR, NOTIFY and COMMON? */
  /* Common configuration */
  #define VIRTIO_PCI_CAP_COMMON_CFG   1
  /* Notifications */
  #define VIRTIO_PCI_CAP_NOTIFY_CFG   2
  /* ISR access */
  #define VIRTIO_PCI_CAP_ISR_CFG  3
  /* Device specific confiuration */
  #define VIRTIO_PCI_CAP_DEVICE_CFG   4
 
  /* This is the PCI capability header: */
  struct virtio_pci_cap {
  u8 cap_vndr;/* Generic PCI field: PCI_CAP_ID_VNDR */
  u8 cap_next;/* Generic PCI field: next ptr. */
  u8 cap_len; /* Generic PCI field: sizeof(struct virtio_pci_cap). */
  u8 cfg_type;/* One of the VIRTIO_PCI_CAP_*_CFG. */
  u8 bar; /* Where to find it. */
  u8 unused;
  __le16 offset;  /* Offset within bar. */
  __le32 length;  /* Length. */
  };
 
  This means qemu can point the isr_cfg into the legacy area if it wants.
  In fact, it can put everything in BAR0 if it wants.
 
  Thoughts?
  Rusty.
  --
  To unsubscribe from this list: send the line unsubscribe kvm in
  the body of a message to majord...@vger.kernel.org
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/3] virtio-net: inline header support

2012-10-10 Thread Rusty Russell
Paolo Bonzini pbonz...@redhat.com writes:
 Il 09/10/2012 06:59, Rusty Russell ha scritto:
 Paolo Bonzini pbonz...@redhat.com writes:
 Il 05/10/2012 07:43, Rusty Russell ha scritto:
 That's good.  But virtio_blk's scsi command is insoluble AFAICT.  As I
 said to Anthony, the best rules are always and never, so I'd really
 rather not have to grandfather that in.

 It is, but we can add a rule that if the (transport) flag
 VIRTIO_RING_F_ANY_HEADER_SG is set, the cdb field is always 32 bytes in
 virtio-blk.
 
 Could we do that?  It's the cmd length I'm concerned about; is it always
 32 in practice for some reason?

 It is always 32 or less except in very obscure cases that are pretty
 much confined to iSCSI.  We don't care about the obscure cases, and the
 extra bytes don't hurt.

 BTW, 32 is the default cdb_size used by virtio-scsi.

 Currently qemu does:
 
 struct sg_io_hdr hdr;
 memset(hdr, 0, sizeof(struct sg_io_hdr));
 hdr.interface_id = 'S';
 hdr.cmd_len = req-elem.out_sg[1].iov_len;
 hdr.cmdp = req-elem.out_sg[1].iov_base;
 hdr.dxfer_len = 0;
 
 If it's a command which expects more output data, there's no way to
 guess where the boundary is between that command and the data.

 Yep, so I understood the problem right.

OK.  Well, Anthony wants qemu to be robust in this regard, so I am
tempted to rework all the qemu drivers to handle arbitrary layouts.
They could use a good audit anyway.

This would become a glaring exception, but I'm tempted to fix it to 32
bytes at the same time as we get the new pci layout (ie. for the virtio
1.0 spec).  The Linux driver would carefully be backwards compatible, of
course, and the spec would document why.

Cheers,
Rusty.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Wed, Oct 10, 2012 at 08:36:12AM -0500, Anthony Liguori wrote:
 Rusty Russell ru...@rustcorp.com.au writes:
 
  Gerd Hoffmann kra...@redhat.com writes:
  So how about this:
 
  (1) Add a vendor specific pci capability for new-style virtio.
  Specifies the pci bar used for new-style virtio registers.
  Guests can use it to figure whenever new-style virtio is
  supported and to map the correct bar (which will probably
  be bar 1 in most cases).
 
  This was closer to the original proposal[1], which I really liked (you
  can layout bars however you want).  Anthony thought that vendor
  capabilities were a PCI-e feature, but it seems they're blessed in PCI
  2.3.
 
 2.3 was standardized in 2002.  Are we confident that vendor extensions
 play nice with pre-2.3 OSes like Win2k, WinXP, etc?

2.2 (1998) had the capability IDs linked list, indicated by bit 4 in the
status register, but reserved ids 7 and above.  ID 9 (vendor specific)
was added in 2.3; it should be ignored, but will require testing of
course, like any change.

2.1 didn't have the capability ID linked list at all; bit 4 in the
status register was reserved.

QEMU's pci.c has capability support, and sets the capabiliy status bit
and list pointer when the driver requests (eg. the eepro100).

 Pre 2.3 it was the case that *all* space outside
 the capability linked list, and any capability not
 recognized by space, was considered vendor specific.
 So there's no problem really.

Oh in theory, sure.  In practice, almost certainly.  But this needs to
be proved with actual testing.

Cheers,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [Qemu-devel] Using PCI config space to indicate config location

2012-10-10 Thread Rusty Russell
Rusty Russell ru...@rustcorp.com.au writes:
 Jamie Lokier ja...@shareable.org writes:

 Rusty Russell wrote:
 I don't think it'll be that bad; reset clears the device to unknown,
 bar0 moves it from unknown-legacy mode, bar1/2/3 changes it from
 unknown-modern mode, and anything else is bad (I prefer being strict so
 we catch bad implementations from the beginning).

 Will that work, if the guest with kernel that uses modern mode, kexecs
 to an older (but presumed reliable) kernel that only knows about legacy mode?

 I.e. will the replacement kernel, or (ideally) replacement driver on
 the rare occasion that is needed on a running kernel, be able to reset
 the device hard enough?

 Well, you need to reset the device, so yes.

MST said something which made me think harder about this case.

Either there needs to be a way to tell what mode the device is in, or
legacy reset has to work, even in modern mode.  The latter is
preferable, since it allows an older kernel to do the reset.

Now, since qemu would almost certainly have to add code to stop that
working, it'll probably be fine.  But I'd really like to add a strict
mode to qemu virtio which does extra sanity checking for driver
authors, and that might break this.  That's OK.

Thanks,
Rusty.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


[PATCH] virtio-pci: use module_pci_driver to simplify the code

2012-10-10 Thread Wei Yongjun
From: Wei Yongjun yongjun_...@trendmicro.com.cn

Use the module_pci_driver() macro to make the code simpler
by eliminating module_init and module_exit calls.

dpatch engine is used to auto generate this patch.
(https://github.com/weiyj/dpatch)

Signed-off-by: Wei Yongjun yongjun_...@trendmicro.com.cn
---
 drivers/virtio/virtio_pci.c | 14 +-
 1 file changed, 1 insertion(+), 13 deletions(-)

diff --git a/drivers/virtio/virtio_pci.c b/drivers/virtio/virtio_pci.c
index c33aea3..b59237c 100644
--- a/drivers/virtio/virtio_pci.c
+++ b/drivers/virtio/virtio_pci.c
@@ -830,16 +830,4 @@ static struct pci_driver virtio_pci_driver = {
 #endif
 };
 
-static int __init virtio_pci_init(void)
-{
-   return pci_register_driver(virtio_pci_driver);
-}
-
-module_init(virtio_pci_init);
-
-static void __exit virtio_pci_exit(void)
-{
-   pci_unregister_driver(virtio_pci_driver);
-}
-
-module_exit(virtio_pci_exit);
+module_pci_driver(virtio_pci_driver);


___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: Using PCI config space to indicate config location

2012-10-10 Thread Rusty Russell
Michael S. Tsirkin m...@redhat.com writes:
 On Mon, Oct 08, 2012 at 12:51:25PM +1030, Rusty Russell wrote:
 Note before anyone gets confused; we were talking about using the PCI
 config space to indicate what BAR(s) the virtio stuff is in.  An
 alternative would be to simply specify a new layout format in BAR1.

 One problem we are still left with is this: device specific
 config accesses are still non atomic.
 This is a problem for multibyte fields such as MAC address
 where MAC could change while we are accessing it.

It's also a problem for related fields, eg. console width and height, or
disk geometry.

 I was thinking about some backwards compatible way to solve this, but if
 we are willing to break compatiblity or use some mode switch, how about
 we give up on virtio config space completely, and do everything besides
 IO and ISR through guest memory?

I think there's still a benefit in the simple publishing of information:
I don't really want to add a control queue for the console.  But
inevitably, once-static information can change in later versions, and
it's horrible to have config information plus a bit that says don't use
this, use the control queue.

Here's a table from a quick audit:

Driver  Config   Device changesDriver writes... after init?
net YY NN
block   YY YY
console YY NN
rng NN NN
balloon YY YY
scsiYN YN
9p  YN NN

For config space reads, I suggest the driver publish a generation count.
For writes, the standard seems to be a commit latch.  We could abuse the
generation count for this: the driver writes to it to commit config
changes.

ie:
/* Fields in VIRTIO_PCI_CAP_COMMON_CFG: */
struct virtio_pci_common_cfg {
/* About the whole device. */
__le32 device_feature_select;   /* read-write */
__le32 device_feature;  /* read-only */
__le32 guest_feature_select;/* read-write */
__le32 guest_feature;   /* read-only */
__le32 config_gen_and_latch;/* read-write */
__le16 msix_config; /* read-write */
__u8 device_status; /* read-write */
__u8 unused;

/* About a specific virtqueue. */
__le16 queue_select;/* read-write */
__le16 queue_align; /* read-write, power of 2. */
__le16 queue_size;  /* read-write, power of 2. */
__le16 queue_msix_vector;/* read-write */
__le64 queue_address;   /* read-write: 0x == DNE. */
};

Thoughts?
Rusty.
PS.  Let's make all the virtio-device config LE, too...
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization