Re: dynamic virtio page size

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 14:02 -0600, Anthony Liguori wrote:
 Hollis Blanchard wrote:
  I wanted to make sure people on non-x86 architectures couldn't run into
  vring-size related problems that didn't also appear on x86.

 
 Having a VRING_SHIFT and a VRING_PAGE_SIZE where VRING_PAGE_SIZE != (1 
  VRING_SHIFT) is almost certainly going to break things in unexpected 
 ways because it's going against common understanding of the relationship 
 between shift and page_size.

In the patch I sent, they are named VIRTIO_PFN_SHIFT and
VRING_PAGE_SIZE. In fact, the first could really be named
VIRTIO_PCI_PFN_SHIFT or even more specific, which hopefully would
alleviate the confusion.

Anyways, I've been trying to implement your other idea, which was to
advertise a virtio page size from the host to the guest. Since the
virtio IO space is only 20 bytes long, and it's full, we need to offer a
feature bit to notify the guest that the IO space has been expanded. 

   virtio's PCI IO resource

|20 bytes| |

   virtio-pci   virtio-device
specific  specific
  data  data

If guests don't acknowledge the feature, the host pretends the
VIRTIO_PAGE_SIZE register (which would be the new IO offset 20) doesn't
exist, and reads to offset 20 instead return the device-specific config
data.

Problem: the guest has not yet acknowledged the advertised features
before accessing device-specific data. So in the virtio-blk/virtio-pci
case, virtio_dev_probe() calls virtblk_probe() *before*
vp_finalize_features(). virtblk_probe() uses virtio_config_val(), which
returns the wrong data because the new yet unknown feature flag is still
set. (Of course, we can't just fix the ordering guest-side, because that
breaks backwards compatibility.)

It seems to me that virtio-pci configuration is not very flexible. I
think I'm going to continue revising the hardcoded page size patches I
sent earlier...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: dynamic virtio page size

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 17:27 -0600, Hollis Blanchard wrote:
 
 In the patch I sent, they are named VIRTIO_PFN_SHIFT and
 VRING_PAGE_SIZE. In fact, the first could really be named
 VIRTIO_PCI_PFN_SHIFT or even more specific, which hopefully would
 alleviate the confusion.
 
 Anyways, I've been trying to implement your other idea, which was to
 advertise a virtio page size from the host to the guest. Since the
 virtio IO space is only 20 bytes long, and it's full, we need to offer a
 feature bit to notify the guest that the IO space has been expanded. 
 
virtio's PCI IO resource
 
 |20 bytes| |
 
virtio-pci   virtio-device
 specific  specific
   data  data
 
 If guests don't acknowledge the feature, the host pretends the
 VIRTIO_PAGE_SIZE register (which would be the new IO offset 20) doesn't
 exist, and reads to offset 20 instead return the device-specific config
 data.

FWIW, here's my current (broken) patch. Aside from being broken, it also
doesn't seem like this approach will scale well should we need to extend
the IO space again in the future.


diff --git a/qemu/hw/virtio.c b/qemu/hw/virtio.c
--- a/qemu/hw/virtio.c
+++ b/qemu/hw/virtio.c
@@ -45,7 +45,11 @@
  * a read-and-acknowledge. */
 #define VIRTIO_PCI_ISR 19
 
-#define VIRTIO_PCI_CONFIG  20
+/* A 32-bit r/o register specifying the size of a page, in bytes, in the
+ * virtio interface. */
+#define VIRTIO_PCI_PAGESIZE20
+
+#define VIRTIO_PCI_CONFIG  24
 
 /* Virtio ABI version, if we increment this, we break the guest driver. */
 #define VIRTIO_PCI_ABI_VERSION 0
@@ -55,6 +59,12 @@
  * KVM or if kqemu gets SMP support.
  */
 #define wmb() do { } while (0)
+
+#define VIRTIO_PAGE_SHIFT  12
+#define VIRTIO_PAGE_SIZE   (1VIRTIO_PAGE_SHIFT)
+#define VIRTIO_PAGE_MASK   (~(VIRTIO_PAGE_SIZE - 1))
+#define VIRTIO_PAGE_ALIGN(x)   (((x) + VIRTIO_PAGE_SIZE - 1) \
+ VIRTIO_PAGE_MASK)
 
 /* virt queue functions */
 
@@ -95,7 +105,7 @@ static void *virtio_map_gpa(target_phys_
 
 static size_t virtqueue_size(int num)
 {
-return TARGET_PAGE_ALIGN((sizeof(VRingDesc) * num) +
+return VIRTIO_PAGE_ALIGN((sizeof(VRingDesc) * num) +
 (sizeof(VRingAvail) + sizeof(uint16_t) * num)) +
(sizeof(VRingUsed) + sizeof(VRingUsedElem) * num);
 }
@@ -104,7 +114,7 @@ static void virtqueue_init(VirtQueue *vq
 {
 vq-vring.desc = p;
 vq-vring.avail = p + vq-vring.num * sizeof(VRingDesc);
-vq-vring.used = (void *)TARGET_PAGE_ALIGN((unsigned 
long)vq-vring.avail-ring[vq-vring.num]);
+vq-vring.used = (void *)VIRTIO_PAGE_ALIGN((unsigned 
long)vq-vring.avail-ring[vq-vring.num]);
 }
 
 static unsigned virtqueue_next_desc(VirtQueue *vq, unsigned int i)
@@ -234,6 +244,9 @@ static uint32_t virtio_config_readb(void
 
 vdev-get_config(vdev, vdev-config);
 
+if (!(vdev-features  VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
+
 addr -= vdev-addr + VIRTIO_PCI_CONFIG;
 if (addr  (vdev-config_len - sizeof(val)))
return (uint32_t)-1;
@@ -248,6 +261,9 @@ static uint32_t virtio_config_readw(void
 uint16_t val;
 
 vdev-get_config(vdev, vdev-config);
+
+if (!(vdev-features  VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
 
 addr -= vdev-addr + VIRTIO_PCI_CONFIG;
 if (addr  (vdev-config_len - sizeof(val)))
@@ -264,6 +280,9 @@ static uint32_t virtio_config_readl(void
 
 vdev-get_config(vdev, vdev-config);
 
+if (!(vdev-features  VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
+
 addr -= vdev-addr + VIRTIO_PCI_CONFIG;
 if (addr  (vdev-config_len - sizeof(val)))
return (uint32_t)-1;
@@ -276,6 +295,9 @@ static void virtio_config_writeb(void *o
 {
 VirtIODevice *vdev = opaque;
 uint8_t val = data;
+
+if (!(vdev-features  VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
 
 addr -= vdev-addr + VIRTIO_PCI_CONFIG;
 if (addr  (vdev-config_len - sizeof(val)))
@@ -292,6 +314,9 @@ static void virtio_config_writew(void *o
 VirtIODevice *vdev = opaque;
 uint16_t val = data;
 
+if (!(vdev-features  VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
+
 addr -= vdev-addr + VIRTIO_PCI_CONFIG;
 if (addr  (vdev-config_len - sizeof(val)))
return;
@@ -306,6 +331,9 @@ static void virtio_config_writel(void *o
 {
 VirtIODevice *vdev = opaque;
 uint32_t val = data;
+
+if (!(vdev-features  VIRTIO_F_EXPLICIT_PAGE_SIZE))
+   addr += 4;
 
 addr -= vdev-addr + VIRTIO_PCI_CONFIG;
 if (addr  (vdev-config_len - sizeof(val)))
@@ -329,9 +357,10 @@ static void virtio_ioport_write(void *op
if (vdev-set_features)
vdev-set_features(vdev, val);
vdev-features = val;
+   printf(%s: features 0x%x\n, __func__, val);
break;
 case 

Re: dynamic virtio page size

2008-11-06 Thread Hollis Blanchard
On Thu, 2008-11-06 at 14:02 -0600, Anthony Liguori wrote:
 Hollis Blanchard wrote:
  I wanted to make sure people on non-x86 architectures couldn't run into
  vring-size related problems that didn't also appear on x86.

 
 Having a VRING_SHIFT and a VRING_PAGE_SIZE where VRING_PAGE_SIZE != (1 
  VRING_SHIFT) is almost certainly going to break things in unexpected 
 ways because it's going against common understanding of the relationship 
 between shift and page_size.

In the patch I sent, they are named VIRTIO_PFN_SHIFT and
VRING_PAGE_SIZE. In fact, the first could really be named
VIRTIO_PCI_PFN_SHIFT or even more specific, which hopefully would
alleviate the confusion.

Anyways, I've been trying to implement your other idea, which was to
advertise a virtio page size from the host to the guest. Since the
virtio IO space is only 20 bytes long, and it's full, we need to offer a
feature bit to notify the guest that the IO space has been expanded. 

   virtio's PCI IO resource

|20 bytes| |

   virtio-pci   virtio-device
specific  specific
  data  data

If guests don't acknowledge the feature, the host pretends the
VIRTIO_PAGE_SIZE register (which would be the new IO offset 20) doesn't
exist, and reads to offset 20 instead return the device-specific config
data.

Problem: the guest has not yet acknowledged the advertised features
before accessing device-specific data. So in the virtio-blk/virtio-pci
case, virtio_dev_probe() calls virtblk_probe() *before*
vp_finalize_features(). virtblk_probe() uses virtio_config_val(), which
returns the wrong data because the new yet unknown feature flag is still
set. (Of course, we can't just fix the ordering guest-side, because that
breaks backwards compatibility.)

It seems to me that virtio-pci configuration is not very flexible. I
think I'm going to continue revising the hardcoded page size patches I
sent earlier...

-- 
Hollis Blanchard
IBM Linux Technology Center

--
To unsubscribe from this list: send the line unsubscribe kvm-ppc in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html