Re: gfx card passthrough broken with latest head

2011-08-21 Thread André Weidemann

Hi,

On 13.08.2011 15:35, André Weidemann wrote:

Hi,
I pulled latest git yesterday and noticed that graphics card pass
through does not work anymore. The VM hangs (probably when initializing
the gfx card) when passing a graphics card to the VM.

I started to do a git bisect but device assignment does not work at all
when doing it.

I started like this:

cd /usr/local/src
git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git
qemu-kvm-2011-08-12
cd qemu-kvm-2011-08-12
git bisect start
git bisect good fda19064e889d4419dd3dc69ca8e6e7a1535fdf5
git bisect bad 44755ea36fee3f0b1093ef27404def8857602274

Bisecting: 229 revisions left to test after this
[55c0975c5b358e948b9ae7bd7b07eff92508e756] Do constant folding for shift
operations.

./configure --audio-drv-list=alsa --target-list=x86_64-softmmu
--enable-kvm-device-assignment

ERROR: unknown option --enable-kvm-device-assignment

What am I doing wrong?


I continued with git bisect bad from this point on, but up to where I 
started, no revision supported device assignment.


I then did a git bisect reset and started over again. Now I continued 
with a few git bisect good until a revision came up that supported 
device assignment, but then after compiling and running 
qemu-system-x86_64, it exited with a segmentation fault. I continued 
with git bisect good until the end but all revision supporting device 
assignment existed with a segmentation fault.


How come so many revision do not support device assignment? Is there a 
trick to enable it?


Any help in this matter is greatly appreciated.

Regards
 André
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/4] [NEW] cgroup test * general smoke_test + module dependend subtests (memory test included) * library for future use in other tests (kvm)

2011-08-21 Thread Lukáš Doktor

#SNIP


+ pwd = item.mk_cgroup()
+ if pwd == None:
+ logging.error(test_memory: Can't create cgroup)
+ return -1
+
+ logging.debug(test_memory: Memory filling test)
+
+ f = open('/proc/meminfo','r')


Not clean way how to do this.. It is better to use regular expression.
But this is absolutely no important.



OKi, anyway Ypu is trying to get get_mem_usage() function into utils. 
I'll use it then.

+ mem = f.readline()
+ while not mem.startswith(MemFree):
+ mem = f.readline()


#SNIP


+ logging.error(cg.smoke_test[%d]: Can't remove cgroup direcotry,
+ part)
+ return -1
+
+ # Finish the process
+ part += 1
+ ps.stdin.write('\n')
+ time.sleep(2)


There should be bigger timeout. This is sometime make problem.
Process ends correct way but not in timeout.



OK, Lucas, can you please change it in patchset (if you intend to accept 
it?). 10 seconds seems to be more safe DL, thanks.



+ if (ps.poll() == None):
+ logging.error(cg.smoke_test[%d]: Process is not finished, part)
+ return -1
+
+ return 0
+
+


#SNIP

Thank you, Jiří.

kind regards,
Lukáš
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Host where KSM appears to save a negative amount of memory

2011-08-21 Thread Chris Webb
We're running KSM on kernel 2.6.39.2 with hosts running a number qemu-kvm
virtual machines, and it has consistently been saving us a useful amount of
RAM.

To monitor the effective amount of memory saved, I've been looking at the
difference between /sys/kernel/mm/ksm/pages_sharing and pages_shared. On a
typical 32GB host, this has been coming out as at least a hundred thousand
or so, which is presumably half to one gigabyte worth of 4k pages.

However, this morning we've spotted something odd - a host where
pages_sharing is smaller than pages_shared, giving a negative saving by the
above calculation:

  # cat /sys/kernel/mm/ksm/pages_sharing
  104
  # cat /sys/kernel/mm/ksm/pages_shared
  1761313

I think this means my interpretation of these values must be wrong, as I
presumably can't have more pages being shared than instances of their use!
Can anyone shed any light on what might be going on here for me? Am I
misinterpreting these values, or does this look like it might be an
accounting bug? (If the latter, what useful debug info can I extract from
the system to help identify it?)

Best wishes,

Chris.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] kvm tools: Add helper to retrieve the field used in virtio config space

2011-08-21 Thread Sasha Levin
This patch adds a helper used to retrieve the type of field used when guest
is writing or reading from virtio config space.

Since the config space is dynamic, it may change during runtime - so we must
calculate it before every read/write.
Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/include/kvm/virtio.h |6 ++
 tools/kvm/virtio/core.c|   23 ++-
 2 files changed, 28 insertions(+), 1 deletions(-)

diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h
index b962705..3442338 100644
--- a/tools/kvm/include/kvm/virtio.h
+++ b/tools/kvm/include/kvm/virtio.h
@@ -12,6 +12,10 @@
 #define VIRTIO_IRQ_LOW 0
 #define VIRTIO_IRQ_HIGH1
 
+#define VIRTIO_PCI_O_CONFIG0
+#define VIRTIO_PCI_O_MSIX  1
+#define VIRTIO_PCI_O_FEATURES  2
+
 struct virt_queue {
struct vringvring;
u32 pfn;
@@ -56,4 +60,6 @@ u16 virt_queue__get_inout_iov(struct kvm *kvm, struct 
virt_queue *queue,
 
 void virt_queue__trigger_irq(struct virt_queue *vq, int irq, u8 *isr, struct 
kvm *kvm);
 
+int virtio__get_dev_specific_field(int offset, bool msix, bool features_hi, 
u32 *config_off);
+
 #endif /* KVM__VIRTIO_H */
diff --git a/tools/kvm/virtio/core.c b/tools/kvm/virtio/core.c
index d28cfc6..1398447 100644
--- a/tools/kvm/virtio/core.c
+++ b/tools/kvm/virtio/core.c
@@ -100,3 +100,24 @@ void virt_queue__trigger_irq(struct virt_queue *vq, int 
irq, u8 *isr, struct kvm
kvm__irq_line(kvm, irq, VIRTIO_IRQ_HIGH);
}
 }
+
+int virtio__get_dev_specific_field(int offset, bool msix, bool features_hi, 
u32 *config_off)
+{
+   if (msix) {
+   if (offset  4)
+   return VIRTIO_PCI_O_MSIX;
+   else
+   offset -= 4;
+   }
+
+   if (features_hi) {
+   if (offset  4)
+   return VIRTIO_PCI_O_FEATURES;
+   else
+   offset -= 4;
+   }
+
+   *config_off = offset;
+
+   return VIRTIO_PCI_O_CONFIG;
+}
-- 
1.7.6

--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] kvm tools: Fix offset calculation for config space and MSI-X

2011-08-21 Thread Sasha Levin
This patch makes offsets for virtio config space and MSI-X dynamic.

The change should fix the wrong usage of MSI-X space as virtio
config space.

Signed-off-by: Sasha Levin levinsasha...@gmail.com
---
 tools/kvm/virtio/net.c |   78 +++-
 1 files changed, 57 insertions(+), 21 deletions(-)

diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c
index 96858b7..aa4536b 100644
--- a/tools/kvm/virtio/net.c
+++ b/tools/kvm/virtio/net.c
@@ -65,6 +65,7 @@ struct net_dev {
u32 gsis[VIRTIO_NET_NUM_QUEUES];
u32 msix_io_block;
int compat_id;
+   boolmsix_enabled;
 
pthread_t   io_rx_thread;
pthread_mutex_t io_rx_lock;
@@ -176,17 +177,67 @@ static void *virtio_net_tx_thread(void *p)
 
 }
 
+static bool virtio_net_pci_io_device_specific_out(struct kvm *kvm, void *data,
+   unsigned long offset, 
int size)
+{
+   u8 *config_space = (u8 *)ndev.config;
+   int type;
+   u32 config_offset;
+
+   type = virtio__get_dev_specific_field(offset - 20, ndev.msix_enabled, 
0, config_offset);
+   if (type == VIRTIO_PCI_O_MSIX) {
+   if (offset == VIRTIO_MSI_CONFIG_VECTOR) {
+   ndev.config_vector  = ioport__read16(data);
+   } else {
+   u32 gsi;
+   u32 vec;
+
+   vec = ndev.vq_vector[ndev.queue_selector] = 
ioport__read16(data);
+
+   gsi = irq__add_msix_route(kvm,
+ 
pci_header.msix.table[vec].low,
+ 
pci_header.msix.table[vec].high,
+ 
pci_header.msix.table[vec].data);
+
+   ndev.gsis[ndev.queue_selector] = gsi;
+   }
+   return true;
+   }
+
+   if (size != 1)
+   return false;
+
+   if ((config_offset)  sizeof(struct virtio_net_config))
+   pr_error(config offset is too big: %u, config_offset);
+
+   config_space[config_offset] = *(u8 *)data;
+
+   return true;
+}
+
 static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long 
offset, int size)
 {
u8 *config_space = (u8 *)ndev.config;
+   int type;
+   u32 config_offset;
+
+   type = virtio__get_dev_specific_field(offset - 20, ndev.msix_enabled, 
0, config_offset);
+   if (type == VIRTIO_PCI_O_MSIX) {
+   if (offset == VIRTIO_MSI_CONFIG_VECTOR)
+   ioport__write16(data, ndev.config_vector);
+   else
+   ioport__write16(data, 
ndev.vq_vector[ndev.queue_selector]);
+
+   return true;
+   }
 
if (size != 1)
return false;
 
-   if ((offset - VIRTIO_MSI_CONFIG_VECTOR)  sizeof(struct 
virtio_net_config))
-   pr_error(config offset is too big: %li, offset - 
VIRTIO_MSI_CONFIG_VECTOR);
+   if ((config_offset)  sizeof(struct virtio_net_config))
+   pr_error(config offset is too big: %u, config_offset);
 
-   ioport__write8(data, config_space[offset - VIRTIO_MSI_CONFIG_VECTOR]);
+   ioport__write8(data, config_space[config_offset]);
 
return true;
 }
@@ -290,25 +341,8 @@ static bool virtio_net_pci_io_out(struct ioport *ioport, 
struct kvm *kvm, u16 po
case VIRTIO_PCI_STATUS:
ndev.status = ioport__read8(data);
break;
-   case VIRTIO_MSI_CONFIG_VECTOR:
-   ndev.config_vector  = ioport__read16(data);
-   break;
-   case VIRTIO_MSI_QUEUE_VECTOR: {
-   u32 gsi;
-   u32 vec;
-
-   vec = ndev.vq_vector[ndev.queue_selector] = 
ioport__read16(data);
-
-   gsi = irq__add_msix_route(kvm,
- pci_header.msix.table[vec].low,
- pci_header.msix.table[vec].high,
- pci_header.msix.table[vec].data);
-
-   ndev.gsis[ndev.queue_selector] = gsi;
-   break;
-   }
default:
-   ret = false;
+   ret = virtio_net_pci_io_device_specific_out(kvm, data, offset, 
size);
};
 
mutex_unlock(ndev.mutex);
@@ -333,6 +367,8 @@ static void callback_mmio(u64 addr, u8 *data, u32 len, u8 
is_write, void *ptr)
memcpy(table + addr - ndev.msix_io_block, data, len);
else
memcpy(data, table + addr - ndev.msix_io_block, len);
+
+   ndev.msix_enabled = 1;
 }
 
 static bool virtio_net__tap_init(const struct virtio_net_parameters *params)
-- 
1.7.6

--
To unsubscribe from this list: send the line 

Re: Host where KSM appears to save a negative amount of memory

2011-08-21 Thread Hugh Dickins
On Sun, 21 Aug 2011, Chris Webb wrote:

 We're running KSM on kernel 2.6.39.2 with hosts running a number qemu-kvm
 virtual machines, and it has consistently been saving us a useful amount of
 RAM.
 
 To monitor the effective amount of memory saved, I've been looking at the
 difference between /sys/kernel/mm/ksm/pages_sharing and pages_shared. On a
 typical 32GB host, this has been coming out as at least a hundred thousand
 or so, which is presumably half to one gigabyte worth of 4k pages.
 
 However, this morning we've spotted something odd - a host where
 pages_sharing is smaller than pages_shared, giving a negative saving by the
 above calculation:
 
   # cat /sys/kernel/mm/ksm/pages_sharing
   104
   # cat /sys/kernel/mm/ksm/pages_shared
   1761313
 
 I think this means my interpretation of these values must be wrong, as I
 presumably can't have more pages being shared than instances of their use!
 Can anyone shed any light on what might be going on here for me? Am I
 misinterpreting these values, or does this look like it might be an
 accounting bug? (If the latter, what useful debug info can I extract from
 the system to help identify it?)

Your interpretation happens to be wrong, it is expected behaviour,
but I agree it's a little odd.

KSM chooses to show the numbers pages_shared and pages_sharing as
exclusive counts: pages_sharing indicates the saving being made.  So it
would be perfectly reasonable to add those two numbers together to get
the total number of pages sharing, the number you expected it to show;
but it doesn't make sense to subtract shared from sharing.

(I think Documentation/vm/ksm.txt does make that clear.)

But you'd be right to question further, how come pages_sharing is less
than pages_shared: what is a shared page if it's not being shared with
anything else?  (And, at the extreme, it might be that all those 104
pages_sharing are actually sharing the same one of the pages_shared.)

It's a page that was shared with (at least one) others before, but all
but one of these instances have got freed since, and we've left this
page in the shared tree, so that it can be more quickly matched up
with duplicates in future when they appear, as seems quite likely.

We don't actively do anything to move them out of the shared state:
some effort was needed to get them there, and no disadvantage in leaving
them like that; but yes, it is misleading to describe them as shared.

Hugh
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-blk: Add stats VQ to collect information about devices

2011-08-21 Thread Anthony Liguori

On 08/18/2011 11:29 AM, Sasha Levin wrote:

On Thu, 2011-08-18 at 08:10 -0700, Avi Kivity wrote:

On 08/17/2011 09:38 PM, Sasha Levin wrote:

On Wed, 2011-08-17 at 16:00 -0700, Avi Kivity wrote:

  On 08/16/2011 12:47 PM, Sasha Levin wrote:
 This patch adds support for an optional stats vq that works similary to 
the
 stats vq provided by virtio-balloon.
  
 The purpose of this change is to allow collection of statistics about 
working
 virtio-blk devices to easily analyze performance without having to tap 
into
 the guest.
  
  

  Why can't you get the same info from the host?  i.e. read sectors?


Some of the stats you can collect from the host, but some you can't.

The ones you can't include all the timing statistics and the internal
queue statistics (read/write merges).


Surely you can time the actual amount of time the I/O takes?  It doesn't
account for the virtio round-trip, but does it matter?

Why is the merge count important for the host?



I assumed that the time the request spends in the virtio layer is
(somewhat) significant, specially since that this is something that adds
up over time.

Merge count can be useful for several testing scenarios (I'll describe
the reasoning behind this patch below).



The idea behind providing all of the stats on the stats vq (which is
basically what you see in '/dev/block/[device]/stats') is to give a
consistent snapshot of the state of the device.




What can you do with it?



I was actually planning on submitting another patch that would add
something similar into virtio-net. My plan was to enable collecting
statistics regarding memory, network and disk usage in a simple manner
without accessing guests.


Why not just add an interface that lets you read files from a guest 
either via a guest agent (like qemu-ga) or a purpose built PV device?


That would let you access the guest's full sysfs which seems to be quite 
a lot more useful long term than adding a bunch of specific interfaces.


Regards,

Anthony Liguori
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] virtio-net: Read MAC only after initializing MSI-X

2011-08-21 Thread Rusty Russell
On Fri, 19 Aug 2011 18:23:35 +0300, Michael S. Tsirkin m...@redhat.com 
wrote:
 On Sat, Aug 13, 2011 at 11:51:01AM +0300, Sasha Levin wrote:
  The MAC of a virtio-net device is located at the first field of the device
  specific header. This header is located at offset 20 if the device doesn't
  support MSI-X or offset 24 if it does.
  
  Current code in virtnet_probe() used to probe the MAC before checking for
  MSI-X, which means that the read was always made from offset 20 regardless
  of whether MSI-X in enabled or not.
  
  This patch moves the MAC probe to after the detection of whether MSI-X is
  enabled. This way the MAC will be read from offset 24 if the device indeed
  supports MSI-X.
  
  Cc: Rusty Russell ru...@rustcorp.com.au
  Cc: Michael S. Tsirkin m...@redhat.com
  Cc: virtualizat...@lists.linux-foundation.org
  Cc: net...@vger.kernel.org
  Cc: kvm@vger.kernel.org
  Signed-off-by: Sasha Levin levinsasha...@gmail.com
 
 I am not sure I see a bug in virtio: the config pace layout simply
 changes as msix is enabled and disabled (and if you look at the latest
 draft, also on whether 64 bit features are enabled).
 It doesn't depend on msix capability being present in device.
 
 The spec seems to be explicit enough:
   If MSI-X is enabled for the device, two additional fields immediately
   follow this header.
 
 So I'm guessing the bug is in kvm tools which assume
 same layout for when msix is enabled and disabled.
 qemu-kvm seems to do the right thing so the device
 seems to get the correct mac.

So, the config space moves once MSI-X is enabled?  In which case, it
should say ONCE MSI-X is enabled...

Thanks,
Rusty.
--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


DMI BIOS String

2011-08-21 Thread Derek
Hi Folks,

I could not track down any solid info on modifying the DMI BIOS string.

For example, in VirtualBox you can use 'vboxmanage setsextradata' to set the 
BIOS product and vendor string per VM.

Any ideas if this is possible with KVM?

Thanks,
Derek--
To unsubscribe from this list: send the line unsubscribe kvm in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: kvm PCI assignment VFIO ramblings

2011-08-21 Thread David Gibson
On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote:
 We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
 capture the plan that I think we agreed to:
 
 We need to address both the description and enforcement of device
 groups.  Groups are formed any time the iommu does not have resolution
 between a set of devices.  On x86, this typically happens when a
 PCI-to-PCI bridge exists between the set of devices and the iommu.  For
 Power, partitionable endpoints define a group.  Grouping information
 needs to be exposed for both userspace and kernel internal usage.  This
 will be a sysfs attribute setup by the iommu drivers.  Perhaps:
 
 # cat /sys/devices/pci:00/:00:19.0/iommu_group
 42
 
 (I use a PCI example here, but attribute should not be PCI specific)

Ok.  Am I correct in thinking these group IDs are representing the
minimum granularity, and are therefore always static, defined only by
the connected hardware, not by configuration?

 From there we have a few options.  In the BoF we discussed a model where
 binding a device to vfio creates a /dev/vfio$GROUP character device
 file.  This group fd provides provides dma mapping ioctls as well as
 ioctls to enumerate and return a device fd for each attached member of
 the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
 returning an error on open() of the group fd if there are members of the
 group not bound to the vfio driver.  Each device fd would then support a
 similar set of ioctls and mapping (mmio/pio/config) interface as current
 vfio, except for the obvious domain and dma ioctls superseded by the
 group fd.

It seems a slightly strange distinction that the group device appears
when any device in the group is bound to vfio, but only becomes usable
when all devices are bound.

 Another valid model might be that /dev/vfio/$GROUP is created for all
 groups when the vfio module is loaded.  The group fd would allow open()
 and some set of iommu querying and device enumeration ioctls, but would
 error on dma mapping and retrieving device fds until all of the group
 devices are bound to the vfio driver.

Which is why I marginally prefer this model, although it's not a big
deal.

 In either case, the uiommu interface is removed entirely since dma
 mapping is done via the group fd.  As necessary in the future, we can
 define a more high performance dma mapping interface for streaming dma
 via the group fd.  I expect we'll also include architecture specific
 group ioctls to describe features and capabilities of the iommu.  The
 group fd will need to prevent concurrent open()s to maintain a 1:1 group
 to userspace process ownership model.

A 1:1 group-process correspondance seems wrong to me. But there are
many ways you could legitimately write the userspace side of the code,
many of them involving some sort of concurrency.  Implementing that
concurrency as multiple processes (using explicit shared memory and/or
other IPC mechanisms to co-ordinate) seems a valid choice that we
shouldn't arbitrarily prohibit.

Obviously, only one UID may be permitted to have the group open at a
time, and I think that's enough to prevent them doing any worse than
shooting themselves in the foot.

 Also on the table is supporting non-PCI devices with vfio.  To do this,
 we need to generalize the read/write/mmap and irq eventfd interfaces.
 We could keep the same model of segmenting the device fd address space,
 perhaps adding ioctls to define the segment offset bit position or we
 could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0),
 VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already
 suffering some degree of fd bloat (group fd, device fd(s), interrupt
 event fd(s), per resource fd, etc).  For interrupts we can overload
 VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq 

Sounds reasonable.

 (do non-PCI
 devices support MSI?).

They can.  Obviously they might not have exactly the same semantics as
PCI MSIs, but I know we have SoC systems with (non-PCI) on-die devices
whose interrupts are treated by the (also on-die) root interrupt
controller in the same way as PCI MSIs.

 For qemu, these changes imply we'd only support a model where we have a
 1:1 group to iommu domain.  The current vfio driver could probably
 become vfio-pci as we might end up with more target specific vfio
 drivers for non-pci.  PCI should be able to maintain a simple -device
 vfio-pci,host=bb:dd.f to enable hotplug of individual devices.  We'll
 need to come up with extra options when we need to expose groups to
 guest for pvdma.

Are you saying that you'd no longer support the current x86 usage of
putting all of one guest's devices into a single domain?  If that's
not what you're saying, how would the domains - now made up of a
user's selection of groups, rather than individual devices - be
configured?

 Hope that captures it, feel free to jump in with corrections and
 suggestions.  Thanks,

-- 
David Gibson