Re: gfx card passthrough broken with latest head
Hi, On 13.08.2011 15:35, André Weidemann wrote: Hi, I pulled latest git yesterday and noticed that graphics card pass through does not work anymore. The VM hangs (probably when initializing the gfx card) when passing a graphics card to the VM. I started to do a git bisect but device assignment does not work at all when doing it. I started like this: cd /usr/local/src git clone git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git qemu-kvm-2011-08-12 cd qemu-kvm-2011-08-12 git bisect start git bisect good fda19064e889d4419dd3dc69ca8e6e7a1535fdf5 git bisect bad 44755ea36fee3f0b1093ef27404def8857602274 Bisecting: 229 revisions left to test after this [55c0975c5b358e948b9ae7bd7b07eff92508e756] Do constant folding for shift operations. ./configure --audio-drv-list=alsa --target-list=x86_64-softmmu --enable-kvm-device-assignment ERROR: unknown option --enable-kvm-device-assignment What am I doing wrong? I continued with git bisect bad from this point on, but up to where I started, no revision supported device assignment. I then did a git bisect reset and started over again. Now I continued with a few git bisect good until a revision came up that supported device assignment, but then after compiling and running qemu-system-x86_64, it exited with a segmentation fault. I continued with git bisect good until the end but all revision supporting device assignment existed with a segmentation fault. How come so many revision do not support device assignment? Is there a trick to enable it? Any help in this matter is greatly appreciated. Regards André -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/4] [NEW] cgroup test * general smoke_test + module dependend subtests (memory test included) * library for future use in other tests (kvm)
#SNIP + pwd = item.mk_cgroup() + if pwd == None: + logging.error(test_memory: Can't create cgroup) + return -1 + + logging.debug(test_memory: Memory filling test) + + f = open('/proc/meminfo','r') Not clean way how to do this.. It is better to use regular expression. But this is absolutely no important. OKi, anyway Ypu is trying to get get_mem_usage() function into utils. I'll use it then. + mem = f.readline() + while not mem.startswith(MemFree): + mem = f.readline() #SNIP + logging.error(cg.smoke_test[%d]: Can't remove cgroup direcotry, + part) + return -1 + + # Finish the process + part += 1 + ps.stdin.write('\n') + time.sleep(2) There should be bigger timeout. This is sometime make problem. Process ends correct way but not in timeout. OK, Lucas, can you please change it in patchset (if you intend to accept it?). 10 seconds seems to be more safe DL, thanks. + if (ps.poll() == None): + logging.error(cg.smoke_test[%d]: Process is not finished, part) + return -1 + + return 0 + + #SNIP Thank you, Jiří. kind regards, Lukáš -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Host where KSM appears to save a negative amount of memory
We're running KSM on kernel 2.6.39.2 with hosts running a number qemu-kvm virtual machines, and it has consistently been saving us a useful amount of RAM. To monitor the effective amount of memory saved, I've been looking at the difference between /sys/kernel/mm/ksm/pages_sharing and pages_shared. On a typical 32GB host, this has been coming out as at least a hundred thousand or so, which is presumably half to one gigabyte worth of 4k pages. However, this morning we've spotted something odd - a host where pages_sharing is smaller than pages_shared, giving a negative saving by the above calculation: # cat /sys/kernel/mm/ksm/pages_sharing 104 # cat /sys/kernel/mm/ksm/pages_shared 1761313 I think this means my interpretation of these values must be wrong, as I presumably can't have more pages being shared than instances of their use! Can anyone shed any light on what might be going on here for me? Am I misinterpreting these values, or does this look like it might be an accounting bug? (If the latter, what useful debug info can I extract from the system to help identify it?) Best wishes, Chris. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] kvm tools: Add helper to retrieve the field used in virtio config space
This patch adds a helper used to retrieve the type of field used when guest is writing or reading from virtio config space. Since the config space is dynamic, it may change during runtime - so we must calculate it before every read/write. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/include/kvm/virtio.h |6 ++ tools/kvm/virtio/core.c| 23 ++- 2 files changed, 28 insertions(+), 1 deletions(-) diff --git a/tools/kvm/include/kvm/virtio.h b/tools/kvm/include/kvm/virtio.h index b962705..3442338 100644 --- a/tools/kvm/include/kvm/virtio.h +++ b/tools/kvm/include/kvm/virtio.h @@ -12,6 +12,10 @@ #define VIRTIO_IRQ_LOW 0 #define VIRTIO_IRQ_HIGH1 +#define VIRTIO_PCI_O_CONFIG0 +#define VIRTIO_PCI_O_MSIX 1 +#define VIRTIO_PCI_O_FEATURES 2 + struct virt_queue { struct vringvring; u32 pfn; @@ -56,4 +60,6 @@ u16 virt_queue__get_inout_iov(struct kvm *kvm, struct virt_queue *queue, void virt_queue__trigger_irq(struct virt_queue *vq, int irq, u8 *isr, struct kvm *kvm); +int virtio__get_dev_specific_field(int offset, bool msix, bool features_hi, u32 *config_off); + #endif /* KVM__VIRTIO_H */ diff --git a/tools/kvm/virtio/core.c b/tools/kvm/virtio/core.c index d28cfc6..1398447 100644 --- a/tools/kvm/virtio/core.c +++ b/tools/kvm/virtio/core.c @@ -100,3 +100,24 @@ void virt_queue__trigger_irq(struct virt_queue *vq, int irq, u8 *isr, struct kvm kvm__irq_line(kvm, irq, VIRTIO_IRQ_HIGH); } } + +int virtio__get_dev_specific_field(int offset, bool msix, bool features_hi, u32 *config_off) +{ + if (msix) { + if (offset 4) + return VIRTIO_PCI_O_MSIX; + else + offset -= 4; + } + + if (features_hi) { + if (offset 4) + return VIRTIO_PCI_O_FEATURES; + else + offset -= 4; + } + + *config_off = offset; + + return VIRTIO_PCI_O_CONFIG; +} -- 1.7.6 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] kvm tools: Fix offset calculation for config space and MSI-X
This patch makes offsets for virtio config space and MSI-X dynamic. The change should fix the wrong usage of MSI-X space as virtio config space. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/virtio/net.c | 78 +++- 1 files changed, 57 insertions(+), 21 deletions(-) diff --git a/tools/kvm/virtio/net.c b/tools/kvm/virtio/net.c index 96858b7..aa4536b 100644 --- a/tools/kvm/virtio/net.c +++ b/tools/kvm/virtio/net.c @@ -65,6 +65,7 @@ struct net_dev { u32 gsis[VIRTIO_NET_NUM_QUEUES]; u32 msix_io_block; int compat_id; + boolmsix_enabled; pthread_t io_rx_thread; pthread_mutex_t io_rx_lock; @@ -176,17 +177,67 @@ static void *virtio_net_tx_thread(void *p) } +static bool virtio_net_pci_io_device_specific_out(struct kvm *kvm, void *data, + unsigned long offset, int size) +{ + u8 *config_space = (u8 *)ndev.config; + int type; + u32 config_offset; + + type = virtio__get_dev_specific_field(offset - 20, ndev.msix_enabled, 0, config_offset); + if (type == VIRTIO_PCI_O_MSIX) { + if (offset == VIRTIO_MSI_CONFIG_VECTOR) { + ndev.config_vector = ioport__read16(data); + } else { + u32 gsi; + u32 vec; + + vec = ndev.vq_vector[ndev.queue_selector] = ioport__read16(data); + + gsi = irq__add_msix_route(kvm, + pci_header.msix.table[vec].low, + pci_header.msix.table[vec].high, + pci_header.msix.table[vec].data); + + ndev.gsis[ndev.queue_selector] = gsi; + } + return true; + } + + if (size != 1) + return false; + + if ((config_offset) sizeof(struct virtio_net_config)) + pr_error(config offset is too big: %u, config_offset); + + config_space[config_offset] = *(u8 *)data; + + return true; +} + static bool virtio_net_pci_io_device_specific_in(void *data, unsigned long offset, int size) { u8 *config_space = (u8 *)ndev.config; + int type; + u32 config_offset; + + type = virtio__get_dev_specific_field(offset - 20, ndev.msix_enabled, 0, config_offset); + if (type == VIRTIO_PCI_O_MSIX) { + if (offset == VIRTIO_MSI_CONFIG_VECTOR) + ioport__write16(data, ndev.config_vector); + else + ioport__write16(data, ndev.vq_vector[ndev.queue_selector]); + + return true; + } if (size != 1) return false; - if ((offset - VIRTIO_MSI_CONFIG_VECTOR) sizeof(struct virtio_net_config)) - pr_error(config offset is too big: %li, offset - VIRTIO_MSI_CONFIG_VECTOR); + if ((config_offset) sizeof(struct virtio_net_config)) + pr_error(config offset is too big: %u, config_offset); - ioport__write8(data, config_space[offset - VIRTIO_MSI_CONFIG_VECTOR]); + ioport__write8(data, config_space[config_offset]); return true; } @@ -290,25 +341,8 @@ static bool virtio_net_pci_io_out(struct ioport *ioport, struct kvm *kvm, u16 po case VIRTIO_PCI_STATUS: ndev.status = ioport__read8(data); break; - case VIRTIO_MSI_CONFIG_VECTOR: - ndev.config_vector = ioport__read16(data); - break; - case VIRTIO_MSI_QUEUE_VECTOR: { - u32 gsi; - u32 vec; - - vec = ndev.vq_vector[ndev.queue_selector] = ioport__read16(data); - - gsi = irq__add_msix_route(kvm, - pci_header.msix.table[vec].low, - pci_header.msix.table[vec].high, - pci_header.msix.table[vec].data); - - ndev.gsis[ndev.queue_selector] = gsi; - break; - } default: - ret = false; + ret = virtio_net_pci_io_device_specific_out(kvm, data, offset, size); }; mutex_unlock(ndev.mutex); @@ -333,6 +367,8 @@ static void callback_mmio(u64 addr, u8 *data, u32 len, u8 is_write, void *ptr) memcpy(table + addr - ndev.msix_io_block, data, len); else memcpy(data, table + addr - ndev.msix_io_block, len); + + ndev.msix_enabled = 1; } static bool virtio_net__tap_init(const struct virtio_net_parameters *params) -- 1.7.6 -- To unsubscribe from this list: send the line
Re: Host where KSM appears to save a negative amount of memory
On Sun, 21 Aug 2011, Chris Webb wrote: We're running KSM on kernel 2.6.39.2 with hosts running a number qemu-kvm virtual machines, and it has consistently been saving us a useful amount of RAM. To monitor the effective amount of memory saved, I've been looking at the difference between /sys/kernel/mm/ksm/pages_sharing and pages_shared. On a typical 32GB host, this has been coming out as at least a hundred thousand or so, which is presumably half to one gigabyte worth of 4k pages. However, this morning we've spotted something odd - a host where pages_sharing is smaller than pages_shared, giving a negative saving by the above calculation: # cat /sys/kernel/mm/ksm/pages_sharing 104 # cat /sys/kernel/mm/ksm/pages_shared 1761313 I think this means my interpretation of these values must be wrong, as I presumably can't have more pages being shared than instances of their use! Can anyone shed any light on what might be going on here for me? Am I misinterpreting these values, or does this look like it might be an accounting bug? (If the latter, what useful debug info can I extract from the system to help identify it?) Your interpretation happens to be wrong, it is expected behaviour, but I agree it's a little odd. KSM chooses to show the numbers pages_shared and pages_sharing as exclusive counts: pages_sharing indicates the saving being made. So it would be perfectly reasonable to add those two numbers together to get the total number of pages sharing, the number you expected it to show; but it doesn't make sense to subtract shared from sharing. (I think Documentation/vm/ksm.txt does make that clear.) But you'd be right to question further, how come pages_sharing is less than pages_shared: what is a shared page if it's not being shared with anything else? (And, at the extreme, it might be that all those 104 pages_sharing are actually sharing the same one of the pages_shared.) It's a page that was shared with (at least one) others before, but all but one of these instances have got freed since, and we've left this page in the shared tree, so that it can be more quickly matched up with duplicates in future when they appear, as seems quite likely. We don't actively do anything to move them out of the shared state: some effort was needed to get them there, and no disadvantage in leaving them like that; but yes, it is misleading to describe them as shared. Hugh -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-blk: Add stats VQ to collect information about devices
On 08/18/2011 11:29 AM, Sasha Levin wrote: On Thu, 2011-08-18 at 08:10 -0700, Avi Kivity wrote: On 08/17/2011 09:38 PM, Sasha Levin wrote: On Wed, 2011-08-17 at 16:00 -0700, Avi Kivity wrote: On 08/16/2011 12:47 PM, Sasha Levin wrote: This patch adds support for an optional stats vq that works similary to the stats vq provided by virtio-balloon. The purpose of this change is to allow collection of statistics about working virtio-blk devices to easily analyze performance without having to tap into the guest. Why can't you get the same info from the host? i.e. read sectors? Some of the stats you can collect from the host, but some you can't. The ones you can't include all the timing statistics and the internal queue statistics (read/write merges). Surely you can time the actual amount of time the I/O takes? It doesn't account for the virtio round-trip, but does it matter? Why is the merge count important for the host? I assumed that the time the request spends in the virtio layer is (somewhat) significant, specially since that this is something that adds up over time. Merge count can be useful for several testing scenarios (I'll describe the reasoning behind this patch below). The idea behind providing all of the stats on the stats vq (which is basically what you see in '/dev/block/[device]/stats') is to give a consistent snapshot of the state of the device. What can you do with it? I was actually planning on submitting another patch that would add something similar into virtio-net. My plan was to enable collecting statistics regarding memory, network and disk usage in a simple manner without accessing guests. Why not just add an interface that lets you read files from a guest either via a guest agent (like qemu-ga) or a purpose built PV device? That would let you access the guest's full sysfs which seems to be quite a lot more useful long term than adding a bunch of specific interfaces. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virtio-net: Read MAC only after initializing MSI-X
On Fri, 19 Aug 2011 18:23:35 +0300, Michael S. Tsirkin m...@redhat.com wrote: On Sat, Aug 13, 2011 at 11:51:01AM +0300, Sasha Levin wrote: The MAC of a virtio-net device is located at the first field of the device specific header. This header is located at offset 20 if the device doesn't support MSI-X or offset 24 if it does. Current code in virtnet_probe() used to probe the MAC before checking for MSI-X, which means that the read was always made from offset 20 regardless of whether MSI-X in enabled or not. This patch moves the MAC probe to after the detection of whether MSI-X is enabled. This way the MAC will be read from offset 24 if the device indeed supports MSI-X. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com I am not sure I see a bug in virtio: the config pace layout simply changes as msix is enabled and disabled (and if you look at the latest draft, also on whether 64 bit features are enabled). It doesn't depend on msix capability being present in device. The spec seems to be explicit enough: If MSI-X is enabled for the device, two additional fields immediately follow this header. So I'm guessing the bug is in kvm tools which assume same layout for when msix is enabled and disabled. qemu-kvm seems to do the right thing so the device seems to get the correct mac. So, the config space moves once MSI-X is enabled? In which case, it should say ONCE MSI-X is enabled... Thanks, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
DMI BIOS String
Hi Folks, I could not track down any solid info on modifying the DMI BIOS string. For example, in VirtualBox you can use 'vboxmanage setsextradata' to set the BIOS product and vendor string per VM. Any ideas if this is possible with KVM? Thanks, Derek-- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm PCI assignment VFIO ramblings
On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote: We had an extremely productive VFIO BoF on Monday. Here's my attempt to capture the plan that I think we agreed to: We need to address both the description and enforcement of device groups. Groups are formed any time the iommu does not have resolution between a set of devices. On x86, this typically happens when a PCI-to-PCI bridge exists between the set of devices and the iommu. For Power, partitionable endpoints define a group. Grouping information needs to be exposed for both userspace and kernel internal usage. This will be a sysfs attribute setup by the iommu drivers. Perhaps: # cat /sys/devices/pci:00/:00:19.0/iommu_group 42 (I use a PCI example here, but attribute should not be PCI specific) Ok. Am I correct in thinking these group IDs are representing the minimum granularity, and are therefore always static, defined only by the connected hardware, not by configuration? From there we have a few options. In the BoF we discussed a model where binding a device to vfio creates a /dev/vfio$GROUP character device file. This group fd provides provides dma mapping ioctls as well as ioctls to enumerate and return a device fd for each attached member of the group (similar to KVM_CREATE_VCPU). We enforce grouping by returning an error on open() of the group fd if there are members of the group not bound to the vfio driver. Each device fd would then support a similar set of ioctls and mapping (mmio/pio/config) interface as current vfio, except for the obvious domain and dma ioctls superseded by the group fd. It seems a slightly strange distinction that the group device appears when any device in the group is bound to vfio, but only becomes usable when all devices are bound. Another valid model might be that /dev/vfio/$GROUP is created for all groups when the vfio module is loaded. The group fd would allow open() and some set of iommu querying and device enumeration ioctls, but would error on dma mapping and retrieving device fds until all of the group devices are bound to the vfio driver. Which is why I marginally prefer this model, although it's not a big deal. In either case, the uiommu interface is removed entirely since dma mapping is done via the group fd. As necessary in the future, we can define a more high performance dma mapping interface for streaming dma via the group fd. I expect we'll also include architecture specific group ioctls to describe features and capabilities of the iommu. The group fd will need to prevent concurrent open()s to maintain a 1:1 group to userspace process ownership model. A 1:1 group-process correspondance seems wrong to me. But there are many ways you could legitimately write the userspace side of the code, many of them involving some sort of concurrency. Implementing that concurrency as multiple processes (using explicit shared memory and/or other IPC mechanisms to co-ordinate) seems a valid choice that we shouldn't arbitrarily prohibit. Obviously, only one UID may be permitted to have the group open at a time, and I think that's enough to prevent them doing any worse than shooting themselves in the foot. Also on the table is supporting non-PCI devices with vfio. To do this, we need to generalize the read/write/mmap and irq eventfd interfaces. We could keep the same model of segmenting the device fd address space, perhaps adding ioctls to define the segment offset bit position or we could split each region into it's own fd (VFIO_GET_PCI_BAR_FD(0), VFIO_GET_PCI_CONFIG_FD(), VFIO_GET_MMIO_FD(3)), though we're already suffering some degree of fd bloat (group fd, device fd(s), interrupt event fd(s), per resource fd, etc). For interrupts we can overload VFIO_SET_IRQ_EVENTFD to be either PCI INTx or non-PCI irq Sounds reasonable. (do non-PCI devices support MSI?). They can. Obviously they might not have exactly the same semantics as PCI MSIs, but I know we have SoC systems with (non-PCI) on-die devices whose interrupts are treated by the (also on-die) root interrupt controller in the same way as PCI MSIs. For qemu, these changes imply we'd only support a model where we have a 1:1 group to iommu domain. The current vfio driver could probably become vfio-pci as we might end up with more target specific vfio drivers for non-pci. PCI should be able to maintain a simple -device vfio-pci,host=bb:dd.f to enable hotplug of individual devices. We'll need to come up with extra options when we need to expose groups to guest for pvdma. Are you saying that you'd no longer support the current x86 usage of putting all of one guest's devices into a single domain? If that's not what you're saying, how would the domains - now made up of a user's selection of groups, rather than individual devices - be configured? Hope that captures it, feel free to jump in with corrections and suggestions. Thanks, -- David Gibson