Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk
On Fri, 13 Jul 2012 16:38:51 +0800, Asias He as...@redhat.com wrote: This patch introduces bio-based IO path for virtio-blk. Acked-by: Rusty Russell ru...@rustcorp.com.au I just hope we can do better than a module option in future. Thanks, Rusty. ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
Hi Amit, Thank you for commenting on our work. (2012/07/26 20:35), Amit Shah wrote: On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote: [...] Therefore, we propose a new system virtio-trace, which uses enhanced virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel tracing data. In this system, there are 5 main components: (1) Ring-buffer of ftrace in a guest - When trace agent reads ring-buffer, a page is removed from ring-buffer. (2) Trace agent in the guest - Splice the page of ring-buffer to read_pipe using splice() without memory copying. Then, the page is spliced from write_pipe to virtio without memory copying. I really like the splicing idea. Thanks. We will improve this patch set. (3) Virtio-console driver in the guest - Pass the page to virtio-ring (4) Virtio-serial bus in QEMU - Copy the page to kernel pipe (5) Reader in the host - Read guest tracing data via FIFO(named pipe) So will this be useful only if guest and host run the same kernel? I'd like to see the host kernel not being used at all -- collect all relevant info from the guest and send it out to qemu, where it can be consumed directly by apps driving the tracing. No, this patch set is used only for guest kernels, so guest and host don't need to run the same kernel. ***Evaluation*** When a host collects tracing data of a guest, the performance of using virtio-trace is compared with that of using native(just running ftrace), IVRing, and virtio-serial(normal method of read/write). Why is tracing performance-sensitive? i.e. why try to optimise this at all? To minimize effects for applications on guests when a host collects tracing data of guests. For example, we assume the situation where guests A and B are running on a host sharing I/O device. An I/O delay problem occur in guest A, but it doesn't for the requirement in guest B. In this case, we need to collect tracing data of guests A and B, but a usual method using network takes high load for applications of guest B even if guest B is normally running. Therefore, we try to decrease the load on guests. We also use this feature for performance analysis on production virtualization systems. [...] ***Just enhancement ideas*** - Support for trace-cmd - Support for 9pfs protocol - Support for non-blocking mode in QEMU There were patches long back (by me) to make chardevs non-blocking but they didn't make it upstream. Fedora carries them, if you want to try out. Though we want to converge on a reasonable solution that's acceptable upstream as well. Just that no one's working on it currently. Any help here will be appreciated. Thanks! In this case, since a guest will stop to run when host reads trace data of the guest, char device is needed to add a non-blocking mode. I'll read your patch series. Is the latest version 8? http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html - Make vhost-serial I need to understand a) why it's perf-critical, and b) why should the host be involved at all, to comment on these. a) To make collecting overhead decrease for application on a guest. (see above) b) Trace data of host kernel is not involved even if we introduce this patch set. Thank you, -- Yoshihiro YUNOMAE Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: yoshihiro.yunomae...@hitachi.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Re: [RFC PATCH 0/6] virtio-trace: Support virtio-trace
On (Fri) 27 Jul 2012 [17:55:11], Yoshihiro YUNOMAE wrote: Hi Amit, Thank you for commenting on our work. (2012/07/26 20:35), Amit Shah wrote: On (Tue) 24 Jul 2012 [11:36:57], Yoshihiro YUNOMAE wrote: [...] Therefore, we propose a new system virtio-trace, which uses enhanced virtio-serial and existing ring-buffer of ftrace, for collecting guest kernel tracing data. In this system, there are 5 main components: (1) Ring-buffer of ftrace in a guest - When trace agent reads ring-buffer, a page is removed from ring-buffer. (2) Trace agent in the guest - Splice the page of ring-buffer to read_pipe using splice() without memory copying. Then, the page is spliced from write_pipe to virtio without memory copying. I really like the splicing idea. Thanks. We will improve this patch set. (3) Virtio-console driver in the guest - Pass the page to virtio-ring (4) Virtio-serial bus in QEMU - Copy the page to kernel pipe (5) Reader in the host - Read guest tracing data via FIFO(named pipe) So will this be useful only if guest and host run the same kernel? I'd like to see the host kernel not being used at all -- collect all relevant info from the guest and send it out to qemu, where it can be consumed directly by apps driving the tracing. No, this patch set is used only for guest kernels, so guest and host don't need to run the same kernel. OK - that's good to know. ***Evaluation*** When a host collects tracing data of a guest, the performance of using virtio-trace is compared with that of using native(just running ftrace), IVRing, and virtio-serial(normal method of read/write). Why is tracing performance-sensitive? i.e. why try to optimise this at all? To minimize effects for applications on guests when a host collects tracing data of guests. For example, we assume the situation where guests A and B are running on a host sharing I/O device. An I/O delay problem occur in guest A, but it doesn't for the requirement in guest B. In this case, we need to collect tracing data of guests A and B, but a usual method using network takes high load for applications of guest B even if guest B is normally running. Therefore, we try to decrease the load on guests. We also use this feature for performance analysis on production virtualization systems. OK, got it. [...] ***Just enhancement ideas*** - Support for trace-cmd - Support for 9pfs protocol - Support for non-blocking mode in QEMU There were patches long back (by me) to make chardevs non-blocking but they didn't make it upstream. Fedora carries them, if you want to try out. Though we want to converge on a reasonable solution that's acceptable upstream as well. Just that no one's working on it currently. Any help here will be appreciated. Thanks! In this case, since a guest will stop to run when host reads trace data of the guest, char device is needed to add a non-blocking mode. I'll read your patch series. Is the latest version 8? http://lists.gnu.org/archive/html/qemu-devel/2010-12/msg00035.html I suppose the latest version on-list is what you quote above. The objections to the patch series are mentioned in Anthony's mails. Hans maintains a rebased version of the patches in his tree at http://cgit.freedesktop.org/~jwrdegoede/qemu/ those patches are included in Fedora's qemu-kvm, so you can try that out if it improves performance for you. - Make vhost-serial I need to understand a) why it's perf-critical, and b) why should the host be involved at all, to comment on these. a) To make collecting overhead decrease for application on a guest. (see above) b) Trace data of host kernel is not involved even if we introduce this patch set. I see, so you suggested vhost-serial only because you saw the guest stopping problem due to the absence of non-blocking code? If so, it now makes sense. I don't think we need vhost-serial in any way yet. BTW where do you parse the trace data obtained from guests? On a remote host? Thanks, Amit ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
+enum { + VMCI_SUCCESS_QUEUEPAIR_ATTACH = 5, + VMCI_SUCCESS_QUEUEPAIR_CREATE = 4, + VMCI_SUCCESS_LAST_DETACH= 3, + VMCI_SUCCESS_ACCESS_GRANTED = 2, + VMCI_SUCCESS_ENTRY_DEAD = 1, We've got a nice collection of Linux error codes than you, and it would make the driver enormously more readable on the Linux side if as low level as possible it started using Linux error codes. + VMCI_SUCCESS= 0, + VMCI_ERROR_INVALID_RESOURCE = (-1), + VMCI_ERROR_INVALID_ARGS = (-2), + VMCI_ERROR_NO_MEM = (-3), + VMCI_ERROR_DATAGRAM_FAILED = (-4), + VMCI_ERROR_MORE_DATA= (-5), + VMCI_ERROR_NO_MORE_DATAGRAMS= (-6), + VMCI_ERROR_NO_ACCESS= (-7), + VMCI_ERROR_NO_HANDLE= (-8), + VMCI_ERROR_DUPLICATE_ENTRY = (-9), + VMCI_ERROR_DST_UNREACHABLE = (-10), + VMCI_ERROR_PAYLOAD_TOO_LARGE= (-11), + VMCI_ERROR_INVALID_PRIV = (-12), ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
Hi Andrew. A few things noted in the following.. diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 2661f6e..fe38c7a 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -517,4 +517,5 @@ source drivers/misc/lis3lv02d/Kconfig source drivers/misc/carma/Kconfig source drivers/misc/altera-stapl/Kconfig source drivers/misc/mei/Kconfig +source drivers/misc/vmw_vmci/Kconfig endmenu diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 456972f..af9e413 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -51,3 +51,4 @@ obj-y += carma/ obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/ obj-$(CONFIG_INTEL_MEI) += mei/ +obj-y+= vmw_vmci/ Please use obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/ like we do in the other cases. This prevents us from visiting the directory when this feature is not enabled. +++ b/drivers/misc/vmw_vmci/Makefile @@ -0,0 +1,43 @@ + +# +# Linux driver for VMware's VMCI device. +# +# Copyright (C) 2007-2012, VMware, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation; version 2 of the License and no later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or +# NON INFRINGEMENT. See the GNU General Public License for more +# details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +# +# The full GNU General Public License is included in this distribution in +# the file called COPYING. +# +# Maintained by: Andrew Stiegmann pv-driv...@vmware.com +# + Lot's of boilerplate noise for such a simple file... + +# +# Makefile for the VMware VMCI +# + +obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o + +vmw_vmci-objs += vmci_context.o +vmw_vmci-objs += vmci_datagram.o +vmw_vmci-objs += vmci_doorbell.o +vmw_vmci-objs += vmci_driver.o +vmw_vmci-objs += vmci_event.o +vmw_vmci-objs += vmci_handle_array.o +vmw_vmci-objs += vmci_hash_table.o +vmw_vmci-objs += vmci_queue_pair.o +vmw_vmci-objs += vmci_resource.o +vmw_vmci-objs += vmci_route.o please use: vmw_vmci-y += vmci_context.o vmw_vmci-y += vmci_datagram.o vmw_vmci-y += vmci_doorbell.o This is recommended these days and allows you to enable/disable single files later using a config option. diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h b/drivers/misc/vmw_vmci/vmci_common_int.h + +#ifndef _VMCI_COMMONINT_H_ +#define _VMCI_COMMONINT_H_ + +#include linux/printk.h +#include linux/vmw_vmci_defs.h Use inverse chrismas tree here. Longer include lines first, and soret alphabetically when lines are of the same length. This applies likely in many cases. +#include vmci_handle_array.h + +#define ASSERT(cond) BUG_ON(!(cond)) + +#define CAN_BLOCK(_f) (!((_f) VMCI_QPFLAG_NONBLOCK)) +#define QP_PINNED(_f) ((_f) VMCI_QPFLAG_PINNED) Looks like poor obscufation. Use a statis inline function if you need a helper for this. + +/* + * Utilility function that checks whether two entities are allowed + * to interact. If one of them is restricted, the other one must + * be trusted. + */ +static inline bool vmci_deny_interaction(uint32_t partOne, + uint32_t partTwo) The kernel types are u32 not uint32_t - these types belongs in user-space. +++ b/include/linux/vmw_vmci_api.h + +#ifndef __VMW_VMCI_API_H__ +#define __VMW_VMCI_API_H__ + +#include linux/vmw_vmci_defs.h + +#undef VMCI_KERNEL_API_VERSION +#define VMCI_KERNEL_API_VERSION_2 2 +#define VMCI_KERNEL_API_VERSION VMCI_KERNEL_API_VERSION_2 + +typedef void (VMCI_DeviceShutdownFn) (void *deviceRegistration, void *userData); + +bool VMCI_DeviceGet(uint32_t *apiVersion, + VMCI_DeviceShutdownFn *deviceShutdownCB, + void *userData, void **deviceRegistration); The kernel style is to use lower_case for everything. So this would become: vmci_device_get() This is obviously a very general comment and applies everywhere. Sam ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [net-next RFC V5 3/5] virtio: intorduce an API to set affinity for a virtqueue
Il 05/07/2012 12:29, Jason Wang ha scritto: Sometimes, virtio device need to configure irq affiniry hint to maximize the performance. Instead of just exposing the irq of a virtqueue, this patch introduce an API to set the affinity for a virtqueue. The api is best-effort, the affinity hint may not be set as expected due to platform support, irq sharing or irq type. Currently, only pci method were implemented and we set the affinity according to: - if device uses INTX, we just ignore the request - if device has per vq vector, we force the affinity hint - if the virtqueues share MSI, make the affinity OR over all affinities requested Signed-off-by: Jason Wang jasow...@redhat.com Hmm, I don't see any benefit from this patch, I need to use irq_set_affinity (which however is not exported) to actually bind IRQs to CPUs. Example: with irq_set_affinity_hint: 43: 89 107 100 97 PCI-MSI-edge virtio0-request 44: 178 195 268 199 PCI-MSI-edge virtio0-request 45: 97 100 97 155 PCI-MSI-edge virtio0-request 46: 234 261 213 218 PCI-MSI-edge virtio0-request with irq_set_affinity: 43: 721001 PCI-MSI-edge virtio0-request 44:0 74601 PCI-MSI-edge virtio0-request 45:00 6580 PCI-MSI-edge virtio0-request 46:001 547 PCI-MSI-edge virtio0-request I gathered these quickly after boot, but real benchmarks show the same behavior, and performance gets actually worse with virtio-scsi multiqueue+irq_set_affinity_hint than with irq_set_affinity. I also tried adding IRQ_NO_BALANCING, but the only effect is that I cannot set the affinity The queue steering algorithm I use in virtio-scsi is extremely simple and based on your tx code. See how my nice pinning is destroyed: # taskset -c 0 dd if=/dev/sda bs=1M count=1000 of=/dev/null iflag=direct # cat /proc/interrupts 43: 2690 2709 2691 2696 PCI-MSI-edge virtio0-request 44: 109 122 199 124 PCI-MSI-edge virtio0-request 45: 170 183 170 237 PCI-MSI-edge virtio0-request 46: 143 166 125 125 PCI-MSI-edge virtio0-request All my requests come from CPU#0 and thus go to the first virtqueue, but the interrupts are serviced all over the place. Did you set the affinity manually in your experiments, or perhaps there is a difference between scsi and networking... (interrupt mitigation?) Paolo ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
Hi Sam, - Original Message - From: Sam Ravnborg s...@ravnborg.org To: Andrew Stiegmann (stieg) astiegm...@vmware.com Cc: linux-ker...@vger.kernel.org, virtualization@lists.linux-foundation.org, pv-driv...@vmware.com, vm-crosst...@vmware.com, csch...@vmware.com, gre...@linuxfoundation.org Sent: Friday, July 27, 2012 3:34:55 AM Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build Hi Andrew. A few things noted in the following.. diff --git a/drivers/misc/Kconfig b/drivers/misc/Kconfig index 2661f6e..fe38c7a 100644 --- a/drivers/misc/Kconfig +++ b/drivers/misc/Kconfig @@ -517,4 +517,5 @@ source drivers/misc/lis3lv02d/Kconfig source drivers/misc/carma/Kconfig source drivers/misc/altera-stapl/Kconfig source drivers/misc/mei/Kconfig +source drivers/misc/vmw_vmci/Kconfig endmenu diff --git a/drivers/misc/Makefile b/drivers/misc/Makefile index 456972f..af9e413 100644 --- a/drivers/misc/Makefile +++ b/drivers/misc/Makefile @@ -51,3 +51,4 @@ obj-y += carma/ obj-$(CONFIG_USB_SWITCH_FSA9480) += fsa9480.o obj-$(CONFIG_ALTERA_STAPL) +=altera-stapl/ obj-$(CONFIG_INTEL_MEI)+= mei/ +obj-y += vmw_vmci/ Please use obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci/ like we do in the other cases. This prevents us from visiting the directory when this feature is not enabled. Ok. +++ b/drivers/misc/vmw_vmci/Makefile @@ -0,0 +1,43 @@ + +# +# Linux driver for VMware's VMCI device. +# +# Copyright (C) 2007-2012, VMware, Inc. All Rights Reserved. +# +# This program is free software; you can redistribute it and/or modify it +# under the terms of the GNU General Public License as published by the +# Free Software Foundation; version 2 of the License and no later version. +# +# This program is distributed in the hope that it will be useful, but +# WITHOUT ANY WARRANTY; without even the implied warranty of +# MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE, GOOD TITLE or +# NON INFRINGEMENT. See the GNU General Public License for more +# details. +# +# You should have received a copy of the GNU General Public License +# along with this program; if not, write to the Free Software +# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA. +# +# The full GNU General Public License is included in this distribution in +# the file called COPYING. +# +# Maintained by: Andrew Stiegmann pv-driv...@vmware.com +# + Lot's of boilerplate noise for such a simple file... I removed the section containing FSF address and section below it as well per Greg KH's request. + +# +# Makefile for the VMware VMCI +# + +obj-$(CONFIG_VMWARE_VMCI) += vmw_vmci.o + +vmw_vmci-objs += vmci_context.o +vmw_vmci-objs += vmci_datagram.o +vmw_vmci-objs += vmci_doorbell.o +vmw_vmci-objs += vmci_driver.o +vmw_vmci-objs += vmci_event.o +vmw_vmci-objs += vmci_handle_array.o +vmw_vmci-objs += vmci_hash_table.o +vmw_vmci-objs += vmci_queue_pair.o +vmw_vmci-objs += vmci_resource.o +vmw_vmci-objs += vmci_route.o please use: vmw_vmci-y += vmci_context.o vmw_vmci-y += vmci_datagram.o vmw_vmci-y += vmci_doorbell.o This is recommended these days and allows you to enable/disable single files later using a config option. Ok. diff --git a/drivers/misc/vmw_vmci/vmci_common_int.h b/drivers/misc/vmw_vmci/vmci_common_int.h + +#ifndef _VMCI_COMMONINT_H_ +#define _VMCI_COMMONINT_H_ + +#include linux/printk.h +#include linux/vmw_vmci_defs.h Use inverse chrismas tree here. Longer include lines first, and soret alphabetically when lines are of the same length. This applies likely in many cases. +#include vmci_handle_array.h + +#define ASSERT(cond) BUG_ON(!(cond)) + +#define CAN_BLOCK(_f) (!((_f) VMCI_QPFLAG_NONBLOCK)) +#define QP_PINNED(_f) ((_f) VMCI_QPFLAG_PINNED) Looks like poor obscufation. Use a statis inline function if you need a helper for this. These definitions are intended more as a helper to make reading the code easier. IMHO ts a lot easier to read if (CAN_BLOCK(flags)) compared to if (!(flags VMCI_QPFLAG_NONBLOCK)) Wouldn't you agree? I'm not sure something this simple warrants a static inline function but I don't see any harm in converting it over to that. + +/* + * Utilility function that checks whether two entities are allowed + * to interact. If one of them is restricted, the other one must + * be trusted. + */ +static inline bool vmci_deny_interaction(uint32_t partOne, +uint32_t partTwo) The kernel types are u32 not uint32_t - these types belongs in user-space. Ok. +++ b/include/linux/vmw_vmci_api.h +
Re: [Pv-drivers] [vmw_vmci 11/11] Apply the header code to make VMCI build
Hi Alan, On Fri, Jul 27, 2012 at 10:53:57AM +0100, Alan Cox wrote: +enum { + VMCI_SUCCESS_QUEUEPAIR_ATTACH = 5, + VMCI_SUCCESS_QUEUEPAIR_CREATE = 4, + VMCI_SUCCESS_LAST_DETACH= 3, + VMCI_SUCCESS_ACCESS_GRANTED = 2, + VMCI_SUCCESS_ENTRY_DEAD = 1, We've got a nice collection of Linux error codes than you, and it would make the driver enormously more readable on the Linux side if as low level as possible it started using Linux error codes. If VMCI was only used on Linux we'd definitely do that; however VMCI core is shared among several operating systems (much like ACPI is) and we'd like to limit divergencies between them while conforming to the kernel coding style as much as possible. We'll make sure that we will not leak VMCI-specific errors to the standard kernel APIs. Thanks, Dmitry ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote: The kernel style is to use lower_case for everything. So this would become: vmci_device_get() This is obviously a very general comment and applies everywhere. I wish I could lower case these symbols but VMCI has already existed outside the mainline Linux tree for some time now and changing these exported symbols would mean that other drivers that depend on VMCI (vSock, vmhgfs) would need to change as well. One thought that did come to mind was exporting both VMCI_Device_Get and vmci_device_get but that would likely just confuse people. So in short I have made function names lower case where possible, but exported symbols could not be changed. Not true at all. You want those drivers to be merged as well, right? So they will need to have their functions changed, and their code as well. Just wait until we get to the change your functionality around requests, those will require those drivers to change. Right now we are at the silly and obvious things you did wrong stage of the review process :) So please fix these, and also, post these drivers as well, so we can see how they interact with the core code. Actually, if you are going to need lots of refactoring for these drivers, and the core, I would recommend putting this all in the staging tree, to allow that to happen over time. That would ensure that your users keep having working systems, and let you modify the interfaces better and easier, than having to keep it all out-of-tree. What do you think? greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
- Original Message - From: Greg KH gre...@linuxfoundation.org To: Andrew Stiegmann astiegm...@vmware.com Cc: Sam Ravnborg s...@ravnborg.org, linux-ker...@vger.kernel.org, virtualization@lists.linux-foundation.org, pv-driv...@vmware.com, vm-crosst...@vmware.com, csch...@vmware.com Sent: Friday, July 27, 2012 11:16:39 AM Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote: The kernel style is to use lower_case for everything. So this would become: vmci_device_get() This is obviously a very general comment and applies everywhere. I wish I could lower case these symbols but VMCI has already existed outside the mainline Linux tree for some time now and changing these exported symbols would mean that other drivers that depend on VMCI (vSock, vmhgfs) would need to change as well. One thought that did come to mind was exporting both VMCI_Device_Get and vmci_device_get but that would likely just confuse people. So in short I have made function names lower case where possible, but exported symbols could not be changed. Not true at all. You want those drivers to be merged as well, right? So they will need to have their functions changed, and their code as well. As previously mentioned VMware is working on upstreaming our vSock driver (one of a few drivers that uses vmw_vmci). However there are no plans to upstream the other drivers that depend on vmw_vmci. Because of this these symbols can not change. Just wait until we get to the change your functionality around requests, those will require those drivers to change. Right now we are at the silly and obvious things you did wrong stage of the review process :) So please fix these, and also, post these drivers as well, so we can see how they interact with the core code. Actually, if you are going to need lots of refactoring for these drivers, and the core, I would recommend putting this all in the staging tree, to allow that to happen over time. That would ensure that your users keep having working systems, and let you modify the interfaces better and easier, than having to keep it all out-of-tree. What do you think? We will discuss this internally and let you know. greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
On Fri, Jul 27, 2012 at 11:39:23AM -0700, Andrew Stiegmann wrote: - Original Message - From: Greg KH gre...@linuxfoundation.org To: Andrew Stiegmann astiegm...@vmware.com Cc: Sam Ravnborg s...@ravnborg.org, linux-ker...@vger.kernel.org, virtualization@lists.linux-foundation.org, pv-driv...@vmware.com, vm-crosst...@vmware.com, csch...@vmware.com Sent: Friday, July 27, 2012 11:16:39 AM Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote: The kernel style is to use lower_case for everything. So this would become: vmci_device_get() This is obviously a very general comment and applies everywhere. I wish I could lower case these symbols but VMCI has already existed outside the mainline Linux tree for some time now and changing these exported symbols would mean that other drivers that depend on VMCI (vSock, vmhgfs) would need to change as well. One thought that did come to mind was exporting both VMCI_Device_Get and vmci_device_get but that would likely just confuse people. So in short I have made function names lower case where possible, but exported symbols could not be changed. Not true at all. You want those drivers to be merged as well, right? So they will need to have their functions changed, and their code as well. As previously mentioned VMware is working on upstreaming our vSock driver (one of a few drivers that uses vmw_vmci). Great. However there are no plans to upstream the other drivers that depend on vmw_vmci. Why not? That seems quite short-sighted. Because of this these symbols can not change. Then I would argue that we can not accept this code at all, because it will change over time, both symbol names, and functionality (see my previous comment about how that is going to have to change.) sorry, greg k-h ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: Re: [Qemu-devel] [RFC PATCH 0/6] virtio-trace: Support virtio-trace
On Wed, Jul 25, 2012 at 8:15 AM, Masami Hiramatsu masami.hiramatsu...@hitachi.com wrote: (2012/07/25 5:26), Blue Swirl wrote: The following patch set provides a low-overhead system for collecting kernel tracing data of guests by a host in a virtualization environment. A guest OS generally shares some devices with other guests or a host, so reasons of any problems occurring in a guest may be from other guests or a host. Then, to collect some tracing data of a number of guests and a host is needed when some problems occur in a virtualization environment. One of methods to realize that is to collect tracing data of guests in a host. To do this, network is generally used. However, high load will be taken to applications on guests using network I/O because there are many network stack layers. Therefore, a communication method for collecting the data without using network is needed. I implemented something similar earlier by passing trace data from OpenBIOS to QEMU using the firmware configuration device. The data format was the same as QEMU used for simpletrace event structure instead of ftrace. I didn't commit it because of a few problems. Sounds interesting :) I guess you traced BIOS events, right? Yes, I converted a few DPRINTFs to tracepoints as a proof of concept. I'm not familiar with ftrace, is it possible to trace two guest applications (BIOS and kernel) at the same time? Since ftrace itself is a tracing feature in the linux kernel, it can trace two or more applications (processes) if those run on linux kernel. However, I think OpenBIOS runs *under* the guest kernel. If so, ftrace currently can't trace OpenBIOS from guest side. No, OpenBIOS boots the machine and then passes control to boot loader and that to kernel. The kernel will make a few calls to OpenBIOS at start but not later. OpenBIOS is used by QEMU as Sparc and PowerPC BIOS. I think it may need another enhancement on both OpenBIOS and linux kernel to trace BIOS event from linux kernel. Ideally both OpenBIOS and Linux should be able to feed trace events back to QEMU independently. Or could this be handled by opening two different virtio-serial pipes, one for BIOS and the other for the kernel? Of course, virtio-serial itself can open multiple channels, thus, if OpenBIOS can handle virtio, it can pass trace data via another channel. Currently OpenBIOS probes the PCI bus and identifies virtio devices but ignores them, adding virtio-serial support shouldn't be too hard. There's a time window between CPU boot and PCI probe when the the device will not be available though. In my version, the tracepoint ID would have been used to demultiplex QEMU tracepoints from BIOS tracepoints, but something like separate ID spaces would have been better. I guess your feature notifies events to QEMU and QEMU records that in their own buffer. Therefore it must have different tracepoint IDs. On the other hand, with this feature, QEMU just passes trace-data to host-side pipe. Since outer tracing tool separately collects trace data, we don't need to demultiplex the data. Perhaps, in the analyzing phase (after tracing), we have to mix events again. At that time, we'll add some guest-ID for each event-ID, but it can be done offline. Yes, the multiplexing/demultiplexing is only needed in my version because the feeds are not independent. Best Regards, -- Masami HIRAMATSU Software Platform Research Dept. Linux Technology Center Hitachi, Ltd., Yokohama Research Laboratory E-mail: masami.hiramatsu...@hitachi.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
+ +#define CAN_BLOCK(_f) (!((_f) VMCI_QPFLAG_NONBLOCK)) +#define QP_PINNED(_f) ((_f) VMCI_QPFLAG_PINNED) Looks like poor obscufation. Use a statis inline function if you need a helper for this. These definitions are intended more as a helper to make reading the code easier. IMHO ts a lot easier to read if (CAN_BLOCK(flags)) compared to if (!(flags VMCI_QPFLAG_NONBLOCK)) Wouldn't you agree? I'm not sure something this simple warrants a static inline function but I don't see any harm in converting it over to that. I would put it the other way around. I cannot see that such simple stuff warrants a #define. A static inline is (almost) always preferable to hide code in a macro. For once you get better type-checks. And semantics are also much simpler. With a macro you can do so many silly things. Sam ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [vmw_vmci 11/11] Apply the header code to make VMCI build
- Original Message - From: Sam Ravnborg s...@ravnborg.org To: Andrew Stiegmann astiegm...@vmware.com Cc: linux-ker...@vger.kernel.org, virtualization@lists.linux-foundation.org, pv-driv...@vmware.com, vm-crosst...@vmware.com, csch...@vmware.com, gre...@linuxfoundation.org Sent: Friday, July 27, 2012 12:53:20 PM Subject: Re: [vmw_vmci 11/11] Apply the header code to make VMCI build + +#define CAN_BLOCK(_f) (!((_f) VMCI_QPFLAG_NONBLOCK)) +#define QP_PINNED(_f) ((_f) VMCI_QPFLAG_PINNED) Looks like poor obscufation. Use a statis inline function if you need a helper for this. These definitions are intended more as a helper to make reading the code easier. IMHO ts a lot easier to read if (CAN_BLOCK(flags)) compared to if (!(flags VMCI_QPFLAG_NONBLOCK)) Wouldn't you agree? I'm not sure something this simple warrants a static inline function but I don't see any harm in converting it over to that. I would put it the other way around. I cannot see that such simple stuff warrants a #define. A static inline is (almost) always preferable to hide code in a macro. For once you get better type-checks. And semantics are also much simpler. With a macro you can do so many silly things. Fair enough. I'll make them into static inline functions. Sam ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
Re: [Pv-drivers] [vmw_vmci 11/11] Apply the header code to make VMCI build
On Fri, Jul 27, 2012 at 11:16:39AM -0700, Greg KH wrote: On Fri, Jul 27, 2012 at 10:20:43AM -0700, Andrew Stiegmann wrote: The kernel style is to use lower_case for everything. So this would become: vmci_device_get() This is obviously a very general comment and applies everywhere. I wish I could lower case these symbols but VMCI has already existed outside the mainline Linux tree for some time now and changing these exported symbols would mean that other drivers that depend on VMCI (vSock, vmhgfs) would need to change as well. One thought that did come to mind was exporting both VMCI_Device_Get and vmci_device_get but that would likely just confuse people. So in short I have made function names lower case where possible, but exported symbols could not be changed. Not true at all. You want those drivers to be merged as well, right? So they will need to have their functions changed, and their code as well. Just wait until we get to the change your functionality around requests, those will require those drivers to change. Right now we are at the silly and obvious things you did wrong stage of the review process :) So please fix these, and also, post these drivers as well, so we can see how they interact with the core code. Actually, if you are going to need lots of refactoring for these drivers, and the core, I would recommend putting this all in the staging tree, to allow that to happen over time. That would ensure that your users keep having working systems, and let you modify the interfaces better and easier, than having to keep it all out-of-tree. What do you think? Actually I think that we'd prefer to keep this in a patch-based form, at least for now, because majority of our users get these drivers with VMware Tools and will continue doing so until ditsributions start enabling VMCI in their kernels. Which they probably won't until VMCI moves form staging. We'd also have to constantly adjust drivers that we are not working on getting upstream at this time to work with the rapidly changing version of VMCI in staging, which will just add work for us. So we'd like to get more feedback and have a chance to address issues and then decide whether staying in staging makes sense or not. Thanks. -- Dmitry ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
KVM Forum 2012 Call For Participation
= KVM Forum 2012: Call For Participation November 7-9, 2012 - Hotel Fira Palace - Barcelona, Spain (All submissions must be received before midnight Aug 31st, 2012) = KVM is an industry leading open source hypervisor that provides an ideal platform for datacenter virtualization, virtual desktop infrastructure, and cloud computing. Once again, it's time to bring together the community of developers and users that define the KVM ecosystem for our annual technical conference. We will discuss the current state of affairs and plan for the future of KVM, its surrounding infrastructure, and management tools. We are also excited to announce the oVirt Workshop will run in parallel with the KVM Forum, bringing in a community focused on enterprise datacenter virtualization management built on KVM. For topics which overlap we will have shared sessions. So mark your calendar and join us in advancing KVM. http://events.linuxfoundation.org/events/kvm-forum/ Once again we are colocated with The Linux Foundation's LinuxCon, Based on feedback from last year, this time it's LinuxCon Europe! KVM Forum attendees will be able to attend oVirt Workshop sessions and are eligible to attend LinuxCon Europe for a discounted rate. http://events.linuxfoundation.org/events/kvm-forum/register We invite you to lead part of the discussion by submitting a speaking proposal for KVM Forum 2012. http://events.linuxfoundation.org/cfp Suggested topics: KVM - Scaling and performance - Nested virtualization - I/O improvements - PCI device assignment - Driver domains - Time keeping - Resource management (cpu, memory, i/o) - Memory management (page sharing, swapping, huge pages, etc) - VEPA, VN-Link, vswitch - Security - Architecture ports QEMU - Device model improvements - New devices and chipsets - Scaling and performance - Desktop virtualization - Spice - Increasing robustness and hardening - Security model - Management interfaces - QMP protocol and implementation - Image formats - Firmware (SeaBIOS, OVMF, UEFI, etc) - Live migration - Live snapshots and merging - Fault tolerance, high availability, continuous backup - Real-time guest support Virtio - Speeding up existing devices - Alternatives - Virtio on non-Linux or non-virtualized Management infrastructure - oVirt (shared track w/ oVirt Workshop) - Libvirt - KVM autotest - OpenStack - Network virtualization management - Enterprise storage management Cloud computing - Scalable storage - Virtual networking - Security - Provisioning SUBMISSION REQUIREMENTS Abstracts due: Aug 31st, 2012 Notification: Sep 14th, 2012 Please submit a short abstract (~150 words) describing your presentation proposal. In your submission please note how long your talk will take. Slots vary in length up to 45 minutes. Also include in your proposal the proposal type -- one of: - technical talk - end-user talk - birds of a feather (BOF) session Submit your proposal here: http://events.linuxfoundation.org/cfp You will receive a notification whether or not your presentation proposal was accepted by Sep 14th. END-USER COLLABORATION One of the big challenges as developers is to know what, where and how people actually use our software. We will reserve a few slots for end users talking about their deployment challenges and achievements. If you are using KVM in production you are encouraged submit a speaking proposal. Simply mark it as an end-user collaboration proposal. As an end user, this is a unique opportunity to get your input to developers. BOF SESSION We will reserve some slots in the evening after the main conference tracks, for birds of a feather (BOF) sessions. These sessions will be less formal than presentation tracks and targetted for people who would like to discuss specific issues with other developers and/or users. If you are interested in getting developers and/or uses together to discuss a specific problem, please submit a BOF proposal. LIGHTNING TALKS In addition to submitted talks we will also have some room for lightning talks. These are short (5 minute) discussions to highlight new work or ideas that aren't complete enough to warrant a full presentation slot. Lightning talk submissions and scheduling will be handled on-site at KVM Forum. HOTEL / TRAVEL The KVM Forum 2012 will be held in Barcelona, Spain at the Hotel Fira Palace. http://events.linuxfoundation.org/events/kvm-forum/hotel Thank you for your interest in KVM. We're looking forward to your submissions and seeing you at the KVM Forum 2012 in November! Thanks, your KVM Forum 2012 Program Commitee Please contact us with any questions or comments. kvm-forum-2012...@redhat.com ___ Virtualization mailing list Virtualization@lists.linux-foundation.org
[PATCH V4 0/3] Improve virtio-blk performance
Hi, Jens Rusty This version is rebased against linux-next which resolves the conflict with Paolo Bonzini's 'virtio-blk: allow toggling host cache between writeback and writethrough' patch. Patch 1/3 and 2/3 applies on linus's master as well. Since Rusty will pick up patch 3/3 so the changes to block core (adding blk_bio_map_sg()) will have a user. Jens, could you please consider picking up the dependencies 1/3 and 2/3 in your tree. Thanks! This patchset implements bio-based IO path for virito-blk to improve performance. Fio test shows bio-based IO path gives the following performance improvement: 1) Ramdisk device With bio-based IO path, sequential read/write, random read/write IOPS boost : 28%, 24%, 21%, 16% Latency improvement: 32%, 17%, 21%, 16% 2) Fusion IO device With bio-based IO path, sequential read/write, random read/write IOPS boost : 11%, 11%, 13%, 10% Latency improvement: 10%, 10%, 12%, 10% Asias He (3): block: Introduce __blk_segment_map_sg() helper block: Add blk_bio_map_sg() helper virtio-blk: Add bio-based IO path for virtio-blk block/blk-merge.c | 117 + drivers/block/virtio_blk.c | 203 +++- include/linux/blkdev.h |2 + 3 files changed, 247 insertions(+), 75 deletions(-) -- 1.7.10.4 ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization
[PATCH V4 3/3] virtio-blk: Add bio-based IO path for virtio-blk
This patch introduces bio-based IO path for virtio-blk. Compared to request-based IO path, bio-based IO path uses driver provided -make_request_fn() method to bypasses the IO scheduler. It handles the bio to device directly without allocating a request in block layer. This reduces the IO path in guest kernel to achieve high IOPS and lower latency. The downside is that guest can not use the IO scheduler to merge and sort requests. However, this is not a big problem if the backend disk in host side uses faster disk device. When the bio-based IO path is not enabled, virtio-blk still uses the original request-based IO path, no performance difference is observed. Performance evaluation: - 1) Fio test is performed in a 8 vcpu guest with ramdisk based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 28%, 24%, 21%, 16% Latency improvement: 32%, 17%, 21%, 16% Long version: With bio-based IO path: seq-read : io=2048.0MB, bw=116996KB/s, iops=233991 , runt= 17925msec seq-write : io=2048.0MB, bw=100829KB/s, iops=201658 , runt= 20799msec rand-read : io=3095.7MB, bw=112134KB/s, iops=224268 , runt= 28269msec rand-write: io=3095.7MB, bw=96198KB/s, iops=192396 , runt= 32952msec clat (usec): min=0 , max=2631.6K, avg=58716.99, stdev=191377.30 clat (usec): min=0 , max=1753.2K, avg=66423.25, stdev=81774.35 clat (usec): min=0 , max=2915.5K, avg=61685.70, stdev=120598.39 clat (usec): min=0 , max=1933.4K, avg=76935.12, stdev=96603.45 cpu : usr=74.08%, sys=703.84%, ctx=29661403, majf=21354, minf=22460954 cpu : usr=70.92%, sys=702.81%, ctx=77219828, majf=13980, minf=27713137 cpu : usr=72.23%, sys=695.37%, ctx=88081059, majf=18475, minf=28177648 cpu : usr=69.69%, sys=654.13%, ctx=145476035, majf=15867, minf=26176375 With request-based IO path: seq-read : io=2048.0MB, bw=91074KB/s, iops=182147 , runt= 23027msec seq-write : io=2048.0MB, bw=80725KB/s, iops=161449 , runt= 25979msec rand-read : io=3095.7MB, bw=92106KB/s, iops=184211 , runt= 34416msec rand-write: io=3095.7MB, bw=82815KB/s, iops=165630 , runt= 38277msec clat (usec): min=0 , max=1932.4K, avg=77824.17, stdev=170339.49 clat (usec): min=0 , max=2510.2K, avg=78023.96, stdev=146949.15 clat (usec): min=0 , max=3037.2K, avg=74746.53, stdev=128498.27 clat (usec): min=0 , max=1363.4K, avg=89830.75, stdev=114279.68 cpu : usr=53.28%, sys=724.19%, ctx=37988895, majf=17531, minf=23577622 cpu : usr=49.03%, sys=633.20%, ctx=205935380, majf=18197, minf=27288959 cpu : usr=55.78%, sys=722.40%, ctx=101525058, majf=19273, minf=28067082 cpu : usr=56.55%, sys=690.83%, ctx=228205022, majf=18039, minf=26551985 2) Fio test is performed in a 8 vcpu guest with Fusion-IO based guest using kvm tool. Short version: With bio-based IO path, sequential read/write, random read/write IOPS boost : 11%, 11%, 13%, 10% Latency improvement: 10%, 10%, 12%, 10% Long Version: With bio-based IO path: read : io=2048.0MB, bw=58920KB/s, iops=117840 , runt= 35593msec write: io=2048.0MB, bw=64308KB/s, iops=128616 , runt= 32611msec read : io=3095.7MB, bw=59633KB/s, iops=119266 , runt= 53157msec write: io=3095.7MB, bw=62993KB/s, iops=125985 , runt= 50322msec clat (usec): min=0 , max=1284.3K, avg=128109.01, stdev=71513.29 clat (usec): min=94 , max=962339 , avg=116832.95, stdev=65836.80 clat (usec): min=0 , max=1846.6K, avg=128509.99, stdev=89575.07 clat (usec): min=0 , max=2256.4K, avg=121361.84, stdev=82747.25 cpu : usr=56.79%, sys=421.70%, ctx=147335118, majf=21080, minf=19852517 cpu : usr=61.81%, sys=455.53%, ctx=143269950, majf=16027, minf=24800604 cpu : usr=63.10%, sys=455.38%, ctx=178373538, majf=16958, minf=24822612 cpu : usr=62.04%, sys=453.58%, ctx=226902362, majf=16089, minf=23278105 With request-based IO path: read : io=2048.0MB, bw=52896KB/s, iops=105791 , runt= 39647msec write: io=2048.0MB, bw=57856KB/s, iops=115711 , runt= 36248msec read : io=3095.7MB, bw=52387KB/s, iops=104773 , runt= 60510msec write: io=3095.7MB, bw=57310KB/s, iops=114619 , runt= 55312msec clat (usec): min=0 , max=1532.6K, avg=142085.62, stdev=109196.84 clat (usec): min=0 , max=1487.4K, avg=129110.71, stdev=114973.64 clat (usec): min=0 , max=1388.6K, avg=145049.22, stdev=107232.55 clat (usec): min=0 , max=1465.9K, avg=133585.67, stdev=110322.95 cpu : usr=44.08%, sys=590.71%, ctx=451812322, majf=14841, minf=17648641 cpu : usr=48.73%, sys=610.78%, ctx=418953997, majf=22164, minf=26850689 cpu : usr=45.58%, sys=581.16%, ctx=714079216, majf=21497, minf=22558223 cpu : usr=48.40%, sys=599.65%, ctx=656089423, majf=16393, minf=23824409 How to use: - Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk use_bio=1' to enable -make_request_fn() based I/O path. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc:
Re: [PATCH V3 3/3] virtio-blk: Add bio-based IO path for virtio-blk
On 07/27/2012 08:33 AM, Rusty Russell wrote: On Fri, 13 Jul 2012 16:38:51 +0800, Asias He as...@redhat.com wrote: Add 'virtio_blk.use_bio=1' to kernel cmdline or 'modprobe virtio_blk use_bio=1' to enable -make_request_fn() based I/O path. This patch conflicts with Paolo's Bonzini's 'virtio-blk: allow toggling host cache between writeback and writethrough' which is also queued (see linux-next). Rebased against Paolo's patch in V4. I'm not sure what the correct behavior for bio cacheflush is, if any. REQ_FLUSH is not supported in the bio path. But as to the patch itself: it's a hack. 1) Leaving the guest's admin to turn on the switch is a terrible choice. 2) The block layer should stop merging and sorting when a device is fast, not the driver. 3) I pointed out that slow disks have low IOPS, so why is this conditional? Sure, more guest exits, but it's still a small number for a slow device. 4) The only case where we want merging is on a slow device when the host isn't doing it. Now, despite this, I'm prepared to commit it. But in my mind it's a hack: we should aim for use_bio to be based on a feature bit fed from the host, and use the module parameter only if we want to override it. OK. A feature bit from host sound like a choice but a switch is also needed on host side. And for other OS, e.g. Windows, the bio thing does not apply at all. Anyway, I have to admit that adding a module parameter here is not the best choice. Let's think more. -- Asias ___ Virtualization mailing list Virtualization@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/virtualization