Re: [PATCH v1 1/1] Submit the codes for QEMU disk I/O limits.
On Mon, Jul 25, 2011 at 1:40 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Jul 25, 2011 at 5:25 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Fri, Jul 22, 2011 at 6:54 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 22, 2011 at 10:20 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com wrote: +static void bdrv_block_timer(void *opaque) +{ + BlockDriverState *bs = opaque; + BlockQueue *queue = bs-block_queue; + uint64_t intval = 1; + + while (!QTAILQ_EMPTY(queue-requests)) { + BlockIORequest *request; + int ret; + + request = QTAILQ_FIRST(queue-requests); + QTAILQ_REMOVE(queue-requests, request, entry); + + ret = queue-handler(request); Please remove the function pointer and call qemu_block_queue_handler() directly. This indirection is not needed and makes it harder to follow the code. Should it keep the same style with other queue implemetations such as network queue? As you have known, network queue has one queue handler. You mean net/queue.c:queue-deliver? There are two deliver functions, yeah qemu_deliver_packet() and qemu_vlan_deliver_packet(), which is why a function pointer is necessary. OK, The handler has been removed and invoked directly. In this case there is only one handler function so the indirection is not necessary. + if (ret == 0) { + QTAILQ_INSERT_HEAD(queue-requests, request, entry); + break; + } + + qemu_free(request); + } + + qemu_mod_timer(bs-block_timer, (intval * 10) + qemu_get_clock_ns(vm_clock)); intval is always 1. The timer should be set to the next earliest deadline. pls see bdrv_aio_readv/writev: + /* throttling disk read I/O */ + if (bs-io_limits != NULL) { + if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) { + ret = qemu_block_queue_enqueue(bs-block_queue, bs, bdrv_aio_readv, + sector_num, qiov, nb_sectors, cb, opaque); + qemu_mod_timer(bs-block_timer, wait_time + qemu_get_clock_ns(vm_clock)); The timer has been modified when the blk request is enqueued. The current algorithm seems to be: 1. Queue requests that exceed the limit and set the timer. 2. Dequeue all requests when the timer fires. Yeah, but for each dequeued requests, it will be still determined if current I/O rate exceed that limits, if yes, it will still be enqueued into block queue again. 3. Set 1s periodic timer. Why is the timer set up as a periodic 1 second timer in bdrv_block_timer()? I was aslo considering if we need to set up this type of timer before. Since the timer has been modified after qemu_block_queue_enqueue() function, this timer should be not modified here. I will remove this line of line. thanks. + bs-slice_start[is_write] = real_time; + + bs-io_disps-bytes[is_write] = 0; + if (bs-io_limits-bps[2]) { + bs-io_disps-bytes[!is_write] = 0; + } All previous data should be discarded when a new time slice starts: bs-io_disps-bytes[IO_LIMIT_READ] = 0; bs-io_disps-bytes[IO_LIMIT_WRITE] = 0; If only bps_rd is specified, bs-io_disps-bytes[IO_LIMIT_WRITE] will never be used; i think that it should not be cleared here. right? I think there is no advantage in keeping slices separate for read/write. The code would be simpler and work the same if there is only one slice and past history is cleared when a new slice starts. Done Stefan -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 1/1] Submit the codes for QEMU disk I/O limits.
On Fri, Jul 22, 2011 at 6:54 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 22, 2011 at 10:20 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com wrote: +static void bdrv_block_timer(void *opaque) +{ + BlockDriverState *bs = opaque; + BlockQueue *queue = bs-block_queue; + uint64_t intval = 1; + + while (!QTAILQ_EMPTY(queue-requests)) { + BlockIORequest *request; + int ret; + + request = QTAILQ_FIRST(queue-requests); + QTAILQ_REMOVE(queue-requests, request, entry); + + ret = queue-handler(request); Please remove the function pointer and call qemu_block_queue_handler() directly. This indirection is not needed and makes it harder to follow the code. + if (ret == 0) { + QTAILQ_INSERT_HEAD(queue-requests, request, entry); + break; + } + + qemu_free(request); + } + + qemu_mod_timer(bs-block_timer, (intval * 10) + qemu_get_clock_ns(vm_clock)); intval is always 1. The timer should be set to the next earliest deadline. +} + void bdrv_register(BlockDriver *bdrv) { if (!bdrv-bdrv_aio_readv) { @@ -476,6 +508,11 @@ static int bdrv_open_common(BlockDriverState *bs, const char *filename, goto free_and_fail; } + /* throttling disk I/O limits */ + if (bs-io_limits != NULL) { + bs-block_queue = qemu_new_block_queue(qemu_block_queue_handler); + } + #ifndef _WIN32 if (bs-is_temporary) { unlink(filename); @@ -642,6 +679,16 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, bs-change_cb(bs-change_opaque, CHANGE_MEDIA); } + /* throttling disk I/O limits */ + if (bs-io_limits != NULL) { block_queue is allocated in bdrv_open_common() but these variables are initialized in bdrv_open(). Can you move them together, I don't see a reason to keep them apart? + bs-io_disps = qemu_mallocz(sizeof(BlockIODisp)); + bs-block_timer = qemu_new_timer_ns(vm_clock, bdrv_block_timer, bs); + qemu_mod_timer(bs-block_timer, qemu_get_clock_ns(vm_clock)); Why is the timer being scheduled immediately? There are no queued requests. + + bs-slice_start[0] = qemu_get_clock_ns(vm_clock); + bs-slice_start[1] = qemu_get_clock_ns(vm_clock); + } + return 0; unlink_and_fail: @@ -680,6 +727,15 @@ void bdrv_close(BlockDriverState *bs) if (bs-change_cb) bs-change_cb(bs-change_opaque, CHANGE_MEDIA); } + + /* throttling disk I/O limits */ + if (bs-block_queue) { + qemu_del_block_queue(bs-block_queue); 3 space indent, should be 4 spaces. + } + + if (bs-block_timer) { qemu_free_timer() will only free() the timer memory but it will not cancel the timer. Use qemu_del_timer() first to ensure that the timer is not pending: qemu_del_timer(bs-block_timer); + qemu_free_timer(bs-block_timer); + } } void bdrv_close_all(void) @@ -1312,6 +1368,13 @@ void bdrv_get_geometry_hint(BlockDriverState *bs, *psecs = bs-secs; } +/* throttling disk io limits */ +void bdrv_set_io_limits(BlockDriverState *bs, + BlockIOLimit *io_limits) +{ + bs-io_limits = io_limits; This function takes ownership of io_limits but never frees it. I suggest not taking ownership and just copying io_limits into bs-io_limits so that the caller does not need to malloc() and the lifecycle of bs-io_limits is completely under our control. Easiest would be to turn BlockDriverState.io_limits into: BlockIOLimit io_limits; and just copy in bdrv_set_io_limits(): bs-io_limits = *io_limits; bdrv_exceed_io_limits() returns quite quickly if no limits are set, so I wouldn't worry about optimizing it out yet. +} + /* Recognize floppy formats */ typedef struct FDFormat { FDriveType drive; @@ -2111,6 +2174,154 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, QEMUSnapshotInfo *sn) return buf; } +bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, + bool is_write, uint64_t *wait) { + int64_t real_time; + uint64_t bps_limit = 0; + double bytes_limit, bytes_disp, bytes_res, elapsed_time; + double slice_time = 0.1, wait_time; + + if (bs-io_limits-bps[2]) { + bps_limit = bs-io_limits-bps[2]; Please define a constant (IO_LIMIT_TOTAL?) instead of using magic number 2. + } else if (bs-io_limits-bps[is_write]) { + bps_limit = bs-io_limits-bps[is_write]; + } else { + if (wait) { + *wait = 0; + } + + return false; + } + + real_time = qemu_get_clock_ns(vm_clock); + if (bs-slice_start[is_write] + 1 = real_time) { Please define a constant for the 100 ms time slice instead of using 1 directly. + bs-slice_start[is_write] = real_time; + + bs-io_disps-bytes[is_write] = 0; +
[PATCH] qemu-kvm: Remove qemu_system_is_ready side-channel
From: Jan Kiszka jan.kis...@siemens.com kvm_add/remove_ioport_region need to know if the change is done during init or while the system is running. We added qemu_system_is_ready for this purpose which exported qemu_system_ready. The latter will be gone soon, but we can also obtain the information hot-plug or not from the callers of those services. They can retrieve it from the associated qdev device state. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- cpus.c |5 - cpus.h |1 - hw/device-assignment.c |6 -- qemu-kvm.c | 12 +++- qemu-kvm.h |6 -- 5 files changed, 15 insertions(+), 15 deletions(-) diff --git a/cpus.c b/cpus.c index 007345a..3035314 100644 --- a/cpus.c +++ b/cpus.c @@ -687,11 +687,6 @@ void qemu_main_loop_start(void) qemu_cond_broadcast(qemu_system_cond); } -bool qemu_system_is_ready(void) -{ -return qemu_system_ready; -} - void run_on_cpu(CPUState *env, void (*func)(void *data), void *data) { struct qemu_work_item wi; diff --git a/cpus.h b/cpus.h index f6d37af..f42b54e 100644 --- a/cpus.h +++ b/cpus.h @@ -7,7 +7,6 @@ void qemu_main_loop_start(void); void resume_all_vcpus(void); void pause_all_vcpus(void); void cpu_stop_current(void); -bool qemu_system_is_ready(void); void cpu_synchronize_all_states(void); void cpu_synchronize_all_post_reset(void); diff --git a/hw/device-assignment.c b/hw/device-assignment.c index 5c24c78..4a3e9b5 100644 --- a/hw/device-assignment.c +++ b/hw/device-assignment.c @@ -277,7 +277,8 @@ static void assigned_dev_ioport_map(PCIDevice *pci_dev, int region_num, addr, region-u.r_baseport, type, size, region_num); if (first_map region-region-resource_fd 0) { -r = kvm_add_ioport_region(region-u.r_baseport, region-r_size); +r = kvm_add_ioport_region(region-u.r_baseport, region-r_size, + pci_dev-qdev.hotplugged); if (r 0) { fprintf(stderr, %s: failed to enable ioport access (%m)\n, __func__); @@ -676,7 +677,8 @@ static void free_assigned_device(AssignedDevice *dev) } if (pci_region-type IORESOURCE_IO) { if (pci_region-resource_fd 0) { -kvm_remove_ioport_region(region-u.r_baseport, region-r_size); +kvm_remove_ioport_region(region-u.r_baseport, region-r_size, + dev-dev.qdev.hotplugged); } } else if (pci_region-type IORESOURCE_MEM) { if (region-u.r_virtbase) { diff --git a/qemu-kvm.c b/qemu-kvm.c index 80cc077..4a10616 100644 --- a/qemu-kvm.c +++ b/qemu-kvm.c @@ -589,7 +589,8 @@ static void do_set_ioport_access(void *data) } } -int kvm_add_ioport_region(unsigned long start, unsigned long size) +int kvm_add_ioport_region(unsigned long start, unsigned long size, + bool is_hot_plug) { KVMIOPortRegion *region = qemu_mallocz(sizeof(KVMIOPortRegion)); CPUState *env; @@ -600,12 +601,12 @@ int kvm_add_ioport_region(unsigned long start, unsigned long size) region-status = 1; QLIST_INSERT_HEAD(ioport_regions, region, entry); -if (qemu_system_is_ready()) { +if (is_hot_plug) { for (env = first_cpu; env != NULL; env = env-next_cpu) { run_on_cpu(env, do_set_ioport_access, region); if (region-status 0) { r = region-status; -kvm_remove_ioport_region(start, size); +kvm_remove_ioport_region(start, size, is_hot_plug); break; } } @@ -613,7 +614,8 @@ int kvm_add_ioport_region(unsigned long start, unsigned long size) return r; } -int kvm_remove_ioport_region(unsigned long start, unsigned long size) +int kvm_remove_ioport_region(unsigned long start, unsigned long size, + bool is_hot_unplug) { KVMIOPortRegion *region, *tmp; CPUState *env; @@ -623,7 +625,7 @@ int kvm_remove_ioport_region(unsigned long start, unsigned long size) if (region-start == start region-size == size) { region-status = 0; } -if (qemu_system_is_ready()) { +if (is_hot_unplug) { for (env = first_cpu; env != NULL; env = env-next_cpu) { run_on_cpu(env, do_set_ioport_access, region); } diff --git a/qemu-kvm.h b/qemu-kvm.h index 845880e..90bcca9 100644 --- a/qemu-kvm.h +++ b/qemu-kvm.h @@ -252,8 +252,10 @@ void kvm_tpr_access_report(CPUState *env, uint64_t rip, int is_write); int kvm_arch_init_irq_routing(void); -int kvm_add_ioport_region(unsigned long start, unsigned long size); -int kvm_remove_ioport_region(unsigned long start, unsigned long size); +int kvm_add_ioport_region(unsigned long start, unsigned long size, + bool is_hot_plug); +int kvm_remove_ioport_region(unsigned long start, unsigned long
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Anthony, On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguori anth...@codemonkey.ws wrote: lguest already does this and lives in the kernel. Does Lguest have SMP, usermode networking, and GUI support? On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguori anth...@codemonkey.ws wrote: So purely from a kernel perspective, why have two tools in the tree that do the same thing? Shouldn't you at least unify the userspace with the lguest userspace? Are you talking about Documentation/lguest/lguest.c? How would you suggest we unify our code with that? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 10:27 AM, Pekka Enberg wrote: Hi Anthony, On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguorianth...@codemonkey.ws wrote: lguest already does this and lives in the kernel. Does Lguest have SMP, usermode networking, and GUI support? IIRC, yes, no, and no. On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguorianth...@codemonkey.ws wrote: So purely from a kernel perspective, why have two tools in the tree that do the same thing? Shouldn't you at least unify the userspace with the lguest userspace? Are you talking about Documentation/lguest/lguest.c? How would you suggest we unify our code with that? It should be easy to have tools/kvm drive lguest - they're both virtio based. All you need to do is provide yet another ops structure to drive the two ABIs. I guess lguest.c has to remain, as point of lguest was a simple teaching aid for virtualization (which doesn't work very well, as the techniques it uses are obsolete). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Jan, On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: I've read several times now that developing in a single tree leads to better results. Can you provide some example from the QEMU/KVM projects where the split is preventing innovation, optimizations, or some other kind of progress? I really don't follow the Qemu project well enough to comment on what your biggest pain points are there. As for tools/kvm, it's pretty obvious by now that we want tighter integration with perf and the tracing facilities (and share code!), for example so for us merging to mainline is important. On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: Let's assume this gets merged. What will change for people working on generic or x86 KVM features? Will they sooner or later be asked to provide corresponding bits also for the tools folder? Having worked on two userlands over the past years (QEMU and qemu-kvm), still contributing to finally unify them, I'm wondering if this will end up repeating the painful history. I'm not a KVM maintainer so I really don't know how tools/kvm will change any of the above things. I'm certainly not in a position to *require* anyone to update tools/kvm when they add new KVM features but it'd sure be great. On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: That said, I definitely appreciate the bug fixes as well as code and documentation improvements for KVM that originate from this effort! I'm just not convinced that writing a new userland and merging it into the kernel is the most efficient way to achieve that. Just to make this crystal clear for everyone: if it weren't for tools/kvm, I wouldn't be hacking on KVM at all. I've looked at Qemu in the past (and a lot recently!) and I simply don't see myself contributing to it, sorry. So 'most efficient' or not, I think tools/kvm is a net win for Linux and KVM in general. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Avi, On Mon, Jul 25, 2011 at 10:36 AM, Avi Kivity a...@redhat.com wrote: Are you talking about Documentation/lguest/lguest.c? How would you suggest we unify our code with that? It should be easy to have tools/kvm drive lguest - they're both virtio based. All you need to do is provide yet another ops structure to drive the two ABIs. Right, we could do that. But for what purpose? I thought Lguest ABI was mostly for educational and testing purposes. On Mon, Jul 25, 2011 at 10:36 AM, Avi Kivity a...@redhat.com wrote: I guess lguest.c has to remain, as point of lguest was a simple teaching aid for virtualization (which doesn't work very well, as the techniques it uses are obsolete). Yup, I don't see point in merging lguest.c with tools/kvm. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, 2011-07-25 at 01:12 +0200, Jan Kiszka wrote: On 2011-07-24 22:37, Pekka Enberg wrote: Hi Linus, Please consider pulling from ssh://master.kernel.org/pub/scm/linux/kernel/git/penberg/linux.git kvm-tool-for-linus to merge the Native Linux KVM tool to Linux 3.1. [ The changes to 9p headers were already merged but show up in the pull request. ] The goal of this tool is to provide a clean, from-scratch, lightweight KVM host tool implementation that can boot Linux guest images with no BIOS dependencies and with only the minimal amount of legacy device emulation. The primary focus of the tool is to Linux but there are already people on working on supporting GRUB and other operating systems. We want the tool to be part of Linux kernel source tree because we believe that ‘perf’ clearly showed the benefits of a single repository for both kernel and userspace components. See Ingo Molnar’s email that started the project for details: http://thread.gmane.org/gmane.linux.kernel/962051/focus=962620 I've read several times now that developing in a single tree leads to better results. Can you provide some example from the QEMU/KVM projects where the split is preventing innovation, optimizations, or some other kind of progress? Anthony had a talk on last years KVM forum regarding the QEMU threading model (slide: http://www.linux-kvm.org/wiki/images/7/70/2010-forum-threading-qemu.pdf) . It was suggested that the KVM part of QEMU is having a hard time achieving the ideal threading model due to its need to support TCG - something which has nothing to do with KVM itself. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Pekka Enberg penb...@kernel.org wrote: On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: That said, I definitely appreciate the bug fixes as well as code and documentation improvements for KVM that originate from this effort! I'm just not convinced that writing a new userland and merging it into the kernel is the most efficient way to achieve that. Just to make this crystal clear for everyone: if it weren't for tools/kvm, I wouldn't be hacking on KVM at all. I've looked at Qemu in the past (and a lot recently!) and I simply don't see myself contributing to it, sorry. So 'most efficient' or not, I think tools/kvm is a net win for Linux and KVM in general. Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I can do that without downloading any (inevitably outdated) virtualization images or maintaining my own ones. Maintaining host userspace is more than enough for me. So, since we already have the lguest tool in the kernel tree, why cannot we have the much more capable tools/kvm/ in the tree? So while it is the Qemu folks' right to oppose tools/qemu/, i don't see why they are opposing tools/kvm/ ... Wrt. integration with lguest - this is a new argument that was not brought up before (i wish people would not come up with new requirements on the day of the pull request) - i don't see how it's relevant really: lguest was designed for legacy CPUs and tools/kvm/ is precisely about being simple and not doing legacy stuff. If then Qemu should be the project that integrates lguest. Is anyone on the Qemu side looking at lguest integration? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, 2011-07-25 at 10:36 +0300, Avi Kivity wrote: On 07/25/2011 10:27 AM, Pekka Enberg wrote: Hi Anthony, On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguorianth...@codemonkey.ws wrote: lguest already does this and lives in the kernel. Does Lguest have SMP, usermode networking, and GUI support? IIRC, yes, no, and no. Theres no SMP support in lguest. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] skbuff: clear tx zero-copy flag
On Mon, Jul 25, 2011 at 08:42:00AM +0800, Herbert Xu wrote: Shirley Ma mashi...@us.ibm.com wrote: This patch clears tx zero-copy flag as needed. Sign-off-by: Shirley Ma x...@us.ibm.com I think we also need to copy and clear this flag on the splice read path as that takes a direct page reference. I hope there isn't any other path that does this. Cheers, When there's a way for an skb to get into the host networking stack, (e.g. when tap gains zero copy support) we'll need to handle that. However macvtap passes an skb directly to the lower device, so as long as macvtap is the only user of that interface, we are fine I think - there's no way for an skb to get from macvtap to splice read path I think. Right? -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] kvm tools, qcow: Add support for writing to zero refcount clusters
Am 24.07.2011 21:45, schrieb Pekka Enberg: This patch adds support for writing to zero refcount clusters. Refcount blocks are cached in like L2 tables and flushed upon VIRTIO_BLK_T_FLUSH and when evicted from the LRU cache. With this patch applied, 'qemu-img check' no longer complains about referenced clusters with zero reference count after dd if=/dev/zero of=/mnt/tmp where '/mnt' is freshly generated QCOW2 image. Cc: Asias He asias.he...@gmail.com Cc: Cyrill Gorcunov gorcu...@gmail.com Cc: Ingo Molnar mi...@elte.hu Cc: Kevin Wolf kw...@redhat.com Cc: Prasad Joshi prasadjoshi...@gmail.com Cc: Sasha Levin levinsasha...@gmail.com Signed-off-by: Pekka Enberg penb...@kernel.org Looks okay, except that in case of a crash you'll most likely corrupt the image because the order in which refcounts and mapping are written out is completely undefined. For a reliable implementation you need to make sure that for cluster allocation you first write out the refcount update, then fsync, then write out the mapping. Otherwise if the mapping is written out first and then a crash happens, you'll have clusters that are used, but marked free, so that in the next run a second cluster can be mapped to the same location. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Asias He asias.he...@gmail.com wrote: On Mon, Jul 25, 2011 at 3:36 PM, Avi Kivity a...@redhat.com wrote: On 07/25/2011 10:27 AM, Pekka Enberg wrote: Hi Anthony, On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguorianth...@codemonkey.ws wrote: lguest already does this and lives in the kernel. Does Lguest have SMP, usermode networking, and GUI support? IIRC, yes, no, and no. And Lguest only supports 32-bit kernel. The answer to SMP is 'no' as well, as lguest supports only a single CPU: [Documentation/virtual/lguest/lguest.c] /* We're CPU 0. In fact, that's the only CPU possible right now. */ cpu_id = 0; Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 09:53, Ingo Molnar wrote: * Pekka Enberg penb...@kernel.org wrote: On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: That said, I definitely appreciate the bug fixes as well as code and documentation improvements for KVM that originate from this effort! I'm just not convinced that writing a new userland and merging it into the kernel is the most efficient way to achieve that. Just to make this crystal clear for everyone: if it weren't for tools/kvm, I wouldn't be hacking on KVM at all. I've looked at Qemu in the past (and a lot recently!) and I simply don't see myself contributing to it, sorry. So 'most efficient' or not, I think tools/kvm is a net win for Linux and KVM in general. Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 does the exact same thing. If that's too much typing for you, make it a bash alias. I can do that without downloading any (inevitably outdated) virtualization images or maintaining my own ones. Maintaining host userspace is more than enough for me. Who would need images? I usually only run -kernel and -initrd directly to test out things. Or if I really want to boot into a real system I do -snapshot /dev/sda. So, since we already have the lguest tool in the kernel tree, why cannot we have the much more capable tools/kvm/ in the tree? Lguest is in Documentation/ for a reason. It's not meant as a user space tool that you take as-is and use. It's meant for documenting how lguest works in general. I admit though, that that's also the reason people don't use it :). So while it is the Qemu folks' right to oppose tools/qemu/, i don't see why they are opposing tools/kvm/ ... In your logic, you would put all of the GNU utils into tools/. This is just plain insane. There's a reason we have the split between kernel and user space. What does putting them into the same tree bring you? Fame? Glorious stats on kernel commits? Seriously, it's a separate project. It's not the kernel. Wrt. integration with lguest - this is a new argument that was not brought up before (i wish people would not come up with new requirements on the day of the pull request) - i don't see how it's relevant really: lguest was designed for legacy CPUs and tools/kvm/ is precisely about being simple and not doing legacy stuff. If then Qemu should be the project that integrates lguest. Is anyone on the Qemu side looking at lguest integration? I don't think lguest integration makes sense for anyone or anything. Lguest is a toy, so let it be the toy it wants to be. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 4:19 AM, Anthony Liguorianth...@codemonkey.ws wrote: lguest already does this and lives in the kernel. On 07/25/2011 10:27 AM, Pekka Enberg wrote: Does Lguest have SMP, usermode networking, and GUI support? On Mon, Jul 25, 2011 at 3:36 PM, Avi Kivity a...@redhat.com wrote: IIRC, yes, no, and no. * Asias He asias.he...@gmail.com wrote: And Lguest only supports 32-bit kernel. On Mon, Jul 25, 2011 at 11:11 AM, Ingo Molnar mi...@elte.hu wrote: The answer to SMP is 'no' as well, as lguest supports only a single CPU: [Documentation/virtual/lguest/lguest.c] /* We're CPU 0. In fact, that's the only CPU possible right now. */ cpu_id = 0; Uhm - so why bring up Lguest at all in this pull request? Anthony, what am I missing here? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Alexander, On Mon, Jul 25, 2011 at 11:14 AM, Alexander Graf ag...@suse.de wrote: Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. Damn you Ingo Molnar, I knew you'd somehow get all the credit for our hard work! ;-) More seriously, though, I fail to see what's bothering you Alexander. I and Ingo already mentioned we wouldn't be hacking on Qemu even if there wasn't no tools/kvm. It's not as if we're putting *your* user space code into the kernel tree - we wrote our own! What's wrong with that? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Alexander, On Mon, Jul 25, 2011 at 11:14 AM, Alexander Graf ag...@suse.de wrote: So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 does the exact same thing. If that's too much typing for you, make it a bash alias. You know, they said the same thing about oprofile. All you needed to do was to write few simple shell scripts to make it work. One of the key features of tools/kvm is 'as little configuration as possible' and I can assure you that bash alias is really not a solution for that. [ Yes, yes, I know people apparently use virtio-manager for lauching Qemu so they don't need to figure out the setup themselves. But now you have *three* separate components (KVM, Qemu, virtio-manager) which is a completely different direction we're taking. Hell, we even went ahead and wrote our own mini-BIOS just to keep things in one unified tree. ] Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 10:23, Pekka Enberg wrote: Hi Alexander, On Mon, Jul 25, 2011 at 11:14 AM, Alexander Graf ag...@suse.de wrote: Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. Damn you Ingo Molnar, I knew you'd somehow get all the credit for our hard work! ;-) Well, IIUC he's the one initiating the whole thing, no? More seriously, though, I fail to see what's bothering you Alexander. I and Ingo already mentioned we wouldn't be hacking on Qemu even if there wasn't no tools/kvm. It's not as if we're putting *your* user space code into the kernel tree - we wrote our own! What's wrong with that? Nothing. I like competition. But why push it into the kernel? It's not a kernel, it's not a library the kernels needs for internal stuff. So why would it have to be in there? In Ingo's reasoning, the next step would be to rewrite glibc and put it into the kernel tree, because we end up adding syscalls so adding them to the in-kernel libc with the same commit would be a lot easier and cleaner. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 09:50, Sasha Levin wrote: On Mon, 2011-07-25 at 01:12 +0200, Jan Kiszka wrote: On 2011-07-24 22:37, Pekka Enberg wrote: Hi Linus, Please consider pulling from ssh://master.kernel.org/pub/scm/linux/kernel/git/penberg/linux.git kvm-tool-for-linus to merge the Native Linux KVM tool to Linux 3.1. [ The changes to 9p headers were already merged but show up in the pull request. ] The goal of this tool is to provide a clean, from-scratch, lightweight KVM host tool implementation that can boot Linux guest images with no BIOS dependencies and with only the minimal amount of legacy device emulation. The primary focus of the tool is to Linux but there are already people on working on supporting GRUB and other operating systems. We want the tool to be part of Linux kernel source tree because we believe that ‘perf’ clearly showed the benefits of a single repository for both kernel and userspace components. See Ingo Molnar’s email that started the project for details: http://thread.gmane.org/gmane.linux.kernel/962051/focus=962620 I've read several times now that developing in a single tree leads to better results. Can you provide some example from the QEMU/KVM projects where the split is preventing innovation, optimizations, or some other kind of progress? Anthony had a talk on last years KVM forum regarding the QEMU threading model (slide: http://www.linux-kvm.org/wiki/images/7/70/2010-forum-threading-qemu.pdf) . It was suggested that the KVM part of QEMU is having a hard time achieving the ideal threading model due to its need to support TCG - something which has nothing to do with KVM itself. Sure, but TCG is a really great part of Qemu. And I'm actually one of the few people who believe that in the KVM community ;). Recently, I sat down for a couple of months and implemented s390x emulation with TCG in Qemu. You know why? We had KVM support (device emulation, KVM backend) for s390x for more than a year at that point. People constantly broke stuff in the s390x code paths, because they couldn't test it - people seemed to not have too many mainframes lying around at home. With the emulation target, I could make sure that everyone's able to at least compile and run the s390x specific parts and make sure he can do regression testing. So yes, while TCG is keeping us back in certain areas, it's tremendously useful in others. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 11:31 AM, Alexander Graf wrote: In Ingo's reasoning, the next step would be to rewrite glibc and put it into the kernel tree, because we end up adding syscalls so adding them to the in-kernel libc with the same commit would be a lot easier and cleaner. That actually makes a ton of sense. One immediate win would be that klibc can be tuned to the kernel it ships with (the dynamic loader will pick the correct object), so less #ifdef trees. Another would be to make klibc the formal kernel interface, which allows us to reimplement an older interface in terms of the one that supercedes it. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Alexander, On Mon, Jul 25, 2011 at 11:31 AM, Alexander Graf ag...@suse.de wrote: Damn you Ingo Molnar, I knew you'd somehow get all the credit for our hard work! ;-) Well, IIUC he's the one initiating the whole thing, no? As much as I appreciate Ingo's help and support with the project, no, I don't consider him to have initiated the whole thing. Yes, it's his idea and that's what got me into hacking on the thing in the first place. But calling this Ingo's crusade is somewhat missing the point. It's really people like Asias, Sasha, Cyrill, and Prashad who have made all the heavy lifting. On Mon, Jul 25, 2011 at 11:31 AM, Alexander Graf ag...@suse.de wrote: More seriously, though, I fail to see what's bothering you Alexander. I and Ingo already mentioned we wouldn't be hacking on Qemu even if there wasn't no tools/kvm. It's not as if we're putting *your* user space code into the kernel tree - we wrote our own! What's wrong with that? Nothing. I like competition. But why push it into the kernel? It's not a kernel, it's not a library the kernels needs for internal stuff. So why would it have to be in there? For the same reasons we want tools/perf to be there. On Mon, Jul 25, 2011 at 11:31 AM, Alexander Graf ag...@suse.de wrote: In Ingo's reasoning, the next step would be to rewrite glibc and put it into the kernel tree, because we end up adding syscalls so adding them to the in-kernel libc with the same commit would be a lot easier and cleaner. It's called klibc, btw. :-) Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 10:30, Pekka Enberg wrote: Hi Alexander, On Mon, Jul 25, 2011 at 11:14 AM, Alexander Graf ag...@suse.de wrote: So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 does the exact same thing. If that's too much typing for you, make it a bash alias. You know, they said the same thing about oprofile. All you needed to do was to write few simple shell scripts to make it work. One of the key features of tools/kvm is 'as little configuration as possible' and I can assure you that bash alias is really not a solution for that. I like perf. Really. But I still don't see why it wouldn't be able to live in its own tree either. [ Yes, yes, I know people apparently use virtio-manager for lauching Qemu so they don't need to figure out the setup themselves. But now you have *three* separate components (KVM, Qemu, virtio-manager) which is a completely Having the split between KVM and user space is necessary. One lives in kernel space, the other one in user space. The split between qemu and virt-manager is something I don't like either. But then again I don't like virt-manager :). But what does this have to do with pulling something into the kernel? And reimplementing something that's already been there? different direction we're taking. Hell, we even went ahead and wrote our own mini-BIOS just to keep things in one unified tree. ] Yes, making sure that you have even more non-working non-Linux OSs. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] skbuff: clear tx zero-copy flag
On Mon, Jul 25, 2011 at 11:07:43AM +0300, Michael S. Tsirkin wrote: However macvtap passes an skb directly to the lower device, so as long as macvtap is the only user of that interface, we are fine I think - there's no way for an skb to get from macvtap to splice read path I think. Right? Yes, as long as you can guarantee that the skb never loops back then you should be fine. However, does macvtap really bypass everything, including the qdisc layer? The qdisc layer is certainly capable of looping the skb back with the redirect action. Cheers, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 10:37, Pekka Enberg wrote: Hi Alexander, On Mon, Jul 25, 2011 at 11:31 AM, Alexander Graf ag...@suse.de wrote: Damn you Ingo Molnar, I knew you'd somehow get all the credit for our hard work! ;-) Well, IIUC he's the one initiating the whole thing, no? As much as I appreciate Ingo's help and support with the project, no, I don't consider him to have initiated the whole thing. Yes, it's his idea and that's what got me into hacking on the thing in the first place. But calling this Ingo's crusade is somewhat missing the point. It's really people like Asias, Sasha, Cyrill, and Prashad who have made all the heavy lifting. On Mon, Jul 25, 2011 at 11:31 AM, Alexander Graf ag...@suse.de wrote: More seriously, though, I fail to see what's bothering you Alexander. I and Ingo already mentioned we wouldn't be hacking on Qemu even if there wasn't no tools/kvm. It's not as if we're putting *your* user space code into the kernel tree - we wrote our own! What's wrong with that? Nothing. I like competition. But why push it into the kernel? It's not a kernel, it's not a library the kernels needs for internal stuff. So why would it have to be in there? For the same reasons we want tools/perf to be there. Yeah, I want a pony too. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Alexander, On Mon, Jul 25, 2011 at 11:37 AM, Alexander Graf ag...@suse.de wrote: different direction we're taking. Hell, we even went ahead and wrote our own mini-BIOS just to keep things in one unified tree. ] Yes, making sure that you have even more non-working non-Linux OSs. You know, I've been a Linux kernel hacker for more than five years now and I've spent way too much of my spare time to improve it. So yes, I care about Linux. I care about it a lot, actually. It's fair to say I care about Linux more than I care about it more than any other operating system out there. [ I thought the 'native Linux' part in 'native Linux KVM tool' was a dead giveaway, really. ] Now if people want to support other operating systems, that's cool and I'm happy to help out where I can. But I don't understand why people keep bringing non-Linux OSs as an argument for not merging tools/kvm into the Linux kernel tree. I mean really, did someone actually expect that a Linux kernel developer spends his weekends improving the state of Windows virtualization? And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Introduce QEMU_NEW()
qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Signed-off-by: Avi Kivity a...@redhat.com --- This is part of my memory API patchset, but doesn't really belong there. qemu-common.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index ba55719..66effa3 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -186,6 +186,9 @@ void qemu_free(void *ptr); char *qemu_strdup(const char *str); char *qemu_strndup(const char *str, size_t size); +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type + void qemu_mutex_lock_iothread(void); void qemu_mutex_unlock_iothread(void); -- 1.7.5.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 11:44 AM, Alexander Graf ag...@suse.de wrote: For the same reasons we want tools/perf to be there. Yeah, I want a pony too. I can contact the Linux Foundation to see if we can arrange that. Seriously, though, I don't understand your point. What is it? Do you not agree that perf benefited from being part of the Linux kernel tree? Or do you think it does not apply here? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 09:53, Ingo Molnar wrote: * Pekka Enberg penb...@kernel.org wrote: On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: That said, I definitely appreciate the bug fixes as well as code and documentation improvements for KVM that originate from this effort! I'm just not convinced that writing a new userland and merging it into the kernel is the most efficient way to achieve that. Just to make this crystal clear for everyone: if it weren't for tools/kvm, I wouldn't be hacking on KVM at all. I've looked at Qemu in the past (and a lot recently!) and I simply don't see myself contributing to it, sorry. So 'most efficient' or not, I think tools/kvm is a net win for Linux and KVM in general. Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. Not all their user space but the ones that are tightly integrated with the kernel to begin with. How hard is that concept to understand? Virtualization is very tightly bound to the kernel, like it or not. So is profiling, power management and a few other things. And when you do 'ls tools/' you'll see exactly those topics populated: earth5:~/tip ls tools/ firewire kvm perf power slub testing usb virtio [ In fact tools/virtio/ was merged upstream yesterday and putting that code there was a good call IMO. ] So just as there are good reasons to keep some user-space projects out of the kernel tree (while i'd love to see a usable mail client for Linux i dont think it belongs into tools/) there are similarly good reasons to keep some of them in the kernel tree. tools/kvm/ is an excellent example for that, just look at the code - it shares code with the kernel, uses the kernel coding style, is in significant part developed by kernel developers and it is being used by kernel developers. I wish there was a hackable tools/qemu/ base but there isnt and won't be any. The thing i *can* do is to help create a hackable virtualization tool. So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 does the exact same thing. If that's too much typing for you, make it a bash alias. This only boots the bzImage itself and panics when it would like to hit user-space. I can do that without downloading any (inevitably outdated) virtualization images or maintaining my own ones. Maintaining host userspace is more than enough for me. Who would need images? I usually only run -kernel and -initrd directly to test out things. Or if I really want to boot into a real system I do -snapshot /dev/sda. that too panics here, while with tools/kvm/ i get to a prompt: [root@aldebaran ~]# uptime 08:42:13 up 0 min, 0 users, load average: 0.00, 0.00, 0.00 But i agree with you that obviously Qemu (or my usage of parameters, whichever is the problem) could be fixed to do this correctly. So, since we already have the lguest tool in the kernel tree, why cannot we have the much more capable tools/kvm/ in the tree? Lguest is in Documentation/ for a reason. It's not meant as a user space tool that you take as-is and use. It's meant for documenting how lguest works in general. I admit though, that that's also the reason people don't use it :). There's an obvious contrast with the request to merge tools/kvm/ to lguest (not brought up by me) and your description of lguest only being for documentation purposes, not to be used really, etc. I wanted to highlight this contrast, so if you thought we disagree much about lguest i don't think we do. So while it is the Qemu folks' right to oppose tools/qemu/, i don't see why they are opposing tools/kvm/ ... In your logic, you would put all of the GNU utils into tools/. This is just plain insane. There's a reason we have the split between kernel and user space. What does putting them into the same tree bring you? Fame? Glorious stats on kernel commits? Seriously, it's a separate project. It's not the kernel. Where did i claim that *all* user-space projects need to move into the kernel tree? It's all case by case. Before you argue against a position and ridicule it you need to understand it. Thanks, Ingo -- To unsubscribe from this list:
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 10:51, Pekka Enberg wrote: On Mon, Jul 25, 2011 at 11:44 AM, Alexander Graf ag...@suse.de wrote: For the same reasons we want tools/perf to be there. Yeah, I want a pony too. I can contact the Linux Foundation to see if we can arrange that. Seriously, though, I don't understand your point. What is it? Do you not agree that perf benefited from being part of the Linux kernel tree? Or do you think it does not apply here? I disagree with the assumption that putting something into the kernel tree benefits the project (except for maybe bumping its visibility, so it potentially kills competition again). If however it's true, then we should analyze why exactly projects benefit from being in the kernel tree, so we can maybe find a good solution to achieve the same differently. The approach let's move everyting into tools/ simply doesn't scale. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Ingo Molnar mi...@elte.hu wrote: Virtualization is very tightly bound to the kernel, like it or not. So is profiling, power management and a few other things. And when you do 'ls tools/' you'll see exactly those topics populated: earth5:~/tip ls tools/ firewire kvm perf power slub testing usb virtio [ In fact tools/virtio/ was merged upstream yesterday and putting that code there was a good call IMO. ] Correction: tools/virtio/ was merged much earlier than that - in v2.6.37. Still the point remains: you cannot ridicule us moving user-space code to tools/ while virtio itself is moving code there ... Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 10:54, Ingo Molnar wrote: * Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 09:53, Ingo Molnar wrote: * Pekka Enberg penb...@kernel.org wrote: On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: That said, I definitely appreciate the bug fixes as well as code and documentation improvements for KVM that originate from this effort! I'm just not convinced that writing a new userland and merging it into the kernel is the most efficient way to achieve that. Just to make this crystal clear for everyone: if it weren't for tools/kvm, I wouldn't be hacking on KVM at all. I've looked at Qemu in the past (and a lot recently!) and I simply don't see myself contributing to it, sorry. So 'most efficient' or not, I think tools/kvm is a net win for Linux and KVM in general. Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. Not all their user space but the ones that are tightly integrated with the kernel to begin with. How hard is that concept to understand? It's very hard to understand. It's similar to religion - I could easily apply your point to every reasonably low-level user space project out there. X for example. X needs to interact with KMS and DRI and whatdoiknow. So it'd be a perfect fit to pull into tools/, no? Virtualization is very tightly bound to the kernel, like it or not. I don't see why. The whole point of virtualization is to model simple, with KVM preferably even very-close-to-real-hardware interfaces that allow you to keep things separate. So is profiling, power management and a few other things. I'm also failing to see the reasoning here TBH. And when you do 'ls tools/' you'll see exactly those topics populated: earth5:~/tip ls tools/ firewire kvm perf power slub testing usb virtio [ In fact tools/virtio/ was merged upstream yesterday and putting that code there was a good call IMO. ] So just as there are good reasons to keep some user-space projects out of the kernel tree (while i'd love to see a usable mail client for Linux i dont think it belongs into tools/) there are similarly good reasons to keep some of them in the kernel tree. tools/kvm/ is an excellent example for that, just look at the code - it shares code with the kernel, uses the kernel coding style, is in significant part developed by kernel developers and it is being used by kernel developers. Then make a separate tree, add the kernel as git submodule and simply pull in your library/header dependencies from there. Shouldn't be too hard, no? I wish there was a hackable tools/qemu/ base but there isnt and won't be any. The thing i *can* do is to help create a hackable virtualization tool. What's the problem with a code base that's hackable, but not in tools/? So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 does the exact same thing. If that's too much typing for you, make it a bash alias. This only boots the bzImage itself and panics when it would like to hit user-space. Well, sure. I can do that without downloading any (inevitably outdated) virtualization images or maintaining my own ones. Maintaining host userspace is more than enough for me. Who would need images? I usually only run -kernel and -initrd directly to test out things. Or if I really want to boot into a real system I do -snapshot /dev/sda. that too panics here, while with tools/kvm/ i get to a prompt: [root@aldebaran ~]# uptime 08:42:13 up 0 min, 0 users, load average: 0.00, 0.00, 0.00 But i agree with you that obviously Qemu (or my usage of parameters, whichever is the problem) could be fixed to do this correctly. Well, you need to make sure to pass the right partition into -append. No idea what ./kvm run does, but it's certainly something easily scriptable with any other virtualization user land. So, since we already have the lguest tool in the kernel tree, why cannot we have the much more capable tools/kvm/ in the tree? Lguest is in Documentation/ for a reason. It's not meant as a user space tool that you take as-is and use. It's meant for documenting how lguest works in general. I admit though, that that's also the reason people don't use it :).
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 10:47, Pekka Enberg wrote: Hi Alexander, On Mon, Jul 25, 2011 at 11:37 AM, Alexander Graf ag...@suse.de wrote: different direction we're taking. Hell, we even went ahead and wrote our own mini-BIOS just to keep things in one unified tree. ] Yes, making sure that you have even more non-working non-Linux OSs. You know, I've been a Linux kernel hacker for more than five years now and I've spent way too much of my spare time to improve it. So yes, I care about Linux. I care about it a lot, actually. It's fair to say I care about Linux more than I care about it more than any other operating system out there. [ I thought the 'native Linux' part in 'native Linux KVM tool' was a dead giveaway, really. ] Now if people want to support other operating systems, that's cool and I'm happy to help out where I can. But I don't understand why people keep bringing non-Linux OSs as an argument for not merging tools/kvm into the Linux kernel tree. I mean really, did someone actually expect that a Linux kernel developer spends his weekends improving the state of Windows virtualization? And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Then kvm-tool is about as useful as Mac-on-Linux. Why don't we have MoL user land in the kernel? I even added support for KVM to it about a year ago. So all I need to do is change it to the kernel coding style, add some dependencies on kernel headers and I'm good for a pull request? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Pekka Enberg penb...@kernel.org wrote: Hi Alexander, On Mon, Jul 25, 2011 at 11:14 AM, Alexander Graf ag...@suse.de wrote: Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. Damn you Ingo Molnar, I knew you'd somehow get all the credit for our hard work! ;-) Yeah, i'm really just a user and code reviewer of tools/kvm/, in terms of actual commits i'm barely on the radar compared to the massive contributions of Pekka, Sasha, Asias, Cyrill and others. More seriously, though, I fail to see what's bothering you Alexander. I and Ingo already mentioned we wouldn't be hacking on Qemu even if there wasn't no tools/kvm. It's not as if we're putting *your* user space code into the kernel tree - we wrote our own! What's wrong with that? That puzzles me as well. If tools/kvm/ is really such a bad approach then it will just wither and die, like one of the thousand stale drivers we have in the kernel - or like lguest it ends up as a cool hack that didnt quite make it. Judging by the many contributions tools/kvm/ has already attracted in its limited form an out of tree project i'd not bet on that outcome though. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Pekka Enberg penb...@kernel.org wrote: Hi Alexander, On Mon, Jul 25, 2011 at 11:31 AM, Alexander Graf ag...@suse.de wrote: Damn you Ingo Molnar, I knew you'd somehow get all the credit for our hard work! ;-) Well, IIUC he's the one initiating the whole thing, no? As much as I appreciate Ingo's help and support with the project, no, I don't consider him to have initiated the whole thing. [...] No, i havent - and if Alexander pulls the URI and looks at the changelogs he'll realize that fact too. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Pekka Enberg penb...@kernel.org wrote: [ I thought the 'native Linux' part in 'native Linux KVM tool' was a dead giveaway, really. ] Now if people want to support other operating systems, that's cool and I'm happy to help out where I can. But I don't understand why people keep bringing non-Linux OSs as an argument for not merging tools/kvm into the Linux kernel tree. I mean really, did someone actually expect that a Linux kernel developer spends his weekends improving the state of Windows virtualization? And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. In fact one of the problems i see with Qemu is that Qemu had to make many compromises to support Windows and other weird platforms that i'm (and i'd claim most other Linux kernel developers) are personally not interested in. So here's a 14 KLOC tools/kvm/ that does everything that i need from virtualization (easy testing of a bzImage before rebooting the host system into it), is clean, readable and hackable and is 1.5% of the nearly 1 MLOC Qemu code base. The whole tools/kvm/ code base is (much) smaller than Qemu's IA64 support code for example. The size difference is startling. tools/kvm/ does less and in my experience does it better - is that such a surprising thing? So it was a no brainer for me to pull it into -tip. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Introduce QEMU_NEW()
On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Introduce QEMU_NEW()
On Mon, 2011-07-25 at 11:32 +0200, Alexander Graf wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. It'll warn when you do silly things such as: struct some_struct *k; k = qemu_malloc(sizeof(k)); -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 11:26, Ingo Molnar wrote: * Pekka Enberg penb...@kernel.org wrote: [ I thought the 'native Linux' part in 'native Linux KVM tool' was a dead giveaway, really. ] Now if people want to support other operating systems, that's cool and I'm happy to help out where I can. But I don't understand why people keep bringing non-Linux OSs as an argument for not merging tools/kvm into the Linux kernel tree. I mean really, did someone actually expect that a Linux kernel developer spends his weekends improving the state of Windows virtualization? And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. In fact one of the problems i see with Qemu is that Qemu had to make many compromises to support Windows and other weird platforms that i'm (and i'd claim most other Linux kernel developers) are personally not interested in. It's what makes it so powerful. Adding a new architecture for KVM for example is as easy as only implementing the CPU. All device emulation is already there. If you want something Linux only, lguest would've been enough, no? The only real problem I see with Qemu is that it's been born in a time of uniprocessor systems. Making it fully threaded as an after-thought is insanely hard. So here's a 14 KLOC tools/kvm/ that does everything that i need from virtualization (easy testing of a bzImage before rebooting the host system into it), is clean, readable and hackable and is 1.5% of the nearly 1 MLOC Qemu code base. Sure, we'll talk again when you can run Android for ARM on kvm-tool. The whole tools/kvm/ code base is (much) smaller than Qemu's IA64 support code for example. The size difference is startling. That's the host code compiler. Qemu doesn't have IA64 guest support. tools/kvm/ does less and in my experience does it better - is that such a surprising thing? I already stated in a few mails before and also in the last discussion that I'm in general in favor of having multiple user space targets for KVM. In fact, I wrote 2 backends (both for PPC) myself. Though adding those 2 backends to their respective projects probably provided a lot more value-add (features that qemu didn't already have) than kvm-tool does atm. Kvm-tool really just smells a lot like NIH. Which is fine - that's why we have KDE and Gnome, right? ;) So it was a no brainer for me to pull it into -tip. The thing I don't agree with is that it should live in the kernel tree. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 10:54, Ingo Molnar wrote: * Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 09:53, Ingo Molnar wrote: * Pekka Enberg penb...@kernel.org wrote: On Mon, Jul 25, 2011 at 2:12 AM, Jan Kiszka jan.kis...@web.de wrote: That said, I definitely appreciate the bug fixes as well as code and documentation improvements for KVM that originate from this effort! I'm just not convinced that writing a new userland and merging it into the kernel is the most efficient way to achieve that. Just to make this crystal clear for everyone: if it weren't for tools/kvm, I wouldn't be hacking on KVM at all. I've looked at Qemu in the past (and a lot recently!) and I simply don't see myself contributing to it, sorry. So 'most efficient' or not, I think tools/kvm is a net win for Linux and KVM in general. Same here - in fact i first asked Qemu to be put into tools/qemu/ so that it all becomes more hackable and more usable - that suggestion was rebuked very strongly. So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. Not all their user space but the ones that are tightly integrated with the kernel to begin with. How hard is that concept to understand? It's very hard to understand. It's similar to religion - I could easily apply your point to every reasonably low-level user space project out there. X for example. X needs to interact with KMS and DRI and whatdoiknow. So it'd be a perfect fit to pull into tools/, no? If the graphics folks feel comfortable with that approach then yes, i think graphics would be a good example as well - in my experience a lot of the user-space pain related to graphics development comes from ugly version friction between various graphics components, and all the APIs are pretty fluid and heavily developed - which is easier to do within a single code repository. But if the Xorg/graphics folks want it separate it's their decision and they've said it in the past that they like the current modularization. If someone comes with tools/X11/ that actually *works* and provides a usable X11 replacement i sure would try it out and would likely contribute to it. Basic infrastructure and tooling - many projects could apply but the most obvious choices are ones that relate to and depend on the kernel. Browsers, email clients, GIMP, games, LibreOffice, not so much. There's no clear line but that's not a problem - there are seldom any clear lines in life. It's a case by case issue and very much depends on the attributes of the project and the preferences of the developers who are driving it. I can tell you my first hand experience: for tools/perf/ and tools/kvm/ it was highly beneficial to be developed in the kernel repo. I don't see how you can validly bring religion into this discussion. Virtualization is very tightly bound to the kernel, like it or not. I don't see why. The whole point of virtualization is to model simple, with KVM preferably even very-close-to-real-hardware interfaces that allow you to keep things separate. Look at tools/kvm/ - i cannot do anything useful without a Linux kernel under it. It's as deeply bound to the Linux kernel as it gets. Then look at the actual drivers and interfaces within tools/kvm/. It's using the same symbols and conventions for 'guest' and 'host' side. Check out tools/kvm/hw/i8042.c and match it up with include/linux/serio.h and drivers/input/serio/i8042.c - you can literally walk from one side to the other and understand how guest and host are tightly related not just functionality but also implementation wise. This is how Qemu should be doing it as well btw., to ease the debugging of host/guest interaction bugs and to ease development. So is profiling, power management and a few other things. I'm also failing to see the reasoning here TBH. You need to quote the full argument to see the reason in it: Virtualization is very tightly bound to the kernel, like it or not. So is profiling, power management and a few other things. It's a very simple point and observation: tools which integrate to the kernel so that they wouldnt even run on another kernel obviously are very natural to develop in tools/. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Introduce QEMU_NEW()
On 25.07.2011, at 11:37, Sasha Levin wrote: On Mon, 2011-07-25 at 11:32 +0200, Alexander Graf wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. It'll warn when you do silly things such as: struct some_struct *k; k = qemu_malloc(sizeof(k)); Hm - is there any way to get this without adding upper case C++'ish macros? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] skbuff: clear tx zero-copy flag
On Mon, Jul 25, 2011 at 04:40:57PM +0800, Herbert Xu wrote: On Mon, Jul 25, 2011 at 11:07:43AM +0300, Michael S. Tsirkin wrote: However macvtap passes an skb directly to the lower device, so as long as macvtap is the only user of that interface, we are fine I think - there's no way for an skb to get from macvtap to splice read path I think. Right? Yes, as long as you can guarantee that the skb never loops back then you should be fine. However, does macvtap really bypass everything, including the qdisc layer? The qdisc layer is certainly capable of looping the skb back with the redirect action. Cheers, No, I don't think macvtap bypasses the qdisc. Is the action in question here? static int tcf_mirred(struct sk_buff *skb, const struct tc_action *a, struct tcf_result *res) if yes that seems to always clone an skb, which in turn does the copy so we are fine? -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 11:41, Ingo Molnar wrote: Virtualization is very tightly bound to the kernel, like it or not. So is profiling, power management and a few other things. It's a very simple point and observation: tools which integrate to the kernel so that they wouldnt even run on another kernel obviously are very natural to develop in tools/. Ah, very good. So all we need to do to prove the point that kvm-tool doesn't belong in tools/ is port KVM to another OS and make kvm-tool compile there too? Shouldn't be too hard. People already have working ports of (old) KVM versions on FreeBSD and Windows. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 25 July 2011 10:32, Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. Yeah, while we're writing in C we should just stick to the C-like APIs, it's less confusing IMHO than wrapping it up in something else. I assume Anthony's new object model stuff will have a create me a new foo object API anyway, so QEMU_NEW() is possibly a bit of a namespace grab. -- PMM -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 12:41 PM, Ingo Molnar wrote: Look at tools/kvm/ - i cannot do anything useful without a Linux kernel under it. It's as deeply bound to the Linux kernel as it gets. The actual code that interacts with the kernel is pretty small, and will grow smaller (as a fraction) in time. Then look at the actual drivers and interfaces within tools/kvm/. It's using the same symbols and conventions for 'guest' and 'host' side. Check out tools/kvm/hw/i8042.c and match it up with include/linux/serio.h and drivers/input/serio/i8042.c - you can literally walk from one side to the other and understand how guest and host are tightly related not just functionality but also implementation wise. This is how Qemu should be doing it as well btw., to ease the debugging of host/guest interaction bugs and to ease development. No. That ties the guest and host interfaces together. If you change the guest to use a feature you didn't implement previously in the host, it's suddenly broken when virtualized. The right thing is to implement the hardware interface exactly per spec, and ignore the guest(s) completely except for testing and performance. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Introduce QEMU_NEW()
On 07/25/2011 12:43 PM, Alexander Graf wrote: Hm - is there any way to get this without adding upper case C++'ish macros? Switch to C++. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 12:48 PM, Peter Maydell wrote: On 25 July 2011 10:32, Alexander Grafag...@suse.de wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. Yeah, while we're writing in C we should just stick to the C-like APIs, it's less confusing IMHO than wrapping it up in something else. That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I assume Anthony's new object model stuff will have a create me a new foo object API anyway, so QEMU_NEW() is possibly a bit of a namespace grab. Anthony's stuff is at a much higher level, hopefully he'll come back to the ground one day. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 25.07.2011, at 11:52, Avi Kivity wrote: On 07/25/2011 12:48 PM, Peter Maydell wrote: On 25 July 2011 10:32, Alexander Grafag...@suse.de wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. Yeah, while we're writing in C we should just stick to the C-like APIs, it's less confusing IMHO than wrapping it up in something else. That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? I sympathize with Peter on the rationale that keeping interfaces aligned with how C APIs usually look like helps making the code more readable. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] skbuff: clear tx zero-copy flag
On Mon, Jul 25, 2011 at 12:44:14PM +0300, Michael S. Tsirkin wrote: if yes that seems to always clone an skb, which in turn does the copy so we are fine? Yes you're right, it should be safe. However, I think we should add a WARN_ON to the splice skb path so that should a packet find its way through a path that we haven't thought of then at least we'll know about it. Thanks, -- Email: Herbert Xu herb...@gondor.apana.org.au Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] skbuff: clear tx zero-copy flag
From: Herbert Xu herb...@gondor.hengli.com.au Date: Mon, 25 Jul 2011 17:57:11 +0800 However, I think we should add a WARN_ON to the splice skb path so that should a packet find its way through a path that we haven't thought of then at least we'll know about it. Good idea. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Avi Kivity a...@redhat.com wrote: On 07/25/2011 12:41 PM, Ingo Molnar wrote: Look at tools/kvm/ - i cannot do anything useful without a Linux kernel under it. It's as deeply bound to the Linux kernel as it gets. The actual code that interacts with the kernel is pretty small, and will grow smaller (as a fraction) in time. Look at the example i have given in my other mail, conceptual interaction goes beyond the strict KVM (and hardware) ABIs themselves - for every guest side driver there's a host side piece of functionality. What we found is that there's real benefits from tools/kvm/hw/i8042.c using the same conventions and constants as drivers/input/serio/i8042.c. The two pieces of code are obviously independent ABI-wise - but to developers working on both sides of the fence it's very useful to have the code look similar and familiar. Some other pieces of code share library functionality between the kernel and tools/kvm/. Then look at the actual drivers and interfaces within tools/kvm/. It's using the same symbols and conventions for 'guest' and 'host' side. Check out tools/kvm/hw/i8042.c and match it up with include/linux/serio.h and drivers/input/serio/i8042.c - you can literally walk from one side to the other and understand how guest and host are tightly related not just functionality but also implementation wise. This is how Qemu should be doing it as well btw., to ease the debugging of host/guest interaction bugs and to ease development. No. That ties the guest and host interfaces together. [...] Why would it tie the interfaces together? For a guest and host driver to be written in the same familiar conventions is useful for debugging and development easier but does not prevent both of those pieces to implement a precise interface. (Especially since in the i8042 case the host driver is used with real hardware as well.) [...] If you change the guest to use a feature you didn't implement previously in the host, it's suddenly broken when virtualized. The right thing is to implement the hardware interface exactly per spec, and ignore the guest(s) completely except for testing and performance. You can certainly do that too - and nothing prevents from the guest and host driver to look and feel similar. But note that for many of these things there no such thing as a 'hardware spec' that describes all the details and quirks - what we have are host PC drivers that have been implemented via many cycles of trial and error. That's especially true of i8042, the example i have given. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 25.07.2011, at 12:02, Avi Kivity wrote: On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. Only if you enforce them. The only sensible thing for QEMU_NEW (despite the general rule of upper case macros, I'd actually prefer this one to be lower case though since it's so often used) would be to remove qemu_malloc, declare malloc() as unusable and convert all users of qemu_malloc() to qemu_new(). Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On Mon, Jul 25, 2011 at 9:51 AM, Avi Kivity a...@redhat.com wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Signed-off-by: Avi Kivity a...@redhat.com --- This is part of my memory API patchset, but doesn't really belong there. qemu-common.h | 3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index ba55719..66effa3 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -186,6 +186,9 @@ void qemu_free(void *ptr); char *qemu_strdup(const char *str); char *qemu_strndup(const char *str, size_t size); +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type Does this mean we need to duplicate the type name for each allocation? struct foo *f; ... f = qemu_malloc(sizeof(*f)); Becomes: struct foo *f; ... f = QEMU_NEW(struct foo); If you ever change the name of the type you have to search-replace these instances. The idomatic C way works well, I don't see a reason to use QEMU_NEW(). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 01:04 PM, Alexander Graf wrote: On 25.07.2011, at 12:02, Avi Kivity wrote: On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. Only if you enforce them. The only sensible thing for QEMU_NEW (despite the general rule of upper case macros, I'd actually prefer this one to be lower case though since it's so often used) would be to remove qemu_malloc, declare malloc() as unusable and convert all users of qemu_malloc() to qemu_new(). Some qemu_mallocs() will remain (allocating a byte array or something variable sized). I agree qemu_new() will be nicer, but that will have to wait until Blue is several light-days away from Earth. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 01:06 PM, Stefan Hajnoczi wrote: char *qemu_strndup(const char *str, size_t size); +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type Does this mean we need to duplicate the type name for each allocation? struct foo *f; ... f = qemu_malloc(sizeof(*f)); Becomes: struct foo *f; ... f = QEMU_NEW(struct foo); If you ever change the name of the type you have to search-replace these instances. The idomatic C way works well, I don't see a reason to use QEMU_NEW(). It works as long as you don't make any mistakes: f = qemu_malloc(sizeof(*g)); f = qemu_malloc(sizeof(f)); qemu_malloc() returns a void pointer, these are poisonous. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Alexander Graf ag...@suse.de wrote: In fact one of the problems i see with Qemu is that Qemu had to make many compromises to support Windows and other weird platforms that i'm (and i'd claim most other Linux kernel developers) are personally not interested in. It's what makes it so powerful. [...] To me and Pekka that is what made Qemu unhackable. Really, i'm not sure why you are arguing here. We are not trying to merge tools/qemu/ upstream. We are trying to merge a Linux-only utility that lives in the kernel tree today and which we are actively using and developing. [...] Adding a new architecture for KVM for example is as easy as only implementing the CPU. All device emulation is already there. If you want something Linux only, lguest would've been enough, no? That's a rather bizarre argument, we were pretty happy with the design of the KVM host side, what we wanted to improve was user-space tooling. With lguest we'd have to write a new host implementation in essence ... [...] tools/kvm/ does less and in my experience does it better - is that such a surprising thing? [...] So it was a no brainer for me to pull it into -tip. The thing I don't agree with is that it should live in the kernel tree. FYI, tools/kvm/ *already* lives in the kernel tree - that is how it's developed and used and it also shares code with the kernel. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 01:03 PM, Ingo Molnar wrote: Then look at the actual drivers and interfaces within tools/kvm/. It's using the same symbols and conventions for 'guest' and 'host' side. Check out tools/kvm/hw/i8042.c and match it up with include/linux/serio.h and drivers/input/serio/i8042.c - you can literally walk from one side to the other and understand how guest and host are tightly related not just functionality but also implementation wise. This is how Qemu should be doing it as well btw., to ease the debugging of host/guest interaction bugs and to ease development. No. That ties the guest and host interfaces together. [...] Why would it tie the interfaces together? For a guest and host driver to be written in the same familiar conventions is useful for debugging and development easier but does not prevent both of those pieces to implement a precise interface. (Especially since in the i8042 case the host driver is used with real hardware as well.) The driver is under no requirement to use all of the functionality of the device. On the other hand, the device emulation has to be complete, or it risks not working with some past or future version of the driver, or with another OS (if you support that). [...] If you change the guest to use a feature you didn't implement previously in the host, it's suddenly broken when virtualized. The right thing is to implement the hardware interface exactly per spec, and ignore the guest(s) completely except for testing and performance. You can certainly do that too - and nothing prevents from the guest and host driver to look and feel similar. But note that for many of these things there no such thing as a 'hardware spec' that describes all the details and quirks - what we have are host PC drivers that have been implemented via many cycles of trial and error. That's especially true of i8042, the example i have given. The device need not implement any quirks - it can implement the spec exactly. As long as the driver can drive a standard device, all is fine. Specs for the 8042 certainly exist. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 25.07.2011, at 12:09, Avi Kivity wrote: On 07/25/2011 01:04 PM, Alexander Graf wrote: On 25.07.2011, at 12:02, Avi Kivity wrote: On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. Only if you enforce them. The only sensible thing for QEMU_NEW (despite the general rule of upper case macros, I'd actually prefer this one to be lower case though since it's so often used) would be to remove qemu_malloc, declare malloc() as unusable and convert all users of qemu_malloc() to qemu_new(). Some qemu_mallocs() will remain (allocating a byte array or something variable sized). Right. But then we really do need a check in checkpatch.pl to see if someone's using qemu_new for simple structs. I agree qemu_new() will be nicer, but that will have to wait until Blue is several light-days away from Earth. Blue, any disagreement on adding qemu_new() as a macro? Something used that often in upper case would seriously disturb the reading flow :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 12:16, Ingo Molnar wrote: So it was a no brainer for me to pull it into -tip. The thing I don't agree with is that it should live in the kernel tree. FYI, tools/kvm/ *already* lives in the kernel tree - that is how it's developed and used and it also shares code with the kernel. So you're just trying to make this a self-fulfilling prophecy now? I already stated before that I could easily add those 'requirements' to any project I liked. That still wouldn't mean it's a good idea to clutter the _kernel_ repository with its code. It's not a kernel. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 11:41, Ingo Molnar wrote: Virtualization is very tightly bound to the kernel, like it or not. So is profiling, power management and a few other things. It's a very simple point and observation: tools which integrate to the kernel so that they wouldnt even run on another kernel obviously are very natural to develop in tools/. Ah, very good. So all we need to do to prove the point that kvm-tool doesn't belong in tools/ is port KVM to another OS and make kvm-tool compile there too? Shouldn't be too hard. People already have working ports of (old) KVM versions on FreeBSD and Windows. Firstly, tools/kvm/ itself only works on Linux and it's developed in the kernel repo and we see many benefits of that model and are happy with it. Secondly, even your argument is rather inconsistent: by your argument what keeps KVM itself within the Linux kernel, if it's so portable to FreeBSD and Windows? By your argument all the virtio drivers should be moved out of the Linux kernel tree, to support both the FreeBSD, Windows and Linux KVM implementations. By your argument the arch/x86/ KVM disassembler should move out of the kernel tree, etc. etc. You cannot have it both ways really. So yes, i disagree with you rigid views about this, in reality what matters is the quality of the end result and the preferences of the developers. Just like i cannot (and should not) force you to develop in tools/qemu/, should you not try to force me to not develop in tools/kvm/. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Avi Kivity a...@redhat.com wrote: On 07/25/2011 01:03 PM, Ingo Molnar wrote: Then look at the actual drivers and interfaces within tools/kvm/. It's using the same symbols and conventions for 'guest' and 'host' side. Check out tools/kvm/hw/i8042.c and match it up with include/linux/serio.h and drivers/input/serio/i8042.c - you can literally walk from one side to the other and understand how guest and host are tightly related not just functionality but also implementation wise. This is how Qemu should be doing it as well btw., to ease the debugging of host/guest interaction bugs and to ease development. No. That ties the guest and host interfaces together. [...] Why would it tie the interfaces together? For a guest and host driver to be written in the same familiar conventions is useful for debugging and development easier but does not prevent both of those pieces to implement a precise interface. (Especially since in the i8042 case the host driver is used with real hardware as well.) The driver is under no requirement to use all of the functionality of the device. On the other hand, the device emulation has to be complete, or it risks not working with some past or future version of the driver, or with another OS (if you support that). Sure, and nothing in tools/kvm/hw/i8042.c contradicts that and my points still remain. You can have clean, harmonic code *and* have it work to spec, agreed? Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Alexander Graf ag...@suse.de wrote: On 25.07.2011, at 12:16, Ingo Molnar wrote: So it was a no brainer for me to pull it into -tip. The thing I don't agree with is that it should live in the kernel tree. FYI, tools/kvm/ *already* lives in the kernel tree - that is how it's developed and used and it also shares code with the kernel. So you're just trying to make this a self-fulfilling prophecy now? No, i'm just trying to point out to you that it's already a fact that the tool already lives in a kernel tree and was developed there to the tune of 700+ commits within a few short months. IMO you should not be arguing: 'You should do X because that's my preference.' IMO you should be arguing: 'You should do X because technical advantage Y which is important to the project.' We have listed numerous technical advantages of why we have chosen tools/kvm/ - and you have not countered that with any convincing technical disadvantage so far. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 10:14:13AM +0200, Alexander Graf wrote: So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. It's not Linux in general yet, it's mostly a crusade of a few with a political agenda. So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 Same here. I can't be bothered with all the stuid distro booting crap. I can do that without downloading any (inevitably outdated) virtualization images or maintaining my own ones. Maintaining host userspace is more than enough for me. Who would need images? I usually only run -kernel and -initrd directly to test out things. Or if I really want to boot into a real system I do -snapshot /dev/sda. Indeed. So, since we already have the lguest tool in the kernel tree, why cannot we have the much more capable tools/kvm/ in the tree? Lguest is in Documentation/ for a reason. It's not meant as a user space tool that you take as-is and use. It's meant for documenting how lguest works in general. I admit though, that that's also the reason people don't use it :). I'd also say it's rather misplaced there, and at least in the storage area that I know most it didn't help it from totally misunderstanding kernel concepts and taking them into protocols (e.g. virtio barrier support). That for me is a reason why you don't want to couple thing too tightly, at least you'll have to document and/or explain the protocol to someone. tight And another argument, calling toyvisor2 kvm is a really bad idea. The kvm binary has been used for the kvm-patched qemu binary for quite a while in various distros, so re-using that name will cause utter confusion. I'm happy that you guys do another independent userspace for kvm, but please: a) give it a useful name b) just develop it where it belongs, your own little git repository somewhere -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On Mon, 25 Jul 2011, Alexander Graf wrote: On 25.07.2011, at 12:09, Avi Kivity wrote: On 07/25/2011 01:04 PM, Alexander Graf wrote: On 25.07.2011, at 12:02, Avi Kivity wrote: On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. Only if you enforce them. The only sensible thing for QEMU_NEW (despite the general rule of upper case macros, I'd actually prefer this one to be lower case though since it's so often used) would be to remove qemu_malloc, declare malloc() as unusable and convert all users of qemu_malloc() to qemu_new(). Some qemu_mallocs() will remain (allocating a byte array or something variable sized). Right. But then we really do need a check in checkpatch.pl to see if someone's using qemu_new for simple structs. I agree qemu_new() will be nicer, but that will have to wait until Blue is several light-days away from Earth. Blue, any disagreement on adding qemu_new() as a macro? Something used that often in upper case would seriously disturb the reading flow :) So do not use it then, macros should be uppercase. -- mailto:av1...@comtv.ru -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Alexander Graf ag...@suse.de wrote: You know, they said the same thing about oprofile. All you needed to do was to write few simple shell scripts to make it work. One of the key features of tools/kvm is 'as little configuration as possible' and I can assure you that bash alias is really not a solution for that. I like perf. Really. But I still don't see why it wouldn't be able to live in its own tree either. Is the reason that the people who develop it prefer integration with the kernel tree not enough for you? perf could possibly be ported to other OSs. Maybe some day someone will try that. But unless that project actually replaces the perf project, or perf developers move out of the kernel en masse due to difficulties with the development model i don't see why the project would want to move out of the kernel tree. In fact my observations as a perf maintainer show the exact opposite: most perf developers are 100% happy to get their stuff merged and upstream ASAP. They do not buffer big patch-queues just to not have to deal with an integrated kernel tree. The integrated tree is a natural model of development to them and often perf tooling patches come mixed with kernel side patches such as new tracepoints or cleanups/fixes to related kernel code, so it's all very convenient. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH net-next] skbuff: clear tx zero-copy flag
On Mon, Jul 25, 2011 at 03:02:29AM -0700, David Miller wrote: From: Herbert Xu herb...@gondor.hengli.com.au Date: Mon, 25 Jul 2011 17:57:11 +0800 However, I think we should add a WARN_ON to the splice skb path so that should a packet find its way through a path that we haven't thought of then at least we'll know about it. Good idea. Another place like this is skb_split, I think. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
Avi Kivity a...@redhat.com writes: On 07/25/2011 01:04 PM, Alexander Graf wrote: On 25.07.2011, at 12:02, Avi Kivity wrote: On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. Only if you enforce them. The only sensible thing for QEMU_NEW (despite the general rule of upper case macros, I'd actually prefer this one to be lower case though since it's so often used) would be to remove qemu_malloc, declare malloc() as unusable and convert all users of qemu_malloc() to qemu_new(). Some qemu_mallocs() will remain (allocating a byte array or something variable sized). Byte array: add the obvious type-safe allocator for a variable-sized array T[N], then use it with unsigned char for T. In fact, I find QEMU_NEW() pretty pointless without a buddy for arrays. Still not covered: allocating a struct with a variable-size array as final member. I guess a solution for that can be found if we care enough. [...] -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
Kevin Wolf kw...@redhat.com writes: Am 25.07.2011 12:06, schrieb Stefan Hajnoczi: On Mon, Jul 25, 2011 at 9:51 AM, Avi Kivity a...@redhat.com wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Signed-off-by: Avi Kivity a...@redhat.com --- This is part of my memory API patchset, but doesn't really belong there. qemu-common.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index ba55719..66effa3 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -186,6 +186,9 @@ void qemu_free(void *ptr); char *qemu_strdup(const char *str); char *qemu_strndup(const char *str, size_t size); +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type Does this mean we need to duplicate the type name for each allocation? struct foo *f; ... f = qemu_malloc(sizeof(*f)); Becomes: struct foo *f; ... f = QEMU_NEW(struct foo); Maybe we should allow this and make it the usual pattern: f = qemu_new(typeof(*f)); It's gcc specific, but we already don't care about portability to other compilers in more places. On the other hand, how many bugs did we have recently that were caused by a wrong sizeof for qemu_malloc? As far as I can say, there's no real reason to do it. I think it's the same kind of discussion as with forbidding qemu_malloc(0) (except that this time it just won't improve things much instead of being really stupid). Side-stepping the stupid OMG malloc(0) is weird, therefore we must make qemu_malloc(0) differently weird controversy would be useful all by itself. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Christoph Hellwig h...@infradead.org wrote: So, since we already have the lguest tool in the kernel tree, why cannot we have the much more capable tools/kvm/ in the tree? Lguest is in Documentation/ for a reason. It's not meant as a user space tool that you take as-is and use. It's meant for documenting how lguest works in general. I admit though, that that's also the reason people don't use it :). I'd also say it's rather misplaced there, and at least in the storage area that I know most it didn't help it from totally misunderstanding kernel concepts and taking them into protocols (e.g. virtio barrier support). That for me is a reason why you don't want to couple thing too tightly, at least you'll have to document and/or explain the protocol to someone. It's funny that you bring up ABIs and virtio, as in my experience it's usually this pattern that creates bad ABIs in Linux: 1) there's an out of tree user-space project, written by a group of developers, proposing some new feature for which they'd need kernel help 2) they talk to another set of Linux kernel developers, who come up with something that works - still out of tree 3) they then talk to upstream Linux and the whole review process starts. The ABI changes in some minor ways but often survives. 4) nobody is risking any ABI changes because the distance from upstream to the actual user-space project is 3 boundaries. Whatever ad-hoc ABI was proposed initially is hammered through if possible, unless it's 100% unworkable. 5) upstream often bends for pragmatic reasons, and it's at most someone at the top of the maintenance hierarchy who says 'hell NO!' and forces an (expensive!) reiteration of all 5 steps. I've also seen a couple of good ABIs, which had a very natural development cycle: 1) developers working both upstream and in a user-space project propose an ABI and quickly iterate through several versions. Both the kernel side and the user-space side changes frequently and iteratively to perfect the ABI and the whole process is highly visible on lkml and gets review at every stage of this work. Do you recognize what i'm trying to point out? While moving a user-space tool project to tools/ certainly does not *guarantee* good ABIs, it *does* guarantee friction-less iterations of ABIs - which are much harder to achieve with out of tree projects. We've done this numerous times for the perf ABI and while the ABI is far from perfect it worked very well for us - far better than it would have worked as a separate project. In such an integrated tool/kernel tree bugs are also much easier to debug: as we can cleanly bisect across interim versions of the ABI, as the tools and the kernel side ABI is developed in lock-step. We do not have to create cross-compatibility between every interim version of the ABI and any future (or past) version of the tool ... So yes, my first hand experience shows that you are 100% wrong: - 'slow-play', out of tree, modularized, to-the-specification ABIs tend to suck - 'quickly iterated', in tree, unified, specified-on-lkml ABIs tend to work out much better There's exceptions, but that's the general trend. Fact is that developing ABIs within an integrated project is *amazingly* powerful. You should try it one day, instead of criticizing it :-) Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 25.07.2011, at 12:59, Markus Armbruster wrote: Avi Kivity a...@redhat.com writes: On 07/25/2011 01:04 PM, Alexander Graf wrote: On 25.07.2011, at 12:02, Avi Kivity wrote: On 07/25/2011 12:56 PM, Alexander Graf wrote: That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I agree that it keeps you from accidently malloc'ing a struct of pointer size. But couldn't we also just add this to checkpatch.pl? Better APIs trump better patch review. Only if you enforce them. The only sensible thing for QEMU_NEW (despite the general rule of upper case macros, I'd actually prefer this one to be lower case though since it's so often used) would be to remove qemu_malloc, declare malloc() as unusable and convert all users of qemu_malloc() to qemu_new(). Some qemu_mallocs() will remain (allocating a byte array or something variable sized). Byte array: add the obvious type-safe allocator for a variable-sized array T[N], then use it with unsigned char for T. In fact, I find QEMU_NEW() pretty pointless without a buddy for arrays. #define QEMU_NEW_MULTI(type, len) ((type *)(qemu_mallocz(sizeof(type) * len))) char *arr = QEMU_NEW_MULTI(char, 1024); Still not covered: allocating a struct with a variable-size array as final member. I guess a solution for that can be found if we care enough. Yeah, but at the end of the day I'd assume most of us know C and can just open code this all, no? Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 01:08:10PM +0200, Ingo Molnar wrote: Fact is that developing ABIs within an integrated project is *amazingly* powerful. You should try it one day, instead of criticizing it :-) I've been doing this long before you declare it the rosetta stone. Some of the worst ABIs I know come from that kind of development, e.g. all the ioctls we have in xfs, inherited from the tightly integrated IRIX development cycle. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Christoph Hellwig h...@infradead.org wrote: On Mon, Jul 25, 2011 at 01:08:10PM +0200, Ingo Molnar wrote: Fact is that developing ABIs within an integrated project is *amazingly* powerful. You should try it one day, instead of criticizing it :-) I've been doing this long before you declare it the rosetta stone. Some of the worst ABIs I know come from that kind of development, e.g. all the ioctls we have in xfs, inherited from the tightly integrated IRIX development cycle. As i said it's not a guarantee, you can mess up even under the best of circumstances. Also, arguably a filesystem is constrainted wrt. ABIs: it's just one filesystem which pretty much pushes it towards ioctls which in turn is often an unstructured breeding ground for mess. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 07:24:12AM -0400, Christoph Hellwig wrote: On Mon, Jul 25, 2011 at 01:08:10PM +0200, Ingo Molnar wrote: Fact is that developing ABIs within an integrated project is *amazingly* powerful. You should try it one day, instead of criticizing it :-) I've been doing this long before you declare it the rosetta stone. Some of the worst ABIs I know come from that kind of development, e.g. all the ioctls we have in xfs, inherited from the tightly integrated IRIX development cycle. You need someone with taste in the loop. But if you do, evolved is always better than designed before you actually know what you need. As I'm sure you perfectly know, for the matter. OG. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Introduce QEMU_NEW()
On 07/25/2011 12:32 PM, Alexander Graf wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. I'm using it as MemoryRegion *phys_flash = QEMU_NEW(MemoryRegion); instead of MemoryRegion *phys_flash = qemu_malloc(sizeof(*phys_flash)); I find it shorter, and if I make a mistake, the compiler shouts at me instead of a runtime crash. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 11:03:32AM +0200, Alexander Graf wrote: It's very hard to understand. It's similar to religion - I could easily apply your point to every reasonably low-level user space project out there. X for example. X needs to interact with KMS and DRI and whatdoiknow. So it'd be a perfect fit to pull into tools/, no? The current development methodology of X11/mesa/v*api is catastrophic, as the state of the linux graphics attest. Having a functional equivalent in tools would be an immense step forward, if simply by having all the damn related code in the same place, but also because of the interactions you cite. OG. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 01:34:25PM +0200, Olivier Galibert wrote: You need someone with taste in the loop. But if you do, evolved is always better than designed before you actually know what you need. As I'm sure you perfectly know, for the matter. Neither is actually helpful. You reall want reference implementation on both sides of an ABI, _and_ documentation. And yes, usually you'll need a few iterations over it. In theory you can do that in a tightly integrated enviroment as well, but practice shows simply boilds down to commiting the bloody thing. I'm also not sure why we even bother to focus with this side-line discussion. It's not like the kvm (as in the kernel kvm module) developers have written the kvm tools. It's just another userspace for the kvm userspace (the fifth if I count correctly), and so far the reference and often only implementation of any new kvm module feature is for qemu-kvm. So no matter where kvm tools lives, if you guys one day actually start doing major kvm core features it will still evolve discussing them with the main consumer of those interfaces. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 02:02 PM, Markus Armbruster wrote: Side-stepping the stupid OMG malloc(0) is weird, therefore we must make qemu_malloc(0) differently weird controversy would be useful all by itself. If we all work together, we can make this thread even bigger than the tools/kvm pull request. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
* Christoph Hellwig h...@infradead.org wrote: On Mon, Jul 25, 2011 at 10:14:13AM +0200, Alexander Graf wrote: So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. It's not Linux in general yet, it's mostly a crusade of a few with a political agenda. I find it weird (and insulting) that you are calling our patches that implement better tolling a 'crusade' done under a 'political agenda'. I'd rather say that the extremist position is yours: to keep everything separated always, all the time and ban tools from the kernel repo. You have declared it a failure from the very start and if it were up to you it would never get a fair chance. Thanks, Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] ioeventfd: Introduce KVM_IOEVENTFD_FLAG_SOCKET
On Wed, 2011-07-13 at 10:07 +0300, Avi Kivity wrote: Second, introducing a new type of exit doesn't mean we see more exits. Third, the new type of exit is probably not needed - we can use the existing mmio/pio exits, and document that in certain cases the kernel will fall back to these instead of delivering the I/O via the sockets (userspace can then try writing itself). Just waking this up since I want to send a new version and just want to cover some things before that. The problem with the original implementation was that if we receive a signal while we wait for the host to provide a value to be read, we must abort the operation and exit to do the signal. What this caused was that read operations with side effects would break (for example, when reading a byte would change the value in that byte). The original plan was to notify the host that we ignored his answer via the socket, and it should provide the response again via regular MMIO exit, but I couldn't find a good way to pass it through the MMIO exit. Also, This would complicate this operation on the host quite a bit. What I did instead was to assume that if the socket write notifying the host of a read operation went through ok, we can block on the socket read request. Does it sound ok? I know it's not what was originally planned, but to me it looked like the most efficient approach. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 03:51 AM, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Just use g_new() and g_new0() Regards, Anthony Liguori Signed-off-by: Avi Kivitya...@redhat.com --- This is part of my memory API patchset, but doesn't really belong there. qemu-common.h |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/qemu-common.h b/qemu-common.h index ba55719..66effa3 100644 --- a/qemu-common.h +++ b/qemu-common.h @@ -186,6 +186,9 @@ void qemu_free(void *ptr); char *qemu_strdup(const char *str); char *qemu_strndup(const char *str, size_t size); +#define QEMU_NEW(type) ((type *)(qemu_malloc(sizeof(type +#define QEMU_NEWZ(type) ((type *)(qemu_mallocz(sizeof(type + void qemu_mutex_lock_iothread(void); void qemu_mutex_unlock_iothread(void); -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] ioeventfd: Introduce KVM_IOEVENTFD_FLAG_SOCKET
On 07/25/2011 03:10 PM, Sasha Levin wrote: On Wed, 2011-07-13 at 10:07 +0300, Avi Kivity wrote: Second, introducing a new type of exit doesn't mean we see more exits. Third, the new type of exit is probably not needed - we can use the existing mmio/pio exits, and document that in certain cases the kernel will fall back to these instead of delivering the I/O via the sockets (userspace can then try writing itself). Just waking this up since I want to send a new version and just want to cover some things before that. The problem with the original implementation was that if we receive a signal while we wait for the host to provide a value to be read, we must abort the operation and exit to do the signal. What this caused was that read operations with side effects would break (for example, when reading a byte would change the value in that byte). The original plan was to notify the host that we ignored his answer via the socket, and it should provide the response again via regular MMIO exit, but I couldn't find a good way to pass it through the MMIO exit. Also, This would complicate this operation on the host quite a bit. What I did instead was to assume that if the socket write notifying the host of a read operation went through ok, we can block on the socket read request. Does it sound ok? I know it's not what was originally planned, but to me it looked like the most efficient approach. You can't block when a signal is pending. You can block, however, after you've exited with -EINTR and re-entered. We need to document that if a vcpu exited with -EINTR, then any socket memory transactions need to be flushed before the vcpu's state can be considered stable (for live migration). In fact it's true for any kind of exit. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 03:11 PM, Anthony Liguori wrote: On 07/25/2011 03:51 AM, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Just use g_new() and g_new0() These bypass qemu_malloc(). Are we okay with that? I suppose so, since many library functions can allocate memory and bypass qemu_malloc()? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 06:11 AM, Alexander Graf wrote: #define QEMU_NEW_MULTI(type, len) ((type *)(qemu_mallocz(sizeof(type) * len))) char *arr = QEMU_NEW_MULTI(char, 1024); Still not covered: allocating a struct with a variable-size array as final member. I guess a solution for that can be found if we care enough. Yeah, but at the end of the day I'd assume most of us know C and can just open code this all, no? While it's always fun to reinvent things, glib has already solved all of this and we're already dependent on it in the build: http://developer.gnome.org/glib/stable/glib-Memory-Allocation.html It also has fancy ways to hook memory allocation for debugging. Regards, Anthony Liguori Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 07:18 AM, Avi Kivity wrote: On 07/25/2011 03:11 PM, Anthony Liguori wrote: On 07/25/2011 03:51 AM, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Just use g_new() and g_new0() These bypass qemu_malloc(). Are we okay with that? Yes. We can just make qemu_malloc use g_malloc. I suppose so, since many library functions can allocate memory and bypass qemu_malloc()? Right. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Am 25.07.2011 10:30, schrieb Pekka Enberg: Hi Alexander, On Mon, Jul 25, 2011 at 11:14 AM, Alexander Graf ag...@suse.de wrote: So i wanted to have a lightweight tool that allows me to test KVM and tools/kvm/ does that very nicely: i type './kvm run' and i can test a native bzImage (which has some virtualization options enabled as well) on the _host_ distro i am running, booting to a text shell prompt. I do that all the time. $ qemu-kvm -nographic -kernel arch/x86/boot/bzImage -append console=ttyS0 does the exact same thing. If that's too much typing for you, make it a bash alias. You know, they said the same thing about oprofile. All you needed to do was to write few simple shell scripts to make it work. One of the key features of tools/kvm is 'as little configuration as possible' and I can assure you that bash alias is really not a solution for that. You've just chosen a different default. I'd argue that most users (i.e. not developers of the tool or the kernel) actually want to run with a disk image and graphics. You can type qemu-kvm harddisk.img and that's it. This is clearly superior to something as tedious as ./kvm run -d harddisk.img --sdl and I can assure you that a bash alias is really not a solution for that. So, as always, which set of command line switches works better for you depends entirely on your use case. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v1 1/1] Submit the codes for QEMU disk I/O limits.
On Mon, Jul 25, 2011 at 8:08 AM, Zhi Yong Wu zwu.ker...@gmail.com wrote: On Fri, Jul 22, 2011 at 6:54 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Fri, Jul 22, 2011 at 10:20 AM, Zhi Yong Wu wu...@linux.vnet.ibm.com wrote: + elapsed_time = (real_time - bs-slice_start[is_write]) / 10.0; + fprintf(stderr, real_time = %ld, slice_start = %ld, elapsed_time = %g\n, real_time, bs-slice_start[is_write], elapsed_time); + + bytes_limit = bps_limit * slice_time; + bytes_disp = bs-io_disps-bytes[is_write]; + if (bs-io_limits-bps[2]) { + bytes_disp += bs-io_disps-bytes[!is_write]; + } + + bytes_res = (unsigned) nb_sectors * BDRV_SECTOR_SIZE; + fprintf(stderr, bytes_limit = %g bytes_disp = %g, bytes_res = %g, elapsed_time = %g\n, bytes_limit, bytes_disp, bytes_res, elapsed_time); + + if (bytes_disp + bytes_res = bytes_limit) { + if (wait) { + *wait = 0; + } + + fprintf(stderr, bytes_disp + bytes_res = bytes_limit\n); + return false; + } + + /* Calc approx time to dispatch */ + wait_time = (bytes_disp + bytes_res - bytes_limit) / bps_limit; + if (!wait_time) { + wait_time = 1; + } + + wait_time = wait_time + (slice_time - elapsed_time); + if (wait) { + *wait = wait_time * 10 + 1; + } + + return true; +} After a slice expires all bytes/iops dispatched data is forgotten, even if there are still requests queued. This means that requests issued by the guest immediately after a new 100 ms period will be issued but existing queued requests will still be waiting. And since the queued requests don't get their next chance until later, it's possible that they will be requeued because the requests that the guest snuck in have brought us to the limit again. In order to solve this problem, we need to extend the current slice if Extend the current slice? like in-kernel block throttling algorithm? Our algorithm seems not to adopt it currently. I'm not sure if extending the slice is necessary as long as new requests are queued while previous requests are still queued. But extending slices is one way to deal with requests that span across multiple slices. See below. there are still requests pending. To prevent extensions from growing the slice forever (and keeping too much old data around), it should be alright to cull data from 2 slices ago. The simplest way of doing that is to subtract the bps/iops limits from the bytes/iops dispatched. You mean that the largest value of current_slice_time is not more than 2 slice_time? Yes. If no single request is larger than the I/O limit, then the timer value for a queued request should always be within the next slice. Therefore everything before last slice should be completed already and we don't need to keep that history around. @@ -2129,6 +2341,19 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, if (bdrv_check_request(bs, sector_num, nb_sectors)) return NULL; + /* throttling disk read I/O */ + if (bs-io_limits != NULL) { + if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) { + ret = qemu_block_queue_enqueue(bs-block_queue, bs, bdrv_aio_readv, + sector_num, qiov, nb_sectors, cb, opaque); 5 space indent, should be 4. + fprintf(stderr, bdrv_aio_readv: wait_time = %ld, timer value = %ld\n, wait_time, wait_time + qemu_get_clock_ns(vm_clock)); + qemu_mod_timer(bs-block_timer, wait_time + qemu_get_clock_ns(vm_clock)); Imagine 3 requests that need to be queued: A, B, and C. Since the timer is unconditionally set each time a request is queued, the timer will be set to C's wait_time. However A or B's wait_time might be earlier and we will miss that deadline! Yeah, exactly there is this issue. We really need a priority queue here. QEMU's timers solve the same problem with a sorted list, which might actually be faster for short lists where a fancy data structure has too much overhead. You mean the block requests should be handled in FIFO way in order? If the block queue is not empty, should this coming request be enqueued at first? right? Yes. If the limit was previously exceeded, enqueue new requests immediately: /* If a limit was exceeded, immediately queue this request */ if (!QTAILQ_EMPTY(queue-requests)) { if (limits-bps[IO_LIMIT_TOTAL]) { /* queue any rd/wr request */ } else if (limits-bps[is_write] another_request_is_queued[is_write]) { /* queue if the rd/wr-specific limit was previously exceeded */ } ...same for iops... } This way new requests cannot skip ahead of queued requests due to the lost history when a new slice starts. +BlockDriverAIOCB *qemu_block_queue_enqueue(BlockQueue *queue, + BlockDriverState *bs, + BlockRequestHandler
Re: [PATCH 5/5] ioeventfd: Introduce KVM_IOEVENTFD_FLAG_SOCKET
On Mon, 2011-07-25 at 15:16 +0300, Avi Kivity wrote: On 07/25/2011 03:10 PM, Sasha Levin wrote: On Wed, 2011-07-13 at 10:07 +0300, Avi Kivity wrote: Second, introducing a new type of exit doesn't mean we see more exits. Third, the new type of exit is probably not needed - we can use the existing mmio/pio exits, and document that in certain cases the kernel will fall back to these instead of delivering the I/O via the sockets (userspace can then try writing itself). Just waking this up since I want to send a new version and just want to cover some things before that. The problem with the original implementation was that if we receive a signal while we wait for the host to provide a value to be read, we must abort the operation and exit to do the signal. What this caused was that read operations with side effects would break (for example, when reading a byte would change the value in that byte). The original plan was to notify the host that we ignored his answer via the socket, and it should provide the response again via regular MMIO exit, but I couldn't find a good way to pass it through the MMIO exit. Also, This would complicate this operation on the host quite a bit. What I did instead was to assume that if the socket write notifying the host of a read operation went through ok, we can block on the socket read request. Does it sound ok? I know it's not what was originally planned, but to me it looked like the most efficient approach. You can't block when a signal is pending. You can block, however, after you've exited with -EINTR and re-entered. What would happen with the MMIO then? I need to provide an answer before I leave the read/write functions to exit with -EINTR, no? We need to document that if a vcpu exited with -EINTR, then any socket memory transactions need to be flushed before the vcpu's state can be considered stable (for live migration). In fact it's true for any kind of exit. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 04:52 AM, Avi Kivity wrote: On 07/25/2011 12:48 PM, Peter Maydell wrote: On 25 July 2011 10:32, Alexander Grafag...@suse.de wrote: On 25.07.2011, at 10:51, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. What does this buy you over type *x = qemu_malloc(sizeof(type)); ? I find the non-C++ version easier to read even. Yeah, while we're writing in C we should just stick to the C-like APIs, it's less confusing IMHO than wrapping it up in something else. That argument can be used to block any change. You'll get used to it in time. The question is, is the new interface better or not. I assume Anthony's new object model stuff will have a create me a new foo object API anyway, so QEMU_NEW() is possibly a bit of a namespace grab. Anthony's stuff is at a much higher level, hopefully he'll come back to the ground one day. The point of introducing glib was to address things like this. We need to start making heavier use of what it provides. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Christoph, On Mon, 2011-07-25 at 06:38 -0400, Christoph Hellwig wrote: On Mon, Jul 25, 2011 at 10:14:13AM +0200, Alexander Graf wrote: So instead of thinking a bit and trying to realize that there might be a reason people don't want all their user space in the kernel tree you go ahead and start your own crusade of creating a new user space. Great. That's how I always hoped Linux would be :(. It's not Linux in general yet, it's mostly a crusade of a few with a political agenda. Would you mind sharing with the rest of us who exactly those few are and what is their political agenda? I certainly don't recognize myself or any of the development team from your description above. And to be completely honest, I think that's one of the single most stupid comments I've read in this thread. On Mon, 2011-07-25 at 06:38 -0400, Christoph Hellwig wrote: And another argument, calling toyvisor2 kvm is a really bad idea. The kvm binary has been used for the kvm-patched qemu binary for quite a while in various distros, so re-using that name will cause utter confusion. I don't foresee 'utter confusion'. The code definitely wants to live in tools/kvm so it's logical that the executable name is called 'kvm'. If distros want to package it with a different name, they're free to do so. I'm happy that you guys do another independent userspace for kvm, but please: a) give it a useful name I don't think toyvisor2 will do, really. I originally wanted to call the hypervisor 'pvm' for selfish reasons because quite frankly 'penix' just doesn't have the prestige 'linux' has. However, Ingo convinced me that 'kvm' is a good name and I agree with that. I'm open to alternative suggestions, though. b) just develop it where it belongs, your own little git repository somewhere I already explained why we want it in the kernel tree and what kind of benefits we think there are. I also explained that we want to reuse and share code with perf, for example. If you really want to argue against merging, you need to cough up some actual disadvantages or refute the suggested advantages. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Introduce QEMU_NEW()
On 07/25/2011 03:21 PM, Anthony Liguori wrote: On 07/25/2011 07:18 AM, Avi Kivity wrote: On 07/25/2011 03:11 PM, Anthony Liguori wrote: On 07/25/2011 03:51 AM, Avi Kivity wrote: qemu_malloc() is type-unsafe as it returns a void pointer. Introduce QEMU_NEW() (and QEMU_NEWZ()), which return the correct type. Just use g_new() and g_new0() These bypass qemu_malloc(). Are we okay with that? Yes. We can just make qemu_malloc use g_malloc. Excellent. Patch withdrawn. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
Hi Kevin, On Mon, 2011-07-25 at 14:24 +0200, Kevin Wolf wrote: You've just chosen a different default. I'd argue that most users (i.e. not developers of the tool or the kernel) actually want to run with a disk image and graphics. You can type qemu-kvm harddisk.img and that's it. This is clearly superior to something as tedious as ./kvm run -d harddisk.img --sdl and I can assure you that a bash alias is really not a solution for that. I actually agree with you and we eventually want kvm run harddisk.img to do the right thing. On Mon, 2011-07-25 at 14:24 +0200, Kevin Wolf wrote: So, as always, which set of command line switches works better for you depends entirely on your use case. I actually don't agree. I think Qemu requires way too much configuration from the user and doesn't try hard enough to provide best possible defaults. Dunno how much virt-manager changes all that, though. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, Jul 25, 2011 at 12:06 PM, Alexander Graf ag...@suse.de wrote: And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Then kvm-tool is about as useful as Mac-on-Linux. Why don't we have MoL user land in the kernel? I even added support for KVM to it about a year ago. So all I need to do is change it to the kernel coding style, add some dependencies on kernel headers and I'm good for a pull request? Oh, I dunno - have you tried that? If you're interested in sending a patch against tools/kvm that adds KVM-based support to Mac, I'm happy to review it and consider for inclusion. ;-) Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 03:41 PM, Pekka Enberg wrote: On Mon, 2011-07-25 at 14:24 +0200, Kevin Wolf wrote: So, as always, which set of command line switches works better for you depends entirely on your use case. I actually don't agree. I think Qemu requires way too much configuration from the user and doesn't try hard enough to provide best possible defaults. qemu has a huge wealth of options so the command line is incredibly complicated. Changing defaults is hard due to backward compatibility constraints. Dunno how much virt-manager changes all that, though. virt-manager is a GUI interface, no command line at all. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 03:45 PM, Pekka Enberg wrote: On Mon, Jul 25, 2011 at 12:06 PM, Alexander Grafag...@suse.de wrote: And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Then kvm-tool is about as useful as Mac-on-Linux. Why don't we have MoL user land in the kernel? I even added support for KVM to it about a year ago. So all I need to do is change it to the kernel coding style, add some dependencies on kernel headers and I'm good for a pull request? Oh, I dunno - have you tried that? If you're interested in sending a patch against tools/kvm that adds KVM-based support to Mac, I'm happy to review it and consider for inclusion. ;-) The patch will be a lot bigger than tools/kvm itself. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Fix svm unit test
Since the -enable-nesting option is no longer available remove the option from the config file. Signed-off-by: Conny Seidel conny.sei...@amd.com --- x86/unittests.cfg |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/x86/unittests.cfg b/x86/unittests.cfg index 228ac1d..065020a 100644 --- a/x86/unittests.cfg +++ b/x86/unittests.cfg @@ -57,7 +57,7 @@ file = rmap_chain.flat [svm] file = svm.flat smp = 2 -extra_params = -enable-nesting -cpu qemu64,+svm +extra_params = -cpu qemu64,+svm [svm-disabled] file = svm.flat -- 1.7.2.3 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 14:47, Avi Kivity wrote: On 07/25/2011 03:45 PM, Pekka Enberg wrote: On Mon, Jul 25, 2011 at 12:06 PM, Alexander Grafag...@suse.de wrote: And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Then kvm-tool is about as useful as Mac-on-Linux. Why don't we have MoL user land in the kernel? I even added support for KVM to it about a year ago. So all I need to do is change it to the kernel coding style, add some dependencies on kernel headers and I'm good for a pull request? Oh, I dunno - have you tried that? If you're interested in sending a patch against tools/kvm that adds KVM-based support to Mac, I'm happy to review it and consider for inclusion. ;-) The patch will be a lot bigger than tools/kvm itself. And it will be full of ugly hacks and code that needs to be compiled on a Mac. I was actually trying to pick an example of a project that you really don't want to have in the kernel tree. One point stays true however with MoL. I added support for KVM a year ago and haven't had to touch it since. It just works. Because we have a pretty stable kernel/user space interface. So the whole thing about having user space and kernel code live together in the same tree is moot there. And if you need to share code with perf, then that just proves the point that perf doesn't belong inside the kernel tree either. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On Mon, 2011-07-25 at 15:47 +0300, Avi Kivity wrote: On 07/25/2011 03:45 PM, Pekka Enberg wrote: On Mon, Jul 25, 2011 at 12:06 PM, Alexander Grafag...@suse.de wrote: And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Then kvm-tool is about as useful as Mac-on-Linux. Why don't we have MoL user land in the kernel? I even added support for KVM to it about a year ago. So all I need to do is change it to the kernel coding style, add some dependencies on kernel headers and I'm good for a pull request? Oh, I dunno - have you tried that? If you're interested in sending a patch against tools/kvm that adds KVM-based support to Mac, I'm happy to review it and consider for inclusion. ;-) The patch will be a lot bigger than tools/kvm itself. But it would be as a single patch series which includes kvm and tools/kvm :) -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 25.07.2011, at 14:51, Sasha Levin wrote: On Mon, 2011-07-25 at 15:47 +0300, Avi Kivity wrote: On 07/25/2011 03:45 PM, Pekka Enberg wrote: On Mon, Jul 25, 2011 at 12:06 PM, Alexander Grafag...@suse.de wrote: And don't get this the wrong way either, I'm not hostile against other operating systems, but I simply am not interested enough in them to spend my time improving them. Then kvm-tool is about as useful as Mac-on-Linux. Why don't we have MoL user land in the kernel? I even added support for KVM to it about a year ago. So all I need to do is change it to the kernel coding style, add some dependencies on kernel headers and I'm good for a pull request? Oh, I dunno - have you tried that? If you're interested in sending a patch against tools/kvm that adds KVM-based support to Mac, I'm happy to review it and consider for inclusion. ;-) The patch will be a lot bigger than tools/kvm itself. But it would be as a single patch series which includes kvm and tools/kvm :) Not sure I get what you mean here. We already have KVM support for Macs ;). And pSeries. And the PS3. It even works on the Wii. Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] Native Linux KVM tool for 3.1
On 07/25/2011 09:50 AM, Sasha Levin wrote: Anthony had a talk on last years KVM forum regarding the QEMU threading model (slide: http://www.linux-kvm.org/wiki/images/7/70/2010-forum-threading-qemu.pdf) . It was suggested that the KVM part of QEMU is having a hard time achieving the ideal threading model due to its need to support TCG - something which has nothing to do with KVM itself. No, it is not having a hard time. The foot in the door that Anthony mentions has been part of QEMU and qemu-kvm for a long time (multi-threading is necessary to support SMP!) and works quite well. Historically, there were three main loops: 1) QEMU single-threaded; 2) QEMU multi-threaded; clean, but buggy, untested and bitrotting; 3) qemu-kvm multi-threaded, forked from (1), ugly but robust and widely deployed. In 0.15 the two multi-threaded versions have been unified by Jan Kiszka. I have even ported (2) to Windows with little or no pain; porting to Mac OS X interestingly is harder than Windows, because non-portable Linux assumptions about signal handling have crept in the code (to preempt the objections: they weren't just non-portabilities, they were latent bugs). Windows just does not have signals. :) So, right now, the only difference is that QEMU is still defaulting to the single-threaded main loop, while qemu-kvm enables multi-threading by default. In some time even QEMU will switch. Yes, this is of course worse than getting it right in the first place; Nobody is saying the opposite. Paolo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] ioeventfd: Introduce KVM_IOEVENTFD_FLAG_SOCKET
On 07/25/2011 03:26 PM, Sasha Levin wrote: You can't block when a signal is pending. You can block, however, after you've exited with -EINTR and re-entered. What would happen with the MMIO then? I need to provide an answer before I leave the read/write functions to exit with -EINTR, no? If you exit, no result is needed until you reenter. You just have to remember that on the next KVM_RUN, instead of running the vcpu, you have to talk to the socket (either write or read, depending on where the signal got you). -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html