Re: Interrupt Assignment on host
On 11/05/2009 09:09 PM, Erik Rull wrote: Can you describe how performance suffers? Please provide vmstat (for host int/sec) and kvm_stat -l -f 'exits|irq_exits' output. I'm sorry, but my target system has no python installed. Any ideas how to do that manually? Try the attached bash script. You'll want debugfs mounted on /sys/kernel/debug. -- error compiling committee.c: too many arguments to function #!/bin/bash p=/sys/kernel/debug/kvm old_exits=0 old_irq_exits=0 while :; do exits=$(cat $p/exits) irq_exits=$(cat $p/irq_exits) d_exits=$((exits - old_exits)) d_irq_exits=$((irq_exits - old_irq_exits)) printf %10d %10d\n $d_exits $d_irq_exits sleep 1 old_exits=$exits old_irq_exits=$irq_exits done
Re: CPU change causes hanging of .NET apps
On 11/05/2009 09:03 PM, Erik Rull wrote: qemu -cpu host,-flag I tried that value, result is an error: Unable to find x86 CPU definition Please use qemu-kvm.git for this test. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: soft lockup after live migration
Marcelo Tosatti wrote: Tomasz, The screenshots seem to indicate a paravirt mmu problem. Try to patch the x86.c file from kvm kernel module with: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c - case KVM_CAP_PV_MMU: - r = !tdp_enabled; + case KVM_CAP_PV_MMU:/* obsolete */ + r = 0; You'll probaby have to do it manually (this disables pvmmu). With this, some guests fail to start with kernel panic; some have soft lockups all the time. Some don't start at all. And generally, everything is dead slow. -- Tomasz Chmielewski http://wpkg.org -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
* Gleb Natapov g...@redhat.com wrote: On Mon, Nov 02, 2009 at 05:29:41PM +0100, Ingo Molnar wrote: * Gleb Natapov g...@redhat.com wrote: On Mon, Nov 02, 2009 at 05:12:48PM +0100, Ingo Molnar wrote: * Gleb Natapov g...@redhat.com wrote: On Mon, Nov 02, 2009 at 10:22:14AM +0100, Ingo Molnar wrote: * Gleb Natapov g...@redhat.com wrote: diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c index f4cee90..14707dc 100644 --- a/arch/x86/mm/fault.c +++ b/arch/x86/mm/fault.c @@ -952,6 +952,9 @@ do_page_fault(struct pt_regs *regs, unsigned long error_code) int write; int fault; + if (arch_handle_page_fault(regs, error_code)) + return; + This patch is not acceptable unless it's done cleaner. Currently we already have 3 callbacks in do_page_fault() (kmemcheck, mmiotrace, notifier), and this adds a fourth one. Please consolidate them into a single callback site, this is a hotpath on x86. This call is patched out by paravirt patching mechanism so overhead should be zero for non paravirt cases. [...] arch_handle_page_fault() isnt upstream yet - precisely what is the instruction sequence injected into do_page_fault() in the patched-out case? It is introduced by the same patch. The instruction inserted is: xor %rax, %rax ok. My observations still stand: [...] What do you want to achieve by consolidate them into single callback? [...] Less bloat in a hotpath and a shared callback infrastructure. [...] I mean the code will still exist and will have to be executed on every #PF. Is the goal to move them out of line? The goal is to have a single callback site for all the users - which call-site is patched out ideally - on non-paravirt too if needed. Most of these callbacks/notifier-chains have are inactive most of the time. I.e. a very low overhead 'conditional callback' facility, and a single one - not just lots of them sprinkled around the code. looks like a golden opportunity to get this right. Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two of them kmemcheck, mmiotrace are enabled only for debugging, should not be performance concern. And notifier call sites (two of them) are deliberately, as explained by comment, not at the function entry, so can't be unified with others. (And kmemcheck also has two different call site BTW) We want mmiotrace to be generic distro capable so the overhead when the hook is not used is of concern. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 3/3] vhost_net: a kernel-level virtio server
On Fri, Nov 06, 2009 at 03:29:17PM +1030, Rusty Russell wrote: On Thu, 5 Nov 2009 02:27:24 am Michael S. Tsirkin wrote: What it is: vhost net is a character device that can be used to reduce the number of system calls involved in virtio networking. Hi Michael, Now everyone else has finally kicked all the tires and it seems to pass, I've done a fairly complete review. Generally, it's really nice; just one bug and a few minor suggestions for polishing. Thanks for the review! I'll add more polishing and repost. Answers to some questions below. +/* Caller must have TX VQ lock */ +static void tx_poll_stop(struct vhost_net *net) +{ + if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED)) + return; likely? Really? Hmm ... yes. tx poll stop is called on each packet (as long as we do not fill up 1/2 backend queue), the first call will stop polling the rest checks state and does nothing. This is because we normally do not care when the message has left the queue in backend device: we tell backend to send it and forget. We only start polling when backend tx queue fills up. Makes sense? + for (;;) { + head = vhost_get_vq_desc(net-dev, vq, vq-iov, out, in, +NULL, NULL); Danger! You need an arg to vhost_get_vq_desc to tell it the max desc size you can handle. Otherwise, it's only limited by ring size, and a malicious guest can overflow you here, and below: In fact, I think this is not a bug. This happens to work correctly (even with malicious guests) because vhost_get_vq_desc is hard-coded to check VHOST_NET_MAX_SG, so in fact no overflow is possible. I agree that it's mich nicer to pass iovec size to vhost_get_vq_desc. + /* Skip header. TODO: support TSO. */ + s = move_iovec_hdr(vq-iov, vq-hdr, hdr_size, out); ... + + use_mm(net-dev.mm); + mutex_lock(vq-mutex); + vhost_no_notify(vq); I prefer a name like vhost_disable_notify(). Good idea. + /* OK, now we need to know about added descriptors. */ + if (head == vq-num vhost_notify(vq)) + /* They could have slipped one in as we were doing that: +* check again. */ + continue; + /* Nothing new? Wait for eventfd to tell us they refilled. */ + if (head == vq-num) + break; + /* We don't need to be notified again. */ + vhost_no_notify(vq); Similarly, vhost_enable_notify. This one is particularly misleading since it doesn't actually notify anything! Good idea. In particular, this code would be neater as: if (head == vq-num) { if (vhost_enable_notify(vq)) { /* Try again, they could have slipped one in. */ continue; } /* Nothing more to do. */ break; } vhost_disable_notify(vq); Now, AFAICT vhost_notify()/enable_notify() would be better rewritten to return true only when there's more pending. Saves a loop around here most of the time. OKay, I'll look into this. It kind of annoys me that we would do get_user for the same value twice: once in vhost_enable_notify and once in vhost_get_vq_desc. OTOH, all the loop does is call vhost_get_vq_desc again. Also, the vhost_no_notify/vhost_disable_notify() can be moved out of the loop entirely. I don't think it can, if we enabled notification and then see more descriptors in queue, we want to disable notification again. But it can be (It could be under an if (unlikely(enabled)), not sure if it's worth it). if (unlikely(vhost_enable_notify(vq))) { /* Try again, they have slipped one in. */ vhost_disable_notify(vq); continue; } + len = err; + err = memcpy_toiovec(vq-hdr, (unsigned char *)hdr, hdr_size); That unsigned char * arg to memcpy_toiovec is annoying. A patch might be nice, separate from this effort. Sounds good. +static int vhost_net_open(struct inode *inode, struct file *f) +{ + struct vhost_net *n = kzalloc(sizeof *n, GFP_KERNEL); + int r; + if (!n) + return -ENOMEM; + f-private_data = n; + n-vqs[VHOST_NET_VQ_TX].handle_kick = handle_tx_kick; + n-vqs[VHOST_NET_VQ_RX].handle_kick = handle_rx_kick; I have a personal dislike of calloc for structures. You mean zalloc? In userspace, it's because valgrind can't spot uninitialized fields. These days a similar argument applies in the kernel, because we have KMEMCHECK now. If someone adds a field to the struct and forgets to initialize it, we can spot it. OK. +static void vhost_net_enable_vq(struct vhost_net *n, int index) +{ + struct socket *sock = n-vqs[index].private_data; OK, I can't help but this that presenting the vqs as an
Re: xp guest, blue screen c0000221 on boot
On 11/04/2009 07:06 PM, Andrew Olney wrote: Host is ext3 Linux monkamu 2.6.31-14-generic #48-Ubuntu SMP Fri Oct 16 14:04:26 UTC 2009 i686 GNU/Linux QEMU PC emulator version 0.11.0 (qemu-kvm-0.11.0), Copyright (c) 2003-2008 Fabrice Bellard If you convert a qcow image back to raw on ext3, does it fail? Similarly, converting it into an lvm volume, works or not? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
On 11/08/2009 01:36 PM, Ingo Molnar wrote: Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two of them kmemcheck, mmiotrace are enabled only for debugging, should not be performance concern. And notifier call sites (two of them) are deliberately, as explained by comment, not at the function entry, so can't be unified with others. (And kmemcheck also has two different call site BTW) We want mmiotrace to be generic distro capable so the overhead when the hook is not used is of concern. Maybe we should generalize paravirt-ops patching in case if (x) f() is deemed too expensive. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Recording interrupted instruction on ioapic_set_irq
On 11/07/2009 10:13 PM, Kurt Kiefer wrote: Hi guys, I'm trying to record the instruction pointer at the exact time a guest was delivered an IOAPIC interrupt. Take for example a PS2 keyboard press. Clearly, when I read IP during the subsequent exit for IO_INSTRUCTION I'm just recording the IP of io_read in the handler, and not the IP at actual interrupt delivery. Maybe I'm missing something fundamental. It doesn't look like exits for EXTERNAL_INTERRUPT (shouldn't it?) or INTERRUPT_WINDOW correspond one-to-one with delivery of these PS2 interrupts. Just setting request_interrupt_window for these IRQs didn't give me an INTERRUPT_WINDOW for each key. I guess since the guest doesn't usually have interrupts masked when I press a key means delivery won't wait for the window. Could I record during delivery? I figure I could look at the stack during the IO_INSTRUCTION exit and figure out what instruction was actually interrupted, but this would be a Linux-specific solution. Any other ideas? I think even a simple description of how these interrupts are being delivered to the guest would help me out a lot. This is all available now in 2.6.32 or later with the new trace infrastructure. If you enable ftrace and echo kvm /sys/kernel/tracing/set_event, you should get a trace of all interrupt injections. Of course, you need to figure out which vector is associated with your irq; you can even have the trace infrastructure filter this for you. Note you can only get the ip at the time the interrupt is delivered, rather than the time the irq is asserted on the ioapic pin. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
* Avi Kivity a...@redhat.com wrote: On 11/08/2009 01:36 PM, Ingo Molnar wrote: Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two of them kmemcheck, mmiotrace are enabled only for debugging, should not be performance concern. And notifier call sites (two of them) are deliberately, as explained by comment, not at the function entry, so can't be unified with others. (And kmemcheck also has two different call site BTW) We want mmiotrace to be generic distro capable so the overhead when the hook is not used is of concern. Maybe we should generalize paravirt-ops patching in case if (x) f() is deemed too expensive. Yes, that's a nice idea. We have quite a number of 'conditional callbacks' in various critical paths that could be made lighter via such a technique. It would also free new callbacks from the 'it increases overhead even if unused' criticism and made it easier to add them. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
On 11/08/2009 02:51 PM, Ingo Molnar wrote: Maybe we should generalize paravirt-ops patching in case if (x) f() is deemed too expensive. Yes, that's a nice idea. We have quite a number of 'conditional callbacks' in various critical paths that could be made lighter via such a technique. It would also free new callbacks from the 'it increases overhead even if unused' criticism and made it easier to add them. We can take the immediate values infrastructure as a first step. Has that been merged? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
* Avi Kivity a...@redhat.com wrote: On 11/08/2009 02:51 PM, Ingo Molnar wrote: Maybe we should generalize paravirt-ops patching in case if (x) f() is deemed too expensive. Yes, that's a nice idea. We have quite a number of 'conditional callbacks' in various critical paths that could be made lighter via such a technique. It would also free new callbacks from the 'it increases overhead even if unused' criticism and made it easier to add them. We can take the immediate values infrastructure as a first step. Has that been merged? No, there were doubts about whether patching in live instructions like that is safe on all CPU types. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: clear only essential parts of VirtIOBlockReq on object allocation
On 11/04/2009 07:41 PM, Saul Tamari wrote: This patch reduces the size of memory being cleared on every virtio-blk IO. On every virtio-blk IO passed to QEMU, virtio_blk_alloc_request() allocates and clears (with qemu_mallocz()) a VirtIOBlockReq object. The sizeof(VirtIOBlockReq) equals 41040 bytes on my x86-64 machine. By moving the 'elem' variable to the end of VirtIOBlockReq and clearing only upto the address of the 'elem.in_addr' field, the memset() call now clears only 80 bytes. Signed-off-by: Saul Tamarist...@gmail.com Use the full email in signoffs. @@ -79,12 +79,12 @@ static inline void virtio_identify_template(struct virtio_blk_config *bc) typedef struct VirtIOBlockReq { VirtIOBlock *dev; -VirtQueueElement elem; struct virtio_blk_inhdr *in; struct virtio_blk_outhdr *out; struct virtio_scsi_inhdr *scsi; QEMUIOVector qiov; struct VirtIOBlockReq *next; +VirtQueueElement elem; } VirtIOBlockReq; Needs a comment to indicate new members must be before 'elem' if they need to be cleared. static void virtio_blk_req_complete(VirtIOBlockReq *req, int status) @@ -139,7 +139,8 @@ static void virtio_blk_flush_complete(void *opaque, int ret) static VirtIOBlockReq *virtio_blk_alloc_request(VirtIOBlock *s) { -VirtIOBlockReq *req = qemu_mallocz(sizeof(*req)); +VirtIOBlockReq *req = qemu_malloc(sizeof(*req)); +memset(req, 0, (size_t)(((VirtIOBlockReq*)0)-elem.in_addr[0])); Please use offsetof() instead of open-coding it. Submit to qemu-de...@nongnu.org since this came from upstream. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
On 11/08/2009 03:05 PM, Ingo Molnar wrote: We can take the immediate values infrastructure as a first step. Has that been merged? No, there were doubts about whether patching in live instructions like that is safe on all CPU types. Ah, I remember. Doesn't the trick of going through a breakpoint instruction work? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
On 11/08/2009 04:51 AM, Ingo Molnar wrote: * Avi Kivity a...@redhat.com wrote: On 11/08/2009 01:36 PM, Ingo Molnar wrote: Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two of them kmemcheck, mmiotrace are enabled only for debugging, should not be performance concern. And notifier call sites (two of them) are deliberately, as explained by comment, not at the function entry, so can't be unified with others. (And kmemcheck also has two different call site BTW) We want mmiotrace to be generic distro capable so the overhead when the hook is not used is of concern. Maybe we should generalize paravirt-ops patching in case if (x) f() is deemed too expensive. Yes, that's a nice idea. We have quite a number of 'conditional callbacks' in various critical paths that could be made lighter via such a technique. It would also free new callbacks from the 'it increases overhead even if unused' criticism and made it easier to add them. There are a number of other things were we permanently bind to a single instance of something, too. Optimizing those away would be nice. Consider memcpy(), where we may want to have different implementations for different processors. -hpa -- H. Peter Anvin, Intel Open Source Technology Center I work for Intel. I don't speak on their behalf. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 02/11] Add handle page fault PV helper.
* H. Peter Anvin h...@zytor.com wrote: On 11/08/2009 04:51 AM, Ingo Molnar wrote: * Avi Kivity a...@redhat.com wrote: On 11/08/2009 01:36 PM, Ingo Molnar wrote: Three existing callbacks are: kmemcheck, mmiotrace, notifier. Two of them kmemcheck, mmiotrace are enabled only for debugging, should not be performance concern. And notifier call sites (two of them) are deliberately, as explained by comment, not at the function entry, so can't be unified with others. (And kmemcheck also has two different call site BTW) We want mmiotrace to be generic distro capable so the overhead when the hook is not used is of concern. Maybe we should generalize paravirt-ops patching in case if (x) f() is deemed too expensive. Yes, that's a nice idea. We have quite a number of 'conditional callbacks' in various critical paths that could be made lighter via such a technique. It would also free new callbacks from the 'it increases overhead even if unused' criticism and made it easier to add them. There are a number of other things were we permanently bind to a single instance of something, too. Optimizing those away would be nice. Consider memcpy(), where we may want to have different implementations for different processors. yeah. Ingo -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Doubt on KVM-88 vulnerabilities
Hi all! I'm using KVM-88 compiled by myself from the source code provided by the official site of the project. Is this version of KVM vulnerable to the mentioned thing in the DSA-1907-1 [1]? In such case, there is some published patch that can be applied or some new version that solves this? Thanks in advance for your reply. Regards, Daniel [1] http://lists.debian.org/debian-security-announce/2009/msg00229.html -- Fingerprint: BFB3 08D6 B4D1 31B2 72B9 29CE 6696 BF1B 14E6 1D37 Powered by Debian GNU/Linux Squeeze - Linux user #188.598 signature.asc Description: Digital signature
Re: [PATCHv7 3/3] vhost_net: a kernel-level virtio server
On Sun, Nov 08, 2009 at 02:39:59PM +1030, Rusty Russell wrote: On Sat, 7 Nov 2009 03:00:07 am Paul E. McKenney wrote: On Fri, Nov 06, 2009 at 03:31:20PM +1030, Rusty Russell wrote: But it's still nasty to use half an API. If it were a few places I would have open-coded it with a comment, or wrapped it. As it is, I don't think that would be a win. So would it help to have a rcu_read_lock_workqueue() and rcu_read_unlock_workqueue() that checked nesting and whether they were actually running in the context of a workqueue item? Or did you have something else in mind? Or am I misjudging the level of sarcasm in your reply? ;-) You read correctly. If we get a second user, creating an API makes sense. Makes sense to me as well. Which does provide some time to come up with a primitive designed to answer the question Am I currently executing in the context of a workqueue item?. ;-) Thanx, Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Reserve CPU cores for specific guests?
Hello: I don't think there is a way to do this with KVM, but I figured I would ask: I want to be able to offer virtual private servers (VPSs) to clients. I am going to use KVM for it. I would like to offer clients the option to buy either: 1. A VPS which allows CPUs to be overcommitted. 2. A VPS with a dedicated CPU core. So, for example, if I have a six core opteron, I might sell: 2 VPSs with a dedicated CPU core 6 VPSs which allow overcommitted CPUs Since I need one core for the hypervisor, there would need to be a way to say that it gets a dedicated core plus the other 2 VPSs get a dedicated core. That leaves 3 pooled cores to serve the 6 VPSs that are allowed to overcommit. Is there a way to set up a pooled set of cores for a given list of VPSs? I think I may have to use separate physical machine for the VPSs with dedicated cores and the ones with overcommitted ones. Thanks, Neil -- Neil Aggarwal, (281)846-8957, http://www.JAMMConsulting.com CentOS 5.4 KVM VPS $55/mo, no setup fee, no contract, dedicated 64bit CPU 1GB dedicated RAM, 40GB RAID storage, 500GB/mo premium BW, Zero downtime -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reserve CPU cores for specific guests?
Neil Aggarwal napsal(a): Hello: I don't think there is a way to do this with KVM, but I figured I would ask: I want to be able to offer virtual private servers (VPSs) to clients. I am going to use KVM for it. I would like to offer clients the option to buy either: 1. A VPS which allows CPUs to be overcommitted. 2. A VPS with a dedicated CPU core. So, for example, if I have a six core opteron, I might sell: 2 VPSs with a dedicated CPU core 6 VPSs which allow overcommitted CPUs Since I need one core for the hypervisor, there would need to be a way to say that it gets a dedicated core plus the other 2 VPSs get a dedicated core. That leaves 3 pooled cores to serve the 6 VPSs that are allowed to overcommit. Is there a way to set up a pooled set of cores for a given list of VPSs? I think I may have to use separate physical machine for the VPSs with dedicated cores and the ones with overcommitted ones. Thanks, Neil -- Neil Aggarwal, (281)846-8957, http://www.JAMMConsulting.com CentOS 5.4 KVM VPS $55/mo, no setup fee, no contract, dedicated 64bit CPU 1GB dedicated RAM, 40GB RAID storage, 500GB/mo premium BW, Zero downtime -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html I think you can achieve that on some simple level DIY with taskset from util-linux(-ng). HTH, Z. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2894352 ] Ctrl+Alt+f
Bugs item #2894352, was opened at 2009-11-09 03:03 Message generated for change (Tracker Item Submitted) made by pawelwalaszek You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2894352group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: pjw (pawelwalaszek) Assigned to: Nobody/Anonymous (nobody) Summary: Ctrl+Alt+f Initial Comment: Ctrl+Alt+f doesn't work in qemu-kvm 0.11.0. I have tested it on Ubuntu 9.10. Provisionally there is a workaround. To disable a full screen is Ctrl+Alt+f Ctrl+Alt+1 and to enable is Ctrl+Alt+1 Ctrl+Alt+f. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2894352group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2894352 ] Ctrl+Alt+f doesn't work
Bugs item #2894352, was opened at 2009-11-09 03:03 Message generated for change (Settings changed) made by pawelwalaszek You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2894352group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: pjw (pawelwalaszek) Assigned to: Nobody/Anonymous (nobody) Summary: Ctrl+Alt+f doesn't work Initial Comment: Ctrl+Alt+f doesn't work in qemu-kvm 0.11.0. I have tested it on Ubuntu 9.10. Provisionally there is a workaround. To disable a full screen is Ctrl+Alt+f Ctrl+Alt+1 and to enable is Ctrl+Alt+1 Ctrl+Alt+f. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2894352group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Reserve CPU cores for specific guests?
I think you can achieve that on some simple level DIY with taskset from util-linux(-ng). That is a good utility to know. I did not know about that earlier. Thanks for the info. I am wondering one thing though: I will either need to call taskset when executing the process or run taskset on a PID after it starts up. Unless there is a way to tell KVM to call taskset when starting a guest, I think that is going to be hard to automate since the guests will get different PID each time they are started. Any suggestions? Thanks, Neil -- Neil Aggarwal, (281)846-8957, http://www.JAMMConsulting.com CentOS 5.4 KVM VPS $55/mo, no setup fee, no contract, dedicated 64bit CPU 1GB dedicated RAM, 40GB RAID storage, 500GB/mo premium BW, Zero downtime -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Reserve CPU cores for specific guests?
On Sun November 8 2009, Neil Aggarwal wrote: I think you can achieve that on some simple level DIY with taskset from util-linux(-ng). That is a good utility to know. I did not know about that earlier. Thanks for the info. I am wondering one thing though: I will either need to call taskset when executing the process or run taskset on a PID after it starts up. Unless there is a way to tell KVM to call taskset when starting a guest, I think that is going to be hard to automate since the guests will get different PID each time they are started. Any suggestions? None directly related, but libvirt's kvm support supports pinning a vm to a physical cpu. At least it has the option in virt-manager. Thanks, Neil -- Neil Aggarwal, (281)846-8957, http://www.JAMMConsulting.com CentOS 5.4 KVM VPS $55/mo, no setup fee, no contract, dedicated 64bit CPU 1GB dedicated RAM, 40GB RAID storage, 500GB/mo premium BW, Zero downtime -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- Thomas Fjellstrom tfjellst...@shaw.ca -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[RFC] KVM Fault Tolerance: Kemari for KVM
Hi all, It has been a while coming, but we have finally started work on Kemari's port to KVM. For those not familiar with it, Kemari provides the basic building block to create a virtualization-based fault tolerant machine: a virtual machine synchronization mechanism. Traditional high availability solutions can be classified in two groups: fault tolerant servers, and software clustering. Broadly speaking, fault tolerant servers protect us against hardware failures and, generally, rely on redundant hardware (often proprietary), and hardware failure detection to trigger fail-over. On the other hand, software clustering, as its name indicates, takes care of software failures and usually requires a standby server whose software configuration for the part we are trying to make fault tolerant must be identical to that of the active server. Both solutions may be applied to virtualized environments. Indeed, the current incarnation of Kemari (Xen-based) brings fault tolerant server-like capabilities to virtual machines and integration with existing HA stacks (Heartbeat, RHCS, etc) is under consideration. After some time in the drawing board we completed the basic design of Kemari for KVM, so we are sending an RFC at this point to get early feedback and, hopefully, get things right from the start. Those already familiar with Kemari and/or fault tolerance may want to skip the Background and go directly to the design and implementation bits. This is a pretty long write-up, but please bear with me. == Background == We started to play around with continuous virtual synchronization technology about 3 years ago. As development progressed and, most importantly, we got the first Xen-based working prototypes it became clear that we needed a proper name for our toy: Kemari. The goal of Kemari is to provide a fault tolerant platform for virtualization environments, so that in the event of a hardware failure the virtual machine fails over from compromised to properly operating hardware (a physical machine) in a way that is completely transparent to the guest operating system. Although hardware based fault tolerant servers and HA servers (software clustering) have been around for a (long) while, they typically require specifically designed hardware and/or modifications to applications. In contrast, by abstracting hardware using virtualization, Kemari can be used on off-the-shelf hardware and no application modifications are needed. After a period of in-house development the first version of Kemari for Xen was released in Nov 2008 as open source. However, by then it was already pretty clear that a KVM port would have several advantages. First, KVM is integrated into the Linux kernel, which means one gets support for a wide variety of hardware for free. Second, and in the same vein, KVM can also benefit from Linux' low latency networking capabilities including RDMA, which is of paramount importance for a extremely latency-sensitive functionality like Kemari. Last, but not the least, KVM and its community is growing rapidly, and there is increasing demand for Kemari-like functionality for KVM. Although the basic design principles will remain the same, our plan is to write Kemari for KVM from scratch, since there does not seem to be much opportunity for sharing between Xen and KVM. == Design outline == The basic premise of fault tolerant servers is that when things go awry with the hardware the running system should transparently continue execution on an alternate physical host. For this to be possible the state of the fallback host has to be identical to that of the primary. Kemari runs paired virtual machines in an active-passive configuration and achieves whole-system replication by continuously copying the state of the system (dirty pages and the state of the virtual devices) from the active node to the passive node. An interesting implication of this is that during normal operation only the active node is actually executing code. Another possible approach is to run a pair of systems in lock-step (à la VMware FT). Since both the primary and fallback virtual machines are active keeping them synchronized is a complex task, which usually involves carefully injecting external events into both virtual machines so that they result in identical states. The latter approach is extremely architecture specific and not SMP friendly. This spurred us to try the design that became Kemari, which we believe lends itself to further optimizations. == Implementation == The first step is to encapsulate the machine to be protected within a virtual machine. Then the live migration functionality is leveraged to keep the virtual machines synchronized. Whereas during live migration dirty pages can be sent asynchronously from the primary to the fallback server until the ratio of dirty pages is low enough to guarantee very short downtimes, when it comes to fault tolerance solutions whenever a synchronization point is reached changes
RE: Reserve CPU cores for specific guests?
None directly related, but libvirt's kvm support supports pinning a vm to a physical cpu. At least it has the option in virt-manager. That is exactly what I needed. My KVM host does not have a GUI so I have been using virsh. I did not notice that option before. Thank you, Neil -- Neil Aggarwal, (281)846-8957, http://www.JAMMConsulting.com CentOS 5.4 KVM VPS $55/mo, no setup fee, no contract, dedicated 64bit CPU 1GB dedicated RAM, 40GB RAID storage, 500GB/mo premium BW, Zero downtime -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2847560 ] Guest hang with exhausted IRQ sources error if 8 VFs assigne
Bugs item #2847560, was opened at 2009-08-30 22:58 Message generated for change (Settings changed) made by jiajun You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2847560group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: kernel Group: None Status: Closed Resolution: Fixed Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: Guest hang with exhausted IRQ sources error if 8 VFs assigne Initial Comment: Environment: Host OS (ia32/ia32e/IA64): ia32e Guest OS (ia32/ia32e/IA64): ia32e Guest OS Type (Linux/Windows): Linux kvm.git Commit: 779cc54dbccaa3a00d70a9d61d090be5d9ccc903 qemu-kvm Commit: 9e3269181e9bc56feb43bcd4e8ce0b82cd543e65 Host Kernel Version: 2.6.31-rc5 Hardware: NHM-EP Bug detailed description: -- If boot a guest with 8 VFs assigned, guest will hang at stage of Starting udev:. And dmesg will shows that kvm: exhaust allocatable IRQ sources! If boot a guest with 6 VFs assigned, guest can boot up and VF can work well in guest. Dmesg info: pci-stub :01:10.0: enabling device ( - 0002) assign device: host bdf = 1:10:0 pci-stub :01:10.1: enabling device ( - 0002) assign device: host bdf = 1:10:1 pci-stub :01:10.2: enabling device ( - 0002) assign device: host bdf = 1:10:2 pci-stub :01:10.3: enabling device ( - 0002) assign device: host bdf = 1:10:3 pci-stub :01:10.4: enabling device ( - 0002) assign device: host bdf = 1:10:4 pci-stub :01:10.5: enabling device ( - 0002) assign device: host bdf = 1:10:5 pci-stub :01:10.6: enabling device ( - 0002) assign device: host bdf = 1:10:6 pci-stub :01:10.7: enabling device ( - 0002) assign device: host bdf = 1:10:7 pci-stub :01:10.0: irq 91 for MSI/MSI-X pci-stub :01:10.0: irq 92 for MSI/MSI-X pci-stub :01:10.0: irq 93 for MSI/MSI-X pci-stub :01:10.1: irq 97 for MSI/MSI-X pci-stub :01:10.1: irq 98 for MSI/MSI-X pci-stub :01:10.1: irq 99 for MSI/MSI-X pci-stub :01:10.2: irq 100 for MSI/MSI-X pci-stub :01:10.2: irq 101 for MSI/MSI-X pci-stub :01:10.2: irq 102 for MSI/MSI-X pci-stub :01:10.3: irq 103 for MSI/MSI-X pci-stub :01:10.3: irq 104 for MSI/MSI-X pci-stub :01:10.3: irq 105 for MSI/MSI-X pci-stub :01:10.4: irq 106 for MSI/MSI-X pci-stub :01:10.4: irq 107 for MSI/MSI-X pci-stub :01:10.4: irq 108 for MSI/MSI-X pci-stub :01:10.5: irq 112 for MSI/MSI-X pci-stub :01:10.5: irq 113 for MSI/MSI-X pci-stub :01:10.5: irq 114 for MSI/MSI-X pci-stub :01:10.6: irq 115 for MSI/MSI-X pci-stub :01:10.6: irq 116 for MSI/MSI-X pci-stub :01:10.6: irq 117 for MSI/MSI-X kvm: exhaust allocatable IRQ sources! -- Comment By: Jiajun Xu (jiajun) Date: 2009-11-08 22:00 Message: The bug is fixed by commit kvm.git commit, 99c52eb29f6f013b7a9f5c25439091a2fad0fd80: commit 99c52eb29f6f013b7a9f5c25439091a2fad0fd80 Author: Marcelo Tosatti mtosa...@redhat.com Date: Sat Oct 17 22:47:23 2009 -0300 KVM: fix irq_source_id size verification find_first_zero_bit works with bit numbers, not bytes. Fixes https://sourceforge.net/tracker/?func=detailaid=2847560group_id=180599atid=893831 Reported-by: Xu, Jiajun jiajun...@intel.com Signed-off-by: Marcelo Tosatti mtosa...@redhat.com -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2847560group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 3/3] vhost_net: a kernel-level virtio server
On Sun, 8 Nov 2009 10:05:16 pm Michael S. Tsirkin wrote: On Fri, Nov 06, 2009 at 03:29:17PM +1030, Rusty Russell wrote: +/* Caller must have TX VQ lock */ +static void tx_poll_stop(struct vhost_net *net) +{ + if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED)) + return; likely? Really? Hmm ... yes. tx poll stop is called on each packet (as long as we do not fill up 1/2 backend queue), the first call will stop polling the rest checks state and does nothing. This is because we normally do not care when the message has left the queue in backend device: we tell backend to send it and forget. We only start polling when backend tx queue fills up. OK, good. +static void vhost_net_set_features(struct vhost_net *n, u64 features) +{ + size_t hdr_size = features (1 VHOST_NET_F_VIRTIO_NET_HDR) ? + sizeof(struct virtio_net_hdr) : 0; + int i; + mutex_lock(n-dev.mutex); + n-dev.acked_features = features; Why is this called acked_features? Not just features? I expected to see code which exposed these back to userspace, and didn't. Not sure how do you mean. Userspace sets them, why does it want to get them exposed back? There's something about the 'acked' which rubs me the wrong way. enabled_features is perhaps a better term than acked_features; acked seems more a user point-of-view, enabled seems more driver POV? set_features matches your ioctl names, but it sounds like a fn name :( It's marginal. And 'features' is shorter than both. + switch (ioctl) { + case VHOST_SET_VRING_NUM: I haven't looked at your userspace implementation, but does a generic VHOST_SET_VRING_STATE VHOST_GET_VRING_STATE with a struct make more sense? It'd be simpler here, Not by much though, right? but not sure if it'd be simpler to use? The problem is with VHOST_SET_VRING_BASE as well. I want it to be separate because I want to make it possible to relocate e.g. used ring to another address while ring is running. This would be a good debugging tool (you look at kernel's used ring, check descriptor, then update guest's used ring) and also possibly an extra way to do migration. And it's nicer to have vring size separate as well, because it is initialized by host and never changed, right? Actually, this looks wrong to me: + case VHOST_SET_VRING_BASE: ... + vq-avail_idx = vq-last_avail_idx = s.num; The last_avail_idx is part of the state of the driver. It needs to be saved and restored over susp/resume. The only reason it's not in the ring itself is because I figured the other side doesn't need to see it (which is true, but missed debugging opportunities as well as man-in-the-middle issues like this one). I had a patch which put this field at the end of the ring, I might resurrect it to avoid this problem. This is backwards compatible with all implementations. See patch at end. I would drop avail_idx altogether: get_user is basically free, and simplifies a lot. As most state is in the ring, all you need is an ioctl to save/restore the last_avail_idx. We could merge DESC, AVAIL, USED, and it will reduce the amount of code in userspace. With both base, size and fds separate, it seemed a bit more symmetrical to have desc/avail/used separate as well. What's your opinion? Well, DESC, AVAIL, and USED could easily be turned into SET/GET_LAYOUT. For future reference, this is *exactly* the kind of thing which would have been nice as a followup patch. Easy to separate, easy to review, not critical to the core. Yes. It's not too late to split it out though: should I do it yet? Only if you're feeling enthused. It's lightly reviewed now. Cheers, Rusty. virtio: put last_used and last_avail index into ring itself. Generally, the other end of the virtio ring doesn't need to see where you're up to in consuming the ring. However, to completely understand what's going on from the outside, this information must be exposed. For example, if you want to save and restore a virtio_ring, but you're not the consumer because the kernel is using it directly. Fortunately, we have room to expand: the ring is always a whole number of pages and there's hundreds of bytes of padding after the avail ring and the used ring, whatever the number of descriptors (which must be a power of 2). We add a feature bit so the guest can tell the host that it's writing out the current value there, if it wants to use that. Signed-off-by: Rusty Russell ru...@rustcorp.com.au --- drivers/virtio/virtio_ring.c | 23 +++ include/linux/virtio_ring.h | 12 +++- 2 files changed, 26 insertions(+), 9 deletions(-) diff --git a/drivers/virtio/virtio_ring.c b/drivers/virtio/virtio_ring.c --- a/drivers/virtio/virtio_ring.c +++ b/drivers/virtio/virtio_ring.c @@ -71,9 +71,6 @@ struct vring_virtqueue /* Number we've added since last sync. */ unsigned int num_added; - /* Last
Re: [PATCHv8 3/3] vhost_net: a kernel-level virtio server
On Mon, Nov 9, 2009 at 8:17 AM, Rusty Russell ru...@rustcorp.com.au wrote: +static void vhost_net_set_features(struct vhost_net *n, u64 features) +{ + size_t hdr_size = features (1 VHOST_NET_F_VIRTIO_NET_HDR) ? + sizeof(struct virtio_net_hdr) : 0; + int i; + mutex_lock(n-dev.mutex); + n-dev.acked_features = features; Why is this called acked_features? Not just features? I expected to see code which exposed these back to userspace, and didn't. Not sure how do you mean. Userspace sets them, why does it want to get them exposed back? There's something about the 'acked' which rubs me the wrong way. enabled_features is perhaps a better term than acked_features; acked seems more a user point-of-view, enabled seems more driver POV? Hmm. Are you happy with the ioctl name? If yes I think being consistent with that is important. set_features matches your ioctl names, but it sounds like a fn name :( It's marginal. And 'features' is shorter than both. I started with this but I was always getting confused whether this includes all features or just acked features. I'll go with enabled_features. + switch (ioctl) { + case VHOST_SET_VRING_NUM: I haven't looked at your userspace implementation, but does a generic VHOST_SET_VRING_STATE VHOST_GET_VRING_STATE with a struct make more sense? It'd be simpler here, Not by much though, right? but not sure if it'd be simpler to use? The problem is with VHOST_SET_VRING_BASE as well. I want it to be separate because I want to make it possible to relocate e.g. used ring to another address while ring is running. This would be a good debugging tool (you look at kernel's used ring, check descriptor, then update guest's used ring) and also possibly an extra way to do migration. And it's nicer to have vring size separate as well, because it is initialized by host and never changed, right? Actually, this looks wrong to me: + case VHOST_SET_VRING_BASE: ... + vq-avail_idx = vq-last_avail_idx = s.num; The last_avail_idx is part of the state of the driver. It needs to be saved and restored over susp/resume. Exactly. That's what VHOST_GET/SET_VRING_BASE does. avail_idx is just a cached value for notify on empty, so what this does is clear the value. What exactly do you refer to when you say this looks wrong? This could trigger an extra notification if I ever called trigger_irq without get first. As I don't, it in fact has no effect. The only reason it's not in the ring itself is because I figured the other side doesn't need to see it (which is true, but missed debugging opportunities as well as man-in-the-middle issues like this one). I had a patch which put this field at the end of the ring, I might resurrect it to avoid this problem. This is backwards compatible with all implementations. See patch at end. Yes, I remember that patch. There seems to be little point though, at this stage. I would drop avail_idx altogether: get_user is basically free, and simplifies a lot. As most state is in the ring, all you need is an ioctl to save/restore the last_avail_idx. avail_idx is there for notify on empty: I had this thought that it's better to leave the avail cache line alone when we are triggering interrupt to avoid bouncing it around if guest is updating it meanwhile on another CPU, and I think my testing showed that it helped performance, but could be a mistake. You don't believe this can help? We could merge DESC, AVAIL, USED, and it will reduce the amount of code in userspace. With both base, size and fds separate, it seemed a bit more symmetrical to have desc/avail/used separate as well. What's your opinion? Well, DESC, AVAIL, and USED could easily be turned into SET/GET_LAYOUT. Will do. For future reference, this is *exactly* the kind of thing which would have been nice as a followup patch. Easy to separate, easy to review, not critical to the core. Yes. It's not too late to split it out though: should I do it yet? Only if you're feeling enthused. It's lightly reviewed now. Not really :) I'll keep this in mind for the future. Thanks! Cheers, Rusty. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv8 3/3] vhost_net: a kernel-level virtio server
On Mon, Nov 09, 2009 at 04:47:29PM +1030, Rusty Russell wrote: On Sun, 8 Nov 2009 10:05:16 pm Michael S. Tsirkin wrote: On Fri, Nov 06, 2009 at 03:29:17PM +1030, Rusty Russell wrote: +/* Caller must have TX VQ lock */ +static void tx_poll_stop(struct vhost_net *net) +{ + if (likely(net-tx_poll_state != VHOST_NET_POLL_STARTED)) + return; likely? Really? Hmm ... yes. tx poll stop is called on each packet (as long as we do not fill up 1/2 backend queue), the first call will stop polling the rest checks state and does nothing. This is because we normally do not care when the message has left the queue in backend device: we tell backend to send it and forget. We only start polling when backend tx queue fills up. OK, good. +static void vhost_net_set_features(struct vhost_net *n, u64 features) +{ + size_t hdr_size = features (1 VHOST_NET_F_VIRTIO_NET_HDR) ? + sizeof(struct virtio_net_hdr) : 0; + int i; + mutex_lock(n-dev.mutex); + n-dev.acked_features = features; Why is this called acked_features? Not just features? I expected to see code which exposed these back to userspace, and didn't. Not sure how do you mean. Userspace sets them, why does it want to get them exposed back? There's something about the 'acked' which rubs me the wrong way. enabled_features is perhaps a better term than acked_features; acked seems more a user point-of-view, enabled seems more driver POV? set_features matches your ioctl names, but it sounds like a fn name :( Hmm. Are you happy with the ioctl name? If yes I think being consistent with that is important. It's marginal. And 'features' is shorter than both. I started with this but I was always getting confused whether this includes all features or just acked features. I'll go with enabled_features. + switch (ioctl) { + case VHOST_SET_VRING_NUM: I haven't looked at your userspace implementation, but does a generic VHOST_SET_VRING_STATE VHOST_GET_VRING_STATE with a struct make more sense? It'd be simpler here, Not by much though, right? but not sure if it'd be simpler to use? The problem is with VHOST_SET_VRING_BASE as well. I want it to be separate because I want to make it possible to relocate e.g. used ring to another address while ring is running. This would be a good debugging tool (you look at kernel's used ring, check descriptor, then update guest's used ring) and also possibly an extra way to do migration. And it's nicer to have vring size separate as well, because it is initialized by host and never changed, right? Actually, this looks wrong to me: + case VHOST_SET_VRING_BASE: ... + vq-avail_idx = vq-last_avail_idx = s.num; The last_avail_idx is part of the state of the driver. It needs to be saved and restored over susp/resume. Exactly. That's what VHOST_GET/SET_VRING_BASE does. avail_idx is just a cached value for notify on empty, so what this does is clear the value. What exactly do you refer to when you say this looks wrong? This could trigger an extra notification if I ever called trigger_irq without get first. As I don't, it in fact has no effect. The only reason it's not in the ring itself is because I figured the other side doesn't need to see it (which is true, but missed debugging opportunities as well as man-in-the-middle issues like this one). I had a patch which put this field at the end of the ring, I might resurrect it to avoid this problem. This is backwards compatible with all implementations. See patch at end. Yes, I remember that patch. There seems to be little point though, at this stage. I would drop avail_idx altogether: get_user is basically free, and simplifies a lot. As most state is in the ring, all you need is an ioctl to save/restore the last_avail_idx. avail_idx is there for notify on empty: I had this thought that it's better to leave the avail cache line alone when we are triggering interrupt to avoid bouncing it around if guest is updating it meanwhile on another CPU, and I think my testing showed that it helped performance, but could be a mistake. You don't believe this can help? We could merge DESC, AVAIL, USED, and it will reduce the amount of code in userspace. With both base, size and fds separate, it seemed a bit more symmetrical to have desc/avail/used separate as well. What's your opinion? Well, DESC, AVAIL, and USED could easily be turned into SET/GET_LAYOUT. OK, I'll do this. For future reference, this is *exactly* the kind of thing which would have been nice as a followup patch. Easy to separate, easy to review, not critical to the core. Yes. It's not too late to split it out though: should I do it yet? Only if you're feeling enthused. It's lightly reviewed now. Not really :) I'll keep this in mind for the