[ kvm-Bugs-2235570 ] 100% cpu usage with KVM-78
Bugs item #2235570, was opened at 2008-11-07 18:58 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2235570group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: James Bailey (dgym) Assigned to: Nobody/Anonymous (nobody) Summary: 100% cpu usage with KVM-78 Initial Comment: When I start a guest it consumes 100% CPU on the host, even after it has booted and is sitting idle at a login prompt. The odd thing is that if I then live migrate the guest to another (identical) machine, the problem goes away. The guest continues to run just fine on the new host, and the new host's CPU usage is normal. I have tried the obvious: starting on the other machine and migrating to the first, and even multiple migrations. It is always the same, the qemu-system-x86_64 process sits at 100% unless it was started with -incoming ... Migrating machines every time you start up is not a very convenient work around, so it would be nice to find out what is different between the normal start up and the -incoming start up and fix the former. Versions and settings: KVM: 78 Host Kernel: Vanilla 2.6.25.2 Compiled with: gcc version 4.1.2 CPU: AMD Phenom Guest OS: Linux (have tried a few distros) Guest Kernels: Debian etch, and an OpenVZ 2.6.18 Command line: qemu-system-x86_64 -m 128 -smp 1 -drive file=/dev/drbd0 -vnc :1 Things I have tried which have not worked: Using -nographics. Using SDL graphics. Using -snapshot, and doing a savevm and loadvm. Things I have tried which have worked: Using -no-kvm. I have attached gdb and found the busy thread, here is its backtrace: #0 0x7f06f017ea17 in ioctl () from /lib/libc.so.6 #1 0x0051b423 in kvm_run (kvm=0xa93040, vcpu=0) at libkvm.c:892 #2 0x004f1116 in kvm_cpu_exec (env=value optimized out) at /opt/setup/kvm-78/qemu/qemu-kvm.c:230 #3 0x004f13e4 in ap_main_loop (_env=value optimized out) at /opt/setup/kvm-78/qemu/qemu-kvm.c:432 #4 0x7f06f0565135 in start_thread () from /lib/libpthread.so.0 #5 0x7f06f01852ce in clone () from /lib/libc.so.6 #6 0x in ?? () Because this indicates business within the kernel module it is as far as I have got. I will attempt to identify the previous working version, I know I never had this problem with 68, but I haven't yet tried anything in between. -- Comment By: Jes Sorensen (jessorensen) Date: 2010-11-29 09:00 Message: Hi, Given that there has been no updates to this bug for a long time, I presume the problem has been resolved? Would you mind letting us know if you still see this problem? Thanks, Jes -- Comment By: James Bailey (dgym) Date: 2008-11-08 22:35 Message: Using the -no-kvm-pit option fixes the CPU problem. There were some changes to arch/x86/kvm/i8254.c so this definitely looks PIT related. -- Comment By: James Bailey (dgym) Date: 2008-11-08 21:28 Message: I have been able to narrow the problem down to a single commit. KVM-76 was fine, but I get this broken behaviour with KVM-77. I checked out the KVM userspace and went back to 76, and I also checked out the KVM kernel and tried different versions. Commit 666c4a43cba0cbaa30cd3c86b515dfdab2a6fa98 on git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm.git is the first version to show the 100% CPU behaviour. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2235570group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On Sun, Nov 28, 2010 at 05:13:41PM -0600, Anthony Liguori wrote: On 11/28/2010 04:28 PM, Michael S. Tsirkin wrote: But rather need to use ugly factory functions with all sorts of DO_UPCAST. This is really unfriendly especially for writing test cases. Yes, I agree. Just moving memory allocation out of there will fix most of the ugliness. So here's a short list of things I've been working on that I don't believe have nice implementations in C. 1) Factories with string based parameters with natural constructor arguments. This was the only item, right? So in fact, this is needed as part of configuration file/command line/monitor parser. IMHO, this really should be separate from the device model. The fact that qdev currently mixes the device model with argument parsing is bad IMO. So we ended up with saying .driver = PCI in hw/pc_piix.c instead of an instance of the structure. There's no compile-time check that the correct string is used and that is pretty bad IMO. Yes, this makes it easier to add new properties, but making it easy is exactly the wrong thing to do because we really have to support such properties forever. So how about a compromise: libqemu written in C, with APIs that should not deal with string parsing at all, and should above all else make sense: i8254_init_drift_mode i8254_init_catchup net_set_link_up net_set_link_down (and it really needs to be C for portability: so that management written in C can use it). This API should be properly versioned, with a backwards compatibility story, and we should be careful about adding interfaces there. On top of this you can have a management interface written in any other language, and have that deal with string parsing. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: fix typos in comment
On Mon, Nov 29, 2010 at 01:48:40PM +0800, Jason Wang wrote: Signed-off-by: Jason Wang jasow...@redhat.com Applied, thanks. --- drivers/vhost/net.c |2 +- drivers/vhost/vhost.h |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index d10da28..14fc189 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -452,7 +452,7 @@ static void handle_rx_mergeable(struct vhost_net *net) move_iovec_hdr(vq-iov, vq-hdr, vhost_hlen, in); else /* Copy the header for use in VIRTIO_NET_F_MRG_RXBUF: - * needed because sendmsg can modify msg_iov. */ + * needed because recvmsg can modify msg_iov. */ copy_iovec_hdr(vq-iov, vq-hdr, sock_hlen, in); msg.msg_iovlen = in; err = sock-ops-recvmsg(NULL, sock, msg, diff --git a/drivers/vhost/vhost.h b/drivers/vhost/vhost.h index 073d06a..2af44b7 100644 --- a/drivers/vhost/vhost.h +++ b/drivers/vhost/vhost.h @@ -102,7 +102,7 @@ struct vhost_virtqueue { * flush the vhost_work instead of synchronize_rcu. Therefore readers do * not need to call rcu_read_lock/rcu_read_unlock: the beginning of * vhost_work execution acts instead of rcu_read_lock() and the end of - * vhost_work execution acts instead of rcu_read_lock(). + * vhost_work execution acts instead of rcu_read_unlock(). * Writers use virtqueue mutex. */ void __rcu *private_data; /* Log write descriptors */ -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: correctly set bits of dirty pages
On Mon, Nov 29, 2010 at 01:48:20PM +0800, Jason Wang wrote: When counting pages we should increase it by 1 instead of VHOST_PAGE_SIZE, and also make log_write() can correctly process the request across pages with write_address not start at page boundary. Signed-off-by: Jason Wang jasow...@redhat.com Thanks, good catch! But let's to it in small steps: first, a small patch to fix the bug: I think this is equivalent, right? Subject: vhost: correctly set bits of dirty pages When counting pages we should increase address by 1 instead of VHOST_PAGE_SIZE, and also make log_write() can correctly process the request across pages with write_address not starting at page boundary. Reported-by: Jason Wang jasow...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 568eb70..d0a3552 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -887,6 +887,7 @@ static int log_write(void __user *log_base, int r; if (!write_length) return 0; + write_length += write_address % VHOST_PAGE_SIZE; write_address /= VHOST_PAGE_SIZE; for (;;) { u64 base = (u64)(unsigned long)log_base; @@ -900,7 +901,7 @@ static int log_write(void __user *log_base, if (write_length = VHOST_PAGE_SIZE) break; write_length -= VHOST_PAGE_SIZE; - write_address += VHOST_PAGE_SIZE; + write_address += 1; } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: correctly set bits of dirty pages
On Mon, Nov 29, 2010 at 10:18:40AM +0200, Michael S. Tsirkin wrote: On Mon, Nov 29, 2010 at 01:48:20PM +0800, Jason Wang wrote: When counting pages we should increase it by 1 instead of VHOST_PAGE_SIZE, and also make log_write() can correctly process the request across pages with write_address not start at page boundary. Signed-off-by: Jason Wang jasow...@redhat.com Thanks, good catch! But let's to it in small steps: first, a small patch to fix the bug: I think this is equivalent, right? Subject: vhost: correctly set bits of dirty pages When counting pages we should increase address by 1 instead of VHOST_PAGE_SIZE, and also make log_write() can correctly process the request across pages with write_address not starting at page boundary. Reported-by: Jason Wang jasow...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com And then this on top: vhost: better variable name in logging We really store a page offset in write_address, so rename it write_page to avoid confusion. Signed-off-by: Jason Wang jasow...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index d0a3552..1a3d3ed 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -884,15 +884,15 @@ static int set_bit_to_user(int nr, void __user *addr) static int log_write(void __user *log_base, u64 write_address, u64 write_length) { + u64 write_page = write_address / VHOST_PAGE_SIZE int r; if (!write_length) return 0; write_length += write_address % VHOST_PAGE_SIZE; - write_address /= VHOST_PAGE_SIZE; for (;;) { u64 base = (u64)(unsigned long)log_base; - u64 log = base + write_address / 8; - int bit = write_address % 8; + u64 log = base + write_page / 8; + int bit = write_page % 8; if ((u64)(unsigned long)log != log) return -EFAULT; r = set_bit_to_user(bit, (void __user *)(unsigned long)log); @@ -901,7 +901,7 @@ static int log_write(void __user *log_base, if (write_length = VHOST_PAGE_SIZE) break; write_length -= VHOST_PAGE_SIZE; - write_address += 1; + write_page += 1; } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] ceph/rbd block driver for qemu-kvm (v8)
Am 27.11.2010 08:12, schrieb Stefan Hajnoczi: On Fri, Nov 26, 2010 at 9:59 PM, Christian Brunner c.m.brun...@gmail.com wrote: Thanks for the review. What am I supposed to do now? Kevin is the block maintainer. His review is the next step, I have CCed him. After that rbd would be ready to merge. If I don't find anything really obvious and it doesn't break the build, I'll merge it based on your review. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/28/2010 06:57 PM, Michael S. Tsirkin wrote: sparce lets you solve C problems that C++ inherited as is. E.g. if you have a pointer you can always dereference it. It's the other way round. For example __user cannot be done in C. It has to be done as an add-on. In C++ it's simply: templateclass T class user_ptrT { public: explicit user_ptr(unsigned long addr); void copy_from(T to); // throws EFAULT void copy_to(const T from); // throws EFAULT private: unsigned long addr; }; This does not allow simple uses such as arithmetic, Add a raw_addr() method that returns addr. void, Fixable. builtin types, should work sizeof, sizeof(T)? arrays, should work NULL comparizon, Do we ever compare __user pointers against NULL? It's a valid address. inheritance, cast, What do these mean in the context of user pointers? memory management. And this? Examples to ponder: what is the appropriate value of T for void *? Probably a specialization user_ptr or a separate class user_void_ptr. After all you can't do anthing with such a pointer. What if you want a shared/auto ptr to manage this memory? What does it mean for user pointers? Some of these might be fixable, with a lot of code. Boost might haver some solutions, I haven't looked. Meanwhile sparse is already there. With sparse you have to implement every rule in a separate compiler. With C++ you introduce the rules into the code. No need for an additional toolchain. It's a feature :) This way you are not forced to rewrite all code each time you realize you need an extra check, and checks can be added gradually without breaking build. You can see that user_ptr is not just for the checks, it adds functionality (sizeof-less copy_from and copy_to). That's usually the case. If there's something you must not do because of some rule, there's also something you want to do, and those become member functions. In C++ you could also introduce user_ptr gradually, it won't break anything. Things like __user are easily done in C++. Some of what sparce does can be done if you create a separate type for all address spaces. This can be done in C too, and the result won't be like __user at all. That's quite a lot of work. Sparse: T __user *foo; C++: user_ptrT foo; Sparse has some advantages: it makes the contract obvious so you clearly see it's a pointer and know -, [], + will work, * and will not. I don't really see how you can tell this from __user. You have to look up the definition. For user_ptr, the definition is actually available. C : struct T_user_ptr { unsigned long addr } foo; + lots of accessors. Some kind of macro can be closer to user_ptr above. Those macros are called templates, and the compiler can check that they are used correctly. C++ support in gdb has some limitations if you use overloading, exceptions, templates. The example posted here uses two of these, so it would be harder to debug. I haven't seen issues with overloading or exceptions. Build your test with -g, fire up gdb from command line, try to put a breakpoint in the constructor of the fd object, maybe you will see what I mean :) (gdb) break 'kvm::fd::fd' Breakpoint 3 at 0x8049650: file api/kvmxx.cc, line 25. Breakpoint 4 at 0x8049628: file api/kvmxx.cc, line 31. Breakpoint 5 at 0x8049080: file api/kvmxx.cc, line 21 But it's hard to figure out that you need the kvm namespace. Your code only has one namespace, but with multiple namespaces, you don't even know in which namespace to look up the fd. With templates you might not even know the fd class. If you like, you can avoid namespaces and prefix everything with kvm_. I never found it necessary. An example of an issue with overloading is that gdb seems unable to resolve them properly depending on the current scope. So you see a call to foo() and want to put a breakpoint there, first problem is just to find one which namespace it is in. Once you did the best it can do it prompt you to select one of the overloaded options. How do you know which one do you want? You don't, so you guess. Sometimes gdb will guess, because of a complex set of name resolution rules, and sometimes it will this wrongly. Which is not what I want to spend mental cycles on when I am debugging a problem. Functions using exceptions can not be called from the gdb prompt (gdb is not smart enough to catch them). There are more issues. That's not restricted to gdb. C has just three scopes: block (may be nested), file static, and global. C++ has more. Stating everything leads to verbose code and potential conflicts. Having more scopes allows tighter code usually but more head-scratching if something goes wrong. In my experience conflicts are very rare. But it's true that when they happen
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/28/2010 04:40 PM, Michael S. Tsirkin wrote: On Sun, Nov 28, 2010 at 03:14:17PM +0200, Avi Kivity wrote: On 11/28/2010 01:44 PM, Michael S. Tsirkin wrote: On Sun, Nov 28, 2010 at 11:54:26AM +0200, Avi Kivity wrote: On 11/28/2010 11:50 AM, Michael S. Tsirkin wrote: Another problem is that there seem to be two memory allocations and a copy here, apparently just to simplify error handling. It might be fine for this test but won't scale for when performance matters. When it matters, we can fix it. I don't see msr read/write becoming a hot path. It will be very painful to fix it. Why? Because the API returns a vector. Returning an object does not involve a copy (return value optimization). Yes, but assigning the value in the code that uses it will, unless again you do this in an initializer. So do that. This code is not reusable. Everywhere you use an fd, you have to repeat this code. But that's not a lot of code. And you can abstract it away at a higher level. For example kvm_init and kvm_cleanup would setup/cleanup state in a consistent way. That's not what we see in C code. The error handling gets everywhere, obscuring what's actually going on, and usually getting it wrong by leaking. My experience tells me C++ code has much more boilerplate code that you are forced to repeat over and over. This is especially true for unix system programming: by the time you are done wrapping all of unix you have created more LOC than you are ever likely to save. You don't need to wrap everything. Sometimes you can get away with if (ret == -1) throw. But things like file descriptors are essential. class kvm::fd is reusable, if you embed it in another object you don't have to worry about errors any more (as long as the object's methods are exception safe). To get exception safe code, you have to constantly worry about errors. And it's easier to spot an unhandled return code than exception-unsafe code: gcc actually has __attribute__((warn_unused_result)) which might help catch common errors. No such tool to catch exception-unsafe code AFAIK. We see exactly how easy it is by the constant stream of patches that fix error paths into Linux. Most user space programs don't even care about errors because they're so difficult and annoying to get right. Yes, all the correctness is more or less pointless here. Like I said, this is an experiment to see what things look like. I guess each side will it as proving its claims. This is exactly what seems to be happening. I did my best to try and be objective and point out real issues, but you probably guessed which side I am on already :). The lack of the C++ compiler eats babies and line noise comments is appreciated. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/28/2010 04:49 PM, Michael S. Tsirkin wrote: On Sun, Nov 28, 2010 at 03:15:52PM +0200, Avi Kivity wrote: On 11/28/2010 01:49 PM, Michael S. Tsirkin wrote: +++ b/api/kvmxx.cc @@ -0,0 +1,168 @@ +#include kvmxx.h +#includefcntl.h +#includesys/ioctl.h +#includesys/mman.h I just realized this is wrong: I think you should wrap the headers in extern C. Same for other headers. I think system headers already do this (otherwise it won't link - int foo() is different than extern C { int foo(void); }) okay, but they aren't there for linux/kvm.h, are they? I don't think they are needed - they only declare structures and constants, not functions. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 13/21] dma-helpers: replace bdrv_aio_writev() with bdrv_aio_writev_proxy().
Am 28.11.2010 12:55, schrieb Yoshiaki Tamura: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:52PM +0900, Yoshiaki Tamura wrote: Replace bdrv_aio_writev() with bdrv_aio_writev_proxy() to let event-tap capture events from dma-helpers. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Same comment as -net here: it's not clear when should a device use bdrv_aio_writev_proxy and when bdrv_aio_writev. If all devices should just use _proxy, let's just make bdrv_aio_writev DTRT instead. Same as I replied to the net layer question. However, I had troubles with inserting event-tap functions into block.c before. block.c gets linked with utils like qemu-img, but they don't get linked with emulators code which event-tap uses in it. So I want to avoid linking block and event-tap for utils, but I guess we don't want to use ifdefs for this. I'm wondering how I can solve this problem cleanly. Kevin, do you have suggestions here? Michael's stubs (probably in qemu-tool.c) seem to be the right solution. Which requests do you actually want to intercept? I assume you're aware that for example qcow2 internally calls another bdrv_aio_readv/writev that accesses the image file. So if you only want to have the requests that come directly from devices, maybe you'll have to restrict it to BlockDriverStates that belongs to a drive. I think this is the case if it has a non-empty device name. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limiting guest block i/o for qos
On Mon, Nov 29, 2010 at 2:00 AM, T Johnson tjohnso...@gmail.com wrote: Hello, On Thu, Nov 25, 2010 at 3:33 AM, Nikola Ciprich extmaill...@linuxbox.cz wrote: Hello Thomas, I t hink blkio-cgroup really can't help You here, but since NFS is network protocol, why not just consider some kind of network shaping? n. I thought about this, but it's rather imprecise I imagine if I try to limit the number of packets per second and hope that matches reads or writes per second. Secondly, I have many guests running to the same NFS server which makes limiting per kvm guest somewhat impossible when the network tools I know if would limit per NFS server. Perhaps iptables/tc can mark the stream based on the client process ID? Each VM has a qemu-kvm userspace process that will issue file I/O. Someone with more networking knowledge could confirm whether or not it is possible to mark based on the process ID using the in-kernel NFS client. You don't need to limit based on packets per second. You can do bandwidth-based traffic shaping with tc. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH] ceph/rbd block driver for qemu-kvm (v8)
Am 29.11.2010 09:59, schrieb Kevin Wolf: Am 27.11.2010 08:12, schrieb Stefan Hajnoczi: On Fri, Nov 26, 2010 at 9:59 PM, Christian Brunner c.m.brun...@gmail.com wrote: Thanks for the review. What am I supposed to do now? Kevin is the block maintainer. His review is the next step, I have CCed him. After that rbd would be ready to merge. If I don't find anything really obvious and it doesn't break the build, I'll merge it based on your review. Which librados version is this supposed to require? My F12 one seems to be too old, however configure still automatically enables it (so the build fails in the default configuration for me). I think you need to add some check there. $ rpm -q ceph-devel ceph-devel-0.20.2-1.fc12.x86_64 $ LANG=C make CCblock/rbd.o block/rbd.c: In function 'rbd_register_image': block/rbd.c:191: error: 'CEPH_OSD_TMAP_SET' undeclared (first use in this function) block/rbd.c:191: error: (Each undeclared identifier is reported only once block/rbd.c:191: error: for each function it appears in.) cc1: warnings being treated as errors block/rbd.c: In function 'rbd_set_snapc': block/rbd.c:468: error: implicit declaration of function 'rados_set_snap_context' block/rbd.c:468: error: nested extern declaration of 'rados_set_snap_context' block/rbd.c: In function 'rbd_snap_create': block/rbd.c:844: error: implicit declaration of function 'rados_selfmanaged_snap_create' block/rbd.c:844: error: nested extern declaration of 'rados_selfmanaged_snap_create' make: *** [block/rbd.o] Error 1 Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
On Sat, Nov 27, 2010 at 1:11 PM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/27 Stefan Hajnoczi stefa...@gmail.com: On Sat, Nov 27, 2010 at 8:53 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/27 Stefan Hajnoczi stefa...@gmail.com: On Sat, Nov 27, 2010 at 4:29 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/27 Blue Swirl blauwir...@gmail.com: On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: Somehow I find some similarities to instrumentation patches. Perhaps the instrumentation framework could be used (maybe with some changes) for Kemari as well? That could be beneficial to both. Yes. I had the same idea but I'm not sure how tracing works. I think Stefan Hajnoczi knows it better. Stefan, is it possible to call arbitrary functions from the trace points? Yes, if you add code to ./tracetool. I'm not sure I see the connection between Kemari and tracing though. The connection is that it may be possible to remove Kemari specific hook point like in ioport.c and exec.c, and let tracing notify Kemari instead. I actually think the other way. Tracing just instruments and stashes away values. It does not change inputs or outputs, it does not change control flow, it does not affect state. Going down the route of side-effects mixes two different things: hooking into a subsystem and instrumentation. For hooking into a subsystem we should define proper interfaces. That interface can explicitly support modifying inputs/outputs or changing control flow. Tracing is much more ad-hoc and not a clean interface. It's also based on a layer of indirection via the tracetool code generator. That's okay because it doesn't affect the code it is called from and you don't need to debug trace events (they are simple and have almost no behavior). Hooking via tracing is just taking advantage of the cheap layer of indirection in order to get at interesting events in a subsystem. It's easy to hook up and quick to develop, but it's not a proper interface and will be hard to understand for other developers. One question I have about Kemari is whether it adds new constraints to the QEMU codebase? Fault tolerance seems like a cross-cutting concern - everyone writing device emulation or core QEMU code may need to be aware of new constraints. For example, you are not allowed to release I/O operations to the outside world directly, instead you need to go through Kemari code which makes I/O transactional and communicates with the passive host. You have converted e1000, virtio-net, and virtio-blk. How do we make sure new devices that are merged into qemu.git don't break Kemari? How do we go about supporting the existing hw/* devices? Whether Kemari adds constraints such as you mentioned, yes. If the devices (including existing ones) don't call Kemari code, they would certainly break Kemari. Altough using proxies looks explicit, to make it unaware from people writing device emulation, it's possible to remove proxies and put changes only into the block/net layer as Blue suggested. Anything that makes it hard to violate the constraints is good. Otherwise Kemari might get broken in the future and no one will know until a failover behaves incorrectly. Blue and Paul prefer to put it into block/net layer, and you think it's better to provide API. Sorry, I wasn't clear. I agree that event tap behavior should be in generic block and net layer code. That way we're guaranteeing that all net and block I/O goes through event tap. Could you formulate the constraints so developers are aware of them in the future and can protect the codebase. How about expanding the Kemari wiki pages? If you like the idea above, I'm happy to make the list also on the wiki page. Here's a different question: what requirements must an emulated device meet in order to be added to the Kemari supported whitelist? That's what I want to know so that I don't break existing devices and can add new devices that work with Kemari :). Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
Hi, BTW are you actually aware of any option rom with multiple BCVs and, if yes, how those BCVs differ? Multiple BCVs - yes. A SCSI card will define a BCV for each attached drive. I don't have a scsi card myself, but the support was added by a user who ran into the problem first hand. FYI: You can test with a virtual card ;) lsi emulation in recent qemu versions is good enougth that the original boot rom runs. You can fetch it here: http://www.lsi.com/DistributionSystem/AssetDocument/files/support/ssp/sdms/Bios/lsi_bios.zip qemu -drive if=scsi,file=$image -option-rom 8xx_64.rom Gives you a list like this ... Press F12 for boot menu. Select boot device: 1. AHCI/0: QEMU HARDDISK ATA-7 Hard-Disk (8192 MiBytes) 2. #20 ID00 LUN0 QEMU QEMU HARD 3. #20 ID01 LUN0 QEMU QEMU HARD 4. DVD/CD [ata1-0: QEMU DVD-ROM ATAPI-4 DVD/CD] 5. DVD/CD [AHCI/2: QEMU DVD-ROM ATAPI-4 DVD/CD] 6. gPXE (PCI 00:03.0) ... where entries 2+3 are created by the scsi rom. HTH, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On Mon, Nov 29, 2010 at 11:22:44AM +0200, Avi Kivity wrote: No need for an additional toolchain. It's a feature :) This way you are not forced to rewrite all code each time you realize you need an extra check, and checks can be added gradually without breaking build. You can see that user_ptr is not just for the checks, it adds functionality (sizeof-less copy_from and copy_to). That's usually the case. If there's something you must not do because of some rule, there's also something you want to do, and those become member functions. In C++ you could also introduce user_ptr gradually, it won't break anything. Yes but in void foo(void *p) { bar(p); } both foo and bar must be converted. Things like __user are easily done in C++. Some of what sparce does can be done if you create a separate type for all address spaces. This can be done in C too, and the result won't be like __user at all. That's quite a lot of work. Sparse: T __user *foo; C++: user_ptrT foo; Sparse has some advantages: it makes the contract obvious so you clearly see it's a pointer and know -, [], + will work, * and will not. I don't really see how you can tell this from __user. I can tell this is a pointer from T *foo :), and I can tell it has some attribute. You have to look up the definition. For user_ptr, the definition is actually available. The definition for __user is also available: #ifdef __CHECKER__ # define __user__attribute__((noderef, address_space(1))) #else # define __user #endif Which is a very transparent way to say: this is just a checker attribute, it does not affect actual code. With a template we go 'I have overridden + but compiler should optimize it back to original'. Note the should :) Templates are indeed harder to debug, simply because names can become very long. That's not the only problem. A bigger one is when you type tab to complete function name and get a list of options to select from for each of the times a template was instantiated. Only one of them is relevant in a given scope. No hint is given which. Further when you step into the template, the source does not give you any hint about the types used. Some of this is true for macros as well of course. Except people know macros are bad and so make them thin wrappers around proper functions. Or they simply avoid it and duplicate the code. You can't always wrap functions with macros. Always is a strong word. What are the examples of such duplicated code in qemu? Let's see if they are easy to fix. qemu in fact uses macros extensively (glue()), they are hardly readable. Well, we don't have so many instances left anymore. We used to have this in pci and I got rid of it pretty easily, just passing in length at runtime. And it turned out the only reason for it was because we didn't pass in the transaction length. That technique (clean up APIs so we don't have to work around them with macros) leads IMO to better code than just sticking a tamplate around it. Do a 'git grep hash' for examples of duplication. I see. Don't know enough about tcg to fix unfortunately, but the logic is different: e.g. sparc opcode table is completely static, translation cache is dynamic, jump cache has static size but dynamic content, qdict could do with linear lookups just as well. So it could just be different optimization strategies. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
Hi, If scsi card has optionrom with only one bcv then Seabios can determine its boot order from device path, so why not provide user with this option today? It's unclear to me how SeaBIOS is supposed to do that. Try to keep track of which bcv/bev belongs to which pci device? It should surely work for devices supported by seabios natively. SeaBIOS should also know which device's rom registered which entry. It might become tricky though in case there are multiple identical devices are present, say two e1000 cards, where the first rom could register entries for both cards ... Maybe we can compromise here - if the user selects booting from a device, and qemu sees there is a rom for that device, then qemu can specify two boot options: /p...@i0cf8/ether...@4/ethernet-...@0 /p...@i0cf8/r...@4 SeaBIOS will ignore the first entry, and act on the second entry. SeaBIOS should be able to operate just fine with the first entry. ether...@4 means the nic at bus address 4. As this is a PCI bus 4 is the pci address. So SeaBIOS would just look what entries it has for 00:04.0, run the rom, and ignore the /ethernet-...@0 part as it can't handle it. In case of scsi seabios can look at the next path element to figure the scsi id. With native support it should be able to boot the correct disk directly. When booting via rom it can either just pick the first entry unconditionally (probably good enougth in 99% of the cases) or do some guesswork based on the order the entries are registered. BTW, how are PCI locations specified in these paths? They should have a (bus, dev, fn) - your examples only seem to show dev. How are the other parts specified? fn is optional for fn=0, IIRC the syntax is $cl...@$dev,$fn. Bus is specified via location in the tree, i.e. you'll see the bridge for the secondary pci bus in the path, like this: /p...@i0cf8/bri...@7/ether...@3/... (not sure it is actually named 'bridge' in the openfirmware specs though). cheers, Gerd -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/29/2010 12:47 PM, Michael S. Tsirkin wrote: On Mon, Nov 29, 2010 at 11:22:44AM +0200, Avi Kivity wrote: No need for an additional toolchain. It's a feature :) This way you are not forced to rewrite all code each time you realize you need an extra check, and checks can be added gradually without breaking build. You can see that user_ptr is not just for the checks, it adds functionality (sizeof-less copy_from and copy_to). That's usually the case. If there's something you must not do because of some rule, there's also something you want to do, and those become member functions. In C++ you could also introduce user_ptr gradually, it won't break anything. Yes but in void foo(void *p) { bar(p); } both foo and bar must be converted. No. You can convert from user_ptr to __user * and back. Sparse has some advantages: it makes the contract obvious so you clearly see it's a pointer and know -, [], + will work, * and will not. I don't really see how you can tell this from __user. I can tell this is a pointer from T *foo :), and I can tell it has some attribute. You have to look up the definition. For user_ptr, the definition is actually available. The definition for __user is also available: #ifdef __CHECKER__ # define __user__attribute__((noderef, address_space(1))) #else # define __user #endif Which is a very transparent way to say: this is just a checker attribute, it does not affect actual code. Can you tell you must not dereference it? With a template we go 'I have overridden + but compiler should optimize it back to original'. Note the should :) We do this in C all the time with pte_t and lots of inlines. Do a 'git grep hash' for examples of duplication. I see. Don't know enough about tcg to fix unfortunately, but the logic is different: e.g. sparc opcode table is completely static, translation cache is dynamic, jump cache has static size but dynamic content, qdict could do with linear lookups just as well. So it could just be different optimization strategies. The default is duplication. With C++ your default is tr1::unordered_map and you can optimize it later if you like. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
On Thu, Nov 25, 2010 at 6:06 AM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp --- Makefile.target | 1 + block.h | 9 + event-tap.c | 794 +++ event-tap.h | 34 +++ net.h | 4 + net/queue.c | 1 + 6 files changed, 843 insertions(+), 0 deletions(-) create mode 100644 event-tap.c create mode 100644 event-tap.h event_tap_state is checked at the beginning of several functions. If there is an unexpected state the function silently returns. Should these checks really be assert() so there is an abort and backtrace if the program ever reaches this state? +typedef struct EventTapBlkReq { + char *device_name; + int num_reqs; + int num_cbs; + bool is_multiwrite; Is multiwrite logging necessary? If event tap is called from within the block layer then multiwrite is turned into one or more bdrv_aio_writev() calls. +static void event_tap_replay(void *opaque, int running, int reason) +{ + EventTapLog *log, *next; + + if (!running) { + return; + } + + if (event_tap_state != EVENT_TAP_LOAD) { + return; + } + + event_tap_state = EVENT_TAP_REPLAY; + + QTAILQ_FOREACH(log, event_list, node) { + EventTapBlkReq *blk_req; + + /* event resume */ + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_flush(log-net_req); + break; + case EVENT_TAP_BLK: + blk_req = log-blk_req; + if ((log-mode EVENT_TAP_TYPE_MASK) == EVENT_TAP_IOPORT) { + switch (log-ioport.index) { + case 0: + cpu_outb(log-ioport.address, log-ioport.data); + break; + case 1: + cpu_outw(log-ioport.address, log-ioport.data); + break; + case 2: + cpu_outl(log-ioport.address, log-ioport.data); + break; + } + } else { + /* EVENT_TAP_MMIO */ + cpu_physical_memory_rw(log-mmio.address, + log-mmio.buf, + log-mmio.len, 1); + } + break; Why are net tx packets replayed at the net level but blk requests are replayed at the pio/mmio level? I expected everything to replay either as pio/mmio or as net/block. +static void event_tap_blk_load(QEMUFile *f, EventTapBlkReq *blk_req) +{ + BlockRequest *req; + ram_addr_t page_addr; + int i, j, len; + + len = qemu_get_byte(f); + blk_req-device_name = qemu_malloc(len + 1); + qemu_get_buffer(f, (uint8_t *)blk_req-device_name, len); + blk_req-device_name[len] = '\0'; + blk_req-num_reqs = qemu_get_byte(f); + + for (i = 0; i blk_req-num_reqs; i++) { + req = blk_req-reqs[i]; + req-sector = qemu_get_be64(f); + req-nb_sectors = qemu_get_be32(f); + req-qiov = qemu_malloc(sizeof(QEMUIOVector)); It would make sense to have common QEMUIOVector load/save functions instead of inlining this code here. +static int event_tap_load(QEMUFile *f, void *opaque, int version_id) +{ + EventTapLog *log, *next; + int mode; + + event_tap_state = EVENT_TAP_LOAD; + + QTAILQ_FOREACH_SAFE(log, event_list, node, next) { + QTAILQ_REMOVE(event_list, log, node); + event_tap_free_log(log); + } + + /* loop until EOF */ + while ((mode = qemu_get_byte(f)) != 0) { + EventTapLog *log = event_tap_alloc_log(); + + log-mode = mode; + switch (log-mode EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_IOPORT: + event_tap_ioport_load(f, log-ioport); + break; + case EVENT_TAP_MMIO: + event_tap_mmio_load(f, log-mmio); + break; + case 0: + DPRINTF(No event\n); + break; + default: + fprintf(stderr, Unknown state %d\n, log-mode); + return -1; log is leaked here... + } + + switch (log-mode ~EVENT_TAP_TYPE_MASK) { + case EVENT_TAP_NET: + event_tap_net_load(f, log-net_req); + break; + case EVENT_TAP_BLK: + event_tap_blk_load(f, log-blk_req); + break; + default: + fprintf(stderr, Unknown state %d\n, log-mode); + return -1; ...and here. Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On Mon, Nov 29, 2010 at 12:52:29PM +0200, Avi Kivity wrote: The default is duplication. With C++ your default is tr1::unordered_map and you can optimize it later if you like. BTW not relevant to kvm, but for qemu, some people seem to care about building with an old migw compiler in Debian stable which is unlikely to have tr1. If we drop this we'll also have a working %llx in printf :) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: buildbot for kvm.git
On Thursday, November 25, 2010 10:22:56 pm Avi Kivity wrote: I'm fine giving (personal) user accounts away for people who just ask for it. Or do you want to make this only available for a small (trusted) group? I think you can give accounts to kvm contributors (present in kvm.git commit log). I'll start asking people to build-test and point them at you for accounts. Should be rare though, most commits are x86 specific. Ok. I'll start by asking for an account. Unfortunately we're running (in meanwhile) an old version of buildbot as buildbot-master (0.7.8, as shipped by Debian 5). And there seems to be an issue using a recent buildbot version to trigger a try-build (e.g. 0.8.1) . Everyone would have to downgrade to buildbot (client) 0.7.8, even if their distro ships a more recent version. Another (minor) issue with the old buildbot-master is that every time someone performs a try-build of a patch the buildbot-slaves will vanish the source- directory on the next testrun. And perform a complete git clone from scratch, which wastes some minutes of testing time and a lot of traffic. Need to check more recent version of buildbot, lots of stuff got changed/rewritten when it comes to VCS handling. Newer buildbot version also support now GitPoller to detect new changes. So we can avoid the rsync-git.kernel.org-delay issue. So there are lots of reason to look into new buildbot version, before giving away try-accounts. FYI, ppc44x is finally building (uboot-mkimage was missing on the slave). Continuous Testing for all requested architectures and the master and next branch should work now. Will add the kvm mailinglist tomorrow to the notification list for build-failures of kvm.git. Once I reviewed the newer buildbot version I start providing the try-accounts. Best Regards, Daniel -- Daniel Gollub Linux Consultant Developer Mail: gol...@b1-systems.de B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 signature.asc Description: This is a digitally signed message part.
Re: [PATCHv6 00/16] boot order specification
On Mon, Nov 29, 2010 at 11:19:29AM +0100, Gerd Hoffmann wrote: Hi, BTW are you actually aware of any option rom with multiple BCVs and, if yes, how those BCVs differ? Multiple BCVs - yes. A SCSI card will define a BCV for each attached drive. I don't have a scsi card myself, but the support was added by a user who ran into the problem first hand. FYI: You can test with a virtual card ;) lsi emulation in recent qemu versions is good enougth that the original boot rom runs. You can fetch it here: http://www.lsi.com/DistributionSystem/AssetDocument/files/support/ssp/sdms/Bios/lsi_bios.zip qemu -drive if=scsi,file=$image -option-rom 8xx_64.rom Gives you a list like this ... Press F12 for boot menu. Select boot device: 1. AHCI/0: QEMU HARDDISK ATA-7 Hard-Disk (8192 MiBytes) 2. #20 ID00 LUN0 QEMU QEMU HARD 3. #20 ID01 LUN0 QEMU QEMU HARD 4. DVD/CD [ata1-0: QEMU DVD-ROM ATAPI-4 DVD/CD] 5. DVD/CD [AHCI/2: QEMU DVD-ROM ATAPI-4 DVD/CD] 6. gPXE (PCI 00:03.0) ... where entries 2+3 are created by the scsi rom. Thanks! If BCVs created by optionrom are always sorted by target we can even determine what BCV correspond to which device path. -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: buildbot for kvm.git
Am 29.11.2010 12:36, Daniel Gollub wrote: On Thursday, November 25, 2010 10:22:56 pm Avi Kivity wrote: I'm fine giving (personal) user accounts away for people who just ask for it. Or do you want to make this only available for a small (trusted) group? I think you can give accounts to kvm contributors (present in kvm.git commit log). I'll start asking people to build-test and point them at you for accounts. Should be rare though, most commits are x86 specific. Ok. I'll start by asking for an account. Unfortunately we're running (in meanwhile) an old version of buildbot as buildbot-master (0.7.8, as shipped by Debian 5). Last time I checked (quite a few moons ago, though), that version contained an unfixed security issue. I installed a vanilla version on my server for that reason (and forgot to patch that afterward...). Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] Re: [PATCH 13/21] dma-helpers: replace bdrv_aio_writev() with bdrv_aio_writev_proxy().
2010/11/29 Kevin Wolf kw...@redhat.com: Am 28.11.2010 12:55, schrieb Yoshiaki Tamura: 2010/11/28 Michael S. Tsirkin m...@redhat.com: On Thu, Nov 25, 2010 at 03:06:52PM +0900, Yoshiaki Tamura wrote: Replace bdrv_aio_writev() with bdrv_aio_writev_proxy() to let event-tap capture events from dma-helpers. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Same comment as -net here: it's not clear when should a device use bdrv_aio_writev_proxy and when bdrv_aio_writev. If all devices should just use _proxy, let's just make bdrv_aio_writev DTRT instead. Same as I replied to the net layer question. However, I had troubles with inserting event-tap functions into block.c before. block.c gets linked with utils like qemu-img, but they don't get linked with emulators code which event-tap uses in it. So I want to avoid linking block and event-tap for utils, but I guess we don't want to use ifdefs for this. I'm wondering how I can solve this problem cleanly. Kevin, do you have suggestions here? Michael's stubs (probably in qemu-tool.c) seem to be the right solution. Same here. I noticed kvm-stub to be a good example. Which requests do you actually want to intercept? I assume you're aware that for example qcow2 internally calls another bdrv_aio_readv/writev that accesses the image file. So if you only want to have the requests that come directly from devices, maybe you'll have to restrict it to BlockDriverStates that belongs to a drive. I think this is the case if it has a non-empty device name. Yes, exactly. I noticed that a little while ago. Thanks for making it clear. Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
Could you formulate the constraints so developers are aware of them in the future and can protect the codebase. How about expanding the Kemari wiki pages? If you like the idea above, I'm happy to make the list also on the wiki page. Here's a different question: what requirements must an emulated device meet in order to be added to the Kemari supported whitelist? That's what I want to know so that I don't break existing devices and can add new devices that work with Kemari :). Why isn't it completely device agnostic? i.e. if a device has to care about Kemari at all (of vice-versa) then IMO you're doing it wrong. The whole point of the internal block/net APIs is that they isolate the host implementation details from the device emulation. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/29 Paul Brook p...@codesourcery.com: Could you formulate the constraints so developers are aware of them in the future and can protect the codebase. How about expanding the Kemari wiki pages? If you like the idea above, I'm happy to make the list also on the wiki page. Here's a different question: what requirements must an emulated device meet in order to be added to the Kemari supported whitelist? That's what I want to know so that I don't break existing devices and can add new devices that work with Kemari :). Why isn't it completely device agnostic? i.e. if a device has to care about Kemari at all (of vice-versa) then IMO you're doing it wrong. The whole point of the internal block/net APIs is that they isolate the host implementation details from the device emulation. You're right theoretically. But what I've learned so far, there are cases like virtio-net and e1000 woks but virtio-blk doesn't. Theoretically, any emulated device should be able to get into the whitelist if the event-tap is properly implemented but sometimes it doesn't seem to be that simple. To answer Stefan's question, there shouldn't be any requirement for a device, but must be tested with Kemari. If it doesn't work correctly, the problems must be fixed before adding to the list. Yoshi Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/29 Paul Brook p...@codesourcery.com: Could you formulate the constraints so developers are aware of them in the future and can protect the codebase. How about expanding the Kemari wiki pages? If you like the idea above, I'm happy to make the list also on the wiki page. Here's a different question: what requirements must an emulated device meet in order to be added to the Kemari supported whitelist? That's what I want to know so that I don't break existing devices and can add new devices that work with Kemari :). Why isn't it completely device agnostic? i.e. if a device has to care about Kemari at all (of vice-versa) then IMO you're doing it wrong. The whole point of the internal block/net APIs is that they isolate the host implementation details from the device emulation. You're right theoretically. But what I've learned so far, there are cases like virtio-net and e1000 woks but virtio-blk doesn't. Theoretically, any emulated device should be able to get into the whitelist if the event-tap is properly implemented but sometimes it doesn't seem to be that simple. To answer Stefan's question, there shouldn't be any requirement for a device, but must be tested with Kemari. If it doesn't work correctly, the problems must be fixed before adding to the list. What exactly are the problems? Is this a device bus of a Kemari bug? If it's the former then that implies you're imposing additional requirements that weren't previously part of the API. If the latter, then it's a bug like any other. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/29/2010 05:26 AM, Michael S. Tsirkin wrote: On Mon, Nov 29, 2010 at 12:52:29PM +0200, Avi Kivity wrote: The default is duplication. With C++ your default is tr1::unordered_map and you can optimize it later if you like. BTW not relevant to kvm, but for qemu, some people seem to care about building with an old migw compiler in Debian stable which is unlikely to have tr1. If we drop this we'll also have a working %llx in printf :) Boost has a very compatible tr1 library. Regards, Anthony Liguori -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/29 Paul Brook p...@codesourcery.com: 2010/11/29 Paul Brook p...@codesourcery.com: Could you formulate the constraints so developers are aware of them in the future and can protect the codebase. How about expanding the Kemari wiki pages? If you like the idea above, I'm happy to make the list also on the wiki page. Here's a different question: what requirements must an emulated device meet in order to be added to the Kemari supported whitelist? That's what I want to know so that I don't break existing devices and can add new devices that work with Kemari :). Why isn't it completely device agnostic? i.e. if a device has to care about Kemari at all (of vice-versa) then IMO you're doing it wrong. The whole point of the internal block/net APIs is that they isolate the host implementation details from the device emulation. You're right theoretically. But what I've learned so far, there are cases like virtio-net and e1000 woks but virtio-blk doesn't. Theoretically, any emulated device should be able to get into the whitelist if the event-tap is properly implemented but sometimes it doesn't seem to be that simple. To answer Stefan's question, there shouldn't be any requirement for a device, but must be tested with Kemari. If it doesn't work correctly, the problems must be fixed before adding to the list. What exactly are the problems? Is this a device bus of a Kemari bug? If it's the former then that implies you're imposing additional requirements that weren't previously part of the API. If the latter, then it's a bug like any other. It's a problem if devices don't continue correctly upon failover. I would say it's a bug of live migration (not all of course) because Kemari is just live migrating at specific points. Yoshi Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/29/2010 02:04 AM, Michael S. Tsirkin wrote: On Sun, Nov 28, 2010 at 05:13:41PM -0600, Anthony Liguori wrote: On 11/28/2010 04:28 PM, Michael S. Tsirkin wrote: But rather need to use ugly factory functions with all sorts of DO_UPCAST. This is really unfriendly especially for writing test cases. Yes, I agree. Just moving memory allocation out of there will fix most of the ugliness. So here's a short list of things I've been working on that I don't believe have nice implementations in C. 1) Factories with string based parameters with natural constructor arguments. This was the only item, right? So in fact, this is needed as part of configuration file/command line/monitor parser. I really see it more as an API interface. I think the best long term architecture for QEMU is where qemu is launched as essentially a daemon and is manipulated via an RPC interface to create an initial device model, etc. IMHO, this really should be separate from the device model. The fact that qdev currently mixes the device model with argument parsing is bad IMO. So we ended up with saying .driver = PCI in hw/pc_piix.c instead of an instance of the structure. There's no compile-time check that the correct string is used and that is pretty bad IMO. Indeed. Yes, this makes it easier to add new properties, but making it easy is exactly the wrong thing to do because we really have to support such properties forever. One advantage of having a factory and the objects in a single place is that the factory is the long term supported interface and the objects can evolve in a more natural fashion. So how about a compromise: libqemu written in C, with APIs that should not deal with string parsing at all, and should above all else make sense: i8254_init_drift_mode i8254_init_catchup net_set_link_up net_set_link_down (and it really needs to be C for portability: so that management written in C can use it). This API should be properly versioned, with a backwards compatibility story, and we should be careful about adding interfaces there. On top of this you can have a management interface written in any other language, and have that deal with string parsing. I still dislike the idea of implementing an object system in C. Besides reinventing vtables, we'll have to also reinvent RTTI to allow for safe upcasting (which is unavoidable). But really, let's defer this discussion for when patches are available. I understand your objections but I'm pretty convinced that the code will speak for itself when it's ready. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 2/4] Introduce a C++ wrapper for the kvm APIs
On 11/29/2010 03:44 PM, Anthony Liguori wrote: But really, let's defer this discussion for when patches are available. I understand your objections but I'm pretty convinced that the code will speak for itself when it's ready. It will, but everyone will hear something different. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] vhost: correctly set bits of dirty pages
Michael S. Tsirkin writes: On Mon, Nov 29, 2010 at 01:48:20PM +0800, Jason Wang wrote: When counting pages we should increase it by 1 instead of VHOST_PAGE_SIZE, and also make log_write() can correctly process the request across pages with write_address not start at page boundary. Signed-off-by: Jason Wang jasow...@redhat.com dd Thanks, good catch! But let's to it in small steps: first, a small patch to fix the bug: I think this is equivalent, right? Yes. Subject: vhost: correctly set bits of dirty pages When counting pages we should increase address by 1 instead of VHOST_PAGE_SIZE, and also make log_write() can correctly process the request across pages with write_address not starting at page boundary. Reported-by: Jason Wang jasow...@redhat.com Signed-off-by: Michael S. Tsirkin m...@redhat.com I'm fine with this, thanks! --- diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index 568eb70..d0a3552 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -887,6 +887,7 @@ static int log_write(void __user *log_base, int r; if (!write_length) return 0; +write_length += write_address % VHOST_PAGE_SIZE; write_address /= VHOST_PAGE_SIZE; for (;;) { u64 base = (u64)(unsigned long)log_base; @@ -900,7 +901,7 @@ static int log_write(void __user *log_base, if (write_length = VHOST_PAGE_SIZE) break; write_length -= VHOST_PAGE_SIZE; -write_address += VHOST_PAGE_SIZE; +write_address += 1; } return r; } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/1] Clean up page fault injection
Currently fault injection is somewhat confused with important information carried in the vcpu area where it has no place. This patch cleans it up. Gleb, Joerg, I'd appreciate review and testing of the apf and nnpt related changes. Goes on top of the previous 7-part emulator series. Also available in 'emulator' branch of kvm.git. Avi Kivity (1): KVM: Pull extra page fault information into struct x86_exception arch/x86/include/asm/kvm_emulate.h |2 + arch/x86/include/asm/kvm_host.h| 17 +++-- arch/x86/kvm/mmu.c |5 ++- arch/x86/kvm/paging_tmpl.h |6 ++-- arch/x86/kvm/svm.c |7 +++-- arch/x86/kvm/x86.c | 44 +++ 6 files changed, 40 insertions(+), 41 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
To answer Stefan's question, there shouldn't be any requirement for a device, but must be tested with Kemari. If it doesn't work correctly, the problems must be fixed before adding to the list. What exactly are the problems? Is this a device bus of a Kemari bug? If it's the former then that implies you're imposing additional requirements that weren't previously part of the API. If the latter, then it's a bug like any other. It's a problem if devices don't continue correctly upon failover. I would say it's a bug of live migration (not all of course) because Kemari is just live migrating at specific points. Ah, now we're getting somewhere. So you're saying that these devices are broken anyway, and Kemari happens to trigger that brokenness more frequently? If the requirement is that a device must support live migration, then that should be the criteria for enabling Kemari, not some arbitrary whitelist. If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/1] KVM: Pull extra page fault information into struct x86_exception
Currently page fault cr2 and nesting infomation are carried outside the fault data structure. Instead they are placed in the vcpu struct, which results in confusion as global variables are manipulated instead of passing parameters. Fix this issue by adding address and nested fields to struct x86_exception, so this struct can carry all information associated with a fault. Signed-off-by: Avi Kivity a...@redhat.com --- arch/x86/include/asm/kvm_emulate.h |2 + arch/x86/include/asm/kvm_host.h| 17 +++-- arch/x86/kvm/mmu.c |5 ++- arch/x86/kvm/paging_tmpl.h |6 ++-- arch/x86/kvm/svm.c |7 +++-- arch/x86/kvm/x86.c | 44 +++ 6 files changed, 40 insertions(+), 41 deletions(-) diff --git a/arch/x86/include/asm/kvm_emulate.h b/arch/x86/include/asm/kvm_emulate.h index 87d017e..bf70ece 100644 --- a/arch/x86/include/asm/kvm_emulate.h +++ b/arch/x86/include/asm/kvm_emulate.h @@ -19,6 +19,8 @@ struct x86_exception { u8 vector; bool error_code_valid; u16 error_code; + bool nested_page_fault; + u64 address; /* cr2 or nested page fault gpa */ }; /* diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 24bad0a..2621c4d 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -242,7 +242,8 @@ struct kvm_mmu { void (*set_cr3)(struct kvm_vcpu *vcpu, unsigned long root); unsigned long (*get_cr3)(struct kvm_vcpu *vcpu); int (*page_fault)(struct kvm_vcpu *vcpu, gva_t gva, u32 err, bool no_apf); - void (*inject_page_fault)(struct kvm_vcpu *vcpu); + void (*inject_page_fault)(struct kvm_vcpu *vcpu, + struct x86_exception *fault); void (*free)(struct kvm_vcpu *vcpu); gpa_t (*gva_to_gpa)(struct kvm_vcpu *vcpu, gva_t gva, u32 access, struct x86_exception *exception); @@ -318,16 +319,6 @@ struct kvm_vcpu_arch { */ struct kvm_mmu *walk_mmu; - /* -* This struct is filled with the necessary information to propagate a -* page fault into the guest -*/ - struct { - u64 address; - unsigned error_code; - bool nested; - } fault; - /* only needed in kvm_pv_mmu_op() path, but it's hot so * put it here to avoid allocation */ struct kvm_pv_mmu_op_buffer mmu_op_buffer; @@ -685,11 +676,11 @@ void kvm_queue_exception(struct kvm_vcpu *vcpu, unsigned nr); void kvm_queue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); void kvm_requeue_exception(struct kvm_vcpu *vcpu, unsigned nr); void kvm_requeue_exception_e(struct kvm_vcpu *vcpu, unsigned nr, u32 error_code); -void kvm_inject_page_fault(struct kvm_vcpu *vcpu); +void kvm_inject_page_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault); int kvm_read_guest_page_mmu(struct kvm_vcpu *vcpu, struct kvm_mmu *mmu, gfn_t gfn, void *data, int offset, int len, u32 access); -void kvm_propagate_fault(struct kvm_vcpu *vcpu); +void kvm_propagate_fault(struct kvm_vcpu *vcpu, struct x86_exception *fault); bool kvm_require_cpl(struct kvm_vcpu *vcpu, int required_cpl); int kvm_pic_set_irq(void *opaque, int irq, int level); diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 28ddc13..e99afef 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2737,9 +2737,10 @@ static unsigned long get_cr3(struct kvm_vcpu *vcpu) return vcpu-arch.cr3; } -static void inject_page_fault(struct kvm_vcpu *vcpu) +static void inject_page_fault(struct kvm_vcpu *vcpu, + struct x86_exception *fault) { - vcpu-arch.mmu.inject_page_fault(vcpu); + vcpu-arch.mmu.inject_page_fault(vcpu, fault); } static void paging_free(struct kvm_vcpu *vcpu) diff --git a/arch/x86/kvm/paging_tmpl.h b/arch/x86/kvm/paging_tmpl.h index aaed914..9a1990b 100644 --- a/arch/x86/kvm/paging_tmpl.h +++ b/arch/x86/kvm/paging_tmpl.h @@ -279,8 +279,8 @@ error: if (rsvd_fault) walker-fault.error_code |= PFERR_RSVD_MASK; - vcpu-arch.fault.address= addr; - vcpu-arch.fault.error_code = walker-fault.error_code; + walker-fault.address = addr; + walker-fault.nested_page_fault = mmu != vcpu-arch.walk_mmu; trace_kvm_mmu_walker_error(walker-fault.error_code); return 0; @@ -561,7 +561,7 @@ static int FNAME(page_fault)(struct kvm_vcpu *vcpu, gva_t addr, u32 error_code, */ if (!r) { pgprintk(%s: guest page fault\n, __func__); - inject_page_fault(vcpu); + inject_page_fault(vcpu, walker.fault); vcpu-arch.last_pt_write_count = 0; /* reset fork detector */ return 0; } diff --git a/arch/x86/kvm/svm.c
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/29 Paul Brook p...@codesourcery.com: To answer Stefan's question, there shouldn't be any requirement for a device, but must be tested with Kemari. If it doesn't work correctly, the problems must be fixed before adding to the list. What exactly are the problems? Is this a device bus of a Kemari bug? If it's the former then that implies you're imposing additional requirements that weren't previously part of the API. If the latter, then it's a bug like any other. It's a problem if devices don't continue correctly upon failover. I would say it's a bug of live migration (not all of course) because Kemari is just live migrating at specific points. Ah, now we're getting somewhere. So you're saying that these devices are broken anyway, and Kemari happens to trigger that brokenness more frequently? If the requirement is that a device must support live migration, then that should be the criteria for enabling Kemari, not some arbitrary whitelist. Sorry, I though that criteria to be obvious one and didn't think to clarify. The whitelist is a guard not to let users get into trouble with arbitrary devices. If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. I totally agree with you. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? Yoshi Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. I totally agree with you. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? My point is that the whitelist shouldn't exist at all. Devices either support migration or they don't. Having some sort of separate whitelist is the wrong way to determine which devices support migration. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/29 Paul Brook p...@codesourcery.com: If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. I totally agree with you. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? My point is that the whitelist shouldn't exist at all. Devices either support migration or they don't. Having some sort of separate whitelist is the wrong way to determine which devices support migration. Alright! Then if a user encounters a problem with Kemari, we'll fix Kemari or the devices or both. Correct? Yoshi Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: remove unused setupcpuid
On Tue, Nov 23, 2010 at 04:54:42PM +0200, Michael S. Tsirkin wrote: kvm_setup_cpuid seems unused, so remove it. Signed-off-by: Michael S. Tsirkin m...@redhat.com Applied, thanks. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: X86: Introduce generic guest-mode representation
This patch introduces a generic representation of guest-mode fpr a vcpu. This currently only exists in the SVM code. Having this representation generic will help making the non-svm code aware of nesting when this is necessary. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/kvm_cache_regs.h | 15 +++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 54e42c8..d2a66be 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -784,6 +784,7 @@ enum { #define HF_VINTR_MASK (1 2) #define HF_NMI_MASK(1 3) #define HF_IRET_MASK (1 4) +#define HF_GUEST_MASK (1 5) /* VCPU is in guest-mode */ /* * Hardware virtualization extension instructions may fault if a diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 975bb45..7e7c52d 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -84,4 +84,19 @@ static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu) | ((u64)(kvm_register_read(vcpu, VCPU_REGS_RDX) -1u) 32); } +static inline void kvm_vcpu_enter_gm(struct kvm_vcpu *vcpu) +{ + vcpu-arch.hflags |= HF_GUEST_MASK; +} + +static inline void kvm_vcpu_leave_gm(struct kvm_vcpu *vcpu) +{ + vcpu-arch.hflags = ~HF_GUEST_MASK; +} + +static inline bool kvm_vcpu_is_gm(struct kvm_vcpu *vcpu) +{ + return !!(vcpu-arch.hflags HF_GUEST_MASK); +} + #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: X86: Don't report L2 emulation failures to user-space
This patch prevents that emulation failures which result from emulating an instruction for an L2-Guest results in being reported to userspace. Without this patch a malicious L2-Guest would be able to kill the L1 by triggering a race-condition between an vmexit and the instruction emulator. With this patch the L2 will most likely only kill itself in this situation. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/x86.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 410d2d1..78329dd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4320,13 +4320,19 @@ EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt); static int handle_emulation_failure(struct kvm_vcpu *vcpu) { + int r = EMULATE_DONE; + ++vcpu-stat.insn_emulation_fail; trace_kvm_emulate_insn_failed(vcpu); - vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR; - vcpu-run-internal.suberror = KVM_INTERNAL_ERROR_EMULATION; - vcpu-run-internal.ndata = 0; + if (!kvm_vcpu_is_gm(vcpu)) { + vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu-run-internal.suberror = KVM_INTERNAL_ERROR_EMULATION; + vcpu-run-internal.ndata = 0; + r = EMULATE_FAIL; + } kvm_queue_exception(vcpu, UD_VECTOR); - return EMULATE_FAIL; + + return r; } static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: SVM: Make Use of the generic guest-mode functions
This patch replaces the is_nested logic in the SVM module with the generic notion of guest-mode. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/svm.c | 44 +--- 1 files changed, 21 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2fd2f4d..3376fca 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -192,11 +192,6 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_svm, vcpu); } -static inline bool is_nested(struct vcpu_svm *svm) -{ - return svm-nested.vmcb; -} - static inline void enable_gif(struct vcpu_svm *svm) { svm-vcpu.arch.hflags |= HF_GIF_MASK; @@ -727,7 +722,7 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) struct vcpu_svm *svm = to_svm(vcpu); u64 g_tsc_offset = 0; - if (is_nested(svm)) { + if (kvm_vcpu_is_gm(vcpu)) { g_tsc_offset = svm-vmcb-control.tsc_offset - svm-nested.hsave-control.tsc_offset; svm-nested.hsave-control.tsc_offset = offset; @@ -741,7 +736,7 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) struct vcpu_svm *svm = to_svm(vcpu); svm-vmcb-control.tsc_offset += adjustment; - if (is_nested(svm)) + if (kvm_vcpu_is_gm(vcpu)) svm-nested.hsave-control.tsc_offset += adjustment; } @@ -1209,7 +1204,7 @@ static void update_cr0_intercept(struct vcpu_svm *svm) if (gcr0 == *hcr0 svm-vcpu.fpu_active) { vmcb-control.intercept_cr_read = ~INTERCEPT_CR0_MASK; vmcb-control.intercept_cr_write = ~INTERCEPT_CR0_MASK; - if (is_nested(svm)) { + if (kvm_vcpu_is_gm(svm-vcpu)) { struct vmcb *hsave = svm-nested.hsave; hsave-control.intercept_cr_read = ~INTERCEPT_CR0_MASK; @@ -1220,7 +1215,7 @@ static void update_cr0_intercept(struct vcpu_svm *svm) } else { svm-vmcb-control.intercept_cr_read |= INTERCEPT_CR0_MASK; svm-vmcb-control.intercept_cr_write |= INTERCEPT_CR0_MASK; - if (is_nested(svm)) { + if (kvm_vcpu_is_gm(svm-vcpu)) { struct vmcb *hsave = svm-nested.hsave; hsave-control.intercept_cr_read |= INTERCEPT_CR0_MASK; @@ -1233,7 +1228,7 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { struct vcpu_svm *svm = to_svm(vcpu); - if (is_nested(svm)) { + if (kvm_vcpu_is_gm(vcpu)) { /* * We are here because we run in nested mode, the host kvm * intercepts cr0 writes but the l1 hypervisor does not. @@ -1471,7 +1466,7 @@ static void svm_fpu_activate(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); u32 excp; - if (is_nested(svm)) { + if (kvm_vcpu_is_gm(vcpu)) { u32 h_excp, n_excp; h_excp = svm-nested.hsave-control.intercept_exceptions; @@ -1700,7 +1695,7 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, { int vmexit; - if (!is_nested(svm)) + if (!kvm_vcpu_is_gm(svm-vcpu)) return 0; svm-vmcb-control.exit_code = SVM_EXIT_EXCP_BASE + nr; @@ -1718,7 +1713,7 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, /* This function returns true if it is save to enable the irq window */ static inline bool nested_svm_intr(struct vcpu_svm *svm) { - if (!is_nested(svm)) + if (!kvm_vcpu_is_gm(svm-vcpu)) return true; if (!(svm-vcpu.arch.hflags HF_VINTR_MASK)) @@ -1757,7 +1752,7 @@ static inline bool nested_svm_intr(struct vcpu_svm *svm) /* This function returns true if it is save to enable the nmi window */ static inline bool nested_svm_nmi(struct vcpu_svm *svm) { - if (!is_nested(svm)) + if (!kvm_vcpu_is_gm(svm-vcpu)) return true; if (!(svm-nested.intercept (1ULL INTERCEPT_NMI))) @@ -1994,7 +1989,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) if (!nested_vmcb) return 1; - /* Exit nested SVM mode */ + /* Exit Guest-Mode */ + kvm_vcpu_leave_gm(svm-vcpu); svm-nested.vmcb = 0; /* Give the current vmcb to the guest */ @@ -2302,7 +2298,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) nested_svm_unmap(page); - /* nested_vmcb is our indicator if nested SVM is activated */ + /* Enter Guest-Mode */ + kvm_vcpu_enter_gm(svm-vcpu); + svm-nested.vmcb = vmcb_gpa; enable_gif(svm); @@ -2588,7 +2586,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) case MSR_IA32_TSC: { u64 tsc_offset; - if
[PATCH 0/3] KVM: Introduce VCPU-wide notion of guest-mode
Hi Avi, Hi Marcelo, this patch-set introduces a generic notion of guest-mode for VCPUs in KVM. This is already useful as seen in patch 3/3. Nested-VMX also has a guest-mode, so it will make sense for this code too. Regards, Joerg As usual: arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/kvm_cache_regs.h | 15 + arch/x86/kvm/svm.c | 44 ++ arch/x86/kvm/x86.c | 14 --- Joerg Roedel (3): KVM: X86: Introduce generic guest-mode representation KVM: SVM: Make Use of the generic guest-mode functions KVM: X86: Don't report L2 emulation failures to user-space 4 files changed, 47 insertions(+), 27 deletions(-) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? My point is that the whitelist shouldn't exist at all. Devices either support migration or they don't. Having some sort of separate whitelist is the wrong way to determine which devices support migration. Alright! Then if a user encounters a problem with Kemari, we'll fix Kemari or the devices or both. Correct? Correct. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/1] Clean up page fault injection
On Mon, Nov 29, 2010 at 09:12:29AM -0500, Avi Kivity wrote: Gleb, Joerg, I'd appreciate review and testing of the apf and nnpt related changes. I'll give it a test as soon as possible. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH kvm-unit-tests 0/4] API test framework
On Wed, Nov 24, 2010 at 12:52:09PM +0200, Avi Kivity wrote: This patchset adds an API test framework. Rather than driving kvm from qemu, we now have a way of calling the kvm API directly and observing the results. We can switch to guest mode and back at will and see any micro effects such as the result of executing particular instructions. As an experiment, the framework is coded in C++. Avi Kivity (4): Makefile: add support for C++ Introduce a C++ wrapper for the kvm APIs Add support for calling a function in guest mode Add sample test using the api test harness Makefile |4 +- api/identity.cc | 76 ++ api/identity.h| 28 api/kvmxx.cc | 168 + api/kvmxx.h | 80 +++ config-x86-common.mak |8 +++ x86/api-sample.cc | 23 +++ 7 files changed, 386 insertions(+), 1 deletions(-) create mode 100644 api/identity.cc create mode 100644 api/identity.h create mode 100644 api/kvmxx.cc create mode 100644 api/kvmxx.h create mode 100644 x86/api-sample.cc I fail to see practical advantages of this compared to current unit tests. Could you give some exciting examples? The fact that it does not run inside QEMU means you can't test QEMU interactions (re: http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/59060). Perhaps you can write a save/restore example as mentioned in the URL above? To me it seems having an interface to QEMU from unit tests would be more beneficial. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] KVM: X86: Introduce generic guest-mode representation
On 11/29/2010 05:38 PM, Joerg Roedel wrote: This patch introduces a generic representation of guest-mode fpr a vcpu. This currently only exists in the SVM code. Having this representation generic will help making the non-svm code aware of nesting when this is necessary. +static inline void kvm_vcpu_enter_gm(struct kvm_vcpu *vcpu) +{ + vcpu-arch.hflags |= HF_GUEST_MASK; +} I don't like the name much - the meat is just two letters. Please spell it out. I guess we could do is_guest_mode() like we do is_long_mode(). + +static inline bool kvm_vcpu_is_gm(struct kvm_vcpu *vcpu) +{ + return !!(vcpu-arch.hflags HF_GUEST_MASK); +} + !! unneeded with bool. Note we need to live migrate this bit, but that's outside the scope of this patchset. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: Introduce VCPU-wide notion of guest-mode
On 11/29/2010 05:38 PM, Joerg Roedel wrote: Hi Avi, Hi Marcelo, this patch-set introduces a generic notion of guest-mode for VCPUs in KVM. This is already useful as seen in patch 3/3. Nested-VMX also has a guest-mode, so it will make sense for this code too. Looks good, apart from the trivial comments on patch 1. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_debian_5_0
The Buildbot has detected a new failure of disable_kvm_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_debian_5_0/builds/653 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_debian_5_0
The Buildbot has detected a new failure of disable_kvm_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_debian_5_0/builds/654 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_x86_64_out_of_tree
The Buildbot has detected a new failure of disable_kvm_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_x86_64_out_of_tree/builds/602 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on disable_kvm_i386_out_of_tree
The Buildbot has detected a new failure of disable_kvm_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/disable_kvm_i386_out_of_tree/builds/602 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_debian_5_0
The Buildbot has detected a new failure of default_x86_64_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_debian_5_0/builds/663 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_x86_64_out_of_tree
The Buildbot has detected a new failure of default_x86_64_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_x86_64_out_of_tree/builds/604 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_1 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_debian_5_0
The Buildbot has detected a new failure of default_i386_debian_5_0 on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_debian_5_0/builds/665 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
buildbot failure in qemu-kvm on default_i386_out_of_tree
The Buildbot has detected a new failure of default_i386_out_of_tree on qemu-kvm. Full details are available at: http://buildbot.b1-systems.de/qemu-kvm/builders/default_i386_out_of_tree/builds/602 Buildbot URL: http://buildbot.b1-systems.de/qemu-kvm/ Buildslave for this Build: b1_qemu_kvm_2 Build Reason: Build Source Stamp: [branch next] HEAD Blamelist: Lai Jiangshan la...@cn.fujitsu.com,Xiao Guangrong xiaoguangr...@cn.fujitsu.com BUILD FAILED: failed git sincerely, -The Buildbot -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2905358 ] Doesn't support MS-DOS compatibility FPU mode
Bugs item #2905358, was opened at 2009-11-28 16:37 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2905358group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Closed Resolution: Later Priority: 5 Private: No Submitted By: Samuel Thibault (youpi_486) Assigned to: Nobody/Anonymous (nobody) Summary: Doesn't support MS-DOS compatibility FPU mode Initial Comment: Hello, Apparently, KVM always enables CR0_NE, i.e. FPU errors are always reported via the EXCP10_COPR exception rather than via the FPU interrupt. This breaks operating systems which assume MS-DOS compatability mode, i.e. exceptions are always reported via the FPU interrupt (like GNU Mach). Samuel -- Comment By: Jes Sorensen (jessorensen) Date: 2010-11-29 17:17 Message: Hi, I checked with Avi, this is still not supported but patches are welcome, so I have opened an RFE in bugzilla.kernel.org and migrated the bug there: https://bugzilla.kernel.org/show_bug.cgi?id=23992 Please use the bugzilla thread for further discussions of this problem. Thanks, Jes -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2905358group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Don't spam trace report about missing fields
Instead of spamming the output with complaints about missing fields, simply return an error to the caller, which can print something out or do something more intelligent about them. Fixes kvm plugin tracing older kernels. Signed-off-by: Avi Kivity a...@redhat.com --- parse-events.c | 15 +++ 1 files changed, 3 insertions(+), 12 deletions(-) diff --git a/parse-events.c b/parse-events.c index f0b0324..7a4c3a6 100644 --- a/parse-events.c +++ b/parse-events.c @@ -4446,17 +4446,11 @@ int get_field_val(struct trace_seq *s, struct format_field *field, const char *name, struct record *record, unsigned long long *val, int err) { - if (!field) { - if (err) - trace_seq_printf(s, CANT FIND FIELD %s, name); + if (!field) return -1; - } - if (pevent_read_number_field(field, record-data, val)) { - if (err) - trace_seq_printf(s, %s=INVALID, name); + if (pevent_read_number_field(field, record-data, val)) return -1; - } return 0; } @@ -4489,11 +4483,8 @@ void *pevent_get_field_raw(struct trace_seq *s, struct event_format *event, field = pevent_find_field(event, name); - if (!field) { - if (err) - trace_seq_printf(s, CANT FIND FIELD %s, name); + if (!field) return NULL; - } /* Allow @len to be NULL */ if (!len) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
On Mon, Nov 29, 2010 at 3:00 PM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/29 Paul Brook p...@codesourcery.com: If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. I totally agree with you. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? My point is that the whitelist shouldn't exist at all. Devices either support migration or they don't. Having some sort of separate whitelist is the wrong way to determine which devices support migration. Alright! Then if a user encounters a problem with Kemari, we'll fix Kemari or the devices or both. Correct? Is this a fair summary: any device that supports live migration workw under Kemari? (If such a device does not work under Kemari then this is a bug that needs to be fixed in live migration, Kemari, or the device.) Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/3] KVM: Introduce VCPU-wide notion of guest-mode
On Mon, Nov 29, 2010 at 11:10:59AM -0500, Avi Kivity wrote: On 11/29/2010 05:38 PM, Joerg Roedel wrote: Hi Avi, Hi Marcelo, this patch-set introduces a generic notion of guest-mode for VCPUs in KVM. This is already useful as seen in patch 3/3. Nested-VMX also has a guest-mode, so it will make sense for this code too. Looks good, apart from the trivial comments on patch 1. Okay, I do a re-spin and fix this. Thanks for your comments. Joerg -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/1] KVM: Pull extra page fault information into struct x86_exception
On Mon, Nov 29, 2010 at 09:12:30AM -0500, Avi Kivity wrote: Currently page fault cr2 and nesting infomation are carried outside the fault data structure. Instead they are placed in the vcpu struct, which results in confusion as global variables are manipulated instead of passing parameters. Fix this issue by adding address and nested fields to struct x86_exception, so this struct can carry all information associated with a fault. Tested with nnpt and found no regressions. Signed-off-by: Avi Kivity a...@redhat.com Tested-by: Joerg Roedel joerg.roe...@amd.com -- AMD Operating System Research Center Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach General Managers: Alberto Bozzo, Andrew Bowd Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Don't spam trace report about missing fields
On Mon, 2010-11-29 at 18:20 +0200, Avi Kivity wrote: Instead of spamming the output with complaints about missing fields, simply return an error to the caller, which can print something out or do something more intelligent about them. Fixes kvm plugin tracing older kernels. Hmm, what about just passing in err=0? -- Steve Signed-off-by: Avi Kivity a...@redhat.com --- parse-events.c | 15 +++ 1 files changed, 3 insertions(+), 12 deletions(-) diff --git a/parse-events.c b/parse-events.c index f0b0324..7a4c3a6 100644 --- a/parse-events.c +++ b/parse-events.c @@ -4446,17 +4446,11 @@ int get_field_val(struct trace_seq *s, struct format_field *field, const char *name, struct record *record, unsigned long long *val, int err) { - if (!field) { - if (err) - trace_seq_printf(s, CANT FIND FIELD %s, name); + if (!field) return -1; - } - if (pevent_read_number_field(field, record-data, val)) { - if (err) - trace_seq_printf(s, %s=INVALID, name); + if (pevent_read_number_field(field, record-data, val)) return -1; - } return 0; } @@ -4489,11 +4483,8 @@ void *pevent_get_field_raw(struct trace_seq *s, struct event_format *event, field = pevent_find_field(event, name); - if (!field) { - if (err) - trace_seq_printf(s, CANT FIND FIELD %s, name); + if (!field) return NULL; - } /* Allow @len to be NULL */ if (!len) -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 22 (kvm)
On Mon, 22 Nov 2010 13:26:27 -0800 Randy Dunlap wrote: On Mon, 22 Nov 2010 13:49:11 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101119: kvm.c:(.init.text+0x11f49): undefined reference to `kvm_register_clock' when CONFIG_KVM_CLOCK is not enabled. BUild error still present in linux-next-2010-NOV-29. --- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 22 (kvm)
On 11/29/2010 06:33 PM, Randy Dunlap wrote: On Mon, 22 Nov 2010 13:26:27 -0800 Randy Dunlap wrote: On Mon, 22 Nov 2010 13:49:11 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101119: kvm.c:(.init.text+0x11f49): undefined reference to `kvm_register_clock' when CONFIG_KVM_CLOCK is not enabled. BUild error still present in linux-next-2010-NOV-29. Glauber, Zach? -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
On 11/29/2010 06:23 PM, Stefan Hajnoczi wrote: On Mon, Nov 29, 2010 at 3:00 PM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/29 Paul Brookp...@codesourcery.com: If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. I totally agree with you. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? My point is that the whitelist shouldn't exist at all. Devices either support migration or they don't. Having some sort of separate whitelist is the wrong way to determine which devices support migration. Alright! Then if a user encounters a problem with Kemari, we'll fix Kemari or the devices or both. Correct? Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. (If such a device does not work under Kemari then this is a bug that needs to be fixed in live migration, Kemari, or the device.) Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/3] KVM: X86: Introduce generic guest-mode representation
This patch introduces a generic representation of guest-mode fpr a vcpu. This currently only exists in the SVM code. Having this representation generic will help making the non-svm code aware of nesting when this is necessary. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/kvm_cache_regs.h | 15 +++ 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h index 54e42c8..d2a66be 100644 --- a/arch/x86/include/asm/kvm_host.h +++ b/arch/x86/include/asm/kvm_host.h @@ -784,6 +784,7 @@ enum { #define HF_VINTR_MASK (1 2) #define HF_NMI_MASK(1 3) #define HF_IRET_MASK (1 4) +#define HF_GUEST_MASK (1 5) /* VCPU is in guest-mode */ /* * Hardware virtualization extension instructions may fault if a diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h index 975bb45..95ac3af 100644 --- a/arch/x86/kvm/kvm_cache_regs.h +++ b/arch/x86/kvm/kvm_cache_regs.h @@ -84,4 +84,19 @@ static inline u64 kvm_read_edx_eax(struct kvm_vcpu *vcpu) | ((u64)(kvm_register_read(vcpu, VCPU_REGS_RDX) -1u) 32); } +static inline void enter_guest_mode(struct kvm_vcpu *vcpu) +{ + vcpu-arch.hflags |= HF_GUEST_MASK; +} + +static inline void leave_guest_mode(struct kvm_vcpu *vcpu) +{ + vcpu-arch.hflags = ~HF_GUEST_MASK; +} + +static inline bool is_guest_mode(struct kvm_vcpu *vcpu) +{ + return vcpu-arch.hflags HF_GUEST_MASK; +} + #endif -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/3] KVM: X86: Don't report L2 emulation failures to user-space
This patch prevents that emulation failures which result from emulating an instruction for an L2-Guest results in being reported to userspace. Without this patch a malicious L2-Guest would be able to kill the L1 by triggering a race-condition between an vmexit and the instruction emulator. With this patch the L2 will most likely only kill itself in this situation. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/x86.c | 14 ++ 1 files changed, 10 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 410d2d1..4337a8b 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -4320,13 +4320,19 @@ EXPORT_SYMBOL_GPL(kvm_inject_realmode_interrupt); static int handle_emulation_failure(struct kvm_vcpu *vcpu) { + int r = EMULATE_DONE; + ++vcpu-stat.insn_emulation_fail; trace_kvm_emulate_insn_failed(vcpu); - vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR; - vcpu-run-internal.suberror = KVM_INTERNAL_ERROR_EMULATION; - vcpu-run-internal.ndata = 0; + if (!is_guest_mode(vcpu)) { + vcpu-run-exit_reason = KVM_EXIT_INTERNAL_ERROR; + vcpu-run-internal.suberror = KVM_INTERNAL_ERROR_EMULATION; + vcpu-run-internal.ndata = 0; + r = EMULATE_FAIL; + } kvm_queue_exception(vcpu, UD_VECTOR); - return EMULATE_FAIL; + + return r; } static bool reexecute_instruction(struct kvm_vcpu *vcpu, gva_t gva) -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/3] KVM: Introduce VCPU-wide notion of guest-mode V2
Hi Avi, Hi Marcelo, here is the re-spin I promised. The change to V1 are essentially the renames: kvm_vcpu_enter_gm - enter_guest_mode kvm_vcpu_leave_gm - leave_guest_mode kvm_vcpu_is_gm- is_guest_mode No other changes are in this patch-set compared to V1. Regards, Joerg arch/x86/include/asm/kvm_host.h |1 + arch/x86/kvm/kvm_cache_regs.h | 15 + arch/x86/kvm/svm.c | 44 ++ arch/x86/kvm/x86.c | 14 --- 4 files changed, 47 insertions(+), 27 deletions(-) Joerg Roedel (3): KVM: X86: Introduce generic guest-mode representation KVM: SVM: Make Use of the generic guest-mode functions KVM: X86: Don't report L2 emulation failures to user-space -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/3] KVM: SVM: Make Use of the generic guest-mode functions
This patch replaces the is_nested logic in the SVM module with the generic notion of guest-mode. Signed-off-by: Joerg Roedel joerg.roe...@amd.com --- arch/x86/kvm/svm.c | 44 +--- 1 files changed, 21 insertions(+), 23 deletions(-) diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index 2fd2f4d..bff391e 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -192,11 +192,6 @@ static inline struct vcpu_svm *to_svm(struct kvm_vcpu *vcpu) return container_of(vcpu, struct vcpu_svm, vcpu); } -static inline bool is_nested(struct vcpu_svm *svm) -{ - return svm-nested.vmcb; -} - static inline void enable_gif(struct vcpu_svm *svm) { svm-vcpu.arch.hflags |= HF_GIF_MASK; @@ -727,7 +722,7 @@ static void svm_write_tsc_offset(struct kvm_vcpu *vcpu, u64 offset) struct vcpu_svm *svm = to_svm(vcpu); u64 g_tsc_offset = 0; - if (is_nested(svm)) { + if (is_guest_mode(vcpu)) { g_tsc_offset = svm-vmcb-control.tsc_offset - svm-nested.hsave-control.tsc_offset; svm-nested.hsave-control.tsc_offset = offset; @@ -741,7 +736,7 @@ static void svm_adjust_tsc_offset(struct kvm_vcpu *vcpu, s64 adjustment) struct vcpu_svm *svm = to_svm(vcpu); svm-vmcb-control.tsc_offset += adjustment; - if (is_nested(svm)) + if (is_guest_mode(vcpu)) svm-nested.hsave-control.tsc_offset += adjustment; } @@ -1209,7 +1204,7 @@ static void update_cr0_intercept(struct vcpu_svm *svm) if (gcr0 == *hcr0 svm-vcpu.fpu_active) { vmcb-control.intercept_cr_read = ~INTERCEPT_CR0_MASK; vmcb-control.intercept_cr_write = ~INTERCEPT_CR0_MASK; - if (is_nested(svm)) { + if (is_guest_mode(svm-vcpu)) { struct vmcb *hsave = svm-nested.hsave; hsave-control.intercept_cr_read = ~INTERCEPT_CR0_MASK; @@ -1220,7 +1215,7 @@ static void update_cr0_intercept(struct vcpu_svm *svm) } else { svm-vmcb-control.intercept_cr_read |= INTERCEPT_CR0_MASK; svm-vmcb-control.intercept_cr_write |= INTERCEPT_CR0_MASK; - if (is_nested(svm)) { + if (is_guest_mode(svm-vcpu)) { struct vmcb *hsave = svm-nested.hsave; hsave-control.intercept_cr_read |= INTERCEPT_CR0_MASK; @@ -1233,7 +1228,7 @@ static void svm_set_cr0(struct kvm_vcpu *vcpu, unsigned long cr0) { struct vcpu_svm *svm = to_svm(vcpu); - if (is_nested(svm)) { + if (is_guest_mode(vcpu)) { /* * We are here because we run in nested mode, the host kvm * intercepts cr0 writes but the l1 hypervisor does not. @@ -1471,7 +1466,7 @@ static void svm_fpu_activate(struct kvm_vcpu *vcpu) struct vcpu_svm *svm = to_svm(vcpu); u32 excp; - if (is_nested(svm)) { + if (is_guest_mode(vcpu)) { u32 h_excp, n_excp; h_excp = svm-nested.hsave-control.intercept_exceptions; @@ -1700,7 +1695,7 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, { int vmexit; - if (!is_nested(svm)) + if (!is_guest_mode(svm-vcpu)) return 0; svm-vmcb-control.exit_code = SVM_EXIT_EXCP_BASE + nr; @@ -1718,7 +1713,7 @@ static int nested_svm_check_exception(struct vcpu_svm *svm, unsigned nr, /* This function returns true if it is save to enable the irq window */ static inline bool nested_svm_intr(struct vcpu_svm *svm) { - if (!is_nested(svm)) + if (!is_guest_mode(svm-vcpu)) return true; if (!(svm-vcpu.arch.hflags HF_VINTR_MASK)) @@ -1757,7 +1752,7 @@ static inline bool nested_svm_intr(struct vcpu_svm *svm) /* This function returns true if it is save to enable the nmi window */ static inline bool nested_svm_nmi(struct vcpu_svm *svm) { - if (!is_nested(svm)) + if (!is_guest_mode(svm-vcpu)) return true; if (!(svm-nested.intercept (1ULL INTERCEPT_NMI))) @@ -1994,7 +1989,8 @@ static int nested_svm_vmexit(struct vcpu_svm *svm) if (!nested_vmcb) return 1; - /* Exit nested SVM mode */ + /* Exit Guest-Mode */ + leave_guest_mode(svm-vcpu); svm-nested.vmcb = 0; /* Give the current vmcb to the guest */ @@ -2302,7 +2298,9 @@ static bool nested_svm_vmrun(struct vcpu_svm *svm) nested_svm_unmap(page); - /* nested_vmcb is our indicator if nested SVM is activated */ + /* Enter Guest-Mode */ + enter_guest_mode(svm-vcpu); + svm-nested.vmcb = vmcb_gpa; enable_gif(svm); @@ -2588,7 +2586,7 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, unsigned ecx, u64 *data) case MSR_IA32_TSC: { u64 tsc_offset; - if (is_nested(svm)) +
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. That's not what I've been hearing earlier in this thread. The responses from Yoshi indicate that Stefan's summary is correct. i.e. the current Kemari implementation may require per-device hooks, but that's a bug and should be fixed before merging. Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] tools/virtio: virtio_ring testing tool
This implements a virtio simulator: - adds stubs for enough support functions to compile virtio ring in userspace. - Adds a stub vhost based module this can talk to. This should help us decide things like which ring layout works best. Communication is currently done using an eventfd descriptor. This means there's a shared spinlock there: what I would like to do in the future, is run this under kvm and use interrupt injection and io for communication, to make it more real-life and avoid lock contention. This patchset applies on top of vhost-net-next branch in my tree. In particular you must have commits: commit 64e1c80748afca3b4818ebb232a9668bf529886d vhost-net: batch use/unuse mm commit 533a19b4b88fcf81da3106b94f0ac4ac8b33a248 vhost: put mm after thread stop I think it's probably best to keep this part of kernel tree, to avoid version skew and so we don't need to commit to any kind of API. Since there's a dependency on vhost here it's easiest to merge this through my vhost tree, so that's what I intend to do unless someone complains, soon. Signed-off-by: Michael S. Tsirkin m...@redhat.com -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
On 11/29/2010 10:53 AM, Paul Brook wrote: Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. That's not what I've been hearing earlier in this thread. The responses from Yoshi indicate that Stefan's summary is correct. i.e. the current Kemari implementation may require per-device hooks, but that's a bug and should be fixed before merging. It's actually really important that Kemari make use of an intermediate layer such that the hooks can distinguish between a device access and a recursive access. You could s/bdrv_aio_multiwrite/bdrv_aio_multiwrite_internal/g and then within kemari, s/bdrv_aio_multiwrite_proxy/bdrv_aio_multiwrite/ but I don't think that results in a cleaner interface. I don't like the _proxy naming and I think it has led to some confusion. I think having a dev_aio_multiwrite interface is a better naming scheme and ultimately provides a clearer idea of why a separate interface is needed--to distinguish between device accesses and internal accesses. BTW, dev_aio_multiwrite should take a DeviceState * and a BlockDriverState. Regards, Anthony Liguori Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] vhost test module
This adds a test module for vhost infrastructure. Intentionally not tied to kbuild to prevent people from installing and loading it accidentally. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/drivers/vhost/test.c b/drivers/vhost/test.c new file mode 100644 index 000..099f302 --- /dev/null +++ b/drivers/vhost/test.c @@ -0,0 +1,320 @@ +/* Copyright (C) 2009 Red Hat, Inc. + * Author: Michael S. Tsirkin m...@redhat.com + * + * This work is licensed under the terms of the GNU GPL, version 2. + * + * test virtio server in host kernel. + */ + +#include linux/compat.h +#include linux/eventfd.h +#include linux/vhost.h +#include linux/miscdevice.h +#include linux/module.h +#include linux/mutex.h +#include linux/workqueue.h +#include linux/rcupdate.h +#include linux/file.h +#include linux/slab.h + +#include test.h +#include vhost.c + +/* Max number of bytes transferred before requeueing the job. + * Using this limit prevents one virtqueue from starving others. */ +#define VHOST_TEST_WEIGHT 0x8 + +enum { + VHOST_TEST_VQ = 0, + VHOST_TEST_VQ_MAX = 1, +}; + +struct vhost_test { + struct vhost_dev dev; + struct vhost_virtqueue vqs[VHOST_TEST_VQ_MAX]; +}; + +/* Expects to be always run from workqueue - which acts as + * read-size critical section for our kind of RCU. */ +static void handle_vq(struct vhost_test *n) +{ + struct vhost_virtqueue *vq = n-dev.vqs[VHOST_TEST_VQ]; + unsigned out, in; + int head; + size_t len, total_len = 0; + void *private; + + private = rcu_dereference_check(vq-private_data, 1); + if (!private) + return; + + mutex_lock(vq-mutex); + vhost_disable_notify(vq); + + for (;;) { + head = vhost_get_vq_desc(n-dev, vq, vq-iov, +ARRAY_SIZE(vq-iov), +out, in, +NULL, NULL); + /* On error, stop handling until the next kick. */ + if (unlikely(head 0)) + break; + /* Nothing new? Wait for eventfd to tell us they refilled. */ + if (head == vq-num) { + if (unlikely(vhost_enable_notify(vq))) { + vhost_disable_notify(vq); + continue; + } + break; + } + if (in) { + vq_err(vq, Unexpected descriptor format for TX: + out %d, int %d\n, out, in); + break; + } + len = iov_length(vq-iov, out); + /* Sanity check */ + if (!len) { + vq_err(vq, Unexpected 0 len for TX\n); + break; + } + vhost_add_used_and_signal(n-dev, vq, head, 0); + total_len += len; + if (unlikely(total_len = VHOST_TEST_WEIGHT)) { + vhost_poll_queue(vq-poll); + break; + } + } + + mutex_unlock(vq-mutex); +} + +static void handle_vq_kick(struct vhost_work *work) +{ + struct vhost_virtqueue *vq = container_of(work, struct vhost_virtqueue, + poll.work); + struct vhost_test *n = container_of(vq-dev, struct vhost_test, dev); + + handle_vq(n); +} + +static int vhost_test_open(struct inode *inode, struct file *f) +{ + struct vhost_test *n = kmalloc(sizeof *n, GFP_KERNEL); + struct vhost_dev *dev; + int r; + + if (!n) + return -ENOMEM; + + dev = n-dev; + n-vqs[VHOST_TEST_VQ].handle_kick = handle_vq_kick; + r = vhost_dev_init(dev, n-vqs, VHOST_TEST_VQ_MAX); + if (r 0) { + kfree(n); + return r; + } + + f-private_data = n; + + return 0; +} + +static void *vhost_test_stop_vq(struct vhost_test *n, + struct vhost_virtqueue *vq) +{ + void *private; + + mutex_lock(vq-mutex); + private = rcu_dereference_protected(vq-private_data, +lockdep_is_held(vq-mutex)); + rcu_assign_pointer(vq-private_data, NULL); + mutex_unlock(vq-mutex); + return private; +} + +static void vhost_test_stop(struct vhost_test *n, void **privatep) +{ + *privatep = vhost_test_stop_vq(n, n-vqs + VHOST_TEST_VQ); +} + +static void vhost_test_flush_vq(struct vhost_test *n, int index) +{ + vhost_poll_flush(n-dev.vqs[index].poll); +} + +static void vhost_test_flush(struct vhost_test *n) +{ + vhost_test_flush_vq(n, VHOST_TEST_VQ); +} + +static int vhost_test_release(struct inode *inode, struct file *f) +{ + struct vhost_test *n = f-private_data; + void *private; + + vhost_test_stop(n, private); +
[PATCH 2/2] tools/virtio: virtio_test tool
This is the userspace part of the tool: it includes a bunch of stubs for linux APIs, somewhat simular to linuxsched. This makes it possible to recompile the ring code in userspace. A small test example is implemented combining this with vhost_test module. Signed-off-by: Michael S. Tsirkin m...@redhat.com --- diff --git a/tools/virtio/Makefile b/tools/virtio/Makefile new file mode 100644 index 000..d1d442e --- /dev/null +++ b/tools/virtio/Makefile @@ -0,0 +1,12 @@ +all: test mod +test: virtio_test +virtio_test: virtio_ring.o virtio_test.o +CFLAGS += -g -O2 -Wall -I. -I ../../usr/include/ -Wno-pointer-sign -fno-strict-overflow -MMD +vpath %.c ../../drivers/virtio +mod: + ${MAKE} -C `pwd`/../.. M=`pwd`/vhost_test +.PHONY: all test mod clean +clean: + ${RM} *.o vhost_test/*.o vhost_test/.*.cmd \ + vhost_test/Module.symvers vhost_test/modules.order *.d +-include *.d diff --git a/tools/virtio/linux/device.h b/tools/virtio/linux/device.h new file mode 100644 index 000..4ad7e1d --- /dev/null +++ b/tools/virtio/linux/device.h @@ -0,0 +1,2 @@ +#ifndef LINUX_DEVICE_H +#endif diff --git a/tools/virtio/linux/slab.h b/tools/virtio/linux/slab.h new file mode 100644 index 000..81baeac --- /dev/null +++ b/tools/virtio/linux/slab.h @@ -0,0 +1,2 @@ +#ifndef LINUX_SLAB_H +#endif diff --git a/tools/virtio/linux/virtio.h b/tools/virtio/linux/virtio.h new file mode 100644 index 000..669bcdd --- /dev/null +++ b/tools/virtio/linux/virtio.h @@ -0,0 +1,223 @@ +#ifndef LINUX_VIRTIO_H +#define LINUX_VIRTIO_H + +#include stdbool.h +#include stdlib.h +#include stddef.h +#include stdio.h +#include string.h +#include assert.h + +#include linux/types.h +#include errno.h + +typedef unsigned long long dma_addr_t; + +struct scatterlist { + unsigned long page_link; + unsigned intoffset; + unsigned intlength; + dma_addr_t dma_address; +}; + +struct page { + unsigned long long dummy; +}; + +#define BUG_ON(__BUG_ON_cond) assert(!(__BUG_ON_cond)) + +/* Physical == Virtual */ +#define virt_to_phys(p) ((unsigned long)p) +#define phys_to_virt(a) ((void *)(unsigned long)(a)) +/* Page address: Virtual / 4K */ +#define virt_to_page(p) ((struct page*)((virt_to_phys(p) / 4096) * \ + sizeof(struct page))) +#define offset_in_page(p) (((unsigned long)p) % 4096) +#define sg_phys(sg) ((sg-page_link ~0x3) / sizeof(struct page) * 4096 + \ +sg-offset) +static inline void sg_mark_end(struct scatterlist *sg) +{ + /* +* Set termination bit, clear potential chain bit +*/ + sg-page_link |= 0x02; + sg-page_link = ~0x01; +} +static inline void sg_init_table(struct scatterlist *sgl, unsigned int nents) +{ + memset(sgl, 0, sizeof(*sgl) * nents); + sg_mark_end(sgl[nents - 1]); +} +static inline void sg_assign_page(struct scatterlist *sg, struct page *page) +{ + unsigned long page_link = sg-page_link 0x3; + + /* +* In order for the low bit stealing approach to work, pages +* must be aligned at a 32-bit boundary as a minimum. +*/ + BUG_ON((unsigned long) page 0x03); + sg-page_link = page_link | (unsigned long) page; +} + +static inline void sg_set_page(struct scatterlist *sg, struct page *page, + unsigned int len, unsigned int offset) +{ + sg_assign_page(sg, page); + sg-offset = offset; + sg-length = len; +} + +static inline void sg_set_buf(struct scatterlist *sg, const void *buf, + unsigned int buflen) +{ + sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf)); +} + +static inline void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int buflen) +{ + sg_init_table(sg, 1); + sg_set_buf(sg, buf, buflen); +} + +typedef __u16 u16; + +typedef enum { + GFP_KERNEL, + GFP_ATOMIC, +} gfp_t; +typedef enum { + IRQ_NONE, + IRQ_HANDLED +} irqreturn_t; + +static inline void *kmalloc(size_t s, gfp_t gfp) +{ + return malloc(s); +} + +static inline void kfree(void *p) +{ + free(p); +} + +#define container_of(ptr, type, member) ({ \ + const typeof( ((type *)0)-member ) *__mptr = (ptr);\ + (type *)( (char *)__mptr - offsetof(type,member) );}) + +#define uninitialized_var(x) x = x + +# ifndef likely +# define likely(x)(__builtin_expect(!!(x), 1)) +# endif +# ifndef unlikely +# define unlikely(x) (__builtin_expect(!!(x), 0)) +# endif + +#define pr_err(format, ...) fprintf (stderr, format, ## __VA_ARGS__) +#ifdef DEBUG +#define pr_debug(format, ...) fprintf (stderr, format, ## __VA_ARGS__) +#else +#define pr_debug(format, ...) do {} while (0) +#endif +#define dev_err(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__) +#define dev_warn(dev, format, ...) fprintf (stderr, format, ## __VA_ARGS__) + +/* TODO: empty stubs for now. Broken but enough for
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
On 11/29/2010 10:53 AM, Paul Brook wrote: Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. That's not what I've been hearing earlier in this thread. The responses from Yoshi indicate that Stefan's summary is correct. i.e. the current Kemari implementation may require per-device hooks, but that's a bug and should be fixed before merging. It's actually really important that Kemari make use of an intermediate layer such that the hooks can distinguish between a device access and a recursive access. I'm failing to understand how this is anything other than running sed over block/*.c (or hw/*.c, depending whether you choose to rename the internal or external API). Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Don't spam trace report about missing fields
On 11/29/2010 06:26 PM, Steven Rostedt wrote: On Mon, 2010-11-29 at 18:20 +0200, Avi Kivity wrote: Instead of spamming the output with complaints about missing fields, simply return an error to the caller, which can print something out or do something more intelligent about them. Fixes kvm plugin tracing older kernels. Hmm, what about just passing in err=0? I was just patching the printk out it mindlessly. Will send a new patch. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
(Sorry for late reply...) On Thu, 25 Nov 2010 17:23:13 +0100, Roedel, Joerg said: On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: On 11/25/2010 03:13 PM, Roedel, Joerg wrote: What about things like adding instructions and forgetting to add the corresponding svm.c code? Cannot happen. Every instruction that can be intercepted with SVM is already handled in this patch-set. Call us back when Intel releases the i9 and i11 with new instructions that need intercept handling. ;) pgp2d3vqVooZD.pgp Description: PGP signature
[PATCH trace-cmd] kvm: don't warn on new fields
The kvm plugin understands a few new fields; don't warn if they are missing, as expected on older kernels. Signed-off-by: Avi Kivity a...@redhat.com --- plugin_kvm.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/plugin_kvm.c b/plugin_kvm.c index c1cb2e4..8115235 100644 --- a/plugin_kvm.c +++ b/plugin_kvm.c @@ -249,15 +249,15 @@ static int kvm_exit_handler(struct trace_seq *s, struct record *record, if (pevent_get_field_val(s, event, exit_reason, record, val, 1) 0) return -1; - if (pevent_get_field_val(s, event, isa, record, isa, 1) 0) + if (pevent_get_field_val(s, event, isa, record, isa, 0) 0) isa = 1; trace_seq_printf(s, reason %s, find_exit_reason(isa, val)); pevent_print_num_field(s, rip 0x%lx, event, guest_rip, record, 1); - if (pevent_get_field_val(s, event, info1, record, info1, 1) = 0 -pevent_get_field_val(s, event, info2, record, info2, 1) = 0) + if (pevent_get_field_val(s, event, info1, record, info1, 0) = 0 +pevent_get_field_val(s, event, info2, record, info2, 0) = 0) trace_seq_printf(s, info %llx %llx\n, info1, info2); return 0; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
On 11/29/2010 11:18 AM, Paul Brook wrote: On 11/29/2010 10:53 AM, Paul Brook wrote: Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. That's not what I've been hearing earlier in this thread. The responses from Yoshi indicate that Stefan's summary is correct. i.e. the current Kemari implementation may require per-device hooks, but that's a bug and should be fixed before merging. It's actually really important that Kemari make use of an intermediate layer such that the hooks can distinguish between a device access and a recursive access. I'm failing to understand how this is anything other than running sed over block/*.c (or hw/*.c, depending whether you choose to rename the internal or external API). You're right, it's not a big deal, and requiring everything in hw use the new interface is not a bad idea. If a device doesn't work with Kemari, that's okay as long as the non-Kemari case is essentially a nop. Regards, Anthony Liguori Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 22 (kvm)
On 11/29/2010 06:35 AM, Avi Kivity wrote: On 11/29/2010 06:33 PM, Randy Dunlap wrote: On Mon, 22 Nov 2010 13:26:27 -0800 Randy Dunlap wrote: On Mon, 22 Nov 2010 13:49:11 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101119: kvm.c:(.init.text+0x11f49): undefined reference to `kvm_register_clock' when CONFIG_KVM_CLOCK is not enabled. BUild error still present in linux-next-2010-NOV-29. Glauber, Zach? I can only speculate this reference is being called from smpboot without CONFIG guarding? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 22 (kvm)
On 11/29/10 09:47, Zachary Amsden wrote: On 11/29/2010 06:35 AM, Avi Kivity wrote: On 11/29/2010 06:33 PM, Randy Dunlap wrote: On Mon, 22 Nov 2010 13:26:27 -0800 Randy Dunlap wrote: On Mon, 22 Nov 2010 13:49:11 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101119: kvm.c:(.init.text+0x11f49): undefined reference to `kvm_register_clock' when CONFIG_KVM_CLOCK is not enabled. BUild error still present in linux-next-2010-NOV-29. Glauber, Zach? I can only speculate this reference is being called from smpboot without CONFIG guarding? Sorry, looks like I dropped the first line of the error messages: arch/x86/built-in.o: In function `kvm_smp_prepare_boot_cpu': kvm.c:(.init.text+0xad38): undefined reference to `kvm_register_clock' from arch/x86/kernel/kvm.c: #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { WARN_ON(kvm_register_clock(primary cpu clock)); kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); } so it looks like you are correct... -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 22 (kvm)
On 11/29/2010 07:52 AM, Randy Dunlap wrote: On 11/29/10 09:47, Zachary Amsden wrote: On 11/29/2010 06:35 AM, Avi Kivity wrote: On 11/29/2010 06:33 PM, Randy Dunlap wrote: On Mon, 22 Nov 2010 13:26:27 -0800 Randy Dunlap wrote: On Mon, 22 Nov 2010 13:49:11 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101119: kvm.c:(.init.text+0x11f49): undefined reference to `kvm_register_clock' when CONFIG_KVM_CLOCK is not enabled. BUild error still present in linux-next-2010-NOV-29. Glauber, Zach? I can only speculate this reference is being called from smpboot without CONFIG guarding? Sorry, looks like I dropped the first line of the error messages: arch/x86/built-in.o: In function `kvm_smp_prepare_boot_cpu': kvm.c:(.init.text+0xad38): undefined reference to `kvm_register_clock' from arch/x86/kernel/kvm.c: #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { WARN_ON(kvm_register_clock(primary cpu clock)); kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); } so it looks like you are correct... Looks like this is the appropriate fix: #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { #ifdef CONFIG_KVM_CLOCK WARN_ON(kvm_register_clock(primary cpu clock)); #endif kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); } The SMP code is still buggy as well, wrt printk timing, in that it doesn't get called early enough, correct? Has anyone thought of a good solution to that problem? Basically the problem is CPU-1 will get CPU-0's per-cpu areas copied over, and these are not valid for CPU-1. If the clocksource is used on CPU-1 before kvm clock gets setup, it can go backwards, wreaking havoc, causing panic, etc. What is the best test to guard against this? Perhaps we should keep the CPU number in the per-cpu data and test against it? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Transcendent Memory and KVM
At the 2010 Linux Plumber's Conference Virtualization Mini-Conference earlier this month, I did a presentation about physical memory utilization and Transcendent Memory (tmem). I mentioned that I am developing an in-kernel version of tmem that might be suitable, with small modifications, for use with KVM. I plan to post a working first draft of that code to linux-mm in the next few days. Although tmem was originally designed for Xen, some of the ideas that have evolved from tmem have broader application and may make an interesting project for someone working in the KVM area interested in memory management. While I personally am not able to do KVM development, I would be willing to assist and answer questions if someone else is interested in trying to plug my in-kernel tmem code together to work with KVM. Thanks, Dan Magenheimer P.S. I don't subscribe to this list, so please cc me on any replies or discussion. References: Tmem overview: http://oss.oracle.com/projects/tmem LPC Virt wiki: http://wiki.linuxplumbersconf.org/2010:virtualization Presentation slides: http://wiki.linuxplumbersconf.org/_media/2010:07-memmgmtvirtenv-lpc2010-final-v2.pdf Detailed script for above presentation: http://oss.oracle.com/projects/tmem/dist/documentation/presentations/MemMgmtVirtEnv-LPC2010-SpkNotes.pdf Future possible uses of tmem-related concepts (from LSF10/MM summit): http://marc.info/?l=linux-mmm=127811271605009 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Mon, Nov 29, 2010 at 12:23:38PM -0500, valdis.kletni...@vt.edu wrote: (Sorry for late reply...) On Thu, 25 Nov 2010 17:23:13 +0100, Roedel, Joerg said: On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: On 11/25/2010 03:13 PM, Roedel, Joerg wrote: What about things like adding instructions and forgetting to add the corresponding svm.c code? Cannot happen. Every instruction that can be intercepted with SVM is already handled in this patch-set. Call us back when Intel releases the i9 and i11 with new instructions that need intercept handling. ;) How does that affect SVM? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: linux-next: Tree for November 22 (kvm)
On 11/29/10 10:08, Zachary Amsden wrote: On 11/29/2010 07:52 AM, Randy Dunlap wrote: On 11/29/10 09:47, Zachary Amsden wrote: On 11/29/2010 06:35 AM, Avi Kivity wrote: On 11/29/2010 06:33 PM, Randy Dunlap wrote: On Mon, 22 Nov 2010 13:26:27 -0800 Randy Dunlap wrote: On Mon, 22 Nov 2010 13:49:11 +1100 Stephen Rothwell wrote: Hi all, Changes since 20101119: kvm.c:(.init.text+0x11f49): undefined reference to `kvm_register_clock' when CONFIG_KVM_CLOCK is not enabled. BUild error still present in linux-next-2010-NOV-29. Glauber, Zach? I can only speculate this reference is being called from smpboot without CONFIG guarding? Sorry, looks like I dropped the first line of the error messages: arch/x86/built-in.o: In function `kvm_smp_prepare_boot_cpu': kvm.c:(.init.text+0xad38): undefined reference to `kvm_register_clock' from arch/x86/kernel/kvm.c: #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { WARN_ON(kvm_register_clock(primary cpu clock)); kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); } so it looks like you are correct... Looks like this is the appropriate fix: #ifdef CONFIG_SMP static void __init kvm_smp_prepare_boot_cpu(void) { #ifdef CONFIG_KVM_CLOCK WARN_ON(kvm_register_clock(primary cpu clock)); #endif kvm_guest_cpu_init(); native_smp_prepare_boot_cpu(); } Sure, that works. Thanks. The SMP code is still buggy as well, wrt printk timing, in that it doesn't get called early enough, correct? Has anyone thought of a good solution to that problem? Basically the problem is CPU-1 will get CPU-0's per-cpu areas copied over, and these are not valid for CPU-1. If the clocksource is used on CPU-1 before kvm clock gets setup, it can go backwards, wreaking havoc, causing panic, etc. What is the best test to guard against this? Perhaps we should keep the CPU number in the per-cpu data and test against it? -- ~Randy *** Remember to use Documentation/SubmitChecklist when testing your code *** -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: buildbot for kvm.git
On Tuesday, November 30, 2010 01:20:17 am Jan Kiszka wrote: Unfortunately we're running (in meanwhile) an old version of buildbot as buildbot-master (0.7.8, as shipped by Debian 5). Last time I checked (quite a few moons ago, though), that version contained an unfixed security issue. I installed a vanilla version on my server for that reason (and forgot to patch that afterward...). Are you talking about this one? http://buildbot.net/trac/wiki/SecurityAlert0711 We applied that patch already. Are you aware of any other unfixed security issues? Best Regards, Daniel -- Daniel Gollub Linux Consultant Developer Mail: gol...@b1-systems.de B1 Systems GmbH Osterfeldstraße 7 / 85088 Vohburg / http://www.b1-systems.de GF: Ralph Dehner / Unternehmenssitz: Vohburg / AG: Ingolstadt,HRB 3537 signature.asc Description: This is a digitally signed message part.
Re: [Qemu-devel] Re: [PATCH] ceph/rbd block driver for qemu-kvm (v8)
On Mon, 2010-11-29 at 11:02 +0100, Kevin Wolf wrote: Am 29.11.2010 09:59, schrieb Kevin Wolf: Am 27.11.2010 08:12, schrieb Stefan Hajnoczi: On Fri, Nov 26, 2010 at 9:59 PM, Christian Brunner c.m.brun...@gmail.com wrote: Thanks for the review. What am I supposed to do now? Kevin is the block maintainer. His review is the next step, I have CCed him. After that rbd would be ready to merge. If I don't find anything really obvious and it doesn't break the build, I'll merge it based on your review. Which librados version is this supposed to require? My F12 one seems to be too old, however configure still automatically enables it (so the build fails in the default configuration for me). I think you need to add some check there. $ rpm -q ceph-devel ceph-devel-0.20.2-1.fc12.x86_64 $ LANG=C make CCblock/rbd.o block/rbd.c: In function 'rbd_register_image': block/rbd.c:191: error: 'CEPH_OSD_TMAP_SET' undeclared (first use in this function) block/rbd.c:191: error: (Each undeclared identifier is reported only once block/rbd.c:191: error: for each function it appears in.) cc1: warnings being treated as errors block/rbd.c: In function 'rbd_set_snapc': block/rbd.c:468: error: implicit declaration of function 'rados_set_snap_context' block/rbd.c:468: error: nested extern declaration of 'rados_set_snap_context' block/rbd.c: In function 'rbd_snap_create': block/rbd.c:844: error: implicit declaration of function 'rados_selfmanaged_snap_create' block/rbd.c:844: error: nested extern declaration of 'rados_selfmanaged_snap_create' make: *** [block/rbd.o] Error 1 Right. The CEPH_OSD_TMAP_SET can be fixed and in theory we can get it compiled without snapshots, but we're not sure that it is really a good idea at this point. The following patch disables rbd when librados is too old. Thanks, Yehuda -- From: Yehuda Sadeh yeh...@hq.newdream.net Date: Mon, 29 Nov 2010 10:38:41 -0800 Subject: [PATCH 1/1] rbd: disable rbd in configure if librados is too old This checks for the existence of the certain function and other definition. Signed-off-by: Yehuda Sadeh yeh...@hq.newdream.net --- configure | 27 --- 1 files changed, 24 insertions(+), 3 deletions(-) diff --git a/configure b/configure index 5d8f620..18ae07c 100755 --- a/configure +++ b/configure @@ -1770,15 +1770,36 @@ int main(void) { rados_initialize(0, NULL); return 0; } EOF rbd_libs=-lrados -lcrypto if compile_prog $rbd_libs ; then -rbd=yes -libs_tools=$rbd_libs $libs_tools -libs_softmmu=$rbd_libs $libs_softmmu +librados_too_old=no +cat $TMPC EOF +#include stdio.h +#include rados/librados.h +#ifndef CEPH_OSD_TMAP_SET +#error missing CEPH_OSD_TMAP_SET +#endif +int main(void) { +int (*func)(const rados_pool_t pool, uint64_t *snapid) = rados_selfmanaged_snap_create; +rados_initialize(0, NULL); +return 0; +} +EOF +if compile_prog $rbd_libs ; then + rbd=yes + libs_tools=$rbd_libs $libs_tools + libs_softmmu=$rbd_libs $libs_softmmu +else + rbd=no + librados_too_old=yes +fi else if test $rbd = yes ; then feature_not_found rados block device fi rbd=no fi + if test $librados_too_old = yes ; then +echo - Your librados version is too old - upgrade needed to have rbd support + fi fi ## -- 1.5.6.5 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/9] KVM: Make the instruction emulator aware of Nested Virtualization
On Mon, 29 Nov 2010 19:32:12 +0100, Joerg Roedel said: On Mon, Nov 29, 2010 at 12:23:38PM -0500, valdis.kletni...@vt.edu wrote: (Sorry for late reply...) On Thu, 25 Nov 2010 17:23:13 +0100, Roedel, Joerg said: On Thu, Nov 25, 2010 at 10:17:53AM -0500, Avi Kivity wrote: On 11/25/2010 03:13 PM, Roedel, Joerg wrote: What about things like adding instructions and forgetting to add the corresponding svm.c code? Cannot happen. Every instruction that can be intercepted with SVM is already handled in this patch-set. Call us back when Intel releases the i9 and i11 with new instructions that need intercept handling. ;) How does that affect SVM? It will quite possibly include instructions that can be intercepted with SVM that are not in this patch set. At which point Joerg's comment can apply - it will be possible to add it in one place and forget to add it in the svm.c code. pgpxviFJOpuUT.pgp Description: PGP signature
Re: [PATCH trace-cmd] kvm: don't warn on new fields
On Mon, 2010-11-29 at 19:25 +0200, Avi Kivity wrote: The kvm plugin understands a few new fields; don't warn if they are missing, as expected on older kernels. Signed-off-by: Avi Kivity a...@redhat.com --- plugin_kvm.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) Applied, Thanks Avi! -- Steve -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limiting guest block i/o for qos
On Mon, Nov 29, 2010 at 4:58 AM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Nov 29, 2010 at 2:00 AM, T Johnson tjohnso...@gmail.com wrote: Hello, On Thu, Nov 25, 2010 at 3:33 AM, Nikola Ciprich extmaill...@linuxbox.cz wrote: Hello Thomas, I t hink blkio-cgroup really can't help You here, but since NFS is network protocol, why not just consider some kind of network shaping? n. I thought about this, but it's rather imprecise I imagine if I try to limit the number of packets per second and hope that matches reads or writes per second. Secondly, I have many guests running to the same NFS server which makes limiting per kvm guest somewhat impossible when the network tools I know if would limit per NFS server. Perhaps iptables/tc can mark the stream based on the client process ID? Each VM has a qemu-kvm userspace process that will issue file I/O. Someone with more networking knowledge could confirm whether or not it is possible to mark based on the process ID using the in-kernel NFS client. You don't need to limit based on packets per second. You can do bandwidth-based traffic shaping with tc. Thanks. That is an interesting idea. As far as packets or bandwidth, my thought was IOPS were more expensive than total bytes transferred.. so I would want to limit the number of distinct read/write requests if possible. Anyway, I'm surprised this doesn't seem to come up more often. If anyone out there is listening, please consider this a plea for something built in to limit i/o in this situation. It would be extremely useful. Thanks again -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 09/21] Introduce event-tap.
On Thu, Nov 25, 2010 at 03:06:48PM +0900, Yoshiaki Tamura wrote: event-tap controls when to start FT transaction, and provides proxy functions to called from net/block devices. While FT transaction, it queues up net/block requests, and flush them when the transaction gets completed. Signed-off-by: Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp Signed-off-by: OHMURA Kei ohmura@lab.ntt.co.jp +static void event_tap_alloc_blk_req(EventTapBlkReq *blk_req, +BlockDriverState *bs, BlockRequest *reqs, +int num_reqs, BlockDriverCompletionFunc *cb, +void *opaque, bool is_multiwrite) +{ +int i; + +blk_req-num_reqs = num_reqs; +blk_req-num_cbs = num_reqs; +blk_req-device_name = qemu_strdup(bs-device_name); +blk_req-is_multiwrite = is_multiwrite; + +for (i = 0; i num_reqs; i++) { +blk_req-reqs[i].sector = reqs[i].sector; +blk_req-reqs[i].nb_sectors = reqs[i].nb_sectors; +blk_req-reqs[i].qiov = reqs[i].qiov; +blk_req-reqs[i].cb = cb; +blk_req-reqs[i].opaque = opaque; +blk_req-cb[i] = reqs[i].cb; +blk_req-opaque[i] = reqs[i].opaque; +} +} bdrv_aio_flush should also be logged, so that guest initiated flush is respected on replay. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
On Sun, Nov 28, 2010 at 08:47:34PM +0200, Gleb Natapov wrote: On Sun, Nov 28, 2010 at 12:15:44PM -0500, Kevin O'Connor wrote: It's unclear to me how SeaBIOS is supposed to do that. Suppose we have /p...@i0cf8/s...@3/d...@0,0 with boot index 5 in boot devices list and suppose pci device in slot 3 function 0 has optionrom. When Seabios load the option rom from device to memory it looks for string that matches /p...@i0cf8/@3.* if the string is found option rom gets boot index from it. In our case /p...@i0cf8/s...@3/d...@0,0 will match and optionrom will get boot index 5. In practice Seabios will know that device is SCSI by reading device class so it will be able to construct string /p...@i0cf8/s...@3 and use simple strstr to find matching device path. I recognize that if we had a regex engine in seabios this would work, but I'm reluctant to add one. strstr doesn't work becuase @3 could match some unrelated part of the path (eg, don't want to match /p...@i0cf8/s...@1/d...@3,0) - so, what you seem to want is /p...@i0cf8/[^/]...@3(/|$). [...] I'm still stuck on how seabios is supposed to know it's an ethernet card. Sure, seabios could hardcode translations from classid to strings, but that seems fragile. What happens when the user wants to boot from myranet, or fiberchannel, or whatnot? This is not fragile since class to name translation is defined by the spec. But even that is not required if Seabios will be a little bit smarter and will implement fuzzy matching. i.e do not match /p...@i0cf8/ether...@4/ethernet-...@0 exactly but match /p...@i0cf8/@4.* instead. I think we're agreeing here that we don't want to put class to name translation in seabios. :-) Maybe we can compromise here - if the user selects booting from a device, and qemu sees there is a rom for that device, then qemu can specify two boot options: /p...@i0cf8/ether...@4/ethernet-...@0 /p...@i0cf8/r...@4 This patch series implement device paths as defined by Openfirmware spec. /p...@i0cf8/r...@4 sould be out of spec. But I do not see why Seabios can't build later from the former. Also I do not see why it would be needed at all. The name isn't important to me - call it something else if you want. It's value is that SeaBIOS doesn't then need to do fuzzy matching or parsing of the device names. That is, we turn the list from boot devices to boot methods which makes life easier for the firmware. SeaBIOS will ignore the first entry, and act on the second entry. If at some later point seabios knows how to natively boot from the device (eg, scsi), then it will use the first entry and ignore the second. If you let go to the idea of exact matching of string built by qemu in Seabios it will be easy to see that /p...@i0cf8/ether...@4/ethernet-...@0 provides everything that Seabios needs to know and even more. If you ignore all the noise it just says boot from pci device slot 4 fn 0. Seabios may have native support for the card in the slot or it can use option rom on the card. Qemu does not care. I'm having a hard time letting go of string matching. I understand all the info is there if SeaBIOS parses the string. However, I think parsing out openbios device strings is overkill in an x86 bios that just wants to order the boot objects it knows about. Is there an issue with qemu generating two strings for devices with roms? BTW, how are PCI locations specified in these paths? They should have a (bus, dev, fn) - your examples only seem to show dev. How are the other parts specified? Bus numbers are assigned by a guest. Qemu knows nothing about them, so it specify device path by topology. If pci bridge is present device path will look like this: /p...@i0cf8/p...@2/ether...@4,1/ethernet-...@0. This path point to device in slot 4 fn 1 sitting on pci-to-pci bridge in slot 2 fn 0 of host pci controller. Same is true for usb bus. I understand - it makes sense. This does mean that seabios will need to track where each pci bus comes from. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv6 00/16] boot order specification
On Mon, Nov 29, 2010 at 11:50:45AM +0100, Gerd Hoffmann wrote: If scsi card has optionrom with only one bcv then Seabios can determine its boot order from device path, so why not provide user with this option today? It's unclear to me how SeaBIOS is supposed to do that. Try to keep track of which bcv/bev belongs to which pci device? It should surely work for devices supported by seabios natively. No issues for any device with native support. I'm okay with the proposed syntax. SeaBIOS should also know which device's rom registered which entry. It doesn't today, but that shouldn't be an issue to add. It might become tricky though in case there are multiple identical devices are present, say two e1000 cards, where the first rom could register entries for both cards ... Right - here's where things get complicated. Maybe we can compromise here - if the user selects booting from a device, and qemu sees there is a rom for that device, then qemu can specify two boot options: /p...@i0cf8/ether...@4/ethernet-...@0 /p...@i0cf8/r...@4 SeaBIOS will ignore the first entry, and act on the second entry. SeaBIOS should be able to operate just fine with the first entry. ether...@4 means the nic at bus address 4. As this is a PCI bus 4 is the pci address. So SeaBIOS would just look what entries it has for 00:04.0, run the rom, and ignore the /ethernet-...@0 part as it can't handle it. Right - I'm not happy about trying to parse out openbios device descriptors though. The natural flow (as I see it) is for seabios to find all the boot methods in the system and then see which ones have been requested to be prioritized. Trying to do fuzzy matching of found device to requested device just seems like an unnecessary pain IMO. When booting via rom it can either just pick the first entry unconditionally (probably good enougth in 99% of the cases) or do some guesswork based on the order the entries are registered. I guess that's the crux of the matter - I'd rather not do guessing in the firmware. The emulator is in a much better position to do heuristics and guessing - if nothing else, the emulator can allow the user to pass it in on the command-line. BTW, how are PCI locations specified in these paths? They should have a (bus, dev, fn) - your examples only seem to show dev. How are the other parts specified? fn is optional for fn=0, IIRC the syntax is $cl...@$dev,$fn. Bus is specified via location in the tree, i.e. you'll see the bridge for the secondary pci bus in the path, like this: /p...@i0cf8/bri...@7/ether...@3/... (not sure it is actually named 'bridge' in the openfirmware specs though). Thanks. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Agenda for November 30
Please send any agenda items you are interested in covering. As I forgot to put the call for agenda befor, Anthony already suggested: - 2011 kvm conference - 0.14.0 release plan - infrastructure changes (irc channel migration, git tree migration) thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2493108 ] Win2k problems on some (not all) Intel hosts
Bugs item #2493108, was opened at 2009-01-08 12:53 Message generated for change (Comment added) made by kmshanah You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2493108group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Kevin Shanahan (kmshanah) Assigned to: Nobody/Anonymous (nobody) Summary: Win2k problems on some (not all) Intel hosts Initial Comment: I have a Windows 2000 guest that I have been testing on an desktop machine (E8400 CPU) and that has been working well (apart from being affected by bug 2314737). When I moved (not a live migration, just shutdown, rsync and boot on the new server) the guest to our server, which is an IBM X3550 with two Xeon 5130 CPUs the guest has short freezes where the guest CPU usage spikes on both virtual CPUs and the guest becomes unresponsive for several seconds at a time. These guest CPU spikes can last over a minute, but the guest might have moments where it briefly responds to the delayed keystrokes, etc. every few seconds. One symptom of this behaviour for me last night was that Win2k AD replication was failing, so it's not just interactive use that suffers. Something I noticed that may be relevant - due to the other bug (2314737) which causes each guest CPU to use 100% of the host CPU, whether the guest CPU is idle or not, I can tell when the guest is having this problem by looking at the 'top' output on the host. When the guest is operating normally, the host will show the qemu-system-x86_64 process using 200% CPU (the guest is using -smp 2). However, when the guest is misbehaving, the host will show the qemu-system-x86_64 process only using _100%_ CPU. Maybe one of the threads is stuck, or something is forcing them to share a single core? Both hosts are running Debian Lenny/Sid, 64-bit with a kernel.org 2.6.28 kernel and kvm-82. The command line used in both cases: /usr/local/kvm/bin/qemu-system-x86_64 \ -smp 2 \ -localtime -m 2048 \ -drive if=ide,file=kvm-ks-02a.img,index=0,media=disk,boot=on \ -drive if=ide,file=kvm-ks-02b.img,index=2,media=disk \ -net nic,vlan=0,macaddr=52:54:00:12:34:68,model=virtio \ -net tap,vlan=0,ifname=tap18,script=no \ -vnc 127.0.0.1:18 -usbdevice tablet \ -daemonize CPUs on the good host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 cpu MHz : 3000.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 5984.98 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 cpu MHz : 3000.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 5984.97 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: And the bad host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant _tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3
[ kvm-Bugs-2494730 ] Guests stalling on kvm-82 / Linux 2.6.28
Bugs item #2494730, was opened at 2009-01-09 09:59 Message generated for change (Comment added) made by kmshanah You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2494730group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Kevin Shanahan (kmshanah) Assigned to: Nobody/Anonymous (nobody) Summary: Guests stalling on kvm-82 / Linux 2.6.28 Initial Comment: I am seeing periodic stalls in Linux and Windows guests with kvm-82 on an IBM X3550 server with 2 x Xeon 5130 CPUs and 32GB RAM. I am *reasonably* certain that this is a regression somewhere between kvm-72 and kvm-82. We had been running kvm-72 (actually, the debian kvm-source package) up until now and never noticed the problem. Now the stalls are very obvious. When the guest stalls, the at least one kvm process on the host gobbles up 100% CPU. I'll do my debugging with the Linux guest, as that's sure to be easier to deal with. As a simple demostration that the guest is unresponsive, here is the result of me pinging the guest from another machine on the (very quiet) LAN: --- hermes-old.wumi.org.au ping statistics --- 600 packets transmitted, 600 received, 0% packet loss, time 599659ms rtt min/avg/max/mdev = 0.255/181.211/6291.871/558.706 ms, pipe 7 The latency varies pretty badly, with spikes up to several seconds as you can see. The problem is not reproducable on other VT capable hardware that I have - e.g. my desktop has a E8400 CPU which runs the VMs just fine. Does knowing that make it any easier to guess where the problem might be? The Xeon 5130 does not have the smx, est, sse4_1, xsave, vnmi and flexpriority CPU flags that the E8400 does. Because this server is the only hardware I have which exhibits the problem and it's a production machine, I have limited times where I can do testing. However, I will try confirm that kvm-72 is okay and then bisect. Currently the host is running a 2.6.28 kernel with the kvm-82 modules. I guess I'm likely to have problems compiling the older kvm releases against this kernel, so I'll have to drop back to 2.6.27.something to run the tests. CPU Vendor: Intel CPU Type: Xeon 5130 Number of CPUs: 2 Host distribution: Debain Lenny/Sid KVM version: kvm-82 Host kernel: Linux 2.6.28 x86_64 Guest Distribution: Debian Etch Guest kernel: Linux 2.6.27.10 i686 Host's /proc/cpuinfo: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips: 3990.23 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm tpr_shadow bogomips: 3989.96 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 2 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 3 siblings: 2 core id : 0 cpu cores : 2 apicid : 6 initial apicid : 6 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx tm2 ssse3 cx16 xtpr pdcm dca lahf_lm
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/30 Dor Laor dl...@redhat.com: On 11/29/2010 06:23 PM, Stefan Hajnoczi wrote: On Mon, Nov 29, 2010 at 3:00 PM, Yoshiaki Tamura tamura.yoshi...@lab.ntt.co.jp wrote: 2010/11/29 Paul Brookp...@codesourcery.com: If devices incorrectly claim support for live migration, then that should also be fixed, either by removing the broken code or by making it work. I totally agree with you. AFAICT your current proposal is just feeding back the results of some fairly specific QA testing. I'd rather not get into that game. The correct response in the context of upstream development is to file a bug and/or fix the code. We already have config files that allow third party packagers to remove devices they don't want to support. Sorry, I didn't get what you're trying to tell me. My plan would be to initially start from a subset of devices, and gradually grow the number of devices that Kemari works with. While this process, it'll include what you said above, file a but and/or fix the code. Am I missing what you're saying? My point is that the whitelist shouldn't exist at all. Devices either support migration or they don't. Having some sort of separate whitelist is the wrong way to determine which devices support migration. Alright! Then if a user encounters a problem with Kemari, we'll fix Kemari or the devices or both. Correct? Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. IIUC, the additional hook you're mentioning is the hack for virtio. Michael has commented on it, I hope his patch make the hack unnecessary. Yoshi (If such a device does not work under Kemari then this is a bug that needs to be fixed in live migration, Kemari, or the device.) Stefan -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 00/21] Kemari for KVM 0.2
2010/11/30 Anthony Liguori anth...@codemonkey.ws: On 11/29/2010 10:53 AM, Paul Brook wrote: Is this a fair summary: any device that supports live migration workw under Kemari? It might be fair summary but practically we barely have live migration working w/o Kemari. In addition, last I checked Kemari needs additional hooks and it will be too hard to keep that out of tree until all devices get it. That's not what I've been hearing earlier in this thread. The responses from Yoshi indicate that Stefan's summary is correct. i.e. the current Kemari implementation may require per-device hooks, but that's a bug and should be fixed before merging. It's actually really important that Kemari make use of an intermediate layer such that the hooks can distinguish between a device access and a recursive access. You could s/bdrv_aio_multiwrite/bdrv_aio_multiwrite_internal/g and then within kemari, s/bdrv_aio_multiwrite_proxy/bdrv_aio_multiwrite/ but I don't think that results in a cleaner interface. I don't like the _proxy naming and I think it has led to some confusion. I think having a dev_aio_multiwrite interface is a better naming scheme and ultimately provides a clearer idea of why a separate interface is needed--to distinguish between device accesses and internal accesses. Sorry about the naming. But from the discussion so far, adding an intermediate layer and exporting it to some/all approach needs a strong reason. Kemari itself can be implemented w/ or w/o the intermediate layer, and this makes the discussion toward folding the layer into block/net to be appropriate. I think there are two perspectives to decide which way to go: - What is clean interfaces for upper/lower layer? - If we introduce the intermediate layer, is there anyone who may use now or in the future? If not, it may not be worth to add. Yoshi BTW, dev_aio_multiwrite should take a DeviceState * and a BlockDriverState. Regards, Anthony Liguori Paul -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2493108 ] Win2k problems on some (not all) Intel hosts
Bugs item #2493108, was opened at 2009-01-08 03:23 Message generated for change (Comment added) made by jessorensen You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2493108group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: intel Group: None Status: Closed Resolution: Out of Date Priority: 5 Private: No Submitted By: Kevin Shanahan (kmshanah) Assigned to: Nobody/Anonymous (nobody) Summary: Win2k problems on some (not all) Intel hosts Initial Comment: I have a Windows 2000 guest that I have been testing on an desktop machine (E8400 CPU) and that has been working well (apart from being affected by bug 2314737). When I moved (not a live migration, just shutdown, rsync and boot on the new server) the guest to our server, which is an IBM X3550 with two Xeon 5130 CPUs the guest has short freezes where the guest CPU usage spikes on both virtual CPUs and the guest becomes unresponsive for several seconds at a time. These guest CPU spikes can last over a minute, but the guest might have moments where it briefly responds to the delayed keystrokes, etc. every few seconds. One symptom of this behaviour for me last night was that Win2k AD replication was failing, so it's not just interactive use that suffers. Something I noticed that may be relevant - due to the other bug (2314737) which causes each guest CPU to use 100% of the host CPU, whether the guest CPU is idle or not, I can tell when the guest is having this problem by looking at the 'top' output on the host. When the guest is operating normally, the host will show the qemu-system-x86_64 process using 200% CPU (the guest is using -smp 2). However, when the guest is misbehaving, the host will show the qemu-system-x86_64 process only using _100%_ CPU. Maybe one of the threads is stuck, or something is forcing them to share a single core? Both hosts are running Debian Lenny/Sid, 64-bit with a kernel.org 2.6.28 kernel and kvm-82. The command line used in both cases: /usr/local/kvm/bin/qemu-system-x86_64 \ -smp 2 \ -localtime -m 2048 \ -drive if=ide,file=kvm-ks-02a.img,index=0,media=disk,boot=on \ -drive if=ide,file=kvm-ks-02b.img,index=2,media=disk \ -net nic,vlan=0,macaddr=52:54:00:12:34:68,model=virtio \ -net tap,vlan=0,ifname=tap18,script=no \ -vnc 127.0.0.1:18 -usbdevice tablet \ -daemonize CPUs on the good host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 cpu MHz : 3000.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 5984.98 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU E8400 @ 3.00GHz stepping: 10 cpu MHz : 3000.000 cache size : 6144 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm tpr_shadow vnmi flexpriority bogomips: 5984.97 clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: And the bad host: processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Xeon(R) CPU5130 @ 2.00GHz stepping: 6 cpu MHz : 1995.117 cache size : 4096 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 10 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant _tsc arch_perfmon pebs bts rep_good pni dtes64 monitor ds_cpl