Re: Call for help: moving the kvm wiki
Tomasz Chmielewski wrote: As you may have noticed, the kvm wiki is overrun by spammers. It the past I've regularly cleaned up the spam, but some time ago I've given up. So I'm looking for a volunteer to locate a spam-free public wiki host (candidates include wiki.kernel.org and fedorahosted.org) and transfer the contents (minus the spam). I don't think we need to transfer the editing history, but the conversion should adapt to the target's wiki syntax. Just use captcha or a similar system with your existing wiki. I'm not the wiki's administrator. My experience with administering wikis is: * if you don't use any preventive measures, your wiki will turn into a collection of garbage very soon, unless you spend lots of time monitoring changes This has happened, even with registration. * requiring users to register before using the wiki results in: - much less contributions from users - very big disadvantage, - fewer spam, but bots can register, so some spam will go through I'm not worried about the registration burden, if someone is too lazy to register, they'll be lazy with the content as well. * using captcha or a similar system prevents 100% of spam - lots of people don't like captchas very much, as often they are hard to read Definitely. - personally, on my site (http://wpkg.org) user has to solve a simple mathematics equation before the changes are accepted, like: 10 + 7 = ... [Accept button] So I suggest going through a list of extensions/plugins for MoinMoin (this is the wiki KVM uses), choosing something appropriate, and the spam should be gone. Problem is, we're on an old version, with no prospect of upgrading. That's why I'd like to move to an existing, well maintained wiki host. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] kvm: qemu: Improve virtio_net recv buffer allocation scheme
Mark McLoughlin wrote: From: Herbert Xu [EMAIL PROTECTED] Currently, in order to receive large packets, the guest must allocate max-sized packet buffers and pass them to the host. Each of these max-sized packets occupy 20 ring entries, which means we can only transfer a maximum of 12 packets in a single batch with a 256 entry ring. When receiving packets from external networks, we only receive MTU sized packets and so the throughput observed is throttled by the number of packets the ring can hold. Implement the VIRTIO_NET_F_MRG_RXBUF feature to let guests know that we can merge smaller buffers together in order to handle large packets. This scheme allows us to be efficient in our use of ring entries while still supporting large packets. Benchmarking using netperf from an external machine to a guest over a 10Gb/s network shows a 100% improvement from ~1Gb/s to ~2Gb/s. With a local host-guest benchmark with GSO disabled on the host side, throughput was seen to increase from 700Mb/s to 1.7Gb/s. Based on a patch from Herbert, with the feature renamed from datahead and some re-factoring for readability. diff --git a/qemu/hw/virtio-net.c b/qemu/hw/virtio-net.c index 403247b..afa5fe5 100644 --- a/qemu/hw/virtio-net.c +++ b/qemu/hw/virtio-net.c @@ -34,9 +34,13 @@ #define VIRTIO_NET_F_HOST_TSO6 12 /* Host can handle TSOv6 in. */ #define VIRTIO_NET_F_HOST_ECN 13 /* Host can handle TSO[6] w/ ECN in. */ #define VIRTIO_NET_F_HOST_UFO 14 /* Host can handle UFO in. */ +#define VIRTIO_NET_F_MRG_RXBUF 15 /* Host can merge receive buffers. */ What's the status of the guest side of this feature? #define TX_TIMER_INTERVAL 15 /* 150 us */ +/* Should be the largest MAX_SKB_FRAGS supported. */ +#define VIRTIO_NET_MAX_FRAGS 18 + This should be advertised by the host to the guest (or vice-versa?). We're embedding Linux-specific magic numbers in a guest-OS-agnostic ABI. Perfereably, there shouldn't be a limit at all. @@ -209,7 +220,12 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) if (virtqueue_pop(n-rx_vq, elem) == 0) return; -if (elem.in_num 1 || elem.in_sg[0].iov_len != sizeof(*hdr)) { +if (n-mergeable_rx_bufs) { + if (elem.in_num 1 || elem.in_sg[0].iov_len TARGET_PAGE_SIZE) { + fprintf(stderr, virtio-net IOV is irregular\n); + exit(1); + } Again, this is burying details of the current Linux stack into the ABI. The Linux stack may change not to be page oriented, or maybe this won't fit will to how Windows views things. Can this be made not to depend on the size of the iov elements? +} else if (elem.in_num 1 || elem.in_sg[0].iov_len != sizeof(*hdr)) { fprintf(stderr, virtio-net header not in first element\n); exit(1); } @@ -229,11 +245,49 @@ static void virtio_net_receive(void *opaque, const uint8_t *buf, int size) } /* copy in packet. ugh */ -iov_fill(elem.in_sg[1], elem.in_num - 1, -buf + offset, size - offset); -/* signal other side */ -virtqueue_push(n-rx_vq, elem, total); +if (n-mergeable_rx_bufs) { + int i = 0; + + elem.in_sg[0].iov_base += sizeof(*hdr); + elem.in_sg[0].iov_len -= sizeof(*hdr); + + offset += iov_fill(elem.in_sg[0], elem.in_num, + buf + offset, size - offset); + + /* signal other side */ + virtqueue_fill(n-rx_vq, elem, total, i++); + + while (offset size) { + int len; + + if (virtqueue_pop(n-rx_vq, elem) == 0) { + fprintf(stderr, virtio-net truncating packet\n); + exit(1); + } + + if (elem.in_num 1 || elem.in_sg[0].iov_len TARGET_PAGE_SIZE) { + fprintf(stderr, virtio-net IOV is irregular\n); + exit(1); + } + + len = iov_fill(elem.in_sg[0], elem.in_num, + buf + offset, size - offset); + + virtqueue_fill(n-rx_vq, elem, len, i++); + + offset += len; + } + + virtqueue_flush(n-rx_vq, i); +} else { + iov_fill(elem.in_sg[1], elem.in_num - 1, +buf + offset, size - offset); + + /* signal other side */ + virtqueue_push(n-rx_vq, elem, total); +} + Can we merge the two sides of the if () so that the only difference is the number of times we go through the loop? Anthony, please review this as well, my virtio-foo is pretty superficial. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ANNOUNCE] kvm-77 release
This release fixes the -std-vga regression which bothered those of us who have large or widescreen monitors (note the option is now named '-vga std' due to upstream qemu changes). Other significant changes include better disk performance if you have a fast host storage subsystem. Changes from kvm-76: - merge bochs-bios-cvs - merge qemu-svn - more -cpu options - faster disk emulation (esp. with scsi/virtio) - improved NMI support (Jan Kiszka) - improve 4GB memory support (Alex Williamson) - memory alias cleanups (Glauber Costa) - fix kvmtrace segfault (Ryota OZAKI) - make external module compile on split source/object configs (Alexander Graf) - allows compiling on opensuse - fix -std-vga regression - fix migration failure at end of migration protocol - map mmio pages for device assignment (Weidong Han) - silence lapic kernel messages (Jan Kiszka) - fix vcpu reset (Gleb Natapov) - fix missed invlpg on EPT-enabled machines with EPT disabled (Marcelo Tosatti) - device assignment on ia64 (Xiantao Zhang) - memory type support on EPT (Sheng Yang) Notes: If you use the modules bundled with kvm-77, you can use any version of Linux from 2.6.16 upwards. You may also use kvm-77 userspace with the kvm modules provided by Linux 2.6.25 or above. Some features may only be available in newer releases. http://kvm.qumranet.com -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: ACPI
jd wrote: Hi We ship some images (that kicks the install ) out of the box and would like to know peoples experiences and developer opinions on ACPI. This would help us determine if this should be enabled by default or not. -- For Windows guests -- For Linix guests I recommend you enable ACPI for all guests. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH][RFC] vmchannel a data channel between host and guest.
Hello, Sometimes there is a need to pass various bits of information between host and guest (mostly for management purposes such as host screen resolution changes or runtime statistics of a guest). To do that we need some way to pass data between host and guest. Attached patch implements vmchannel that can be used for this purpose. It is based on virtio infrastructure and support more then one channel. The vmchannel presents itself as PCI device to a guest so guest driver is also required. The one for linux is attached. It uses netlink connector to communicate with userspace. Comments are welcome. -- Gleb. diff --git a/qemu/Makefile.target b/qemu/Makefile.target index 5462092..6cf13f7 100644 --- a/qemu/Makefile.target +++ b/qemu/Makefile.target @@ -612,7 +612,7 @@ OBJS += rtl8139.o OBJS += e1000.o # virtio devices -OBJS += virtio.o virtio-net.o virtio-blk.o virtio-balloon.o +OBJS += virtio.o virtio-net.o virtio-blk.o virtio-balloon.o virtio-vmchannel.o OBJS += device-hotplug.o diff --git a/qemu/hw/pc.c b/qemu/hw/pc.c index 1d42aa7..e8c5531 100644 --- a/qemu/hw/pc.c +++ b/qemu/hw/pc.c @@ -1141,6 +1141,7 @@ static void pc_init1(ram_addr_t ram_size, int vga_ram_size, drives_table[index].bdrv); unit_id++; } +virtio_vmchannel_init(pci_bus); } if (extboot_drive != -1) { diff --git a/qemu/hw/virtio-vmchannel.c b/qemu/hw/virtio-vmchannel.c new file mode 100644 index 000..1ce76ec --- /dev/null +++ b/qemu/hw/virtio-vmchannel.c @@ -0,0 +1,239 @@ +/* + * Virtio VMChannel Device + * + * Copyright RedHat, inc. 2008 + * + * This work is licensed under the terms of the GNU GPL, version 2. See + * the COPYING file in the top-level directory. + * + */ + +#include qemu-common.h +#include sysemu.h +#include virtio.h +#include pc.h +#include qemu-kvm.h +#include qemu-char.h +#include virtio-vmchannel.h + +#define DEBUG_VMCHANNEL + +#ifdef DEBUG_VMCHANNEL +#define VMCHANNEL_DPRINTF(fmt, args...) \ + do { printf(VMCHANNEL: fmt , ##args); } while (0) +#else +#define VMCHANNEL_DPRINTF(fmt, args...) +#endif + +typedef struct VirtIOVMChannel { +VirtIODevice vdev; +VirtQueue *sq; +VirtQueue *rq; +} VirtIOVMChannel; + +typedef struct VMChannel { +CharDriverState *hd; +VirtQueueElement elem; +uint32_t id; +size_t len; +} VMChannel; + +typedef struct VMChannelDesc { +uint32_t id; +uint32_t len; +} VMChannelDesc; + +typedef struct VMChannelCfg { +uint32_t count; +uint32_t ids[MAX_VMCHANNEL_DEVICES]; +} VMChannelCfg; + +static VirtIOVMChannel *vmchannel; + +static VMChannel vmchannel_descs[MAX_VMCHANNEL_DEVICES]; +static int vmchannel_desc_idx; + +static int vmchannel_can_read(void *opaque) +{ +VMChannel *c = opaque; + +/* device not yet configured */ +if (vmchannel-rq-vring.avail == NULL) +return 0; + +if (!c-len) { +int i; + +if (virtqueue_pop(vmchannel-rq, c-elem) == 0) +return 0; + +if (c-elem.in_num 1 || +c-elem.in_sg[0].iov_len sizeof(VMChannelDesc)) { +fprintf(stderr, vmchannel: wrong receive descriptor\n); +return 0; +} + +for (i = 0; i c-elem.in_num; i++) +c-len += c-elem.in_sg[i].iov_len; + +c-len -= sizeof(VMChannelDesc); +} + +return (int)c-len; +} + +static void vmchannel_read(void *opaque, const uint8_t *buf, int size) +{ +VMChannel *c = opaque; +VMChannelDesc *desc; +int i; + +VMCHANNEL_DPRINTF(read %d bytes from channel %d\n, size, c-id); + +if (!c-len) { +fprintf(stderr, vmchannel: trying to receive into empty descriptor\n); +exit(1); +} + +if (size = 0 || size c-len) { +fprintf(stderr, vmchannel: read size is wrong\n); +exit(1); +} + +desc = (VMChannelDesc*)c-elem.in_sg[0].iov_base; +desc-id = c-id; +desc-len = size; + +c-elem.in_sg[0].iov_base = desc + 1; +c-elem.in_sg[0].iov_len -= sizeof(VMChannelDesc); + +for (i = 0; i c-elem.in_num size; i++) { +struct iovec *iov = c-elem.in_sg[i]; +size_t len; + +len = MIN(size, iov-iov_len); +memcpy(iov-iov_base, buf, len); +size -= len; +buf += len; +} + +if (size) { +fprintf(stderr, vmchannel: dropping %d bytes of data\n, size); +exit(1); +} + +virtqueue_push(vmchannel-rq, c-elem, desc-len); +c-len = 0; +virtio_notify(vmchannel-vdev, vmchannel-rq); +} + +static void virtio_vmchannel_handle_recv(VirtIODevice *vdev, VirtQueue *outputq) +{ +if (kvm_enabled()) +qemu_kvm_notify_work(); +} + +static VMChannel *vmchannel_lookup(uint32_t id) +{ +int i; + +for (i = 0; i vmchannel_desc_idx; i++) { +if (vmchannel_descs[i].id == id) +return vmchannel_descs[i]; +} +return NULL; +} + +static void virtio_vmchannel_handle_send(VirtIODevice *vdev, VirtQueue *outputq) +{ +
bridging a wifi interface into kvm guest possible?
[cross-posted to netdev and kvm lists] [..which failed due to wrong (old) kvm address. Please excuse me for the repost] Hello! I'm trying to set up a [virtual/guest] network of hosts to form something like a DMZ and a gateway, but in virtual hardware instead of real hardware. One of the things I tried is to run the gateway/router machine inside a guest system too, not only all the dmz hosts (there are some obscure historical reasons for that, don't ask ;). Real hardware has 2 ethernet interfaces - external and internal LAN. In order for the gateway to run as a guest, one has to move external interface into guest. Since kvm does not [fully] support PCI device moving (what's the right word for this?) from host to guest (which is the simplest solution possible), I were thinking about something different: bridging. Since bridge is already used to connect gateway host to the LAN, why not use it for external=gateway link too? The difference is that there will be no IP address on the host on that external bridge, i.e. the host will not participate in the IP traffic transmission, only ethernet. So far so good, and that setup worked on a test environment, worked flawlessly (well.. almost -- for some reason, under some circumstances, linux starts broadcasting certain packets over all bridges it has.. but that's different issue/topic). Worked up until I tried it on production, which is different from the test setup by the fact that for external interface, we have an old 11Mbps wifi card, instead of a real ethernet NIC. And I learned the hard way that bridging does not really work with wifi cards (it works with some, and even that requires.. some tweaking and additional software). I tried to set up the mac address on the guest-gateway to be the same as the one on wifi, but that obviously didn't help. After browsing kernel options (unrelated to this issue), I noticed a device called macvlan. So I wonder if that can be used in my case, -- just to move a wifi interface to a guest system. I found very little documentation about macvlan. The patchset that introduced it back in 2007 says that macvlan puts the underlying device into promisc mode (which is where a wifi driver has problems). Or maybe there's another solution to this my problem (not counting getting additional hardware for the wifi link, which obviously will work; or replacing the wifi card with something more advanced). Thank you! /mjt -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2138166 ] Vista guest fails to start on kvm-76
Bugs item #2138166, was opened at 2008-09-30 08:39 Message generated for change (Comment added) made by johnrrousseau You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2138166group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: qemu Group: None Status: Open Resolution: Fixed Priority: 5 Private: No Submitted By: John Rousseau (johnrrousseau) Assigned to: Nobody/Anonymous (nobody) Summary: Vista guest fails to start on kvm-76 Initial Comment: CPU: Intel(R) Core(TM)2 Duo CPU T7250 @ 2.00GHz Build: kvm-76 Host kernel: 2.6.26.3-29.fc9.x86_64 Host arch: x86_64 Guest: Windows Vista Ultimate 64-bit QEMU command: qemu-system-x86_64 -hda /home/jrr/vista-x86_64.img -m 2048M -net nic,vlan=0,macaddr=52:54:00:12:32:00 -net tap,vlan=0,ifname=tap0 -std-vga -full-screen -smp 2 I've been running this guest on this host with kvm-75 without difficulty. kvm-76, built the same way that kvm-75 was (and on the same machine), fails to start my guest. The guest window is up, but the guest fails to complete startup. Command line output is: kvm_create_phys_mem: File existsset_vram_mapping: cannot allocate memory: File exists set_vram_mapping failed kvm: get_dirty_pages returned -2 The last line repeats hundreds of times. -- Comment By: John Rousseau (johnrrousseau) Date: 2008-10-12 08:50 Message: I've confirmed that this issue is resolved with kvm-77. -- Comment By: Marco Menardi (markit) Date: 2008-10-10 08:02 Message: I've the same issue with my XP-32 guests, I've Debian64 sid, Phenom 9550, kernel 2.6.26-1-amd64. Everything works like a charm with kvm-75 instead (and I've had to revert to 75, of course). Any news? Would love to have forecoming kvm77 with this blocking bug fixed. -- Comment By: John Rousseau (johnrrousseau) Date: 2008-10-02 20:06 Message: kvm-2646c5.tar.gz: Worked fine kvm-d558461.tar.gz: Failed (showed this bug) I've never used git before, but if you teach me to fish... I installed git, pulled the userspace and kernel trees, built kvm-75 and kvm-76 and got the expected results, but when I did a bisect on kvm-75 (good) and kvm-76 (bad) I kept getting sparse trees that I couldn't build. configure among other things was missing. What am I doing wrong? Also, what should I be syncing my kernel tree to when I am bisecting the userspace tree? Thanks. -- Comment By: Glauber de Oliveira Costa (glommer) Date: 2008-10-02 12:27 Message: Are you using git? If so, can you bisect to find out who the culprit is? If not, I've managed to archive two strategic commits you should try: http://glommer.net/kvm-2646c5.tar.gz and http://glommer.net/kvm-d558461.tar.gz please report success or failure with them thanks! -- Comment By: John Rousseau (johnrrousseau) Date: 2008-10-02 11:48 Message: I applied the patch to kvm-76 and ran into basically the same problem. The guest still hung during boot and I got the plume of kvm: get_dirty_pages returned -2 errors, but the first message kvm_create_phys_mem: File existsset_vram_mapping: cannot allocate memory: File exists wasn't displayed. -- Comment By: Glauber de Oliveira Costa (glommer) Date: 2008-10-02 09:01 Message: can you please test the patch at http://glommer.net/band-aid.patch ? -- Comment By: Brian Jackson (iggy_cav) Date: 2008-09-30 10:06 Message: This was reported on the mailing list. It's a problem with sdl output. Not specific to any guest. Until the problem is fixed, I'd suggest using vnc output. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2138166group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] read UUID from qemu
Similar patch was sent to bochs devel list, but I propose to apply this patch now rather than waiting for bochs developers to apply it and then merger. --- Add support for new FW configuration channel to the BIOS. Read UUID from QEMU using this channel. Signed-off-by: Gleb Natapov [EMAIL PROTECTED] diff --git a/bios/rombios32.c b/bios/rombios32.c index 921e202..a91b155 100755 --- a/bios/rombios32.c +++ b/bios/rombios32.c @@ -444,31 +444,51 @@ void wrmsr_smp(uint32_t index, uint64_t val) p-ecx = 0; } -void uuid_probe(void) -{ #ifdef BX_QEMU -uint32_t eax, ebx, ecx, edx; +#define QEMU_CFG_CTL_PORT 0x510 +#define QEMU_CFG_DATA_PORT 0x511 +#define QEMU_CFG_SIGNATURE 0x00 +#define QEMU_CFG_ID 0x01 +#define QEMU_CFG_UUID 0x02 + +int qemu_cfg_port; + +void qemu_cfg_select(int f) +{ +outw(QEMU_CFG_CTL_PORT, f); +} -// check if backdoor port exists -asm volatile (outl %%eax, %%dx -: =a (eax), =b (ebx), =c (ecx), =d (edx) -: a (0x564d5868), b (0), c (0xa), d (0x5658)); -if (ebx == 0x564d5868) { -uint32_t *uuid_ptr = (uint32_t *)bios_uuid; -// get uuid -asm volatile (outl %%eax, %%dx -: =a (eax), =b (ebx), =c (ecx), =d (edx) -: a (0x564d5868), c (0x13), d (0x5658)); -uuid_ptr[0] = eax; -uuid_ptr[1] = ebx; -uuid_ptr[2] = ecx; -uuid_ptr[3] = edx; -} else +int qemu_cfg_port_probe() +{ +char *sig = QEMU; +int i; + +qemu_cfg_select(QEMU_CFG_SIGNATURE); + +for (i = 0; i 4; i++) +if (inb(QEMU_CFG_DATA_PORT) != sig[i]) +return 0; + +return 1; +} + +void qemu_cfg_read(uint8_t *buf, int len) +{ +while (len--) +*(buf++) = inb(QEMU_CFG_DATA_PORT); +} #endif -{ -// UUID not set -memset(bios_uuid, 0, 16); + +void uuid_probe(void) +{ +#ifdef BX_QEMU +if(qemu_cfg_port) { +qemu_cfg_select(QEMU_CFG_UUID); +qemu_cfg_read(bios_uuid, 16); +return; } +#endif +memset(bios_uuid, 0, 16); } void cpu_probe(void) @@ -2085,6 +2105,10 @@ void rombios32_init(void) init_smp_msrs(); +#ifdef BX_QEMU +qemu_cfg_port = qemu_cfg_port_probe(); +#endif + ram_probe(); cpu_probe(); -- Gleb. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Avi Kivity wrote: Chris Wright wrote: I think it's safe to say the perf folks are concerned w/ data integrity first, stable/reproducible results second, and raw performance third. So seeing data cached in host was simply not what they expected. I think write through is sufficient. However I think that uncached vs. wt will show up on the radar under reproducible results (need to tune based on cache size). And in most overcommit scenarios memory is typically more precious than cpu, it's unclear to me if the extra buffering is anything other than memory overhead. As long as it's configurable then it's comparable and benchmarking and best practices can dictate best choice. Getting good performance because we have a huge amount of free memory in the host is not a good benchmark. Under most circumstances, the free memory will be used either for more guests, or will be given to the existing guests, which can utilize it more efficiently than the host. I can see two cases where this is not true: - using older, 32-bit guests which cannot utilize all of the cache. I think Windows XP is limited to 512MB of cache, and usually doesn't utilize even that. So if you have an application running on 32-bit Windows (or on 32-bit Linux with pae disabled), and a huge host, you will see a significant boost from cache=writethrough. This is a case where performance can exceed native, simply because native cannot exploit all the resources of the host. - if cache requirements vary in time across the different guests, and if some smart ballooning is not in place, having free memory on the host means we utilize it for whichever guest has the greatest need, so overall performance improves. Another justification for ODIRECT is that many production system will use the base images for their VMs. It's mainly true for desktop virtualization but probably for some server virtualization deployments. In these type of scenarios, we can have all of the base image chain opened as default with caching for read-only while the leaf images are open with cache=off. Since there is ongoing effort (both by IT and developers) to keep the base images as big as possible, it guarantees that this data is best suited for caching in the host while the private leaf images will be uncached. This way we provide good performance and caching for the shared parent images while also promising correctness. Actually this is what happens on mainline qemu with cache=off. Cheers, Dor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Dor Laor wrote: Actually this is what happens on mainline qemu with cache=off. Have I understood right that cache=off on a qcow2 image only uses O_DIRECT for the leaf image, and the chain of base images don't use O_DIRECT? Sometimes on a memory constrained host, where the (collective) guest memory is nearly as big as the host memory, I'm not sure this is what I want. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Chris Wright wrote: Either wt or uncached (so host O_DSYNC or O_DIRECT) would suffice to get it through to host's storage subsytem, and I think that's been the core of the discussion (plus defaults, etc). Just want to point out that the storage commitment from O_DIRECT can be _weaker_ than O_DSYNC. On Linux,m O_DIRECT never uses storage-device barriers or transactions, but O_DSYNC sometimes does, and fsync is even more likely to than O_DSYNC. I'm not certain, but I think the same applies to other host OSes too - including Windows, which has its own equivalents to O_DSYNC and O_DIRECT, and extra documented semantics when they are used together. Although this is a host implementation detail, unfortunately it means that O_DIRECT=no-cache and O_DSYNC=write-through-cache is not an accurate characterisation. Some might be mislead into assuming that cache=off is as strongly committing their data to hard storage as cache=wb would. I think you can assume this only when the underlying storage devices' write caches are disabled. You cannot assume this if the host filesystem uses barriers instead of disabling the storage devices' write cache. Unfortunately there's not a lot qemu can do about these various quirks, but at least it should be documented, so that someone requiring storage commitment (e.g. for a critical guest database) is advised to investigate whether O_DIRECT and/or O_DSYNC give them what they require with their combination of host kernel, filesystem, filesystem options and storage device(s). -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but if both are given (non NULL) we encounter problems: 1. There is only one buffer and index, which may mean data corruption. 2. qemu_flush (which is also called by qemu_fclose) is writing (flushing) some of the data that was read (for the reader part). Currently qemu_fopen_fd registers both get_buffer and put_buffer functions. This breaks migration for tcp and ssh migration protocols. The following patch fix the above by: 1. It makes sure that at most one of get_buffer and put_buffer is given to qemu_fopen_ops. 2. It changes qemu_fopen_fd to register only get_buffer for a reader and only put_buffer for a writer (adding a 'reader' parameter). 3. The incoming fd migration code calls qemu_fopen_fd as a reader only. Signed-off-by: Uri Lublin [EMAIL PROTECTED] --- qemu/hw/hw.h |2 +- qemu/migration.c |2 +- qemu/vl.c| 12 ++-- 3 files changed, 12 insertions(+), 4 deletions(-) diff --git a/qemu/hw/hw.h b/qemu/hw/hw.h index c9390c1..d965c47 100644 --- a/qemu/hw/hw.h +++ b/qemu/hw/hw.h @@ -34,7 +34,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, QEMUFileCloseFunc *close, QEMUFileRateLimit *rate_limit); QEMUFile *qemu_fopen(const char *filename, const char *mode); -QEMUFile *qemu_fopen_fd(int fd); +QEMUFile *qemu_fopen_fd(int fd, int reader); void qemu_fflush(QEMUFile *f); int qemu_fclose(QEMUFile *f); void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size); diff --git a/qemu/migration.c b/qemu/migration.c index 44cb9eb..587c67e 100644 --- a/qemu/migration.c +++ b/qemu/migration.c @@ -820,7 +820,7 @@ static int migrate_incoming_page(QEMUFile *f, uint32_t addr) static int migrate_incoming_fd(int fd) { int ret = 0; -QEMUFile *f = qemu_fopen_fd(fd); +QEMUFile *f = qemu_fopen_fd(fd, 1); uint32_t addr, size; extern void qemu_announce_self(void); unsigned char running; diff --git a/qemu/vl.c b/qemu/vl.c index 36e3bb7..1ce188b 100644 --- a/qemu/vl.c +++ b/qemu/vl.c @@ -6712,7 +6712,7 @@ static int fd_close(void *opaque) return 0; } -QEMUFile *qemu_fopen_fd(int fd) +QEMUFile *qemu_fopen_fd(int fd, int reader) { QEMUFileFD *s = qemu_mallocz(sizeof(QEMUFileFD)); @@ -6720,7 +6720,10 @@ QEMUFile *qemu_fopen_fd(int fd) return NULL; s-fd = fd; -s-file = qemu_fopen_ops(s, fd_put_buffer, fd_get_buffer, fd_close, NULL); +if (reader) +s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close, NULL); +else +s-file = qemu_fopen_ops(s, fd_put_buffer, NULL, fd_close, NULL); return s-file; } @@ -6826,6 +6829,11 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, { QEMUFile *f; +if (put_buffer get_buffer) { +fprintf(stderr, %s: only one of get_buffer and put_buffer +functions may be given\n, __FUNCTION__); +return NULL; +} f = qemu_mallocz(sizeof(QEMUFile)); if (!f) return NULL; -- 1.5.5.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[ kvm-Bugs-2149609 ] Booting IA32e Windows guest meets BSOD
Bugs item #2149609, was opened at 2008-10-06 16:25 Message generated for change (Comment added) made by kiszka You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2149609group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jiajun Xu (jiajun) Assigned to: Nobody/Anonymous (nobody) Summary: Booting IA32e Windows guest meets BSOD Initial Comment: With latest commit, kvm.git 77d0a44d2393f43836fd38c235dfda2de4c4630a and userspace.git d32aaf6d02d102855e5d8d1e23f0c55ca214e871, IA32e Windows guest(including Windows XP, Windows 2003, Windows Vista) fails to boot. Guest will meet BSOD when booting, error code 0x000A(IRQL_NOT_LESS_OR_EQUAL). Previous commit, kvm.git a509fff8ed134115f2fd413e92a92cddc1709a5f userspace.git 42621e776ac3a12930c3fec19c60e68e563df4cc has no such issue. -- Comment By: Jan Kiszka (kiszka) Date: 2008-10-12 18:32 Message: OK, this (the NMI watchdog) now likely became a regression of -77. I tried reproducing it with a Windows Server 2003 R2 (64-bit), but without success. Can you describe your scenario in more details? Vanilla Windows installation? Which qemu command line switches precisely? Anything else I may need to know to reproduce? Further tests: Does http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/22635 change the situation? What does http://permalink.gmane.org/gmane.comp.emulators.kvm.devel/22634 make kvm report to the kernel log? -- Comment By: Jiajun Xu (jiajun) Date: 2008-10-10 16:09 Message: We find that kernel.git cbb44eaa2d961f1eb975b52c7be1c82178b3c580 introduces the issue first. With kernel.git ba8ab77ebfba9898764c39bc2f00540a5a67a1e9, windows guest can boot up successfully. -- Comment By: Jiajun Xu (jiajun) Date: 2008-10-06 16:53 Message: OK. I will try to find the causing. with no-kvm-irqchip or no-kvm-pit, we did not meet such issue. -- Comment By: Jan Kiszka (kiszka) Date: 2008-10-06 16:38 Message: Can you try to narrow down the causing patches, specifically on the kernel side? I assume you run with in-kernel irqchip, right? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detailatid=893831aid=2149609group_id=180599 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
Uri Lublin wrote: Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but if both are given (non NULL) we encounter problems: 1. There is only one buffer and index, which may mean data corruption. 2. qemu_flush (which is also called by qemu_fclose) is writing (flushing) some of the data that was read (for the reader part). Currently qemu_fopen_fd registers both get_buffer and put_buffer functions. This breaks migration for tcp and ssh migration protocols. The following patch fix the above by: 1. It makes sure that at most one of get_buffer and put_buffer is given to qemu_fopen_ops. 2. It changes qemu_fopen_fd to register only get_buffer for a reader and only put_buffer for a writer (adding a 'reader' parameter). 3. The incoming fd migration code calls qemu_fopen_fd as a reader only. Anthony, this is a problem with qemu-upstream so I'd like to solve it in a way that's acceptable for upstream. The proposed patch is less that ideal IMO as it introduces limitations on what you can do with a file. An alternative implementation would add a read/write mode to the buffer, based on the last access type. When switching from read to write, we drop the buffer, and when switching from write to read, we flush it and then drop it. This is more complex but results in a cleaner API. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Jamie Lokier wrote: Dor Laor wrote: Actually this is what happens on mainline qemu with cache=off. Have I understood right that cache=off on a qcow2 image only uses O_DIRECT for the leaf image, and the chain of base images don't use O_DIRECT? Yeah, that's a bug IMHO and in my patch to add O_DSYNC, I fix that. I think an argument for O_DIRECT in a leaf and wb in the leaf is seriously flawed... Regards, Anthony Liguori Sometimes on a memory constrained host, where the (collective) guest memory is nearly as big as the host memory, I'm not sure this is what I want. -- Jamie -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Dor Laor wrote: Avi Kivity wrote: Since there is ongoing effort (both by IT and developers) to keep the base images as big as possible, it guarantees that this data is best suited for caching in the host while the private leaf images will be uncached. A proper CAS solution is really such a better approach. qcow2 deduplification is an interesting concept, but such a hack :-) This way we provide good performance and caching for the shared parent images while also promising correctness. You get correctness by using O_DSYNC. cache=off should disable the use of the page cache everywhere. Regards, Anthony Liguori Actually this is what happens on mainline qemu with cache=off. Cheers, Dor -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
Avi Kivity wrote: Uri Lublin wrote: Currently qemu_fopen_ops accepts both get_buffer and put_buffer, but if both are given (non NULL) we encounter problems: 1. There is only one buffer and index, which may mean data corruption. 2. qemu_flush (which is also called by qemu_fclose) is writing (flushing) some of the data that was read (for the reader part). Currently qemu_fopen_fd registers both get_buffer and put_buffer functions. This breaks migration for tcp and ssh migration protocols. The following patch fix the above by: 1. It makes sure that at most one of get_buffer and put_buffer is given to qemu_fopen_ops. 2. It changes qemu_fopen_fd to register only get_buffer for a reader and only put_buffer for a writer (adding a 'reader' parameter). 3. The incoming fd migration code calls qemu_fopen_fd as a reader only. Anthony, this is a problem with qemu-upstream so I'd like to solve it in a way that's acceptable for upstream. The proposed patch is less that ideal IMO as it introduces limitations on what you can do with a file. An alternative implementation would add a read/write mode to the buffer, based on the last access type. When switching from read to write, we drop the buffer, and when switching from write to read, we flush it and then drop it. This is more complex but results in a cleaner API. I would think a better solution would introduce two buffers, one for read and one for write. That way, you can have a proper bidirectional stream. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
Anthony Liguori wrote: The proposed patch is less that ideal IMO as it introduces limitations on what you can do with a file. An alternative implementation would add a read/write mode to the buffer, based on the last access type. When switching from read to write, we drop the buffer, and when switching from write to read, we flush it and then drop it. This is more complex but results in a cleaner API. I would think a better solution would introduce two buffers, one for read and one for write. That way, you can have a proper bidirectional stream. Complexity goes way up. Now you need to intercept reads that go to the write buffer, and vice versa. -- Do not meddle in the internals of kernels, for they are subtle and quick to panic. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: kvm-76 fails to boot kubuntu 64bits (ok with kvm-75)
John Rousseau wrote: My guess is it's this: https://sourceforge.net/tracker/?func=detailatid=893831aid=2138166group_id=180599 -John Xavier Gnata wrote: Hi, kubuntu 64bits 8.10 beta boots without problem with kvm-75 using this command line: qemu-system-x86_64 -no-quit -serial file:serial.log -hda intrepid.img -boot c -m 1024 -smb qemu -soundhw es1370 It never boots with kvm-76. Here is the serial output: it always crashes at boot time with kvm-76: [0.00] Initializing cgroup subsys cpuset [0.00] Initializing cgroup subsys cpu [0.00] Linux version 2.6.27-4-generic ([EMAIL PROTECTED]) (gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu7) ) #1 SMP Wed Sep 24 01:29:06 UTC 2008 (Ubuntu 2.6.27-4. 6-generic) [0.00] Command line: root=UUID=cab01a2d-f54f-4c16-be37-4555fd50a068 cons ole=ttyS0,115200 earlyprintk=serial,ttyS0,115200 quiet splash [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] BIOS-provided physical RAM map: [0.00] BIOS-e820: - 0009fc00 (usable) [0.00] BIOS-e820: 0009fc00 - 000a (reserved) [0.00] BIOS-e820: 000e8000 - 0010 (reserved) [0.00] BIOS-e820: 0010 - 3fff (usable) [0.00] BIOS-e820: 3fff - 4000 (ACPI data) [0.00] BIOS-e820: fffbd000 - 0001 (reserved) [0.00] console [earlyser0] enabled [1.575835] pci :00:01.0: PIIX3: Enabling Passive Release Loading, please wait... Couldnt get a file descriptor referring to the console *** glibc detected *** modprobe: realloc(): invalid next size: 0x00ef8c4 0 *** Aborted *** glibc detected *** modprobe: realloc(): invalid next size: 0x015ddc4 0 *** Aborted usplash: libusplash.c:289: switch_console: Assertion `(saved_vt = 0) (saved_ vt 10)' failed. *** glibc detected *** modprobe: realloc(): invalid old size: 0x00725160 *** Aborted *** glibc detected *** modprobe: realloc(): invalid old size: 0x01b76160 *** Aborted *** glibc detected *** modprobe: realloc(): invalid old size: 0x01f88160 *** Aborted *** glibc detected *** modprobe: realloc(): invalid old size: 0x00db4160 *** Aborted *** glibc detected *** modprobe: realloc(): invalid old size: 0x01d0d160 *** Aborted *** glibc detected *** modprobe: realloc(): invalid old size: 0x008d5160 *** Aborted udevd[952]: parse_config_file: error parsing /etc/udev/udev.conf, line 1:0 udevd[952]: add_to_rules: invalid rule '/etc/udev/rules.d/05-options.rules:1' udevd[952]: add_to_rules: invalid rule '/etc/udev/rules.d/05-options.rules:2' udevd[952]: parse_file: line too long, rule skipped '/etc/udev/rules.d/20-names.rules:7' udevd[952]: add_to_rules: invalid rule '/etc/udev/rules.d/40-basic-permissions.rules:7' udevd[952]: parse_file: line too long, rule skipped '/etc/udev/rules.d/60-persistent-storage.rules:7' udevd[952]: add_to_rules: invalid rule '/etc/udev/rules.d/61-persistent-storage-edd.rules:7' udevd[952]: parse_file: line too long, rule skipped '/etc/udev/rules.d/90-modprobe.rules:13' udevtrigger[954]: parse_config_file: error parsing /etc/udev/udev.conf, line 1:0 uname -a 2.6.26.5-1 #1 SMP processor : 0 (and 1) vendor_id : GenuineIntel cpu family : 6 model : 15 model name : Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz Any idea? Xavier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html It is the same with kvm-77. Same failure. Any patch to be tested? Xavier -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [RFC] Disk integrity in QEMU
Avi Kivity wrote: LRU typically makes fairly bad decisions since it throws most of the information it has away. I recommend looking up LRU-K and similar algorithms, just to get a feel for this; it is basically the simplest possible algorithm short of random selection. Note that Linux doesn't even have an LRU; it has to approximate since it can't sample all of the pages all of the time. With a hypervisor that uses Intel's EPT, it's even worse since we don't have an accessed bit. On silly benchmarks that just exercise the disk and touch no memory, and if you tune the host very aggresively, LRU will win on long running guests since it will eventually page out all unused guest memory (with Linux guests, it will never even page guest memory in). On real life applications I don't think there is much chance. But when using O_DIRECT you actuality make the pages not swappable at all... or am i wrong? maybe somekind of combination with the mm shrink could be good, do_try_to_free_pages is good point for reference. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: with kerenl 2.6.27, CONFIG_KVM_GUEST does not work
Does the attached work for you? Avi, do you have thoughts on how to proceed with pvmmu? Using hypercalls instead of faults can still be beneficial (for the first write before page goes out of sync, or for non-leaf tables which currently don't go oos). But at the current state pvmmu should be slower in most loads. Perhaps disable it? KVM: MMU: sync root on paravirt TLB flush The pvmmu TLB flush handler should request a root sync, similarly to a native read-write CR3. Signed-off-by: Marcelo Tosatti [EMAIL PROTECTED] diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c index 79cb4a9..7e70e97 100644 --- a/arch/x86/kvm/mmu.c +++ b/arch/x86/kvm/mmu.c @@ -2747,6 +2747,7 @@ static int kvm_pv_mmu_write(struct kvm_vcpu *vcpu, static int kvm_pv_mmu_flush_tlb(struct kvm_vcpu *vcpu) { kvm_x86_ops-tlb_flush(vcpu); + set_bit(KVM_REQ_MMU_SYNC, vcpu-requests); return 1; } This patch works for me (kvm-77, 2.6.27 host and guest)! kvm-75 works fine, but kvm-76 and kvm-77 (all unpatched) show lot's of segfaults in the guest (2.6.26.5 or 2.6.27, x86_64 on host and guest). Thanks for the patch! HTH, Bernhard -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
Avi Kivity wrote: Anthony Liguori wrote: The proposed patch is less that ideal IMO as it introduces limitations on what you can do with a file. An alternative implementation would add a read/write mode to the buffer, based on the last access type. When switching from read to write, we drop the buffer, and when switching from write to read, we flush it and then drop it. This is more complex but results in a cleaner API. I would think a better solution would introduce two buffers, one for read and one for write. That way, you can have a proper bidirectional stream. Complexity goes way up. Now you need to intercept reads that go to the write buffer, and vice versa. Yeah, Uri: instead of passing an argument to qemu_fopen_ops, it may be better to direct the cases where we do a write and set a flag. Then in the fflush() function, only do the put_buffer if the is_write flag is set. Also, having checks and the read and write functions to determine if the is_write flag is set along with whether buf_index 0 that fprintf()'d and aborted would be good for debugging. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: [PATCH 0/5] bios: 4G updates
Hi, On Thu, Oct 02, 2008 at 03:33:58PM +0300, Avi Kivity wrote: Alex Williamson wrote: It works, so I pushed it out. Alex, can you rebase your bios patches on top of current HEAD? I updated and resent the first patch in the 4 patch follow-on to this one. The remaining 3 patches still apply cleanly. I think Sheng was going to send out a patch to better follow the SDM when changing the MTRRs, but the first 3 patches are independent of that. Thanks, Applied all, thanks. As an aside, is there any interest in using SeaBIOS with kvm? SeaBIOS is a port of bochs bios to gcc. I've been using SeaBIOS (along with coreboot) to boot and provide bios functions on real hardware. It works fine under qemu also. I looked at the changes that kvm has in its local bochs bios repo. Most of the code is the same, however I noticed a number of msr settings which I didn't fully understand. If there is interest, the source code repository can be pulled by running: git clone git://git.linuxtogo.org/home/seabios.git There is a git browser at: http://git.linuxtogo.org/?p=kevin/seabios.git;a=summary And some precompiled binaries at: http://linuxtogo.org/~kevin/SeaBIOS/ Thoughts? -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: Handle multiple interrupt sources
On Saturday 11 October 2008 16:10:51 Amit Shah wrote: From: Sheng Yang [EMAIL PROTECTED] Keep a record of current interrupt state before injecting. Don't assert/deassert repeatedly, so that every caller of kvm_set_irq() can be identified as a separate interrupt source for the IOAPIC/PIC to implement logical OR of level triggered interrupts on one IRQ line. Notice that userspace devices are treated as one device for each IRQ line. The correctness of sharing interrupt for each IRQ line should be ensured by the userspace program (QEmu). [Amit: rebase to kvm.git HEAD] Hi, Amit Thanks for your work! But maybe I miss something. I suppose my later patch can work indepently? I think the second patch should solve the whole problem (sorry to reply it to the second rather than [0/2] which made confusion...). Can you have a check? Thanks! -- regards Yang, Sheng Signed-off-by: Sheng Yang [EMAIL PROTECTED] Signed-off-by: Amit Shah [EMAIL PROTECTED] --- arch/x86/kvm/x86.c | 13 - include/linux/kvm_host.h |3 +++ virt/kvm/kvm_main.c | 12 +--- 3 files changed, 24 insertions(+), 4 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index dda478e..6f45428 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1816,7 +1816,18 @@ long kvm_arch_vm_ioctl(struct file *filp, goto out; if (irqchip_in_kernel(kvm)) { mutex_lock(kvm-lock); - kvm_set_irq(kvm, irq_event.irq, irq_event.level); + /* + * Take one IRQ line as from one device, shared IRQ + * line should also be handled in the userspace before + * use KVM_IRQ_LINE ioctl to change IRQ line state. + */ + if (kvm-userspace_intrsource_states[irq_event.irq] + != irq_event.level) { + kvm_set_irq(kvm, irq_event.irq, + irq_event.level); + kvm-userspace_intrsource_states[irq_event.irq] + = irq_event.level; + } mutex_unlock(kvm-lock); r = 0; } diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3833c48..d392e31 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -129,6 +129,8 @@ struct kvm { unsigned long mmu_notifier_seq; long mmu_notifier_count; #endif + + int userspace_intrsource_states[KVM_IOAPIC_NUM_PINS]; }; /* The guest did something we don't support. */ @@ -306,6 +308,7 @@ struct kvm_assigned_dev_kernel { int host_irq; int guest_irq; int irq_requested; + int irq_state; struct pci_dev *dev; struct kvm *kvm; }; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index cf0ab8e..faa56fb 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -104,8 +104,11 @@ static void kvm_assigned_dev_interrupt_work_handler(struct work_struct *work) * finer-grained lock, update this */ mutex_lock(assigned_dev-kvm-lock); - kvm_set_irq(assigned_dev-kvm, - assigned_dev-guest_irq, 1); + if (assigned_dev-irq_state == 0) { + kvm_set_irq(assigned_dev-kvm, + assigned_dev-guest_irq, 1); + assigned_dev-irq_state = 1; + } mutex_unlock(assigned_dev-kvm-lock); kvm_put_kvm(assigned_dev-kvm); } @@ -134,7 +137,10 @@ static void kvm_assigned_dev_ack_irq(struct kvm_irq_ack_notifier *kian) dev = container_of(kian, struct kvm_assigned_dev_kernel, ack_notifier); - kvm_set_irq(dev-kvm, dev-guest_irq, 0); + if (dev-irq_state == 1) { + kvm_set_irq(dev-kvm, dev-guest_irq, 0); + dev-irq_state = 0; + } enable_irq(dev-host_irq); } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ Re: unhandled vm exit: 0x80000021 vcpu_id 0]
Hi Pier The only thing I can tell that is, seems guest completely messed up... It ran into some non-code segment. unhandled vm exit: 0x8021 vcpu_id 0 rax 0007 rbx 1490 rcx rdx 19a0 rsi rdi rsp 0080 rbp 96bf r8 r9 r10 r11 r12 r13 r14 r15 rip 002a rflags 00023202 cs 14a2 (/ p 0 dpl 0 db 0 s 0 type 9 l 0 g 0 avl 0) ds 19a0 (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) es 1a31 (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) ss 1a29 (/ p 0 dpl 0 db 0 s 0 type 1 l 0 g 0 avl 0) Segments maybe messed up... fs (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gs (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) tr 0058 (00201ffa/ p 1 dpl 0 db 0 s 0 type b l 0 g 0 avl 0) ldt (/ p 0 dpl 0 db 0 s 0 type 0 l 0 g 0 avl 0) gdt 20/1dd8 idt 201df0/188 cr0 8019 cr2 0 cr3 144 cr4 0 cr8 0 efer 0 CR0.PE set(sorry for wrong decode before...), CR0.PG set. Guest in protected mode. But CR4 is wrong, at least CR4.PAE and CR4.VMXE should be set. code: 00 f0 53 ff 00 f0 53 ff 00 f0 a5 fe 00 f0 87 e9 00 f0 53 ff -- 00 f0 53 ff 00 f0 53 ff 00 f0 53 ff 00 f0 57 ef 00 f0 53 ff 00 f0 3a 83 00 c0 4d f8 00 f0 Seems like meaningless code... Well, still don't know what the checkpoint done caused this... At least it seems more than a emulation bug. Anybody else have idea?... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: qemu_fopen_fd: differentiate between reader and writer user
Anthony Liguori wrote: Also, having checks and the read and write functions to determine if the is_write flag is set along with whether buf_index 0 that fprintf()'d and aborted would be good for debugging. I have a patch that does this along with fixing a few other bugs. It's attached. Regards, Anthony Liguori Regards, Anthony Liguori diff --git a/hw/hw.h b/hw/hw.h index e130355..8edd788 100644 --- a/hw/hw.h +++ b/hw/hw.h @@ -11,8 +11,8 @@ * The pos argument can be ignored if the file is only being used for * streaming. The handler should try to write all of the data it can. */ -typedef void (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf, - int64_t pos, int size); +typedef int (QEMUFilePutBufferFunc)(void *opaque, const uint8_t *buf, +int64_t pos, int size); /* Read a chunk of data from a file at the given position. The pos argument * can be ignored if the file is only be used for streaming. The number of @@ -64,6 +64,7 @@ unsigned int qemu_get_be16(QEMUFile *f); unsigned int qemu_get_be32(QEMUFile *f); uint64_t qemu_get_be64(QEMUFile *f); int qemu_file_rate_limit(QEMUFile *f); +int qemu_file_has_error(QEMUFile *f); /* Try to send any outstanding data. This function is useful when output is * halted due to rate limiting or EAGAIN errors occur as it can be used to diff --git a/vl.c b/vl.c index 5659fea..d49c648 100644 --- a/vl.c +++ b/vl.c @@ -6197,12 +6197,15 @@ struct QEMUFile { QEMUFileCloseFunc *close; QEMUFileRateLimit *rate_limit; void *opaque; +int is_write; int64_t buf_offset; /* start of buffer when writing, end of buffer when reading */ int buf_index; int buf_size; /* 0 when writing */ uint8_t buf[IO_BUF_SIZE]; + +int has_error; }; typedef struct QEMUFileFD @@ -6211,34 +6214,6 @@ typedef struct QEMUFileFD QEMUFile *file; } QEMUFileFD; -static void fd_put_notify(void *opaque) -{ -QEMUFileFD *s = opaque; - -/* Remove writable callback and do a put notify */ -qemu_set_fd_handler2(s-fd, NULL, NULL, NULL, NULL); -qemu_file_put_notify(s-file); -} - -static void fd_put_buffer(void *opaque, const uint8_t *buf, - int64_t pos, int size) -{ -QEMUFileFD *s = opaque; -ssize_t len; - -do { -len = write(s-fd, buf, size); -} while (len == -1 errno == EINTR); - -if (len == -1) -len = -errno; - -/* When the fd becomes writable again, register a callback to do - * a put notify */ -if (len == -EAGAIN) -qemu_set_fd_handler2(s-fd, NULL, NULL, fd_put_notify, s); -} - static int fd_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) { QEMUFileFD *s = opaque; @@ -6269,7 +6244,7 @@ QEMUFile *qemu_fopen_fd(int fd) return NULL; s-fd = fd; -s-file = qemu_fopen_ops(s, fd_put_buffer, fd_get_buffer, fd_close, NULL); +s-file = qemu_fopen_ops(s, NULL, fd_get_buffer, fd_close, NULL); return s-file; } @@ -6278,12 +6253,13 @@ typedef struct QEMUFileStdio FILE *outfile; } QEMUFileStdio; -static void file_put_buffer(void *opaque, const uint8_t *buf, +static int file_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, int size) { QEMUFileStdio *s = opaque; fseek(s-outfile, pos, SEEK_SET); fwrite(buf, 1, size, s-outfile); +return size; } static int file_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) @@ -6331,11 +6307,12 @@ typedef struct QEMUFileBdrv int64_t base_offset; } QEMUFileBdrv; -static void bdrv_put_buffer(void *opaque, const uint8_t *buf, -int64_t pos, int size) +static int bdrv_put_buffer(void *opaque, const uint8_t *buf, + int64_t pos, int size) { QEMUFileBdrv *s = opaque; bdrv_pwrite(s-bs, s-base_offset + pos, buf, size); +return size; } static int bdrv_get_buffer(void *opaque, uint8_t *buf, int64_t pos, int size) @@ -6384,18 +6361,29 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc *put_buffer, f-get_buffer = get_buffer; f-close = close; f-rate_limit = rate_limit; +f-is_write = 0; return f; } +int qemu_file_has_error(QEMUFile *f) +{ +return f-has_error; +} + void qemu_fflush(QEMUFile *f) { if (!f-put_buffer) return; -if (f-buf_index 0) { -f-put_buffer(f-opaque, f-buf, f-buf_offset, f-buf_index); -f-buf_offset += f-buf_index; +if (f-is_write f-buf_index 0) { +int len; + + len = f-put_buffer(f-opaque, f-buf, f-buf_offset, f-buf_index); + if (len 0) + f-buf_offset += f-buf_index; + else + f-has_error = 1; f-buf_index = 0; } } @@ -6407,13 +6395,16 @@ static void qemu_fill_buffer(QEMUFile *f) if (!f-get_buffer) return; -len = f-get_buffer(f-opaque, f-buf, f-buf_offset,
Re: [PATCH 1/2] KVM: Handle multiple interrupt sources
- Sheng Yang [EMAIL PROTECTED] wrote: On Saturday 11 October 2008 16:10:51 Amit Shah wrote: From: Sheng Yang [EMAIL PROTECTED] Keep a record of current interrupt state before injecting. Don't assert/deassert repeatedly, so that every caller of kvm_set_irq() can be identified as a separate interrupt source for the IOAPIC/PIC to implement logical OR of level triggered interrupts on one IRQ line. Notice that userspace devices are treated as one device for each IRQ line. The correctness of sharing interrupt for each IRQ line should be ensured by the userspace program (QEmu). [Amit: rebase to kvm.git HEAD] Hi, Amit Thanks for your work! But maybe I miss something. I suppose my later patch can work indepently? I think the second patch should solve the whole problem (sorry to reply it to the second rather than [0/2] which made confusion...). Can you have a check? I'm not sure I understand. Which concern are you talking about? I used the latest patch that you sent and I also verified that it works. Amit -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] KVM: Handle multiple interrupt sources
On Monday 13 October 2008 13:06:18 Amit Shah wrote: - Sheng Yang [EMAIL PROTECTED] wrote: On Saturday 11 October 2008 16:10:51 Amit Shah wrote: From: Sheng Yang [EMAIL PROTECTED] Keep a record of current interrupt state before injecting. Don't assert/deassert repeatedly, so that every caller of kvm_set_irq() can be identified as a separate interrupt source for the IOAPIC/PIC to implement logical OR of level triggered interrupts on one IRQ line. Notice that userspace devices are treated as one device for each IRQ line. The correctness of sharing interrupt for each IRQ line should be ensured by the userspace program (QEmu). [Amit: rebase to kvm.git HEAD] Hi, Amit Thanks for your work! But maybe I miss something. I suppose my later patch can work indepently? I think the second patch should solve the whole problem (sorry to reply it to the second rather than [0/2] which made confusion...). Can you have a check? I'm not sure I understand. Which concern are you talking about? I used the latest patch that you sent and I also verified that it works. Well, at least I meant to replace all of my first two patches with my later one... I suppose the second patch(my later one, derived from Avi's suggestion) should work alone without the first one... -- regards Yang, Sheng -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html