Re: [PATCH v2] kvm tools: Support multiple net devices
On Sun, Sep 25, 2011 at 2:11 PM, Sasha Levin levinsasha...@gmail.com wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] Each of the parameters is optional, and the config defaults to a TAP based networking with a sequential MAC. Signed-off-by: Sasha Levin levinsasha...@gmail.com Seems reasonable. Asias, is v2 OK to merge? -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT
On Mon, Sep 26, 2011 at 12:40:18AM -0400, Kevin O'Connor wrote: On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote: On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote: The code to generate basic SSDT code isn't that difficult (see build_ssdt and src/ssdt-proc.dsl). Is there a compelling reason to patch the DSDT versus just generating the necessary blocks in an SSDT? I don't really care whether the code is in DSDT or SSDT, IMO there isn't much difference between build_ssdt and patching: main reason is build_ssdt uses offsets hardcoded to a specific binary (ssdt_proc and SD_OFFSET_* ) while I used a script to extract offsets. Yes - your script to extract the offsets is nice. If you still have doubts, it might make sense to merge just patch 1 - acpi: generate and parse mixed asl/aml listing - as the first step. With the infrastructure in place it will be easier to discuss the best way to use it. I think we should avoid relying on copy-pasted binary because I see the related ASL code changing in the near future (with multifunction and bridge support among others). Can you expand on this? Would multi-function and bridge support make patching easier than dynamic SSDT generation? Just generic concerns: 1. We are going to have to modify DSL, so binary bytecode would be hard to maintain. Specifically IMO the more work can be done compile-time, the better, both iasl and my script do compile-time checks to keep runtime simple. 2. There are going to be a lot of devices so one new table with all of them would be ok, a table per device would be expensive. I'm a little leary of patching the DSDT - doubly so when the DSDT is creating dummy devices that are then dynamically patched out. A small correction, my specific code only patches out methods, not whole devices. (For example, even with your patch series there are still two devices defined with _ADR 1.) So this is a bug, I think. I am not sure whether we just want ISA and VGA always non-removable - is there some reason that we have hotplug_slot macros for these slots? If no we should just remove hotplug macros for these two slots. Gleb, Marcelo, any preferences? If we want to keep the option of making these slots hotpluggable from qemu, it is still easy to fix the duplication. For example, I could put ACPI_EXTRACT tags in VGA/ISA devices without renaming them. But the easiest way is probably by using Scope. See patch below. It seems more straight-forward to just create the devices that are needed. -Kevin FWIW the acpi spec does mention that it's ok to describe unoccupied slots. Multiple device with the same _ADR relies on unspecified behavior of ACPI. Fix DSDT so we always have a single device per slot. Signed-off-by: Michael S. Tsirkin m...@redhat.com - diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 055202b..305d2e5 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -260,8 +260,8 @@ DefinitionBlock ( } Scope(\_SB.PCI0) { -Device (VGA) { - Name (_ADR, 0x0002) + /* VGA device */ +Scope (S2) { OperationRegion(PCIC, PCI_Config, Zero, 0x4) Field(PCIC, DWordAcc, NoLock, Preserve) { VEND, 32 @@ -282,13 +282,10 @@ DefinitionBlock ( Return (0x00) } } - Method(_RMV) { Return (0x00) } } /* PIIX3 ISA bridge */ -Device (ISA) { -Name (_ADR, 0x0001) -Method(_RMV) { Return (0x00) } +Scope (S1) { /* PIIX PCI to ISA irq remapping */ @@ -486,7 +483,7 @@ DefinitionBlock ( /* PCI IRQs */ Scope(\_SB) { - Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) + Field (\_SB.PCI0.S1.P40C, ByteAcc, NoLock, Preserve) { PRQ0, 8, PRQ1, 8, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [SeaBIOS] [PATCH 0/4] acpi: fix up EJ0 in DSDT
Hi all, defined with _ADR 1.) It seems more straight-forward to just create the devices that are needed. Yes something like this: http://review.coreboot.org/gitweb?p=coreboot.git;a=blob;f=src/arch/x86/boot/acpigen.c;hb=HEAD Thanks Rudolf -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: Questions about duplicate memory work
On Mon, Sep 26, 2011 at 01:49:18PM +0800, Emmanuel Noobadmin wrote: On 9/25/11, Robin Lee Powell rlpow...@digitalkingdom.org wrote: OK, so I've got a Linux host, and a bunch of Linux VMs. This means that the host *and* all tho VMs do their own disk caches/buffers and do their own swap as well. If I'm not wrong, that's why the recommended and current default in libvirtd is to create storage devices with no caching to remove one layer of duplication. How do you do that? I have my VMs using LVs created on the host as their disks, but I'm open to other methods if there are significant advantages. I've considered turning off swap on the VMs so all the swapping at least happens in *one place*; I dunno if that's best. Not sure it's a good idea. If the VM needs more working memory than you allocated, I think it locks up dead if there is insufficient swap space. At least that appears to be what happened to one of mine. Good to know, thanks. -Robin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
On Sat, Sep 24, 2011 at 12:19 AM, Kevin Wolf kw...@redhat.com wrote: Am 08.09.2011 12:11, schrieb Zhi Yong Wu: Note: 1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. 2.) When dd command is issued in guest, if its option bs is set to a large value such as bs=1024K, the result speed will slightly bigger than the limits. For these problems, if you have nice thought, pls let us know.:) Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- block.c | 259 --- block.h | 1 - 2 files changed, 248 insertions(+), 12 deletions(-) One general comment: What about synchronous and/or coroutine I/O operations? Do you think they are just not important enough to consider here or were they forgotten? For sync ops, we assume that it will be converse into async mode at some point of future, right? For coroutine I/O, it is introduced in image driver layer, and behind bdrv_aio_readv/writev. I think that we need not consider them, right? Also, do I understand correctly that you're always submitting the whole Right, when the block timer fire, it will flush whole request queue. queue at once? Does this effectively enforce the limit all the time or will it lead to some peaks and then no requests at all for a while until In fact, it only try to submit those enqueued request one by one. If fail to pass the limit, this request will be enqueued again. the average is right again? Yeah, it is possible. Do you better idea? Maybe some documentation on how it all works from a high level perspective would be helpful. diff --git a/block.c b/block.c index cd75183..c08fde8 100644 --- a/block.c +++ b/block.c @@ -30,6 +30,9 @@ #include qemu-objects.h #include qemu-coroutine.h +#include qemu-timer.h +#include block/blk-queue.h + #ifdef CONFIG_BSD #include sys/types.h #include sys/stat.h @@ -72,6 +75,13 @@ static int coroutine_fn bdrv_co_writev_em(BlockDriverState *bs, QEMUIOVector *iov); static int coroutine_fn bdrv_co_flush_em(BlockDriverState *bs); +static bool bdrv_exceed_bps_limits(BlockDriverState *bs, int nb_sectors, + bool is_write, double elapsed_time, uint64_t *wait); +static bool bdrv_exceed_iops_limits(BlockDriverState *bs, bool is_write, + double elapsed_time, uint64_t *wait); +static bool bdrv_exceed_io_limits(BlockDriverState *bs, int nb_sectors, + bool is_write, int64_t *wait); + static QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); @@ -745,6 +755,11 @@ int bdrv_open(BlockDriverState *bs, const char *filename, int flags, bs-change_cb(bs-change_opaque, CHANGE_MEDIA); } + /* throttling disk I/O limits */ + if (bs-io_limits_enabled) { + bdrv_io_limits_enable(bs); + } + return 0; unlink_and_fail: @@ -783,6 +798,18 @@ void bdrv_close(BlockDriverState *bs) if (bs-change_cb) bs-change_cb(bs-change_opaque, CHANGE_MEDIA); } + + /* throttling disk I/O limits */ + if (bs-block_queue) { + qemu_del_block_queue(bs-block_queue); + bs-block_queue = NULL; + } + + if (bs-block_timer) { + qemu_del_timer(bs-block_timer); + qemu_free_timer(bs-block_timer); + bs-block_timer = NULL; + } Why not io_limits_disable() instead of copying the code here? Good point, thanks. } void bdrv_close_all(void) @@ -2341,16 +2368,48 @@ BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, BlockDriverCompletionFunc *cb, void *opaque) { BlockDriver *drv = bs-drv; - + BlockDriverAIOCB *ret; + int64_t wait_time = -1; +printf(sector_num=%ld, nb_sectors=%d\n, sector_num, nb_sectors); Debugging leftover (more of them follow, won't comment on each one) Removed. trace_bdrv_aio_readv(bs, sector_num, nb_sectors, opaque); - if (!drv) - return NULL; - if (bdrv_check_request(bs, sector_num, nb_sectors)) + if (!drv || bdrv_check_request(bs, sector_num, nb_sectors)) { return NULL; + } This part is unrelated. Have changed it to original. + + /* throttling disk read I/O */ + if (bs-io_limits_enabled) { + if (bdrv_exceed_io_limits(bs, nb_sectors, false, wait_time)) { + ret = qemu_block_queue_enqueue(bs-block_queue, bs, bdrv_aio_readv, + sector_num, qiov, nb_sectors, cb, opaque); + printf(wait_time=%ld\n, wait_time); + if (wait_time != -1) { + printf(reset block timer\n); + qemu_mod_timer(bs-block_timer, + wait_time + qemu_get_clock_ns(vm_clock)); + } + + if (ret) { + printf(ori ret is not
Re: [Qemu-devel] [PATCH v8 1/4] block: add the block queue support
On Fri, Sep 23, 2011 at 11:32 PM, Kevin Wolf kw...@redhat.com wrote: Am 08.09.2011 12:11, schrieb Zhi Yong Wu: Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com --- Makefile.objs | 2 +- block/blk-queue.c | 201 + block/blk-queue.h | 59 block_int.h | 27 +++ 4 files changed, 288 insertions(+), 1 deletions(-) create mode 100644 block/blk-queue.c create mode 100644 block/blk-queue.h diff --git a/Makefile.objs b/Makefile.objs index 26b885b..5dcf456 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -33,7 +33,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o dmg.o bochs.o vpc.o vv block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow2-cache.o block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o -block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o +block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o blk-queue.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block/blk-queue.c b/block/blk-queue.c new file mode 100644 index 000..adef497 --- /dev/null +++ b/block/blk-queue.c @@ -0,0 +1,201 @@ +/* + * QEMU System Emulator queue definition for block layer + * + * Copyright (c) IBM, Corp. 2011 + * + * Authors: + * Zhi Yong Wu wu...@linux.vnet.ibm.com + * Stefan Hajnoczi stefa...@linux.vnet.ibm.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include block_int.h +#include block/blk-queue.h +#include qemu-common.h + +/* The APIs for block request queue on qemu block layer. + */ + +struct BlockQueueAIOCB { + BlockDriverAIOCB common; + QTAILQ_ENTRY(BlockQueueAIOCB) entry; + BlockRequestHandler *handler; + BlockDriverAIOCB *real_acb; + + int64_t sector_num; + QEMUIOVector *qiov; + int nb_sectors; +}; The idea is that each request is first queued on the QTAILQ, and at some point it's removed from the queue and gets a real_acb. But it never has both at the same time. Correct? NO. if block I/O throttling is enabled and I/O rate at runtime exceed this limits, this request will be enqueued. It represents the whole lifecycle of one enqueued request. Can we have the basic principle of operation spelled out as a comment somewhere near the top of the file? OK, i will. + +typedef struct BlockQueueAIOCB BlockQueueAIOCB; + +struct BlockQueue { + QTAILQ_HEAD(requests, BlockQueueAIOCB) requests; + bool req_failed; + bool flushing; +}; I find req_failed pretty confusing. Needs documentation at least, but most probably also a better name. OK. request_has_failed? + +static void qemu_block_queue_dequeue(BlockQueue *queue, + BlockQueueAIOCB *request) +{ + BlockQueueAIOCB *req; + + assert(queue); + while (!QTAILQ_EMPTY(queue-requests)) { + req = QTAILQ_FIRST(queue-requests); + if (req == request) { + QTAILQ_REMOVE(queue-requests, req, entry); + break; + } + } +} Is it just me or is this an endless loop if the request isn't the first element in the list? queue-requests is only used to store requests which exceed the limits. Why is the request not the first evlement? + +static void qemu_block_queue_cancel(BlockDriverAIOCB *acb) +{ + BlockQueueAIOCB *request = container_of(acb, BlockQueueAIOCB, common); + if (request-real_acb) { + bdrv_aio_cancel(request-real_acb); + } else { + assert(request-common.bs-block_queue); + qemu_block_queue_dequeue(request-common.bs-block_queue, + request); + } + + qemu_aio_release(request); +} + +static AIOPool
Re: [kvm] Re: Questions about duplicate memory work
On 9/26/11, Robin Lee Powell rlpow...@digitalkingdom.org wrote: On Mon, Sep 26, 2011 at 01:49:18PM +0800, Emmanuel Noobadmin wrote: On 9/25/11, Robin Lee Powell rlpow...@digitalkingdom.org wrote: OK, so I've got a Linux host, and a bunch of Linux VMs. This means that the host *and* all tho VMs do their own disk caches/buffers and do their own swap as well. If I'm not wrong, that's why the recommended and current default in libvirtd is to create storage devices with no caching to remove one layer of duplication. How do you do that? I have my VMs using LVs created on the host as their disks, but I'm open to other methods if there are significant advantages. It's unrelated to what you're actually using as the disks, whether file or block devices like LVs. I think it just makes KVM tell the host not to cache I/O done on the storage device. To do so just use the option cache=none when specify the storage. e.g. from http://www.linux-kvm.org/page/Tuning_KVM qemu -drive file=/dev/mapper/ImagesVolumeGroup-Guest1,cache=none,if=virtio or edit the cache attribute in the libvirt domain XML file if you're using that. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v8 3/4] block: add block timer and throttling algorithm
On Tue, Sep 20, 2011 at 8:34 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Mon, Sep 19, 2011 at 05:55:41PM +0800, Zhi Yong Wu wrote: On Wed, Sep 14, 2011 at 6:50 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Tue, Sep 13, 2011 at 11:09:46AM +0800, Zhi Yong Wu wrote: On Fri, Sep 9, 2011 at 10:44 PM, Marcelo Tosatti mtosa...@redhat.com wrote: On Thu, Sep 08, 2011 at 06:11:07PM +0800, Zhi Yong Wu wrote: Note: 1.) When bps/iops limits are specified to a small value such as 511 bytes/s, this VM will hang up. We are considering how to handle this senario. You can increase the length of the slice, if the request is larger than slice_time * bps_limit. Yeah, but it is a challenge for how to increase it. Do you have some nice idea? If the queue is empty, and the request being processed does not fit the queue, increase the slice so that the request fits. Sorry for late reply. actually, do you think that this scenario is meaningful for the user? Since we implement this, if the user limits the bps below 512 bytes/second, the VM can also not run every task. Can you let us know why we need to make such effort? It would be good to handle request larger than the slice. It is not strictly necessary, but in case its not handled, a minimum should be in place, to reflect maximum request size known. Being able to specify something which crashes is not acceptable. HI, Marcelo, any comments? I have post the implementation based on your suggestions -- Regards, Zhi Yong Wu -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu: Fix inject-nmi
On 09/25/2011 08:22 PM, Jan Kiszka wrote: On 2011-09-25 16:07, Avi Kivity wrote: On 09/23/2011 12:31 PM, Lai Jiangshan wrote: Moreover: wrong indention. You know that this won't work for qemu-kvm with in-kernel irqchip? You may want to provide a patch for that tree, emulating the unavailable LINT1 injection via testing the APIC configration and then raising an NMI as before if it is accepted. It works in my box but the NMI is not injected through the in-kernel irqchip, I will implement it as you suggested. Somewhat hacky; isn't it better to test LINT1 in the kernel (and redefine the KVM_NMI ioctl as toggle LINT1)? KVM_NMI is required for user space IRQ chip as well. We could define KVM_NMI as edging the core NMI input if !irqchip_in_kernel, and toggling LINT1 otherwise. Hardly nice though. The current KVM_NMI with irqchip_in_kernel is not meaningful, since it doesn't obey the rules of any NMI source. Introducing some KVM_SET_LINT1 is an option though. But emulating it for the NMI button on older kernels sounds worthwhile nevertheless. Perhaps this is the best option to avoid confusion. -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote: Based on the discussions over the last couple of weeks I have updated the device fd file layout proposal and tried to specify it a bit more formally. === 1. Overview This specification describes the layout of device files used in the context of vfio, which gives user space direct access to I/O devices that have been bound to vfio. When a device fd is opened and read, offset 0x0 contains a fixed sized header followed by a number of variable length records that describe different characteristics of the device-- addressable regions, interrupts, etc. 0x0 +-+-+ | magic | u32 // identifies this as a vfio device file +---+ and identifies the type of bus | version | u32 // specifies the version of this +---+ | flags | u32 // encodes any flags +---+ | dev info record 0| |type | u32 // type of record |rec_len| u32 // length in bytes of record | | (including record header) |flags | u32 // type specific flags |...content... | // record content, which could +---+ // include sub-records | dev info record 1| +---+ | dev info record N| +---+ I really should have chimed in on this earlier, but I've been very busy. Um, not to put too fine a point on it, this is madness. Yes, it's very flexible and can thereby cover a very wide range of cases. But it's much, much too complex. Userspace has to parse a complex, multilayered data structure, with variable length elements just to get an address at which to do IO. I can pretty much guarantee that if we went with this, most userspace programs using this interface would just ignore this metadata and directly map the offsets at which they happen to know the kernel will put things for the type of device they care about. _At least_ for PCI, I think the original VFIO layout of each BAR at a fixed, well known offset is much better. Despite its limitations, just advertising a device type ID which describes one of a few fixed layouts would be preferable to this. I'm still hoping, that we can do a bit better than that. But we should try really hard to at the very least force the metadata into a simple array of resources each with a fixed size record describing it, even if it means some space wastage with occasionally-used fields. Anything more complex than that and userspace is just never going to use it properly. -- David Gibson| I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Weird console issue.
I'm running qemu-kvm on Fedora, with Fedora VMs, normally via libvirt/virsh. I have serial consoles setup, like so: serial type='pty' target port='0'/ /serial console type='pty' target type='serial' port='0'/ /console $ sudo virsh ttyconsole stodi /dev/pts/3 With both PuTTY and shellinabox (web-based shell) to the master, connecting via either virsh console *or* simple screen /dev/pts/3, with agetty running on the VM like so: sudo /sbin/agetty 115200 ttyS0, I get the following problem: Everything seems to work until it asks for a password. As soon as that happens, any character I type causes it to respond as though I'd hit enter. This continues until /bin/login completes and a new getty starts, at which point it prints the right headers and I can enter my userid, and then it asks for a password and it breaks again in the same way. Obviously, this makes the console not so useful. Any help or suggestions would be welcome. -Robin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: Questions about duplicate memory work
On 09/26/2011 11:15 AM, Emmanuel Noobadmin wrote: This means that the host *and* all tho VMs do their own disk caches/buffers and do their own swap as well. If I'm not wrong, that's why the recommended and current default in libvirtd is to create storage devices with no caching to remove one layer of duplication. How do you do that? or edit the cache attribute in the libvirt domain XML file if you're using that. Just saw it: http://libvirt.org/formatdomain.html#elementsDisks disk ... driver ... cache=none ... / ... / /disk Good to know. -- RMA. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
KVM call agenda for September 26
Hi Please send in any agenda items you are interested in covering. Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: KVM call agenda for September 26
Juan Quintela quint...@redhat.com wrote: Hi I should learn to read the calendar Call is for September 27th, of course. Sorry for any inconvenience. Please send in any agenda items you are interested in covering. Thanks, Juan. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html #secure method=pgpmime mode=sign -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
Am 26.09.2011 um 09:51 schrieb David Gibson da...@gibson.dropbear.id.au: On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote: Based on the discussions over the last couple of weeks I have updated the device fd file layout proposal and tried to specify it a bit more formally. === 1. Overview This specification describes the layout of device files used in the context of vfio, which gives user space direct access to I/O devices that have been bound to vfio. When a device fd is opened and read, offset 0x0 contains a fixed sized header followed by a number of variable length records that describe different characteristics of the device-- addressable regions, interrupts, etc. 0x0 +-+-+ | magic | u32 // identifies this as a vfio device file +---+ and identifies the type of bus | version | u32 // specifies the version of this +---+ | flags | u32 // encodes any flags +---+ | dev info record 0| |type | u32 // type of record |rec_len| u32 // length in bytes of record | | (including record header) |flags | u32 // type specific flags |...content... | // record content, which could +---+ // include sub-records | dev info record 1| +---+ | dev info record N| +---+ I really should have chimed in on this earlier, but I've been very busy. Um, not to put too fine a point on it, this is madness. Yes, it's very flexible and can thereby cover a very wide range of cases. But it's much, much too complex. Userspace has to parse a complex, multilayered data structure, with variable length elements just to get an address at which to do IO. I can pretty much guarantee that if we went with this, most userspace programs using this interface would just ignore this metadata and directly map the offsets at which they happen to know the kernel will put things for the type of device they care about. _At least_ for PCI, I think the original VFIO layout of each BAR at a fixed, well known offset is much better. Despite its limitations, just advertising a device type ID which describes one of a few fixed layouts would be preferable to this. I'm still hoping, that we can do a bit better than that. But we should try really hard to at the very least force the metadata into a simple array of resources each with a fixed size record describing it, even if it means some space wastage with occasionally-used fields. Anything more complex than that and userspace is just never going to use it properly. We will have 2 different types of user space. One wants to be as generic as possible and needs all this dynamic information. QEMU would fall into this category. The other one is device specific drivers in user space. Here hardcoding might make sense. For the generic interface, we need something that us as verbose as possible and lets us enumerate all the device properties, so we can properly map and forward them to the guest. However, nothing keeps us from mapping BARs always at static offsets into the file. If you don't need the generic info, then you don't need it. Also, if you can come up with an interface that does not have variable length descriptors but is still able to export all the required generic information, please send a proposal to the list :) Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] nVMX: Add KVM_REQ_IMMEDIATE_EXIT
On Sun, Sep 25, 2011 at 11:13:06AM +0300, Nadav Har'El wrote: On Fri, Sep 23, 2011, Marcelo Tosatti wrote about Re: [PATCH 1/2] nVMX: Add KVM_REQ_IMMEDIATE_EXIT: On Thu, Sep 22, 2011 at 01:52:56PM +0300, Nadav Har'El wrote: This patch adds a new vcpu-requests bit, KVM_REQ_IMMEDIATE_EXIT. This bit requests that when next entering the guest, we should run it only for as little as possible, and exit again. We use this new option in nested VMX: When L1 launches L2, but L0 wishes L1 ... @@ -5647,6 +5648,8 @@ static int vcpu_enter_guest(struct kvm_v } if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu)) record_steal_time(vcpu); + req_immediate_exit = + kvm_check_request(KVM_REQ_IMMEDIATE_EXIT, vcpu); ... The immediate exit information can be lost if entry decides to bail out. You can do req_immediate_exit = kvm_check_request(KVM_REQ_IMMEDIATE_EXIT) after preempt_disable() and then transfer back the bit in the bail out case in if (vcpu-mode == EXITING_GUEST_MODE || vcpu-requests Thanks. But thinking about this a bit, it seems to me that in my case *losing* this bit on a canceled entry is the correct thing to do, as turning on this bit was decided in the injection phase (in enable_irq_window()), and next time, if the reason to turn on this bit still exists (i.e., L0 has something to inject to L1, but L2 needs to run), we will turn it on again. Correct, the loss is irrelevant. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT
On Mon, Sep 26, 2011 at 10:04:13AM +0300, Michael S. Tsirkin wrote: On Mon, Sep 26, 2011 at 12:40:18AM -0400, Kevin O'Connor wrote: On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote: On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote: The code to generate basic SSDT code isn't that difficult (see build_ssdt and src/ssdt-proc.dsl). Is there a compelling reason to patch the DSDT versus just generating the necessary blocks in an SSDT? I don't really care whether the code is in DSDT or SSDT, IMO there isn't much difference between build_ssdt and patching: main reason is build_ssdt uses offsets hardcoded to a specific binary (ssdt_proc and SD_OFFSET_* ) while I used a script to extract offsets. Yes - your script to extract the offsets is nice. If you still have doubts, it might make sense to merge just patch 1 - acpi: generate and parse mixed asl/aml listing - as the first step. With the infrastructure in place it will be easier to discuss the best way to use it. I think we should avoid relying on copy-pasted binary because I see the related ASL code changing in the near future (with multifunction and bridge support among others). Can you expand on this? Would multi-function and bridge support make patching easier than dynamic SSDT generation? Just generic concerns: 1. We are going to have to modify DSL, so binary bytecode would be hard to maintain. Specifically IMO the more work can be done compile-time, the better, both iasl and my script do compile-time checks to keep runtime simple. 2. There are going to be a lot of devices so one new table with all of them would be ok, a table per device would be expensive. I'm a little leary of patching the DSDT - doubly so when the DSDT is creating dummy devices that are then dynamically patched out. A small correction, my specific code only patches out methods, not whole devices. And note only the name of the method is changed to something which the guests do not identify, its not as if the entire method is added or removed (although IMO it would be interesting to patch out entirely with NOPs). (For example, even with your patch series there are still two devices defined with _ADR 1.) So this is a bug, I think. I am not sure whether we just want ISA and VGA always non-removable - is there some reason that we have hotplug_slot macros for these slots? Just so its a nice linear range. If no we should just remove hotplug macros for these two slots. Gleb, Marcelo, any preferences? Currently they are hardcoded as not hotpluggable (as can be noted by the RMV macros), but its nice to retrieve that information from QEMU instead. If we want to keep the option of making these slots hotpluggable from qemu, it is still easy to fix the duplication. For example, I could put ACPI_EXTRACT tags in VGA/ISA devices without renaming them. But the easiest way is probably by using Scope. See patch below. It seems more straight-forward to just create the devices that are needed. -Kevin FWIW the acpi spec does mention that it's ok to describe unoccupied slots. Multiple device with the same _ADR relies on unspecified behavior of ACPI. Fix DSDT so we always have a single device per slot. Signed-off-by: Michael S. Tsirkin m...@redhat.com - diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 055202b..305d2e5 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -260,8 +260,8 @@ DefinitionBlock ( } Scope(\_SB.PCI0) { -Device (VGA) { - Name (_ADR, 0x0002) + /* VGA device */ +Scope (S2) { OperationRegion(PCIC, PCI_Config, Zero, 0x4) Field(PCIC, DWordAcc, NoLock, Preserve) { VEND, 32 @@ -282,13 +282,10 @@ DefinitionBlock ( Return (0x00) } } - Method(_RMV) { Return (0x00) } } /* PIIX3 ISA bridge */ -Device (ISA) { -Name (_ADR, 0x0001) -Method(_RMV) { Return (0x00) } +Scope (S1) { /* PIIX PCI to ISA irq remapping */ @@ -486,7 +483,7 @@ DefinitionBlock ( /* PCI IRQs */ Scope(\_SB) { - Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) + Field (\_SB.PCI0.S1.P40C, ByteAcc, NoLock, Preserve) { PRQ0, 8, PRQ1, 8, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GIT PULL] KVM updates for 3.1-rc7
Linus, please pull from: git://github.com/avikivity/kvm.git kvm-updates/3.1 to receive an emulator fix affecting shld/shrd and a softmmu issue on i386 hosts. Avi Kivity (1): KVM: x86 emulator: fix Src2CL decode Zhao Jin (1): KVM: MMU: fix incorrect return of spte arch/x86/kvm/emulate.c |2 +- arch/x86/kvm/mmu.c |3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: Support multiple net devices
On 09/25/2011 07:11 PM, Sasha Levin wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] This syntax is actually, no? --network/-n mode=tap,guest_ip=x.x.x.x not --network/-n mode=tag guest_ip=x.x.x.x This works for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1 This does not work for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1 Each of the parameters is optional, and the config defaults to a TAP based networking with a sequential MAC. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/builtin-run.c | 146 tools/kvm/virtio/net.c | 155 +- 2 files changed, 191 insertions(+), 110 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 28dc95a..070b1b6 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -87,9 +87,12 @@ static bool sdl; static bool balloon; static bool using_rootfs; static bool custom_rootfs; +static bool no_net; extern bool ioport_debug; extern int active_console; extern int debug_iodelay; +struct virtio_net_parameters *net_params; +int num_net_devices; bool do_debug_print = false; @@ -182,6 +185,89 @@ static int tty_parser(const struct option *opt, const char *arg, int unset) return 0; } +static inline void str_to_mac(const char *str, char *mac) +{ + sscanf(str, %hhx:%hhx:%hhx:%hhx:%hhx:%hhx, + mac, mac+1, mac+2, mac+3, mac+4, mac+5); +} +static int set_net_param(struct virtio_net_parameters *p, const char *param, + const char *val) +{ + if (strcmp(param, guest_mac) == 0) { + str_to_mac(val, p-guest_mac); + } else if (strcmp(param, mode) == 0) { + if (!strncmp(val, user, 4)) { + int i; + + for (i = 0; i num_net_devices; i++) + if (net_params[i].mode == NET_MODE_USER) + die(Only one usermode network device allowed at a time); + p-mode = NET_MODE_USER; + } else if (!strncmp(val, tap, 3)) { + p-mode = NET_MODE_TAP; + } else if (!strncmp(val, none, 4)) { + no_net = 1; + return -1; + } else + die(Unkown network mode %s, please use user, tap or none, network); + } else if (strcmp(param, script) == 0) { + p-script = val; + } else if (strcmp(param, guest_ip) == 0) { + p-guest_ip = val; + } else if (strcmp(param, host_ip) == 0) { + p-host_ip = val; + } + + return 0; +} + +static int netdev_parser(const struct option *opt, const char *arg, int unset) +{ + struct virtio_net_parameters p; + char *buf, *cmd = NULL, *cur = NULL; + bool on_cmd = true; + + if (arg) { + buf = strdup(arg); + if (buf == NULL) + die(Failed allocating new net buffer); + cur = strtok(buf, ,=); + } + + p = (struct virtio_net_parameters) { + .guest_ip = DEFAULT_GUEST_ADDR, + .host_ip= DEFAULT_HOST_ADDR, + .script = DEFAULT_SCRIPT, + .mode = NET_MODE_TAP, + }; + + str_to_mac(DEFAULT_GUEST_MAC, p.guest_mac); + p.guest_mac[5] += num_net_devices; + + while (cur) { + if (on_cmd) { + cmd = cur; + } else { + if (set_net_param(p, cmd, cur) 0) + goto done; + } + on_cmd = !on_cmd; + + cur = strtok(NULL, ,=); + }; + + num_net_devices++; + + net_params = realloc(net_params, num_net_devices * sizeof(*net_params)); + if (net_params == NULL) + die(Failed adding new network device); + + net_params[num_net_devices - 1] = p; + +done: + return 0; +} + static int shmem_parser(const struct option *opt, const char *arg, int unset) { const u64 default_size = SHMEM_DEFAULT_SIZE; @@ -339,18 +425,9 @@ static const struct option options[] = { Kernel command line arguments), OPT_GROUP(Networking options:), - OPT_STRING('n', network, network, user, tap, none, - Network to use), - OPT_STRING('\0', host-ip, host_ip, a.b.c.d, - Assign this address to the host side networking), - OPT_STRING('\0',
Re: [PATCH v2] kvm tools: Support multiple net devices
On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote: On 09/25/2011 07:11 PM, Sasha Levin wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] This syntax is actually, no? --network/-n mode=tap,guest_ip=x.x.x.x not --network/-n mode=tag guest_ip=x.x.x.x This works for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1 This does not work for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1 Yes, with ',' ofcourse :) -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: Support multiple net devices
On 09/26/2011 07:59 PM, Sasha Levin wrote: On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote: On 09/25/2011 07:11 PM, Sasha Levin wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] This syntax is actually, no? --network/-n mode=tap,guest_ip=x.x.x.x not --network/-n mode=tag guest_ip=x.x.x.x This works for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1 This does not work for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1 Yes, with ',' ofcourse :) Sasha, Can you drop these []? There are way too many []. Although [] tells user this option is optional. --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] Does this look ok? --network/-n mode=tap|user|none,guest_ip=x.x.x.x,host_ip=x.x.x.x,guest_mac=xx:xx:xx:xx:xx:xx,script=script.sh -- Asias He -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: Support multiple net devices
On Mon, 2011-09-26 at 20:11 +0800, Asias He wrote: On 09/26/2011 07:59 PM, Sasha Levin wrote: On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote: On 09/25/2011 07:11 PM, Sasha Levin wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] This syntax is actually, no? --network/-n mode=tap,guest_ip=x.x.x.x not --network/-n mode=tag guest_ip=x.x.x.x This works for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1 This does not work for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1 Yes, with ',' ofcourse :) Sasha, Can you drop these []? There are way too many []. Although [] tells user this option is optional. --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] Does this look ok? --network/-n mode=tap|user|none,guest_ip=x.x.x.x,host_ip=x.x.x.x,guest_mac=xx:xx:xx:xx:xx:xx,script=script.sh We really need them to specify they are optional. How about this: --network/-n [mode=tap/user/none][,guest_ip=ip][,host_ip=ip][,guest_mac=mac][,script=file] -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: Support multiple net devices
On 09/26/2011 08:14 PM, Sasha Levin wrote: On Mon, 2011-09-26 at 20:11 +0800, Asias He wrote: On 09/26/2011 07:59 PM, Sasha Levin wrote: On Mon, 2011-09-26 at 19:57 +0800, Asias He wrote: On 09/25/2011 07:11 PM, Sasha Levin wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] This syntax is actually, no? --network/-n mode=tap,guest_ip=x.x.x.x not --network/-n mode=tag guest_ip=x.x.x.x This works for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap,host_ip=192.168.100.1 -n mode=tap,host_ip=192.168.200.1 This does not work for me: $ sudo ./kvm run -d sid.img -p root=/dev/vda1 -k bzImage \ -n mode=tap host_ip=192.168.100.1 -n mode=tap host_ip=192.168.200.1 Yes, with ',' ofcourse :) Sasha, Can you drop these []? There are way too many []. Although [] tells user this option is optional. --network/-n [mode=[tap/user/none]] [guest_ip=[guest ip]] [host_ip= [host_ip]] [guest_mac=[guest_mac]] [script=[script]] Does this look ok? --network/-n mode=tap|user|none,guest_ip=x.x.x.x,host_ip=x.x.x.x,guest_mac=xx:xx:xx:xx:xx:xx,script=script.sh We really need them to specify they are optional. How about this: --network/-n [mode=tap/user/none][,guest_ip=ip][,host_ip=ip][,guest_mac=mac][,script=file] OK. -- Asias He -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt.virt_env_process: Abstract screenshot production
Hi, vm.screendump() doesn't have parameter 'debug'. So you should either add debug parameter to kvm_vm.py or remove this parameter (and perhaps add debug=False into kvm_vm.py). Regards, Lukáš Dne 24.9.2011 01:27, Lucas Meneghel Rodrigues napsal(a): In order to ease work with other virtualization types, make virt_env_process to call vm.screendump() instead of vm.monitor.screendump(). Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com --- client/virt/virt_env_process.py |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/client/virt/virt_env_process.py b/client/virt/virt_env_process.py index 789fa01..a2e 100644 --- a/client/virt/virt_env_process.py +++ b/client/virt/virt_env_process.py @@ -110,7 +110,7 @@ def preprocess_vm(test, params, env, name): scrdump_filename = os.path.join(test.debugdir, pre_%s.ppm % name) try: if vm.monitor and params.get(take_regular_screendumps) == yes: -vm.monitor.screendump(scrdump_filename, debug=False) +vm.screendump(scrdump_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) @@ -151,7 +151,7 @@ def postprocess_vm(test, params, env, name): scrdump_filename = os.path.join(test.debugdir, post_%s.ppm % name) try: if vm.monitor and params.get(take_regular_screenshots) == yes: -vm.monitor.screendump(scrdump_filename, debug=False) +vm.screendump(scrdump_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) @@ -460,7 +460,7 @@ def _take_screendumps(test, params, env): if not vm.is_alive(): continue try: -vm.monitor.screendump(filename=temp_filename, debug=False) +vm.screendump(filename=temp_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) continue -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt.virt_env_process: Abstract screenshot production
On 09/26/2011 09:27 AM, Lukáš Doktor wrote: Hi, vm.screendump() doesn't have parameter 'debug'. My fault, the screendump method on both qmp and human monitors does take this parameter, and since the implementation on virt_env_process was using the monitor method directly, I forgot to add the param to screendump. It's fixed now. debug=True by default, the only place where it is False is during screendump thread (to avoid polluting the logs). https://github.com/autotest/autotest/commit/49b1d9b65ab0061aaf631c19620987bc59592af6 So you should either add debug parameter to kvm_vm.py or remove this parameter (and perhaps add debug=False into kvm_vm.py). Regards, Lukáš Dne 24.9.2011 01:27, Lucas Meneghel Rodrigues napsal(a): In order to ease work with other virtualization types, make virt_env_process to call vm.screendump() instead of vm.monitor.screendump(). Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com --- client/virt/virt_env_process.py | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/client/virt/virt_env_process.py b/client/virt/virt_env_process.py index 789fa01..a2e 100644 --- a/client/virt/virt_env_process.py +++ b/client/virt/virt_env_process.py @@ -110,7 +110,7 @@ def preprocess_vm(test, params, env, name): scrdump_filename = os.path.join(test.debugdir, pre_%s.ppm % name) try: if vm.monitor and params.get(take_regular_screendumps) == yes: - vm.monitor.screendump(scrdump_filename, debug=False) + vm.screendump(scrdump_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) @@ -151,7 +151,7 @@ def postprocess_vm(test, params, env, name): scrdump_filename = os.path.join(test.debugdir, post_%s.ppm % name) try: if vm.monitor and params.get(take_regular_screenshots) == yes: - vm.monitor.screendump(scrdump_filename, debug=False) + vm.screendump(scrdump_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) @@ -460,7 +460,7 @@ def _take_screendumps(test, params, env): if not vm.is_alive(): continue try: - vm.monitor.screendump(filename=temp_filename, debug=False) + vm.screendump(filename=temp_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) continue -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT
On Mon, Sep 26, 2011 at 08:36:41AM -0300, Marcelo Tosatti wrote: If no we should just remove hotplug macros for these two slots. Gleb, Marcelo, any preferences? Currently they are hardcoded as not hotpluggable (as can be noted by the RMV macros), but its nice to retrieve that information from QEMU instead. OK, so you ACK the patch below? If we want to keep the option of making these slots hotpluggable from qemu, it is still easy to fix the duplication. For example, I could put ACPI_EXTRACT tags in VGA/ISA devices without renaming them. But the easiest way is probably by using Scope. See patch below. It seems more straight-forward to just create the devices that are needed. -Kevin FWIW the acpi spec does mention that it's ok to describe unoccupied slots. Multiple device with the same _ADR relies on unspecified behavior of ACPI. Fix DSDT so we always have a single device per slot. Signed-off-by: Michael S. Tsirkin m...@redhat.com - diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 055202b..305d2e5 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -260,8 +260,8 @@ DefinitionBlock ( } Scope(\_SB.PCI0) { -Device (VGA) { - Name (_ADR, 0x0002) + /* VGA device */ +Scope (S2) { OperationRegion(PCIC, PCI_Config, Zero, 0x4) Field(PCIC, DWordAcc, NoLock, Preserve) { VEND, 32 @@ -282,13 +282,10 @@ DefinitionBlock ( Return (0x00) } } - Method(_RMV) { Return (0x00) } } /* PIIX3 ISA bridge */ -Device (ISA) { -Name (_ADR, 0x0001) -Method(_RMV) { Return (0x00) } +Scope (S1) { /* PIIX PCI to ISA irq remapping */ @@ -486,7 +483,7 @@ DefinitionBlock ( /* PCI IRQs */ Scope(\_SB) { - Field (\_SB.PCI0.ISA.P40C, ByteAcc, NoLock, Preserve) + Field (\_SB.PCI0.S1.P40C, ByteAcc, NoLock, Preserve) { PRQ0, 8, PRQ1, 8, -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt.virt_env_process: Abstract screenshot production
Dne 26.9.2011 15:10, Lucas Meneghel Rodrigues napsal(a): On 09/26/2011 09:27 AM, Lukáš Doktor wrote: Hi, vm.screendump() doesn't have parameter 'debug'. My fault, the screendump method on both qmp and human monitors does take this parameter, and since the implementation on virt_env_process was using the monitor method directly, I forgot to add the param to screendump. It's fixed now. debug=True by default, the only place where it is False is during screendump thread (to avoid polluting the logs). https://github.com/autotest/autotest/commit/49b1d9b65ab0061aaf631c19620987bc59592af6 We used the same fix ;-) Acked-by: Lukáš Doktor ldok...@redhat.com So you should either add debug parameter to kvm_vm.py or remove this parameter (and perhaps add debug=False into kvm_vm.py). Regards, Lukáš Dne 24.9.2011 01:27, Lucas Meneghel Rodrigues napsal(a): In order to ease work with other virtualization types, make virt_env_process to call vm.screendump() instead of vm.monitor.screendump(). Signed-off-by: Lucas Meneghel Rodriguesl...@redhat.com --- client/virt/virt_env_process.py | 6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) diff --git a/client/virt/virt_env_process.py b/client/virt/virt_env_process.py index 789fa01..a2e 100644 --- a/client/virt/virt_env_process.py +++ b/client/virt/virt_env_process.py @@ -110,7 +110,7 @@ def preprocess_vm(test, params, env, name): scrdump_filename = os.path.join(test.debugdir, pre_%s.ppm % name) try: if vm.monitor and params.get(take_regular_screendumps) == yes: - vm.monitor.screendump(scrdump_filename, debug=False) + vm.screendump(scrdump_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) @@ -151,7 +151,7 @@ def postprocess_vm(test, params, env, name): scrdump_filename = os.path.join(test.debugdir, post_%s.ppm % name) try: if vm.monitor and params.get(take_regular_screenshots) == yes: - vm.monitor.screendump(scrdump_filename, debug=False) + vm.screendump(scrdump_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) @@ -460,7 +460,7 @@ def _take_screendumps(test, params, env): if not vm.is_alive(): continue try: - vm.monitor.screendump(filename=temp_filename, debug=False) + vm.screendump(filename=temp_filename, debug=False) except kvm_monitor.MonitorError, e: logging.warn(e) continue -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] virt.virt_env_process: Abstract screenshot production
On 09/26/2011 10:20 AM, Lukáš Doktor wrote: Dne 26.9.2011 15:10, Lucas Meneghel Rodrigues napsal(a): On 09/26/2011 09:27 AM, Lukáš Doktor wrote: Hi, vm.screendump() doesn't have parameter 'debug'. My fault, the screendump method on both qmp and human monitors does take this parameter, and since the implementation on virt_env_process was using the monitor method directly, I forgot to add the param to screendump. It's fixed now. debug=True by default, the only place where it is False is during screendump thread (to avoid polluting the logs). https://github.com/autotest/autotest/commit/49b1d9b65ab0061aaf631c19620987bc59592af6 We used the same fix ;-) Yup :) Now, please rebase your topic branch, remove the fix and commit it with push --force. The pull request will be updated automatically, I will test it here and will let you know if there are more comments before I merge it with upstream. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] kvm tools: Support multiple net devices
On Mon, 26 Sep 2011, Asias He wrote: $ ./kvm run -n mode=tap [1.490781] registered taskstats version 1 [1.492781] BUG: unable to handle kernel NULL pointer dereference at 001c [1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408 [1.493781] *pde = [1.493781] Oops: [#1] PREEMPT SMP [1.493781] Modules linked in: [1.493781] [1.493781] Pid: 1, comm: swapper Tainted: GW 3.1.0-rc3+ #77 [1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1 [1.493781] EIP is at virtnet_poll+0x16e/0x408 [1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000 [1.493781] ESI: EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64 [1.493781] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [1.493781] Process swapper (pid: 1, ti=db486000 task=db45 task.ti=db458000) [1.493781] Stack: [1.493781] db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000 0010 [1.493781] 0080 dbcebfe0 db5e1414 db5e1000 fffec005 db5e1414 db906dc0 [1.493781] c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080 012c [1.493781] Call Trace: [1.493781] [c15e4869] net_rx_action+0x8e/0x177 [1.493781] [c1066128] __do_softirq+0xa7/0x158 [1.493781] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.493781] IRQ [1.493781] [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.493781] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.493781] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.493781] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.493781] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.493781] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.493781] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.493781] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.493781] [c1393c86] ? extract_entropy+0x45/0x71 [1.493781] [c1059e35] ? get_parent_ip+0xb/0x31 [1.493781] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.493781] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.493781] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.493781] [c1001159] ? do_one_initcall+0x71/0x114 [1.493781] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.493781] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.493781] [c1a9c700] ? start_kernel+0x301/0x301 [1.493781] [c16acfb6] ? kernel_thread_helper+0x6/0xd [1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc 8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2 ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02 [1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP 0068:db487f64 [1.493781] CR2: 001c [1.549772] ---[ end trace 4eaa2a86a8e2da27 ]--- [1.550772] Kernel panic - not syncing: Fatal exception in interrupt [1.551772] Pid: 1, comm: swapper Tainted: G D W 3.1.0-rc3+ #77 [1.553771] Call Trace: [1.553771] [c169ca33] panic+0x58/0x156 [1.554771] [c16a921a] oops_end+0x8c/0x9b [1.555771] [c169c4e7] no_context+0x116/0x120 [1.555771] [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8 [1.557771] [c169c5f6] bad_area_nosemaphore+0xd/0x10 [1.558771] [c16aa4b5] do_page_fault+0x174/0x2fa [1.559770] [c107bad0] ? sched_clock_local+0x10/0x14b [1.560770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.561770] [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7 [1.563770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.564770] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.565770] [c16a89d6] error_code+0x5a/0x60 [1.566769] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.567769] [c14f3236] ? virtnet_poll+0x16e/0x408 [1.567769] [c15e4869] net_rx_action+0x8e/0x177 [1.568769] [c1066128] __do_softirq+0xa7/0x158 [1.569769] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.569769] IRQ [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.570769] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.571769] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.571769] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.572768] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.573768] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.573768] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.574768] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.574768] [c1393c86] ? extract_entropy+0x45/0x71 [1.575768] [c1059e35] ? get_parent_ip+0xb/0x31 [1.576768] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.576768] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.577768] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.578768] [c1001159] ? do_one_initcall+0x71/0x114 [1.578768] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.579767] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.580767] [c1a9c700] ? start_kernel+0x301/0x301 [1.581767] [c16acfb6] ? kernel_thread_helper+0x6/0xd [1.582767] Rebooting in 1 seconds.. # KVM session ended normally. This needs fixing before I can apply
Re: [PATCH v2] kvm tools: Support multiple net devices
On 09/26/2011 10:54 PM, Pekka Enberg wrote: On Mon, 26 Sep 2011, Asias He wrote: $ ./kvm run -n mode=tap [1.490781] registered taskstats version 1 [1.492781] BUG: unable to handle kernel NULL pointer dereference at 001c [1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408 [1.493781] *pde = [1.493781] Oops: [#1] PREEMPT SMP [1.493781] Modules linked in: [1.493781] [1.493781] Pid: 1, comm: swapper Tainted: GW 3.1.0-rc3+ #77 [1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1 [1.493781] EIP is at virtnet_poll+0x16e/0x408 [1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000 [1.493781] ESI: EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64 [1.493781] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [1.493781] Process swapper (pid: 1, ti=db486000 task=db45 task.ti=db458000) [1.493781] Stack: [1.493781] db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000 0010 [1.493781] 0080 dbcebfe0 db5e1414 db5e1000 fffec005 db5e1414 db906dc0 [1.493781] c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080 012c [1.493781] Call Trace: [1.493781] [c15e4869] net_rx_action+0x8e/0x177 [1.493781] [c1066128] __do_softirq+0xa7/0x158 [1.493781] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.493781] IRQ [1.493781] [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.493781] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.493781] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.493781] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.493781] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.493781] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.493781] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.493781] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.493781] [c1393c86] ? extract_entropy+0x45/0x71 [1.493781] [c1059e35] ? get_parent_ip+0xb/0x31 [1.493781] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.493781] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.493781] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.493781] [c1001159] ? do_one_initcall+0x71/0x114 [1.493781] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.493781] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.493781] [c1a9c700] ? start_kernel+0x301/0x301 [1.493781] [c16acfb6] ? kernel_thread_helper+0x6/0xd [1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc 8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2 ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02 [1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP 0068:db487f64 [1.493781] CR2: 001c [1.549772] ---[ end trace 4eaa2a86a8e2da27 ]--- [1.550772] Kernel panic - not syncing: Fatal exception in interrupt [1.551772] Pid: 1, comm: swapper Tainted: G D W 3.1.0-rc3+ #77 [1.553771] Call Trace: [1.553771] [c169ca33] panic+0x58/0x156 [1.554771] [c16a921a] oops_end+0x8c/0x9b [1.555771] [c169c4e7] no_context+0x116/0x120 [1.555771] [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8 [1.557771] [c169c5f6] bad_area_nosemaphore+0xd/0x10 [1.558771] [c16aa4b5] do_page_fault+0x174/0x2fa [1.559770] [c107bad0] ? sched_clock_local+0x10/0x14b [1.560770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.561770] [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7 [1.563770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.564770] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.565770] [c16a89d6] error_code+0x5a/0x60 [1.566769] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.567769] [c14f3236] ? virtnet_poll+0x16e/0x408 [1.567769] [c15e4869] net_rx_action+0x8e/0x177 [1.568769] [c1066128] __do_softirq+0xa7/0x158 [1.569769] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.569769] IRQ [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.570769] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.571769] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.571769] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.572768] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.573768] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.573768] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.574768] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.574768] [c1393c86] ? extract_entropy+0x45/0x71 [1.575768] [c1059e35] ? get_parent_ip+0xb/0x31 [1.576768] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.576768] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.577768] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.578768] [c1001159] ? do_one_initcall+0x71/0x114 [1.578768] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.579767] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.580767] [c1a9c700] ? start_kernel+0x301/0x301 [1.581767] [c16acfb6] ?
Re: [PATCH v2] kvm tools: Support multiple net devices
On 09/26/2011 05:54 PM, Pekka Enberg wrote: On Mon, 26 Sep 2011, Asias He wrote: $ ./kvm run -n mode=tap [1.490781] registered taskstats version 1 [1.492781] BUG: unable to handle kernel NULL pointer dereference at 001c [1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408 [1.493781] *pde = [1.493781] Oops: [#1] PREEMPT SMP [1.493781] Modules linked in: [1.493781] [1.493781] Pid: 1, comm: swapper Tainted: GW 3.1.0-rc3+ #77 [1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1 [1.493781] EIP is at virtnet_poll+0x16e/0x408 [1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000 [1.493781] ESI: EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64 [1.493781] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [1.493781] Process swapper (pid: 1, ti=db486000 task=db45 task.ti=db458000) [1.493781] Stack: [1.493781] db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000 0010 [1.493781] 0080 dbcebfe0 db5e1414 db5e1000 fffec005 db5e1414 db906dc0 [1.493781] c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080 012c [1.493781] Call Trace: [1.493781] [c15e4869] net_rx_action+0x8e/0x177 [1.493781] [c1066128] __do_softirq+0xa7/0x158 [1.493781] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.493781] IRQ [1.493781] [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.493781] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.493781] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.493781] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.493781] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.493781] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.493781] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.493781] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.493781] [c1393c86] ? extract_entropy+0x45/0x71 [1.493781] [c1059e35] ? get_parent_ip+0xb/0x31 [1.493781] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.493781] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.493781] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.493781] [c1001159] ? do_one_initcall+0x71/0x114 [1.493781] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.493781] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.493781] [c1a9c700] ? start_kernel+0x301/0x301 [1.493781] [c16acfb6] ? kernel_thread_helper+0x6/0xd [1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc 8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2 ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02 [1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP 0068:db487f64 [1.493781] CR2: 001c [1.549772] ---[ end trace 4eaa2a86a8e2da27 ]--- [1.550772] Kernel panic - not syncing: Fatal exception in interrupt [1.551772] Pid: 1, comm: swapper Tainted: G D W 3.1.0-rc3+ #77 [1.553771] Call Trace: [1.553771] [c169ca33] panic+0x58/0x156 [1.554771] [c16a921a] oops_end+0x8c/0x9b [1.555771] [c169c4e7] no_context+0x116/0x120 [1.555771] [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8 [1.557771] [c169c5f6] bad_area_nosemaphore+0xd/0x10 [1.558771] [c16aa4b5] do_page_fault+0x174/0x2fa [1.559770] [c107bad0] ? sched_clock_local+0x10/0x14b [1.560770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.561770] [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7 [1.563770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.564770] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.565770] [c16a89d6] error_code+0x5a/0x60 [1.566769] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.567769] [c14f3236] ? virtnet_poll+0x16e/0x408 [1.567769] [c15e4869] net_rx_action+0x8e/0x177 [1.568769] [c1066128] __do_softirq+0xa7/0x158 [1.569769] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.569769] IRQ [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.570769] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.571769] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.571769] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.572768] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.573768] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.573768] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.574768] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.574768] [c1393c86] ? extract_entropy+0x45/0x71 [1.575768] [c1059e35] ? get_parent_ip+0xb/0x31 [1.576768] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.576768] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.577768] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.578768] [c1001159] ? do_one_initcall+0x71/0x114 [1.578768] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.579767] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.580767] [c1a9c700] ? start_kernel+0x301/0x301 [1.581767] [c16acfb6] ? kernel_thread_helper+0x6/0xd [1.582767] Rebooting in 1 seconds.. # KVM session
Re: [PATCH v2] kvm tools: Support multiple net devices
On Mon, 2011-09-26 at 19:21 +0300, Avi Kivity wrote: On 09/26/2011 05:54 PM, Pekka Enberg wrote: On Mon, 26 Sep 2011, Asias He wrote: $ ./kvm run -n mode=tap [1.490781] registered taskstats version 1 [1.492781] BUG: unable to handle kernel NULL pointer dereference at 001c [1.493781] IP: [c14f3236] virtnet_poll+0x16e/0x408 [1.493781] *pde = [1.493781] Oops: [#1] PREEMPT SMP [1.493781] Modules linked in: [1.493781] [1.493781] Pid: 1, comm: swapper Tainted: GW 3.1.0-rc3+ #77 [1.493781] EIP: 0060:[c14f3236] EFLAGS: 00010286 CPU: 1 [1.493781] EIP is at virtnet_poll+0x16e/0x408 [1.493781] EAX: 1000 EBX: db4bb0c0 ECX: db7cd778 EDX: 1000 [1.493781] ESI: EDI: db7cd6c0 EBP: db487fa8 ESP: db487f64 [1.493781] DS: 007b ES: 007b FS: 00d8 GS: SS: 0068 [1.493781] Process swapper (pid: 1, ti=db486000 task=db45 task.ti=db458000) [1.493781] Stack: [1.493781] db487f98 19dfb000 db5e1400 0080 c1b0df60 db6ff000 0010 [1.493781] 0080 dbcebfe0 db5e1414 db5e1000 fffec005 db5e1414 db906dc0 [1.493781] c1a39a0c db487fd4 c15e4869 fffb71f7 db906dc8 0080 012c [1.493781] Call Trace: [1.493781] [c15e4869] net_rx_action+0x8e/0x177 [1.493781] [c1066128] __do_softirq+0xa7/0x158 [1.493781] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.493781] IRQ [1.493781] [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.493781] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.493781] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.493781] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.493781] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.493781] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.493781] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.493781] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.493781] [c1393c86] ? extract_entropy+0x45/0x71 [1.493781] [c1059e35] ? get_parent_ip+0xb/0x31 [1.493781] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.493781] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.493781] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.493781] [c1001159] ? do_one_initcall+0x71/0x114 [1.493781] [c1acfd0f] ? root_nfs_parse_addr+0x91/0x91 [1.493781] [c1a9c7ab] ? kernel_init+0xab/0x11d [1.493781] [c1a9c700] ? start_kernel+0x301/0x301 [1.493781] [c16acfb6] ? kernel_thread_helper+0x6/0xd [1.493781] Code: 89 d8 e8 23 94 0e 00 8b 4d dc 89 c7 f3 a4 8b 55 dc 8b 4d d8 29 55 f0 8b 75 e0 01 d1 eb 13 8d 45 f0 89 f2 50 89 d8 e8 ae f2 ff ff 8b 76 1c 31 c9 58 83 7d f0 00 75 e7 85 f6 89 75 e0 0f 84 6e 02 [1.493781] EIP: [c14f3236] virtnet_poll+0x16e/0x408 SS:ESP 0068:db487f64 [1.493781] CR2: 001c [1.549772] ---[ end trace 4eaa2a86a8e2da27 ]--- [1.550772] Kernel panic - not syncing: Fatal exception in interrupt [1.551772] Pid: 1, comm: swapper Tainted: G D W 3.1.0-rc3+ #77 [1.553771] Call Trace: [1.553771] [c169ca33] panic+0x58/0x156 [1.554771] [c16a921a] oops_end+0x8c/0x9b [1.555771] [c169c4e7] no_context+0x116/0x120 [1.555771] [c169c5e1] __bad_area_nosemaphore+0xf0/0xf8 [1.557771] [c169c5f6] bad_area_nosemaphore+0xd/0x10 [1.558771] [c16aa4b5] do_page_fault+0x174/0x2fa [1.559770] [c107bad0] ? sched_clock_local+0x10/0x14b [1.560770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.561770] [c10e9b84] ? __kmalloc_track_caller+0xb7/0xc7 [1.563770] [c15db33f] ? __netdev_alloc_skb+0x17/0x34 [1.564770] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.565770] [c16a89d6] error_code+0x5a/0x60 [1.566769] [c16aa341] ? spurious_fault+0xa8/0xa8 [1.567769] [c14f3236] ? virtnet_poll+0x16e/0x408 [1.567769] [c15e4869] net_rx_action+0x8e/0x177 [1.568769] [c1066128] __do_softirq+0xa7/0x158 [1.569769] [c1066081] ? __tasklet_hi_schedule_first+0x2b/0x2b [1.569769] IRQ [c1065e82] ? _local_bh_enable_ip.isra.9+0x65/0x86 [1.570769] [c1065eab] ? local_bh_enable_ip+0x8/0xa [1.571769] [c16a7a78] ? _raw_spin_unlock_bh+0x18/0x1a [1.571769] [c15e59c5] ? dev_set_rx_mode+0x22/0x26 [1.572768] [c15e5a5f] ? __dev_open+0x96/0xa6 [1.573768] [c15e5c23] ? __dev_change_flags+0x97/0x10e [1.573768] [c15e5cfe] ? dev_change_flags+0x13/0x3f [1.574768] [c1acfe6f] ? ip_auto_config+0x160/0xcf8 [1.574768] [c1393c86] ? extract_entropy+0x45/0x71 [1.575768] [c1059e35] ? get_parent_ip+0xb/0x31 [1.576768] [c16aa6b7] ? sub_preempt_count+0x7c/0x89 [1.576768] [c16a7d24] ? _raw_spin_unlock+0x1c/0x27 [1.577768] [c1629173] ? tcp_set_default_congestion_control+0x8c/0x95 [1.578768] [c1001159] ? do_one_initcall+0x71/0x114 [1.578768] [c1acfd0f] ?
Re: [PATCH v2] kvm tools: Support multiple net devices
On Mon, Sep 26, 2011 at 7:37 PM, Sasha Levin levinsasha...@gmail.com wrote: This needs fixing before I can apply the patch, right? Looks like a guest kernel bug, no? It's a kernel bug and should be fixed there, but it's caused by us not passing sane values to virtio-net, which we can fix on our side as well. So my plan is to prevent triggering it from within kvm tools while working on a kernel patch. Yup, that's what I meant as well. Even if it is a kernel issue, we should do enough validation on our side not to trigger it so easily. Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream
On 09/20/2011 07:49 PM, Jan Kiszka wrote: Upstream's version is about to be signal-free and will stop handling SIGUSR2 specially. So it's time to adopt its implementation, ie. switch from signalfd to a pipe. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
NFS share
Hi! I just installed KVM (yum) on a fresh CentOS 6 (64 bits). Everything works great with virt-manager. The only thing I'm having problem is accessing the HD from an NFS share. virt-manager mount fine the share. Locally I can write on the share. But when I'm trying to create a HD on the pool, it just don't do anything. Returns to the pool and don't create the file nor gives me an error. What should I do? Regards, Rodrigo. M. Rodrigo Monteiro Free as in Freedom, not free as in free beer As we are liberated from our own fear, our presence automatically liberates others Linux User # 403730 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream
On 2011-09-21 10:06, Stefan Hajnoczi wrote: On Tue, Sep 20, 2011 at 06:49:49PM +0200, Jan Kiszka wrote: Upstream's version is about to be signal-free and will stop handling SIGUSR2 specially. So it's time to adopt its implementation, ie. switch from signalfd to a pipe. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- This should help pulling upstream into qemu-kvm when block: avoid SIGUSR2 is merged. And will help merging further cleanups of this code I'm working on. posix-aio-compat.c | 73 ++- 1 files changed, 32 insertions(+), 41 deletions(-) Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com Perhaps qemu_eventfd() can be used in the future instead of an explicit pipe. Then Linux will do eventfd while other OSes will fall back to pipes. Basically simpler code, or does this also have runtime benefits? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream
On 09/26/2011 08:23 PM, Jan Kiszka wrote: Perhaps qemu_eventfd() can be used in the future instead of an explicit pipe. Then Linux will do eventfd while other OSes will fall back to pipes. Basically simpler code, or does this also have runtime benefits? In corner cases, the completion can block on the write() with pipes, but not eventfd. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] nVMX injection corrections
On 09/22/2011 01:52 PM, Nadav Har'El wrote: The following two patches solve two injection-related nested VMX issues: 1. When we must run L2 next (namely on L1's VMLAUNCH/VMRESUME), injection into L1 was delayed for an unknown amount of time - until L2 exits. We now force (using a self IPI) an exit immediately after entry to L2, so that the injection into L1 happens promptly. 2. unexpected, valid vectoring info warnings appeared in L1. These are fixed by correcting the emulation of concurrent L0-L1 and L1-L2 injections: We cannot inject into L1 until the injection into L2 has been processed. Applied, thanks. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] virtio-net: Verify page list size before fitting into skb
This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/net/virtio_net.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..64e0717 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int copy, hdr_len, offset; char *p; + if (len MAX_SKB_FRAGS * PAGE_SIZE) + return NULL; + p = page_address(page); /* copy small packet so we can reuse these pages for small data */ -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] virtio-net: Prevent NULL dereference
This patch prevents a NULL dereference when the user has passed a length longer than an actual buffer to virtio-net. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com --- drivers/net/virtio_net.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 64e0717..8d32c1e 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -198,7 +198,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, len -= copy; offset += copy; - while (len) { + while (len page) { set_skb_frag(skb, page, offset, len); page = (struct page *)page-private; offset = 0; -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3] kvm tools: Support multiple net devices
This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=tap/user/none][,guest_ip=ip][,host_ip= ip][,guest_mac=mac][,script=file] Each of the parameters is optional, and the config defaults to a TAP based networking with a sequential MAC. Signed-off-by: Sasha Levin levinsasha...@gmail.com --- tools/kvm/builtin-run.c | 146 tools/kvm/virtio/net.c | 157 ++- 2 files changed, 193 insertions(+), 110 deletions(-) diff --git a/tools/kvm/builtin-run.c b/tools/kvm/builtin-run.c index 28dc95a..070b1b6 100644 --- a/tools/kvm/builtin-run.c +++ b/tools/kvm/builtin-run.c @@ -87,9 +87,12 @@ static bool sdl; static bool balloon; static bool using_rootfs; static bool custom_rootfs; +static bool no_net; extern bool ioport_debug; extern int active_console; extern int debug_iodelay; +struct virtio_net_parameters *net_params; +int num_net_devices; bool do_debug_print = false; @@ -182,6 +185,89 @@ static int tty_parser(const struct option *opt, const char *arg, int unset) return 0; } +static inline void str_to_mac(const char *str, char *mac) +{ + sscanf(str, %hhx:%hhx:%hhx:%hhx:%hhx:%hhx, + mac, mac+1, mac+2, mac+3, mac+4, mac+5); +} +static int set_net_param(struct virtio_net_parameters *p, const char *param, + const char *val) +{ + if (strcmp(param, guest_mac) == 0) { + str_to_mac(val, p-guest_mac); + } else if (strcmp(param, mode) == 0) { + if (!strncmp(val, user, 4)) { + int i; + + for (i = 0; i num_net_devices; i++) + if (net_params[i].mode == NET_MODE_USER) + die(Only one usermode network device allowed at a time); + p-mode = NET_MODE_USER; + } else if (!strncmp(val, tap, 3)) { + p-mode = NET_MODE_TAP; + } else if (!strncmp(val, none, 4)) { + no_net = 1; + return -1; + } else + die(Unkown network mode %s, please use user, tap or none, network); + } else if (strcmp(param, script) == 0) { + p-script = val; + } else if (strcmp(param, guest_ip) == 0) { + p-guest_ip = val; + } else if (strcmp(param, host_ip) == 0) { + p-host_ip = val; + } + + return 0; +} + +static int netdev_parser(const struct option *opt, const char *arg, int unset) +{ + struct virtio_net_parameters p; + char *buf, *cmd = NULL, *cur = NULL; + bool on_cmd = true; + + if (arg) { + buf = strdup(arg); + if (buf == NULL) + die(Failed allocating new net buffer); + cur = strtok(buf, ,=); + } + + p = (struct virtio_net_parameters) { + .guest_ip = DEFAULT_GUEST_ADDR, + .host_ip= DEFAULT_HOST_ADDR, + .script = DEFAULT_SCRIPT, + .mode = NET_MODE_TAP, + }; + + str_to_mac(DEFAULT_GUEST_MAC, p.guest_mac); + p.guest_mac[5] += num_net_devices; + + while (cur) { + if (on_cmd) { + cmd = cur; + } else { + if (set_net_param(p, cmd, cur) 0) + goto done; + } + on_cmd = !on_cmd; + + cur = strtok(NULL, ,=); + }; + + num_net_devices++; + + net_params = realloc(net_params, num_net_devices * sizeof(*net_params)); + if (net_params == NULL) + die(Failed adding new network device); + + net_params[num_net_devices - 1] = p; + +done: + return 0; +} + static int shmem_parser(const struct option *opt, const char *arg, int unset) { const u64 default_size = SHMEM_DEFAULT_SIZE; @@ -339,18 +425,9 @@ static const struct option options[] = { Kernel command line arguments), OPT_GROUP(Networking options:), - OPT_STRING('n', network, network, user, tap, none, - Network to use), - OPT_STRING('\0', host-ip, host_ip, a.b.c.d, - Assign this address to the host side networking), - OPT_STRING('\0', guest-ip, guest_ip, a.b.c.d, - Assign this address to the guest side networking), - OPT_STRING('\0', host-mac, host_mac, aa:bb:cc:dd:ee:ff, - Assign this address to the host side NIC), - OPT_STRING('\0', guest-mac, guest_mac, aa:bb:cc:dd:ee:ff, - Assign this address to the guest side NIC), - OPT_STRING('\0', tapscript, script, Script path, -Assign a script to process created tap device),
Re: [PATCH] qemu-kvm: Switch POSIX compat AIO implementation to upstream
On 09/26/2011 12:24 PM, Avi Kivity wrote: On 09/26/2011 08:23 PM, Jan Kiszka wrote: Perhaps qemu_eventfd() can be used in the future instead of an explicit pipe. Then Linux will do eventfd while other OSes will fall back to pipes. Basically simpler code, or does this also have runtime benefits? In corner cases, the completion can block on the write() with pipes, but not eventfd. The pipe is O_NONBLOCK and the only thing the caller cares about is whether there is *some* data in the PIPE so it can happily ignore EAGAIN. So the pipe implementation never blocks on write() either. It may require multiple reads to drain the queue though whereas eventfd would only require a single read to drain the queue. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On Mon, 2011-09-26 at 12:04 +0200, Alexander Graf wrote: Am 26.09.2011 um 09:51 schrieb David Gibson da...@gibson.dropbear.id.au: On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote: Based on the discussions over the last couple of weeks I have updated the device fd file layout proposal and tried to specify it a bit more formally. === 1. Overview This specification describes the layout of device files used in the context of vfio, which gives user space direct access to I/O devices that have been bound to vfio. When a device fd is opened and read, offset 0x0 contains a fixed sized header followed by a number of variable length records that describe different characteristics of the device-- addressable regions, interrupts, etc. 0x0 +-+-+ | magic | u32 // identifies this as a vfio device file +---+ and identifies the type of bus | version | u32 // specifies the version of this +---+ | flags | u32 // encodes any flags +---+ | dev info record 0| |type | u32 // type of record |rec_len| u32 // length in bytes of record | | (including record header) |flags | u32 // type specific flags |...content... | // record content, which could +---+ // include sub-records | dev info record 1| +---+ | dev info record N| +---+ I really should have chimed in on this earlier, but I've been very busy. Um, not to put too fine a point on it, this is madness. Yes, it's very flexible and can thereby cover a very wide range of cases. But it's much, much too complex. Userspace has to parse a complex, multilayered data structure, with variable length elements just to get an address at which to do IO. I can pretty much guarantee that if we went with this, most userspace programs using this interface would just ignore this metadata and directly map the offsets at which they happen to know the kernel will put things for the type of device they care about. _At least_ for PCI, I think the original VFIO layout of each BAR at a fixed, well known offset is much better. Despite its limitations, just advertising a device type ID which describes one of a few fixed layouts would be preferable to this. I'm still hoping, that we can do a bit better than that. But we should try really hard to at the very least force the metadata into a simple array of resources each with a fixed size record describing it, even if it means some space wastage with occasionally-used fields. Anything more complex than that and userspace is just never going to use it properly. We will have 2 different types of user space. One wants to be as generic as possible and needs all this dynamic information. QEMU would fall into this category. The other one is device specific drivers in user space. Here hardcoding might make sense. For the generic interface, we need something that us as verbose as possible and lets us enumerate all the device properties, so we can properly map and forward them to the guest. However, nothing keeps us from mapping BARs always at static offsets into the file. If you don't need the generic info, then you don't need it. Also, if you can come up with an interface that does not have variable length descriptors but is still able to export all the required generic information, please send a proposal to the list :) Hi, The other obvious possibility is a pure ioctl interface. To match what this proposal is trying to describe, plus the runtime interfaces, we'd need something like: /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) /* Return number of mmio/iop/config regions. * For PCI this is always 8 (BAR0-5 + ROM + Config) */ #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) /* Return length for region index (may be zero) */ #define VFIO_DEVICE_GET_REGION_LEN _IOWR(, , u64) /* Return flags for region index * :0 - mmap'able, :1 - read-only, 63:2 - reserved */ #define VFIO_DEVICE_GET_REGION_FLAGS_IOR(, , u64) /* Return file offset for region index */ #define VFIO_DEVICE_GET_REGION_OFFSET _IOWR(, , u64) /* Return physical address for region index - not implemented for PCI */ #define VFIO_DEVICE_GET_REGION_PHYS_ADDR_IOWR(, , u64) /* Return number of IRQs (Not including MSI/MSI-X for PCI)
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, Sep 26, 2011 at 08:41:08PM +0300, Sasha Levin wrote: This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com Interesting. This is a theoretical issue, correct? Not a crash you actually see. This crash would mean device is giving us packets that are way too large. Avoiding crashes even in the face of a misbehaved device is a good idea, but should we print a diagnostic to a system log? Maybe rate-limited or print once to avoid filling up the disk. Other places in driver print with pr_debug I'm not sure that's right but better than nothing. --- drivers/net/virtio_net.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..64e0717 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int copy, hdr_len, offset; char *p; + if (len MAX_SKB_FRAGS * PAGE_SIZE) unlikely()? Also, this seems too aggressive: at this point len includes the header and the linear part. The right place for this test is probably where we fill in the frags, just before while (len) The whole can only happen when mergeable buffers are disabled, right? + return NULL; + p = page_address(page); /* copy small packet so we can reuse these pages for small data */ -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] virtio-net: Prevent NULL dereference
On Mon, Sep 26, 2011 at 08:41:09PM +0300, Sasha Levin wrote: This patch prevents a NULL dereference when the user has passed a length longer than an actual buffer to virtio-net. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com Hmm, another protection against a buggy device, right? No problem with that, but let's discard the packet and print a disgnostic, so that the user can discover what happened. --- drivers/net/virtio_net.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 64e0717..8d32c1e 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -198,7 +198,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, len -= copy; offset += copy; - while (len) { + while (len page) { set_skb_frag(skb, page, offset, len); page = (struct page *)page-private; offset = 0; -- 1.7.6.1 -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, 2011-09-26 at 21:44 +0300, Michael S. Tsirkin wrote: On Mon, Sep 26, 2011 at 08:41:08PM +0300, Sasha Levin wrote: This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com Interesting. This is a theoretical issue, correct? Not a crash you actually see. Actually it was an actual crash caused when our virtio-net driver in kvm tools did funny things and passed '(u32)-1' length as a buffer length to the guest kernel. This crash would mean device is giving us packets that are way too large. Avoiding crashes even in the face of a misbehaved device is a good idea, but should we print a diagnostic to a system log? Maybe rate-limited or print once to avoid filling up the disk. Other places in driver print with pr_debug I'm not sure that's right but better than nothing. Yup, I'll add some debug info. --- drivers/net/virtio_net.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..64e0717 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int copy, hdr_len, offset; char *p; + if (len MAX_SKB_FRAGS * PAGE_SIZE) unlikely()? Also, this seems too aggressive: at this point len includes the header and the linear part. The right place for this test is probably where we fill in the frags, just before while (len) The whole can only happen when mergeable buffers are disabled, right? From what I understand it can happen whenever you're going to build a skb longer than PAGE_SIZE. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, Sep 26, 2011 at 10:37 PM, Sasha Levin levinsasha...@gmail.com wrote: Interesting. This is a theoretical issue, correct? Not a crash you actually see. Actually it was an actual crash caused when our virtio-net driver in kvm tools did funny things and passed '(u32)-1' length as a buffer length to the guest kernel. I'm not sure what Michael means with theoretical issue here. Can the guest driver assume that the hypervisor doesn't attempt to do nasty things? Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, Sep 26, 2011 at 10:37:22PM +0300, Sasha Levin wrote: On Mon, 2011-09-26 at 21:44 +0300, Michael S. Tsirkin wrote: On Mon, Sep 26, 2011 at 08:41:08PM +0300, Sasha Levin wrote: This patch verifies that the length of a buffer stored in a linked list of pages is small enough to fit into a skb. If the size is larger than a max size of a skb, it means that we shouldn't go ahead building skbs anyway since we won't be able to send the buffer as the user requested. Cc: Rusty Russell ru...@rustcorp.com.au Cc: Michael S. Tsirkin m...@redhat.com Cc: virtualizat...@lists.linux-foundation.org Cc: net...@vger.kernel.org Cc: kvm@vger.kernel.org Signed-off-by: Sasha Levin levinsasha...@gmail.com Interesting. This is a theoretical issue, correct? Not a crash you actually see. Actually it was an actual crash caused when our virtio-net driver in kvm tools did funny things and passed '(u32)-1' length as a buffer length to the guest kernel. This crash would mean device is giving us packets that are way too large. Avoiding crashes even in the face of a misbehaved device is a good idea, but should we print a diagnostic to a system log? Maybe rate-limited or print once to avoid filling up the disk. Other places in driver print with pr_debug I'm not sure that's right but better than nothing. Yup, I'll add some debug info. --- drivers/net/virtio_net.c |3 +++ 1 files changed, 3 insertions(+), 0 deletions(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0c7321c..64e0717 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -165,6 +165,9 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, unsigned int copy, hdr_len, offset; char *p; + if (len MAX_SKB_FRAGS * PAGE_SIZE) unlikely()? Also, this seems too aggressive: at this point len includes the header and the linear part. The right place for this test is probably where we fill in the frags, just before while (len) The whole can only happen when mergeable buffers are disabled, right? From what I understand it can happen whenever you're going to build a skb longer than PAGE_SIZE. Hmm how exactly? With mergeable buffers this only gets the length of the 1st chunk which is up to 4K unless the driver is buggy ... -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, Sep 26, 2011 at 10:45:35PM +0300, Pekka Enberg wrote: On Mon, Sep 26, 2011 at 10:37 PM, Sasha Levin levinsasha...@gmail.com wrote: Interesting. This is a theoretical issue, correct? Not a crash you actually see. Actually it was an actual crash caused when our virtio-net driver in kvm tools did funny things and passed '(u32)-1' length as a buffer length to the guest kernel. I'm not sure what Michael means with theoretical issue here. Can the guest driver assume that the hypervisor doesn't attempt to do nasty things? Pekka IMO yes, hypervisor has full access to guest memory so it's a safe assumption. But surviving in the face of hypervisor bugs is laudable goal, bugs do happen. -- MST -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On Mon, Sep 26, 2011 at 2:51 AM, David Gibson da...@gibson.dropbear.id.au wrote: On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote: Based on the discussions over the last couple of weeks I have updated the device fd file layout proposal and tried to specify it a bit more formally. === 1. Overview This specification describes the layout of device files used in the context of vfio, which gives user space direct access to I/O devices that have been bound to vfio. When a device fd is opened and read, offset 0x0 contains a fixed sized header followed by a number of variable length records that describe different characteristics of the device-- addressable regions, interrupts, etc. 0x0 +-+-+ | magic | u32 // identifies this as a vfio device file +---+ and identifies the type of bus | version | u32 // specifies the version of this +---+ | flags | u32 // encodes any flags +---+ | dev info record 0 | | type | u32 // type of record | rec_len | u32 // length in bytes of record | | (including record header) | flags | u32 // type specific flags | ...content... | // record content, which could +---+ // include sub-records | dev info record 1 | +---+ | dev info record N | +---+ I really should have chimed in on this earlier, but I've been very busy. Um, not to put too fine a point on it, this is madness. Yes, it's very flexible and can thereby cover a very wide range of cases. But it's much, much too complex. Userspace has to parse a complex, multilayered data structure, with variable length elements just to get an address at which to do IO. I can pretty much guarantee that if we went with this, most userspace programs using this interface would just ignore this metadata and directly map the offsets at which they happen to know the kernel will put things for the type of device they care about. _At least_ for PCI, I think the original VFIO layout of each BAR at a fixed, well known offset is much better. Despite its limitations, just advertising a device type ID which describes one of a few fixed layouts would be preferable to this. I'm still hoping, that we can do a bit better than that. But we should try really hard to at the very least force the metadata into a simple array of resources each with a fixed size record describing it, even if it means some space wastage with occasionally-used fields. Anything more complex than that and userspace is just never going to use it properly. So, is your issue really the variable length nature of what was proposed? I don't think it would be that hard to make the different resources fixed length. I think we have 2 types of resources now-- address regions and interrupts. The only thing that get's a bit tricky is device tree paths, which are obviously variable length. We could put a description of all the resources in an array with each element being something like 4KB?? Stuart -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, 2011-09-26 at 22:45 +0300, Pekka Enberg wrote: On Mon, Sep 26, 2011 at 10:37 PM, Sasha Levin levinsasha...@gmail.com wrote: Interesting. This is a theoretical issue, correct? Not a crash you actually see. Actually it was an actual crash caused when our virtio-net driver in kvm tools did funny things and passed '(u32)-1' length as a buffer length to the guest kernel. I'm not sure what Michael means with theoretical issue here. Can the guest driver assume that the hypervisor doesn't attempt to do nasty things? afaik if the hypervisor can access the vcpus and the memory of the guest, this shouldn't be a security issue - more of a bug prevention issue. I guess it'll be interesting the other way around, when it's the guest that passes this buggy information to the hypervisor. -- Sasha. -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
The other obvious possibility is a pure ioctl interface. To match what this proposal is trying to describe, plus the runtime interfaces, we'd need something like: /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) /* Return number of mmio/iop/config regions. * For PCI this is always 8 (BAR0-5 + ROM + Config) */ #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) /* Return length for region index (may be zero) */ #define VFIO_DEVICE_GET_REGION_LEN _IOWR(, , u64) /* Return flags for region index * :0 - mmap'able, :1 - read-only, 63:2 - reserved */ #define VFIO_DEVICE_GET_REGION_FLAGS _IOR(, , u64) /* Return file offset for region index */ #define VFIO_DEVICE_GET_REGION_OFFSET _IOWR(, , u64) /* Return physical address for region index - not implemented for PCI */ #define VFIO_DEVICE_GET_REGION_PHYS_ADDR _IOWR(, , u64) /* Return number of IRQs (Not including MSI/MSI-X for PCI) */ #define VFIO_DEVICE_GET_NUM_IRQ _IOR(, , int) /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */ #define VFIO_DEVICE_SET_IRQ_EVENTFD _IOW(, , int) /* Unmask IRQ index */ #define VFIO_DEVICE_UNMASK_IRQ _IOW(, , int) /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */ #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD _IOW(, , int) /* Return the device tree path for type/index into the user * allocated buffer */ struct dtpath { u32 type; (0 = region, 1 = IRQ) u32 index; u32 buf_len; char *buf; }; #define VFIO_DEVICE_GET_DTPATH _IOWR(, , struct dtpath) /* Return the device tree index for type/index */ struct dtindex { u32 type; (0 = region, 1 = IRQ) u32 index; u32 prop_type; u32 prop_index; }; #define VFIO_DEVICE_GET_DTINDEX _IOWR(, , struct dtindex) /* Reset the device */ #define VFIO_DEVICE_RESET _IO(, ,) /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int) #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) Hope that covers it. Something I prefer about this interface is that everything can easily be generated on the fly, whereas reading out a table from the device means we really need to have that table somewhere in kernel memory to easily support reading random offsets. Thoughts? I think this could work, but I'm not sure it makes the problem David had any better-- you substitute the complexity of parsing the variable length regions with invoking a set of APIs. Stuart -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, Sep 26, 2011 at 10:45:35PM +0300, Pekka Enberg wrote: I'm not sure what Michael means with theoretical issue here. Can the guest driver assume that the hypervisor doesn't attempt to do nasty things? On Mon, Sep 26, 2011 at 10:57 PM, Michael S. Tsirkin m...@redhat.com wrote: IMO yes, hypervisor has full access to guest memory so it's a safe assumption. But surviving in the face of hypervisor bugs is laudable goal, bugs do happen. I was thinking of a compromised guest that is able to trick the hypervisor into doing nasty things to other guests without taking over the hypervisor completely. So for something like virtio networking, that's by definition exposed to rest of the world, I think it's very important not to be robust against hypervisor bugs. In any case, we were able to trigger this particular case rather easily with our buggy tool, so it's definitely worth fixing. ;-) FWIW, Acked-by: Pekka Enberg penb...@kernel.org Pekka -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 09/24/2011 02:47 AM, Alexander Graf wrote: On 22.09.2011, at 08:50, Liu Yu-B13201 wrote: -Original Message- From: kvm-ppc-ow...@vger.kernel.org [mailto:kvm-ppc-ow...@vger.kernel.org] On Behalf Of Alexander Graf Sent: Tuesday, September 20, 2011 7:36 AM To: kvm-...@vger.kernel.org Cc: kvm@vger.kernel.org Subject: [PATCH] KVM: PPC: E500: Support hugetlbfs With hugetlbfs support emerging on e500, we should also support KVM backing its guest memory by it. This patch adds support for hugetlbfs into the e500 shadow mmu code. Signed-off-by: Alexander Graf ag...@suse.de --- arch/powerpc/kvm/e500_tlb.c | 22 ++ 1 files changed, 22 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index ec17148..64f75eb 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -24,6 +24,7 @@ #include linux/sched.h #include linux/rwsem.h #include linux/vmalloc.h +#include linux/hugetlb.h #include asm/kvm_ppc.h #include asm/kvm_e500.h @@ -673,13 +674,34 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pfn = ~(tsize_pages - 1); break; } + } else if (vma hva = vma-vm_start + (vma-vm_flags VM_HUGETLB)) { Why check (vma hva = vma-vm_start) twice? What would you do? :) I think he's just complaining about doing the check twice, in which case the answer could be it avoids extra indentation and the compiler should be able to factor out the common subexpression. In fact, I only copied the vm_start condition from the pfn code. Scott, why do we have to check this in the first place? We're calling find_vma. Can that return a vma that does not cover the hva we're passing in? Yes, it can. From find_vma(): /* Look up the first VMA which satisfies addr vm_end, NULL if none. */ You'll find similar checks in a lot of other places where find_vma() is used. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 09/24/2011 02:44 AM, Alexander Graf wrote: On 20.09.2011, at 19:54, Scott Wood wrote: On 09/19/2011 06:35 PM, Alexander Graf wrote: + asm (PPC_CNTLZL %0,%1 : =r (lz) : r (psize)); + tsize = min(21 - lz, tsize); No need to open-code the cntlz and subtract-from-21: tsize = min(ilog2(psize) - 10, tsize); /* * e500 doesn't implement the lowest tsize bit, * or 1K pages. */ tsize = max(BOOK3E_PAGESZ_4K, tsize ~1); There's still an open-coded subtraction of 10, but that relates more straightforwardly to the definition of tsize (and could be factored out into a size-to-tsize function). Yeah, no need to micro-optimized those few bits. The reason I used the asm statement was that I copied the hugetlbfs code which does it that way :). The 21 - lz thing is broken on 64-bit, FWIW. It works by chance in the hugetlbfs code (and in settlbcam) because it results in tsize being too low by 32, which only affects bits that subsequently get masked off. } up_read(current-mm-mmap_sem); } if (likely(!pfnmap)) { + unsigned long tsize_pages = 1 (tsize - 2); 1 (tsize + 10 - PAGE_SHIFT); Are we getting variable page sizes anytime soon? Will change it nevertheless, just curious :). Nothing imminent on our chips AFAIK, but there's some 64K page support in Linux for IBM's book3e chips, and it's not nice to hardcode (especially in a hidden way) regardless. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On Mon, 2011-09-26 at 15:03 -0500, Stuart Yoder wrote: The other obvious possibility is a pure ioctl interface. To match what this proposal is trying to describe, plus the runtime interfaces, we'd need something like: /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) /* Return number of mmio/iop/config regions. * For PCI this is always 8 (BAR0-5 + ROM + Config) */ #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) /* Return length for region index (may be zero) */ #define VFIO_DEVICE_GET_REGION_LEN _IOWR(, , u64) /* Return flags for region index * :0 - mmap'able, :1 - read-only, 63:2 - reserved */ #define VFIO_DEVICE_GET_REGION_FLAGS_IOR(, , u64) /* Return file offset for region index */ #define VFIO_DEVICE_GET_REGION_OFFSET _IOWR(, , u64) /* Return physical address for region index - not implemented for PCI */ #define VFIO_DEVICE_GET_REGION_PHYS_ADDR_IOWR(, , u64) /* Return number of IRQs (Not including MSI/MSI-X for PCI) */ #define VFIO_DEVICE_GET_NUM_IRQ _IOR(, , int) /* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */ #define VFIO_DEVICE_SET_IRQ_EVENTFD _IOW(, , int) /* Unmask IRQ index */ #define VFIO_DEVICE_UNMASK_IRQ _IOW(, , int) /* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */ #define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD _IOW(, , int) /* Return the device tree path for type/index into the user * allocated buffer */ struct dtpath { u32 type; (0 = region, 1 = IRQ) u32 index; u32 buf_len; char*buf; }; #define VFIO_DEVICE_GET_DTPATH _IOWR(, , struct dtpath) /* Return the device tree index for type/index */ struct dtindex { u32 type; (0 = region, 1 = IRQ) u32 index; u32 prop_type; u32 prop_index; }; #define VFIO_DEVICE_GET_DTINDEX _IOWR(, , struct dtindex) /* Reset the device */ #define VFIO_DEVICE_RESET _IO(, ,) /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS_IOW(, , int) #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) Hope that covers it. Something I prefer about this interface is that everything can easily be generated on the fly, whereas reading out a table from the device means we really need to have that table somewhere in kernel memory to easily support reading random offsets. Thoughts? I think this could work, but I'm not sure it makes the problem David had any better-- you substitute the complexity of parsing the variable length regions with invoking a set of APIs. I read it as mostly the complexity problem, which I think this makes fairly trivial. It also eliminates a lot of complexity on the kernel side of supporting the table driven interface. Thanks, Alex if (!(GET_FLAGS PCI)) return error; if (GET_NUM_REGIONS 8) return error; GET_REGION_LEN(7) GET_REGION_OFFSET(7) // setup config access for (i = 0; i 6; i++) { if (GET_REGION_LEN(i)) { GET_REGION_OFFSET(i) setup mmap/rw } } if (GET_REGION_LEN(6)) { GET_REGION_OFFSET(6) setup ROM access } -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode
-Original Message- From: Roopa Prabhu (roprabhu) Sent: Thursday, September 15, 2011 6:47 AM To: Michael S. Tsirkin Cc: Sridhar Samudrala; net...@vger.kernel.org; dragos.tatu...@gmail.com; a...@arndb.de; David Wang (dwang2); Christian Benvenuti (benve); ka...@trash.net; da...@davemloft.net; eric.duma...@gmail.com; mc...@broadcom.com; kvm@vger.kernel.org Subject: Re: [net-next-2.6 PATCH 0/3 RFC] macvlan: MAC Address filtering support for passthru mode The netlink patch is still in the works. I will post the patches after I clean it up a bit and also accommodate or find answers to most questions discussed for non-passthru case. Thought I will post the netlink interface here to see if anyone has any early comments. I have a rtnl_link_ops-set_rx_filter defined. [IFLA_RX_FILTER] = { [IFLA_ADDRESS_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } } Some open questions: - The VLAN filter above shows a VLAN list. It could also be a bitmap or the interface could provide both a bitmap and VLAN list for more flexibility . Like the below [IFLA_RX_FILTER] = { [IFLA_ADDRESS_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_BITMAP] [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } } The simplest interface probably is to stick to a bitmap (knowing that in the worst case it will take 256 bytes, but we can compress it ...), because sending a vlan list may end up requiring much more than that (on interfaces configured as trunks). This regardless of whether the most common use case is that of a server configured with just few vlans or that of a switch configured with few trunks. Another option would be a list of ranges, but that one would not work well in those cases where trunks are configured, for example, to carry big numbers of odd or even vlan IDs or other groups of vlans IDs that cannot be grouped into ranges. Probably an acceptable compromise, if we care about the size of this attribute, would be: - to use a list of IDs for less than 256 vlans (or a list of ranges) - to use a bitmap for more than 256 vlan. I would recommend the two attributes (IFLA_VLAN_BITMAP and IFLA_VLAN_LIST) to be mutually exclusive to reduce the complexity of the merging and error/misconfig detection code. - Do you see any advantage in keeping Unicast and multicast address list separate ? Something like the below : [IFLA_RX_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_UC_ADDRESS_FILTER] = { [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_MC_ADDRESS_FILTER] = { [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_VLAN_FILTER] = { [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } } I personally like the idea of grouping UC and MC addresses into two distinct attributes/groups. The receiver (the kernel) would have to check them anyway (I suppose), but I like the idea of having the kernel being able to detect the case where, for example, the user configures a MC address thinking he is actually configuring a UC address. Most probably the iproute2 commands used to configure/add/del UC and MC address will be assigned two different keywords. BTW, once this code will be in, it will be possible for ip link show to show all UC/MC MAC addresses; right now ip link only shows dev-dev_addr. This output is useful for debugging. The output from ip maddr only shows the MC list and anyway I think the best place for the list of MAC addresses is ip link show. Would ip link show also show the list of vlans? Probably, best would be to add new flags (to ask for the extended output) or simply add the extra output (uc/mc/vlan lists) under the already existent -s flag ? - Is there any need to keep address and vlan filters separate. And have two rtnl_link_ops, set_rx_address_filter, set_rx_vlan_filter ?. I don't see one . [IFLA_RX_ADDRESS_FILTER] = { [IFLA_ADDRESS_FILTER_FLAGS] [IFLA_ADDRESS_LIST] = { [IFLA_ADDRESS_LIST_ENTRY] } } [IFLA_RX_VLAN_FILTER] = { [IFLA_VLAN_LIST] = { [IFLA_VLAN] } } I think both approaches are good. Anyway, given that you can have/configure nested vlans, having IFLA_RX_VLAN_FILTER inside IFLA_RX_FILTER would be syntactically correct too. /Chris Thanks, Roopa On 9/12/11 10:02 AM, Roopa Prabhu ropra...@cisco.com wrote: On 9/11/11 12:03 PM, Michael S. Tsirkin m...@redhat.com wrote: On Sun, Sep 11,
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On 09/26/2011 01:34 PM, Alex Williamson wrote: The other obvious possibility is a pure ioctl interface. To match what this proposal is trying to describe, plus the runtime interfaces, we'd need something like: /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) /* Return number of mmio/iop/config regions. * For PCI this is always 8 (BAR0-5 + ROM + Config) */ #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) How do you handle BARs that a particular device doesn't use? Zero-length? /* Return the device tree path for type/index into the user * allocated buffer */ struct dtpath { u32 type; (0 = region, 1 = IRQ) u32 index; u32 buf_len; char*buf; }; #define VFIO_DEVICE_GET_DTPATH_IOWR(, , struct dtpath) So now the user needs to guess a buffer length in advance... and what happens if it's too small? /* Reset the device */ #define VFIO_DEVICE_RESET _IO(, ,) What generic way do we have to do this? We should probably have a way to determine whether it's possible, without actually asking to do it. /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int) #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) Hope that covers it. It could be done this way, but I predict that the code (both kernel and user side) will be larger. Maybe not much more complex, but more boilerplate. How will you manage extensions to the interface? With the table it's simple, you see a new (sub)record type and you either understand it or you skip it. With ioctls you need to call every information-gathering ioctl you know and care about (or are told is present via some feature advertisement), and see if there's anything there. Something I prefer about this interface is that everything can easily be generated on the fly, whereas reading out a table from the device means we really need to have that table somewhere in kernel memory to easily support reading random offsets. Thoughts? The table should not be particularly large, and you'll need to keep the information around in some form regardless. Maybe in the PCI case you could produce it dynamically (though I probably wouldn't), but it really wouldn't make sense in the device tree case. You also lose the ability to easily have a human look at the hexdump for debugging; you'll need a special lsvfio tool. You might want one anyway to pretty-print the info, but with ioctls it's mandatory. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] acpi: fix up EJ0 in DSDT
On Mon, Sep 26, 2011 at 10:04:13AM +0300, Michael S. Tsirkin wrote: On Mon, Sep 26, 2011 at 12:40:18AM -0400, Kevin O'Connor wrote: On Thu, Sep 22, 2011 at 09:09:49AM +0300, Michael S. Tsirkin wrote: On Thu, Sep 22, 2011 at 12:35:13AM -0400, Kevin O'Connor wrote: The code to generate basic SSDT code isn't that difficult (see build_ssdt and src/ssdt-proc.dsl). Is there a compelling reason to patch the DSDT versus just generating the necessary blocks in an SSDT? I don't really care whether the code is in DSDT or SSDT, IMO there isn't much difference between build_ssdt and patching: main reason is build_ssdt uses offsets hardcoded to a specific binary (ssdt_proc and SD_OFFSET_* ) while I used a script to extract offsets. Yes - your script to extract the offsets is nice. If you still have doubts, it might make sense to merge just patch 1 - acpi: generate and parse mixed asl/aml listing - as the first step. With the infrastructure in place it will be easier to discuss the best way to use it. I'm okay with your first patch. However, I wish to tag a release before committing ACPI changes. There was a concern raised with two-pass PCI initialization that I need to follow up on before tagging. -Kevin -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On 09/26/2011 02:57 PM, Stuart Yoder wrote: On Mon, Sep 26, 2011 at 2:51 AM, David Gibson Um, not to put too fine a point on it, this is madness. Yes, it's very flexible and can thereby cover a very wide range of cases. But it's much, much too complex. Userspace has to parse a complex, multilayered data structure, with variable length elements just to get an address at which to do IO. I can pretty much guarantee that if we went with this, most userspace programs using this interface would just ignore this metadata and directly map the offsets at which they happen to know the kernel will put things for the type of device they care about. Then they deserve to break when those offsets change, just like with all the other bad assumptions poorly written code tends to make. Really, it should not be that hard to parse this. A loop with a switch statement for toplevel records, and another loop with a switch statement for each context in which subrecords can appear. _At least_ for PCI, I think the original VFIO layout of each BAR at a fixed, well known offset is much better. This is what grew out of attempting to address the needs of assigning non-PCI devices based on the device tree. Personally, I'd rather be dealing with this for PCI devices as well. Despite its limitations, just advertising a device type ID which describes one of a few fixed layouts would be preferable to this. That's already more information than the original VFIO layout provided. So, is your issue really the variable length nature of what was proposed? I don't think it would be that hard to make the different resources fixed length. I think we have 2 types of resources now-- address regions and interrupts. The only thing that get's a bit tricky is device tree paths, which are obviously variable length. We could put a description of all the resources in an array with each element being something like 4KB?? I'm not sure what that would improve. If the user wants to put a limit on the size of a certain field it's willing to handle, so be it. It'll break in the same set of cases that would be unexpressable with such a limitation in the interface, except that since the breakage is not in the interface, it's more easily fixable. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
How many threads should a kvm vm be starting?
I just noticed something interesting, a virtual machine on one of my servers seems to have 69 threads (including the main thread). Other guests on the machine only have a couple threads. Is this normal? or has something gone horribly wrong? -- Thomas Fjellstrom tho...@fjellstrom.ca -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files
On Mon, 2011-09-26 at 18:59 -0500, Scott Wood wrote: On 09/26/2011 01:34 PM, Alex Williamson wrote: The other obvious possibility is a pure ioctl interface. To match what this proposal is trying to describe, plus the runtime interfaces, we'd need something like: /* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */ #define VFIO_DEVICE_GET_FLAGS _IOR(, , u64) /* Return number of mmio/iop/config regions. * For PCI this is always 8 (BAR0-5 + ROM + Config) */ #define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int) How do you handle BARs that a particular device doesn't use? Zero-length? Yep /* Return the device tree path for type/index into the user * allocated buffer */ struct dtpath { u32 type; (0 = region, 1 = IRQ) u32 index; u32 buf_len; char*buf; }; #define VFIO_DEVICE_GET_DTPATH _IOWR(, , struct dtpath) So now the user needs to guess a buffer length in advance... and what happens if it's too small? -ENOSPC. Call with buf_len = 0 and it could indicate the size. /* Reset the device */ #define VFIO_DEVICE_RESET _IO(, ,) What generic way do we have to do this? We should probably have a way to determine whether it's possible, without actually asking to do it. It's not generic, it could be a VFIO_DEVICE_PCI_RESET or we could add a bit to the device flags to indicate if it's available or we could add a probe arg to the ioctl to either check for existence or do it. /* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */ #define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS_IOW(, , int) #define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int) Hope that covers it. It could be done this way, but I predict that the code (both kernel and user side) will be larger. Maybe not much more complex, but more boilerplate. How will you manage extensions to the interface? I would assume we'd do something similar to the kvm capabilities checks. I don't know if that's just bits of GET_FLAGS or a different ioctl. With the table it's simple, you see a new (sub)record type and you either understand it or you skip it. With ioctls you need to call every information-gathering ioctl you know and care about (or are told is present via some feature advertisement), and see if there's anything there. I don't really see much difference between the interfaces here. You'd pick and choose which table entries you care about and pick and choose ioctls. For one you see it in the table, for the other there's a bit indicating the capability exists. Something I prefer about this interface is that everything can easily be generated on the fly, whereas reading out a table from the device means we really need to have that table somewhere in kernel memory to easily support reading random offsets. Thoughts? The table should not be particularly large, and you'll need to keep the information around in some form regardless. Maybe in the PCI case you could produce it dynamically (though I probably wouldn't), but it really wouldn't make sense in the device tree case. It would be entirely dynamic for PCI, there's no advantage to caching it. Even for device tree, if you can't fetch it dynamically, you'd have to duplicate it between an internal data structure and a buffer reading the table. You also lose the ability to easily have a human look at the hexdump for debugging; you'll need a special lsvfio tool. You might want one anyway to pretty-print the info, but with ioctls it's mandatory. I don't think this alone justifies duplicating the data and making it difficult to parse on both ends. Chances are we won't need such a tool for the ioctl interface because it's easier to get it right the first time ;) Note that I'm not stuck on this interface, I was just thinking about how to generate the table last week, it seemed like a pain so I thought I'd spend a few minutes outlining an ioctl interface... turns out it's not so bad. Thanks, Alex -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] kvm tools: Support multiple net devices
On 09/27/2011 01:44 AM, Sasha Levin wrote: This patch adds support for multiple network devices. The command line syntax changes to the following: --network/-n [mode=tap/user/none][,guest_ip=ip][,host_ip= ip][,guest_mac=mac][,script=file] Each of the parameters is optional, and the config defaults to a TAP based networking with a sequential MAC. Signed-off-by: Sasha Levin levinsasha...@gmail.com Acked-by: Asias He asias.he...@gmail.com -- Asias He -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 09/24/2011 02:44 AM, Alexander Graf wrote: On 20.09.2011, at 19:54, Scott Wood wrote: On 09/19/2011 06:35 PM, Alexander Graf wrote: + asm (PPC_CNTLZL %0,%1 : =r (lz) : r (psize)); + tsize = min(21 - lz, tsize); No need to open-code the cntlz and subtract-from-21: tsize = min(ilog2(psize) - 10, tsize); /* * e500 doesn't implement the lowest tsize bit, * or 1K pages. */ tsize = max(BOOK3E_PAGESZ_4K, tsize ~1); There's still an open-coded subtraction of 10, but that relates more straightforwardly to the definition of tsize (and could be factored out into a size-to-tsize function). Yeah, no need to micro-optimized those few bits. The reason I used the asm statement was that I copied the hugetlbfs code which does it that way :). The 21 - lz thing is broken on 64-bit, FWIW. It works by chance in the hugetlbfs code (and in settlbcam) because it results in tsize being too low by 32, which only affects bits that subsequently get masked off. } up_read(current-mm-mmap_sem); } if (likely(!pfnmap)) { + unsigned long tsize_pages = 1 (tsize - 2); 1 (tsize + 10 - PAGE_SHIFT); Are we getting variable page sizes anytime soon? Will change it nevertheless, just curious :). Nothing imminent on our chips AFAIK, but there's some 64K page support in Linux for IBM's book3e chips, and it's not nice to hardcode (especially in a hidden way) regardless. -Scott -- To unsubscribe from this list: send the line unsubscribe kvm-ppc in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html