Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On 2011-10-06 03:13, liu ping fan wrote: > On Wed, Oct 5, 2011 at 7:01 PM, Jan Kiszka wrote: >> On 2011-10-05 12:26, liu ping fan wrote: > And make the creation of apic as part of cpu initialization, so > apic's state has been ready, before setting kvm_apic. There is no kvm-apic upstream yet, so it's hard to judge why we need this here. If we do, this has to be a separate patch. But I seriously doubt we need it (my hack worked without it, and that was not because of its hack nature). Sorry, I did not explain it clearly. What I mean is that “env->apic_state” >>> must be prepared >>> before qemu_kvm_cpu_thread_fn() -> ... -> kvm_put_sregs(), where we get >>> apic_base by >>> “ sregs.apic_base = cpu_get_apic_base(env->apic_state);” >>> and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);” which will >>> finally affect the >>> kvm_apic structure in kernel. >>> >>> But as current code, in pc_new_cpu(), we call apic_init() to initialize >>> apic_state, after cpu_init(), >>> so we can not guarantee the order of apic_state initializaion and the >>> setting to kernel. >>> >>> Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), >>> and ensure apic_init() >>> called before thread “qemu_kvm_cpu_thread_fn()” creation. >> >> The LAPIC is part of the CPU, the classic APIC was a dedicated chip. > Sorry, a little puzzle. I think x86 interrupt system consists of two > parts: IOAPIC/LAPIC. > For we have "hw/ioapic.c" to simulate IOAPIC, I think "hw/apic.c" > takes the role as LAPIC, > especially that we create an APICState instance for each CPUX86State, > just like each LAPIC > for x86 CPU in real machine. > So we can consider apic_init() to create a LAPIC instance, rather than > create a "classic APIC"? > > I guess If there is lack of something in IOAPIC/LAPIC bus topology, > that will be the arbitrator of ICC bus, right? > So "the classic APIC was a dedicated chip" what you said, play this > role, right? > Could you tell me a sample chipset of APIC, and I can increase my > knowledge about it, thanks. The 82489DX was used as a discrete APIC on 486 SMP systems. > >> >> For various reasons, a safer approach for creating a new CPU is to stop >> the machine, add the new device models, run cpu_synchronize_post_init on >> that new cpu (looks like you missed that) and then resume everything. >> See >> http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 >> > Great job. And I am interesting about it. Could you give some sample > reason about why we need to stop > the machine, or list all of the reasons, so we can resolve it one by > one. I can not figure out such scenes by myself. > From my view, especially for KVM, the creation of vcpu are protected > well by lock mechanism from other > vcpu threads in kernel, so we need not to stop all of the threads. Maybe I was seeing ghosts: I thought that there is a race window between VCPU_CREATE and the last initialization IOCTL when we allow other VCPUs to interact with the new one already. However, I do not find the scenario again ATM. But if you want to move the VCPU resource initialization completely over the VCPU thread, you also have to handle env->halted in that context. See [1] for this topic and associated todos. And don't forget the cpu_synchronize_post_init. Running this after each VCPU creation directly should also obsolete cpu_synchronize_all_post_init. Jan [1] http://thread.gmane.org/gmane.comp.emulators.qemu/100806 signature.asc Description: OpenPGP digital signature
Re: balloon drivers missing in virtio-win-1.1.16.vfd
- Original Message - > From: "Onkar N Mahajan" > To: kvm@vger.kernel.org, qemu-de...@nongnu.org > Sent: Thursday, September 29, 2011 6:03:26 AM > Subject: balloon drivers missing in virtio-win-1.1.16.vfd > > virtio_balloon drivers are missing in the virtio-win floppy disk > image > found at > http://alt.fedoraproject.org/pub/alt/virtio-win/latest/images/bin/ > whereas they are present in the ISO image , any specific reason for > this ? Shouldn't they be ideally present ? You probably want to be asking this on the Fedora virt list rather than the kvm & qemu developer list. > > Regards, > Onkar > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] virtio-blk: implement ->make_request
On Wed, 05 Oct 2011 15:54:08 -0400, Christoph Hellwig wrote: > Add an alternate I/O path that implements ->make_request for virtio-blk. > This is required for high IOPs devices which get slowed down to 1/5th of > the native speed by all the locking, memory allocation and other overhead > in the request based I/O path. Ouch. I'd be tempted to just switch across to this, though I'd be interested to see if the simple add_buf change I referred to before has some effect by itself (I doubt it). Also, though it's overkill I'd use standard list primitives rather than open-coding a single linked list. Thanks! Rusty. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 4/4] acpi: automatically generated ssdt proc
On Wed, Oct 05, 2011 at 08:35:26AM -0200, Michael S. Tsirkin wrote: > On Tue, Oct 04, 2011 at 10:52:33PM -0400, Kevin O'Connor wrote: > > Something like: > > > > ACPI_EXTRACT ssdt_proc_obj > > Processor (CPAA, 0xAA, 0xb010, 0x06) { > > > > I considered this, sure. We could parse AML to figure out > what is the object type we are trying to look up. > > However I decided sanity-checking that we get the right type of object > in AML is better. This way if iasl output format breaks > we will have a better chance to detect that. > Makes sense? Yes. I guess one could do ACPI_EXTRACT_PROCESSOR for the sanity check. > Also this can not be as generic as it seems: each type of > object in AML bytecode is encoded slightly differently. > So we would still have types of objects we support > and types of object we don't. Yes. > > which would produce something like: > > > > static struct aml_object ssdt_proc_obj = {.addr=0x24, .size=0x40, > > .param=0x28}; > > What is the param offset here? The location of the first byte of the parameters (the same as you had for ssdt_proc_name). Normally, AML objects take the form: . The is itself of variable length, so passing in the start of the fixed parameters would make manipulating the results easier. > > As for the other parts of this patch series - I'm still leary of > > changing the DSDT dynamically. > > Hmm, not sure I understand why. Could you explain pls? Sure: - The DSDT is big and has several cross-functional users. Patching up the DSDT for hotplug when the DSDT also has unrelated stuff (eg, mouse) seems ugly. - The PCI hotplug stuff is generating a whole bunch of devices and the dynamic code is effectively disabling the unwanted ones. It seems nicer to dynamically generate the desired entries instead of bulk generating and dynamically blanking. - The CPU hotplug has similar requirements, but is implemented differently - it generates the CPU objects dynamically. It's not desirable to bulk generate the CPU objects and "blank" them dynamically, because 255 CPU objects would noticeably increase SeaBIOS' static size. - Some time back there were patches floating around to pass the DSDT into SeaBIOS via fw_cfg interface. Those patches never made it in (I forget why), but the basic functionality seemed sound. Patching the DSDT in SeaBIOS would seem to eliminate that possibility. None of these would be road-blocks. However, they make me want to consider other approaches. > > and then just memcpy the "hotplug_obj" N number of times into the ssdt > > for each available slot. (This would be on top of the DSDT > > simplification patch series that I posted previously.) > > This assumes all devices are the same. But unfortunately this will not > work for other devices such as VGA. The VGA device can't be hotplugged, so I don't see why that would be an issue. -Kevin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On 10/05/2011 08:13 PM, liu ping fan wrote: On Wed, Oct 5, 2011 at 7:01 PM, Jan Kiszka wrote: On 2011-10-05 12:26, liu ping fan wrote: > And make the creation of apic as part of cpu initialization, so apic's state has been ready, before setting kvm_apic. There is no kvm-apic upstream yet, so it's hard to judge why we need this here. If we do, this has to be a separate patch. But I seriously doubt we need it (my hack worked without it, and that was not because of its hack nature). Sorry, I did not explain it clearly. What I mean is that “env->apic_state” must be prepared before qemu_kvm_cpu_thread_fn() -> ... -> kvm_put_sregs(), where we get apic_base by “ sregs.apic_base = cpu_get_apic_base(env->apic_state);” and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS,&sregs);” which will finally affect the kvm_apic structure in kernel. But as current code, in pc_new_cpu(), we call apic_init() to initialize apic_state, after cpu_init(), so we can not guarantee the order of apic_state initializaion and the setting to kernel. Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), and ensure apic_init() called before thread “qemu_kvm_cpu_thread_fn()” creation. The LAPIC is part of the CPU, the classic APIC was a dedicated chip. Sorry, a little puzzle. I think x86 interrupt system consists of two parts: IOAPIC/LAPIC. For we have "hw/ioapic.c" to simulate IOAPIC, I think "hw/apic.c" takes the role as LAPIC, especially that we create an APICState instance for each CPUX86State, just like each LAPIC for x86 CPU in real machine. So we can consider apic_init() to create a LAPIC instance, rather than create a "classic APIC"? I guess If there is lack of something in IOAPIC/LAPIC bus topology, that will be the arbitrator of ICC bus, right? So "the classic APIC was a dedicated chip" what you said, play this role, right? Could you tell me a sample chipset of APIC, and I can increase my knowledge about it, thanks. I think Jan meant the PIC was a dedicated chip. hw/apic.c implements an LAPIC, hw/i8259.c implements an I8259A otherwise known as the PIC. hw/ioapic.c implements an I/O APIC. Together, the I/O APIC and the LAPIC form an 'APIC Architecture'. Usually, the legacy PIC can hang off of the BSP LAPIC. Regards, Anthony Liguori For various reasons, a safer approach for creating a new CPU is to stop the machine, add the new device models, run cpu_synchronize_post_init on that new cpu (looks like you missed that) and then resume everything. See http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 Great job. And I am interesting about it. Could you give some sample reason about why we need to stop the machine, or list all of the reasons, so we can resolve it one by one. I can not figure out such scenes by myself. From my view, especially for KVM, the creation of vcpu are protected well by lock mechanism from other vcpu threads in kernel, so we need not to stop all of the threads. Hope for suggestion and direct from you, and make a further step for CPU hot-plug feature, Thanks and regards, ping fan ... diff --git a/hw/icc_bus.c b/hw/icc_bus.c new file mode 100644 index 000..360ca2a --- /dev/null +++ b/hw/icc_bus.c @@ -0,0 +1,62 @@ +/* +*/ +#define ICC_BUS_PLUG +#ifdef ICC_BUS_PLUG +#include "icc_bus.h" + + + +struct icc_bus_info icc_info = { +.qinfo.name = "icc", +.qinfo.size = sizeof(struct icc_bus), +.qinfo.props = (Property[]) { +DEFINE_PROP_END_OF_LIST(), +} + +}; + + +static const VMStateDescription vmstate_icc_bus = { +.name = "icc_bus", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.pre_save = NULL, +.post_load = NULL, +}; + +struct icc_bus *g_iccbus; + +struct icc_bus *icc_init_bus(DeviceState *parent, const char *name) +{ +struct icc_bus *bus; + +bus = FROM_QBUS(icc_bus, qbus_create(&icc_info.qinfo, parent, name)); +bus->qbus.allow_hotplug = 1; /* Yes, we can */ +bus->qbus.name = "icc"; +vmstate_register(NULL, -1,&vmstate_icc_bus, bus); The chipset is the owner of this bus and instantiates it. So it also provides a vmstate. You can drop this unneeded one here (it's created via an obsolete API anyway). No familiar with Qemu bus emulation, keep on learning :) . But what I thought is, the x86-ICC bus is not the same as bus like PCI. For a PCI bus, it lies behind a host bridge, but ICC is shared by all x86 processors in SMP system, so there is not a outstanding owner. And I right? ICC is also attached to the chipset (due to the IOAPIC). So it looks reasonable to me to let the chipset do the lifecycle management as well. It is the fixed point, CPUs may come and go. Jan -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On Wed, Oct 5, 2011 at 7:01 PM, Jan Kiszka wrote: > On 2011-10-05 12:26, liu ping fan wrote: >>> > And make the creation of apic as part of cpu initialization, so apic's state has been ready, before setting kvm_apic. >>> >>> There is no kvm-apic upstream yet, so it's hard to judge why we need >>> this here. If we do, this has to be a separate patch. But I seriously >>> doubt we need it (my hack worked without it, and that was not because of >>> its hack nature). >>> >>> Sorry, I did not explain it clearly. What I mean is that “env->apic_state” >> must be prepared >> before qemu_kvm_cpu_thread_fn() -> ... -> kvm_put_sregs(), where we get >> apic_base by >> “ sregs.apic_base = cpu_get_apic_base(env->apic_state);” >> and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);” which will >> finally affect the >> kvm_apic structure in kernel. >> >> But as current code, in pc_new_cpu(), we call apic_init() to initialize >> apic_state, after cpu_init(), >> so we can not guarantee the order of apic_state initializaion and the >> setting to kernel. >> >> Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), >> and ensure apic_init() >> called before thread “qemu_kvm_cpu_thread_fn()” creation. > > The LAPIC is part of the CPU, the classic APIC was a dedicated chip. Sorry, a little puzzle. I think x86 interrupt system consists of two parts: IOAPIC/LAPIC. For we have "hw/ioapic.c" to simulate IOAPIC, I think "hw/apic.c" takes the role as LAPIC, especially that we create an APICState instance for each CPUX86State, just like each LAPIC for x86 CPU in real machine. So we can consider apic_init() to create a LAPIC instance, rather than create a "classic APIC"? I guess If there is lack of something in IOAPIC/LAPIC bus topology, that will be the arbitrator of ICC bus, right? So "the classic APIC was a dedicated chip" what you said, play this role, right? Could you tell me a sample chipset of APIC, and I can increase my knowledge about it, thanks. > > For various reasons, a safer approach for creating a new CPU is to stop > the machine, add the new device models, run cpu_synchronize_post_init on > that new cpu (looks like you missed that) and then resume everything. > See > http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 > Great job. And I am interesting about it. Could you give some sample reason about why we need to stop the machine, or list all of the reasons, so we can resolve it one by one. I can not figure out such scenes by myself. >From my view, especially for KVM, the creation of vcpu are protected well by lock mechanism from other vcpu threads in kernel, so we need not to stop all of the threads. Hope for suggestion and direct from you, and make a further step for CPU hot-plug feature, Thanks and regards, ping fan > ... diff --git a/hw/icc_bus.c b/hw/icc_bus.c new file mode 100644 index 000..360ca2a --- /dev/null +++ b/hw/icc_bus.c @@ -0,0 +1,62 @@ +/* +*/ +#define ICC_BUS_PLUG +#ifdef ICC_BUS_PLUG +#include "icc_bus.h" + + + +struct icc_bus_info icc_info = { + .qinfo.name = "icc", + .qinfo.size = sizeof(struct icc_bus), + .qinfo.props = (Property[]) { + DEFINE_PROP_END_OF_LIST(), + } + +}; + + +static const VMStateDescription vmstate_icc_bus = { + .name = "icc_bus", + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, + .pre_save = NULL, + .post_load = NULL, +}; + +struct icc_bus *g_iccbus; + +struct icc_bus *icc_init_bus(DeviceState *parent, const char *name) +{ + struct icc_bus *bus; + + bus = FROM_QBUS(icc_bus, qbus_create(&icc_info.qinfo, parent, >>> name)); + bus->qbus.allow_hotplug = 1; /* Yes, we can */ + bus->qbus.name = "icc"; + vmstate_register(NULL, -1, &vmstate_icc_bus, bus); >>> >>> The chipset is the owner of this bus and instantiates it. So it also >>> provides a vmstate. You can drop this unneeded one here (it's created >>> via an obsolete API anyway). >>> >> >> No familiar with Qemu bus emulation, keep on learning :) . But what I >> thought is, >> the x86-ICC bus is not the same as bus like PCI. >> For a PCI bus, it lies behind a host bridge, but ICC is shared by all x86 >> processors in SMP system, >> so there is not a outstanding owner. And I right? > > ICC is also attached to the chipset (due to the IOAPIC). So it looks > reasonable to me to let the chipset do the lifecycle management as well. > It is the fixed point, CPUs may come and go. > > Jan > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: Perf for only one VM?
On Wed, Oct 05, 2011 at 04:03:13PM -0600, David Ahern wrote: > On 10/05/2011 01:54 PM, Robin Lee Powell wrote: > > Is it possible to get perf kvm record to record only what's > > going on on a single VM? The options seem to be --host (all > > host data) and --guest (all VM data). > > -p argument after the record. Thanks! -Robin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] block: add bio_map_sg
On 10/05/2011 09:54 PM, Christoph Hellwig wrote: > Add a helper to map a bio to a scatterlist, modelled after blk_rq_map_sg. > This helper is useful for any driver that wants to create a scatterlist > from its ->make_request method. > > Signed-off-by: Christoph Hellwig > I have some questions. - Could we later use this bio_map_sg() to implement blk_rq_map_sg() and remove some duplicated code? - Don't you need to support a chained bio (bio->next != NULL)? Because I did not see any looping in the last patch [PATCH 5/5] virtio-blk: implement ->make_request Or is it that ->make_request() is a single bio at a time? If so could we benefit from both bio-chaining and sg-chaning to make bigger IOs? Thanks Boaz > Index: linux-2.6/block/blk-merge.c > === > --- linux-2.6.orig/block/blk-merge.c 2011-10-04 11:49:32.857014742 -0400 > +++ linux-2.6/block/blk-merge.c 2011-10-04 13:37:51.305630951 -0400 > @@ -199,6 +199,69 @@ new_segment: > } > EXPORT_SYMBOL(blk_rq_map_sg); > > +/* > + * map a bio to a scatterlist, return number of sg entries setup. Caller > + * must make sure sg can hold bio->bi_phys_segments entries > + */ > +int bio_map_sg(struct request_queue *q, struct bio *bio, > + struct scatterlist *sglist) > +{ > + struct bio_vec *bvec, *bvprv; > + struct scatterlist *sg; > + int nsegs, cluster; > + unsigned long i; > + > + nsegs = 0; > + cluster = blk_queue_cluster(q); > + > + bvprv = NULL; > + sg = NULL; > + bio_for_each_segment(bvec, bio, i) { > + int nbytes = bvec->bv_len; > + > + if (bvprv && cluster) { > + if (sg->length + nbytes > queue_max_segment_size(q)) > + goto new_segment; > + > + if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) > + goto new_segment; > + if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec)) > + goto new_segment; > + > + sg->length += nbytes; > + } else { > +new_segment: > + if (!sg) > + sg = sglist; > + else { > + /* > + * If the driver previously mapped a shorter > + * list, we could see a termination bit > + * prematurely unless it fully inits the sg > + * table on each mapping. We KNOW that there > + * must be more entries here or the driver > + * would be buggy, so force clear the > + * termination bit to avoid doing a full > + * sg_init_table() in drivers for each command. > + */ > + sg->page_link &= ~0x02; > + sg = sg_next(sg); > + } > + > + sg_set_page(sg, bvec->bv_page, nbytes, bvec->bv_offset); > + nsegs++; > + } > + bvprv = bvec; > + } /* segments in bio */ > + > + if (sg) > + sg_mark_end(sg); > + > + BUG_ON(bio->bi_phys_segments && nsegs > bio->bi_phys_segments); > + return nsegs; > +} > +EXPORT_SYMBOL(bio_map_sg); > + > static inline int ll_new_hw_segment(struct request_queue *q, > struct request *req, > struct bio *bio) > Index: linux-2.6/include/linux/blkdev.h > === > --- linux-2.6.orig/include/linux/blkdev.h 2011-10-04 13:37:13.216148915 > -0400 > +++ linux-2.6/include/linux/blkdev.h 2011-10-04 13:37:51.317613617 -0400 > @@ -854,6 +854,8 @@ extern void blk_queue_flush_queueable(st > extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device > *bdev); > > extern int blk_rq_map_sg(struct request_queue *, struct request *, struct > scatterlist *); > +extern int bio_map_sg(struct request_queue *q, struct bio *bio, > + struct scatterlist *sglist); > extern void blk_dump_rq_flags(struct request *, char *); > extern long nr_blockdev_pages(void); > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Perf for only one VM?
On 10/05/2011 01:54 PM, Robin Lee Powell wrote: > Is it possible to get perf kvm record to record only what's going on > on a single VM? The options seem to be --host (all host data) and > --guest (all VM data). -p argument after the record. David -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] RFC: ->make_request support for virtio-blk
On Wed, Oct 05, 2011 at 04:31:16PM -0400, Vivek Goyal wrote: > So you no longer believe that request queue overhead can be brought > down to mangeable levels for these fast devices. And instead go for > bio based drivers and give up on merging and implement own FLUSH/FUA > machinery. Not in a reasonable time frame. Having a common interface is still the grand plan, but the virtio performance issues are hurting people so badly that we need a fix soon. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] RFC: ->make_request support for virtio-blk
On Wed, Oct 05, 2011 at 03:54:03PM -0400, Christoph Hellwig wrote: > This patchset allows the virtio-blk driver to support much higher IOP > rates which can be driven out of modern PCI-e flash devices. At this > point it really is just a RFC due to various issues. > So you no longer believe that request queue overhead can be brought down to mangeable levels for these fast devices. And instead go for bio based drivers and give up on merging and implement own FLUSH/FUA machinery. Not that I am advocating for continuing with request based driver. Just curious.. Thanks Vivek -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] Re: tcpdump locks up kvm host for a while.
On Wed, Oct 05, 2011 at 11:36:53AM -0700, Robin Lee Powell wrote: > On Wed, Oct 05, 2011 at 08:21:53PM +0200, Avi Kivity wrote: > > On 10/04/2011 09:40 PM, Robin Lee Powell wrote: > > >> >When I run tcpdump on a *guest*, the entire guest completely > > >> >freezes up; no response even to hitting enter on the console. > > >> >"virsh list" also locks up whenever it tries to print state > > >> >about that VM (but the others work fine), as does any other > > >> >operation that touches the state of that VM. The VM takes up > > >> >100% of CPU on one core while this is happening. Eventually > > >> >it gets better. > > >> > > >> You can use 'perf kvm' to figure out where the guest is > > >> spinning. > > > > > >OK, gathered with: > > > > > >sudo perf kvm --guest --host record -o /tmp/kvm_perf -a > > > > > >I don't know how to read it at all, so it's at > > >http://users.digitalkingdom.org/~rlpowell/media/public/kvm_perf > > > > > > > Not accessible. > > -_- Fixed. > > > Please post the output of 'perf kvm report > log'. > > # Events: 42K > # > # Overhead Command Shared Object Symbol > # .. > # >100.00% qemu-kvm [unknown] [g] 0x8111c7a5 > > > # > # (For a higher level overview, try: perf report --sort comm,dso) > # > > How helpful is that? -_- > > I'm guessing I need --guestkallsyms= ; since they're all the same > kernel I thought it'd figure it out. I'll redo. OK, here's a "better" version. # Events: 46K cycles # # Overhead CommandShared Object Symbol # ... ... # 74.81% qemu-kvm [unknown][u] 0x7fbdffd4c18a 25.14% qemu-kvm [guest.kernel.kallsyms] [g] 0x82f0 0.03% qemu-kvm [virtio_net] [g] 0x83e8 0.01% qemu-kvm [virtio_balloon] [g] 0x103b 0.00% qemu-kvm [ip6_tables] [g] compat_standard_to_user 0.00% qemu-kvm [ipv6] [g] icmpv6_send 0.00% qemu-kvm [virtio_blk] [g] 0x7783 0.00% qemu-kvm [ipv6] [g] raw6_seq_show 0.00% qemu-kvm [ipv6] [g] icmpv6_rcv 0.00% qemu-kvm [virtio_net] [g] fini 0.00% qemu-kvm [ip6table_filter][g] 0x9b5 # # (For a higher level overview, try: perf report --sort comm,dso) # The file is also updated. -Robin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] pci_cap_init: add 82599 VF quirk
On Tue, Oct 04, 2011 at 03:02:18PM -0400, Donald Dutile wrote: > v2: Updated to reflect changes requested by reviewers. > > Add check when PCIe capability structure is version 0 > and VID and DID is 82599 VF. In this case, the size > of the PCIe cap structure should be the same as a version 2 > cap structure. > Documented in 82599 Errata 35, and is still marked "No Fix". > According to Intel, it's in silicon not fw, and needs a sw workaround. > > Signed-off-by: Donald Dutile > --- > hw/device-assignment.c | 17 +++-- > 1 files changed, 15 insertions(+), 2 deletions(-) Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/5] block: add bio_map_sg
Add a helper to map a bio to a scatterlist, modelled after blk_rq_map_sg. This helper is useful for any driver that wants to create a scatterlist from its ->make_request method. Signed-off-by: Christoph Hellwig Index: linux-2.6/block/blk-merge.c === --- linux-2.6.orig/block/blk-merge.c2011-10-04 11:49:32.857014742 -0400 +++ linux-2.6/block/blk-merge.c 2011-10-04 13:37:51.305630951 -0400 @@ -199,6 +199,69 @@ new_segment: } EXPORT_SYMBOL(blk_rq_map_sg); +/* + * map a bio to a scatterlist, return number of sg entries setup. Caller + * must make sure sg can hold bio->bi_phys_segments entries + */ +int bio_map_sg(struct request_queue *q, struct bio *bio, + struct scatterlist *sglist) +{ + struct bio_vec *bvec, *bvprv; + struct scatterlist *sg; + int nsegs, cluster; + unsigned long i; + + nsegs = 0; + cluster = blk_queue_cluster(q); + + bvprv = NULL; + sg = NULL; + bio_for_each_segment(bvec, bio, i) { + int nbytes = bvec->bv_len; + + if (bvprv && cluster) { + if (sg->length + nbytes > queue_max_segment_size(q)) + goto new_segment; + + if (!BIOVEC_PHYS_MERGEABLE(bvprv, bvec)) + goto new_segment; + if (!BIOVEC_SEG_BOUNDARY(q, bvprv, bvec)) + goto new_segment; + + sg->length += nbytes; + } else { +new_segment: + if (!sg) + sg = sglist; + else { + /* +* If the driver previously mapped a shorter +* list, we could see a termination bit +* prematurely unless it fully inits the sg +* table on each mapping. We KNOW that there +* must be more entries here or the driver +* would be buggy, so force clear the +* termination bit to avoid doing a full +* sg_init_table() in drivers for each command. +*/ + sg->page_link &= ~0x02; + sg = sg_next(sg); + } + + sg_set_page(sg, bvec->bv_page, nbytes, bvec->bv_offset); + nsegs++; + } + bvprv = bvec; + } /* segments in bio */ + + if (sg) + sg_mark_end(sg); + + BUG_ON(bio->bi_phys_segments && nsegs > bio->bi_phys_segments); + return nsegs; +} +EXPORT_SYMBOL(bio_map_sg); + static inline int ll_new_hw_segment(struct request_queue *q, struct request *req, struct bio *bio) Index: linux-2.6/include/linux/blkdev.h === --- linux-2.6.orig/include/linux/blkdev.h 2011-10-04 13:37:13.216148915 -0400 +++ linux-2.6/include/linux/blkdev.h2011-10-04 13:37:51.317613617 -0400 @@ -854,6 +854,8 @@ extern void blk_queue_flush_queueable(st extern struct backing_dev_info *blk_get_backing_dev_info(struct block_device *bdev); extern int blk_rq_map_sg(struct request_queue *, struct request *, struct scatterlist *); +extern int bio_map_sg(struct request_queue *q, struct bio *bio, + struct scatterlist *sglist); extern void blk_dump_rq_flags(struct request *, char *); extern long nr_blockdev_pages(void); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/5] virtio-blk: reimplement the serial attribute without using requests
If we want to do bio-based I/O in virtio-blk we have to implement reading the serial attribute ourselves. Do that and also prepare struct virtblk_req for dealing with different types of requests. Signed-off-by: Christoph Hellwig Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2011-10-03 20:32:12.997713070 +0200 +++ linux-2.6/drivers/block/virtio_blk.c2011-10-03 20:37:28.836714193 +0200 @@ -38,12 +38,42 @@ struct virtio_blk struct virtblk_req { - struct request *req; + void *private; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; + u8 kind; +#define VIRTIO_BLK_REQUEST 0x00 +#define VIRTIO_BLK_INTERNAL0x01 u8 status; }; +static inline int virtblk_result(struct virtblk_req *vbr) +{ + switch (vbr->status) { + case VIRTIO_BLK_S_OK: + return 0; + case VIRTIO_BLK_S_UNSUPP: + return -ENOTTY; + default: + return -EIO; + } +} + +static void virtblk_request_done(struct virtio_blk *vblk, + struct virtblk_req *vbr) +{ + struct request *req = vbr->private; + + if (req->cmd_type == REQ_TYPE_BLOCK_PC) { + req->resid_len = vbr->in_hdr.residual; + req->sense_len = vbr->in_hdr.sense_len; + req->errors = vbr->in_hdr.errors; + } + + __blk_end_request_all(req, virtblk_result(vbr)); + mempool_free(vbr, vblk->pool); +} + static void blk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; @@ -53,35 +83,16 @@ static void blk_done(struct virtqueue *v spin_lock_irqsave(&vblk->lock, flags); while ((vbr = virtqueue_get_buf(vblk->vq, &len)) != NULL) { - int error; - - switch (vbr->status) { - case VIRTIO_BLK_S_OK: - error = 0; - break; - case VIRTIO_BLK_S_UNSUPP: - error = -ENOTTY; - break; - default: - error = -EIO; - break; - } - - switch (vbr->req->cmd_type) { - case REQ_TYPE_BLOCK_PC: - vbr->req->resid_len = vbr->in_hdr.residual; - vbr->req->sense_len = vbr->in_hdr.sense_len; - vbr->req->errors = vbr->in_hdr.errors; + switch (vbr->kind) { + case VIRTIO_BLK_REQUEST: + virtblk_request_done(vblk, vbr); break; - case REQ_TYPE_SPECIAL: - vbr->req->errors = (error != 0); + case VIRTIO_BLK_INTERNAL: + complete(vbr->private); break; default: - break; + BUG(); } - - __blk_end_request_all(vbr->req, error); - mempool_free(vbr, vblk->pool); } /* In case queue is stopped waiting for more buffers. */ blk_start_queue(vblk->disk->queue); @@ -99,28 +110,24 @@ static bool do_req(struct request_queue /* When another request finishes we'll try again. */ return false; - vbr->req = req; + vbr->private = req; + vbr->kind = VIRTIO_BLK_REQUEST; if (req->cmd_flags & REQ_FLUSH) { vbr->out_hdr.type = VIRTIO_BLK_T_FLUSH; vbr->out_hdr.sector = 0; - vbr->out_hdr.ioprio = req_get_ioprio(vbr->req); + vbr->out_hdr.ioprio = req_get_ioprio(req); } else { switch (req->cmd_type) { case REQ_TYPE_FS: vbr->out_hdr.type = 0; - vbr->out_hdr.sector = blk_rq_pos(vbr->req); - vbr->out_hdr.ioprio = req_get_ioprio(vbr->req); + vbr->out_hdr.sector = blk_rq_pos(req); + vbr->out_hdr.ioprio = req_get_ioprio(req); break; case REQ_TYPE_BLOCK_PC: vbr->out_hdr.type = VIRTIO_BLK_T_SCSI_CMD; vbr->out_hdr.sector = 0; - vbr->out_hdr.ioprio = req_get_ioprio(vbr->req); - break; - case REQ_TYPE_SPECIAL: - vbr->out_hdr.type = VIRTIO_BLK_T_GET_ID; - vbr->out_hdr.sector = 0; - vbr->out_hdr.ioprio = req_get_ioprio(vbr->req); + vbr->out_hdr.ioprio = req_get_ioprio(req); break; default: /* We don't put anything else in the queue. */ @@ -136,13 +143,14 @@ static bool do_req(struct request_queue * block, and before the normal inhdr we p
[PATCH 0/5] RFC: ->make_request support for virtio-blk
This patchset allows the virtio-blk driver to support much higher IOP rates which can be driven out of modern PCI-e flash devices. At this point it really is just a RFC due to various issues. The first four patches are infrastructure that could go in fairly soon as far as I'm concerned. Patch 5 implements the actual ->make_request support and still has a few issues, see there for more details. With it I can driver my PCI-e test devices to 85-90% of the native IOPS and bandwith, but be warned that this is still a fairly low end setup as far as expensive flash storage is concerned. One big downside that is has is that it current exposes a nasty race in the qemu virtqueue code - just running xfstests inside a guest using the new virtio-blk driver (even on a slow device) will trigger it and lead to a filesystem shutdown. I've tracked it down to getting data I/O segments overwritten with status s/g list entries, but got lost at that point. I can start a separate thread on it. Besides that it is missing a few features, and we have to decided how to select which mode to use in virtio-blk - either a module option, sysfs attribute or something that the host communicates. Or maybe decide that just going with ->make_request alone is fine, even on my cheap laptop SSD it actually is just as fast if not slightly faster than the request based variant on my laptop. There are a few other bottlenecks in virtio that this exposes. The first one is the low queue length of just 128 entries in the virtio-blk queue - to drive higher IOPs with a deep queue we absolutely need to increment that. Comments welcome! -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/5] virtio-blk: remove the unused list of pending requests
Signed-off-by: Christoph Hellwig Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2011-10-03 19:55:29.061215040 +0200 +++ linux-2.6/drivers/block/virtio_blk.c2011-10-03 19:55:54.196412176 +0200 @@ -24,9 +24,6 @@ struct virtio_blk /* The disk structure for the kernel. */ struct gendisk *disk; - /* Request tracking. */ - struct list_head reqs; - mempool_t *pool; /* Process context for config space updates */ @@ -41,7 +38,6 @@ struct virtio_blk struct virtblk_req { - struct list_head list; struct request *req; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; @@ -85,7 +81,6 @@ static void blk_done(struct virtqueue *v } __blk_end_request_all(vbr->req, error); - list_del(&vbr->list); mempool_free(vbr, vblk->pool); } /* In case queue is stopped waiting for more buffers. */ @@ -170,7 +165,6 @@ static bool do_req(struct request_queue return false; } - list_add_tail(&vbr->list, &vblk->reqs); return true; } @@ -368,7 +362,6 @@ static int __devinit virtblk_probe(struc goto out; } - INIT_LIST_HEAD(&vblk->reqs); spin_lock_init(&vblk->lock); vblk->vdev = vdev; vblk->sg_elems = sg_elems; @@ -526,9 +519,6 @@ static void __devexit virtblk_remove(str flush_work(&vblk->config_work); - /* Nothing should be pending. */ - BUG_ON(!list_empty(&vblk->reqs)); - /* Stop all the virtqueues. */ vdev->config->reset(vdev); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/5] virtio-blk: implement ->make_request
Add an alternate I/O path that implements ->make_request for virtio-blk. This is required for high IOPs devices which get slowed down to 1/5th of the native speed by all the locking, memory allocation and other overhead in the request based I/O path. This patch is not quite merge ready due to two issues: - it doesn't implement FUA and FLUSH requests yet - it hardcodes which I/O path to chose Signed-off-by: Christoph Hellwig Index: linux-2.6/drivers/block/virtio_blk.c === --- linux-2.6.orig/drivers/block/virtio_blk.c 2011-10-05 10:36:42.883913334 -0400 +++ linux-2.6/drivers/block/virtio_blk.c2011-10-05 15:29:35.591405323 -0400 @@ -11,6 +11,8 @@ #define PART_BITS 4 +static int use_make_request = 1; + static int major, index; struct workqueue_struct *virtblk_wq; @@ -20,6 +22,7 @@ struct virtio_blk struct virtio_device *vdev; struct virtqueue *vq; + wait_queue_head_t queue_wait; /* The disk structure for the kernel. */ struct gendisk *disk; @@ -39,11 +42,13 @@ struct virtio_blk struct virtblk_req { void *private; + struct virtblk_req *next; struct virtio_blk_outhdr out_hdr; struct virtio_scsi_inhdr in_hdr; u8 kind; #define VIRTIO_BLK_REQUEST 0x00 -#define VIRTIO_BLK_INTERNAL0x01 +#define VIRTIO_BLK_BIO 0x01 +#define VIRTIO_BLK_INTERNAL0x02 u8 status; }; @@ -74,10 +79,17 @@ static void virtblk_request_done(struct mempool_free(vbr, vblk->pool); } +static void virtblk_bio_done(struct virtio_blk *vblk, + struct virtblk_req *vbr) +{ + bio_endio(vbr->private, virtblk_result(vbr)); + mempool_free(vbr, vblk->pool); +} + static void blk_done(struct virtqueue *vq) { struct virtio_blk *vblk = vq->vdev->priv; - struct virtblk_req *vbr; + struct virtblk_req *vbr, *head = NULL, *tail = NULL; unsigned int len; unsigned long flags; @@ -88,15 +100,47 @@ static void blk_done(struct virtqueue *v virtblk_request_done(vblk, vbr); break; case VIRTIO_BLK_INTERNAL: - complete(vbr->private); + case VIRTIO_BLK_BIO: + if (head) { + tail->next = vbr; + tail = vbr; + } else { + tail = head = vbr; + } break; default: BUG(); } } - /* In case queue is stopped waiting for more buffers. */ - blk_start_queue(vblk->disk->queue); + + if (!use_make_request) { + /* In case queue is stopped waiting for more buffers. */ + blk_start_queue(vblk->disk->queue); + } spin_unlock_irqrestore(&vblk->lock, flags); + + wake_up(&vblk->queue_wait); + + /* +* Process completions after freeing up space in the virtqueue and +* dropping the lock. +*/ + while (head) { + vbr = head; + head = head->next; + + switch (vbr->kind) { + case VIRTIO_BLK_BIO: + virtblk_bio_done(vblk, vbr); + break; + case VIRTIO_BLK_INTERNAL: + complete(vbr->private); + break; + default: + BUG(); + } + + } } static bool do_req(struct request_queue *q, struct virtio_blk *vblk, @@ -111,6 +155,7 @@ static bool do_req(struct request_queue return false; vbr->private = req; + vbr->next = NULL; vbr->kind = VIRTIO_BLK_REQUEST; if (req->cmd_flags & REQ_FLUSH) { @@ -199,6 +244,128 @@ static void do_virtblk_request(struct re virtqueue_kick(vblk->vq); } +struct virtblk_plug_cb { + struct blk_plug_cb cb; + struct virtio_blk *vblk; +}; + +static void virtblk_unplug(struct blk_plug_cb *bcb) +{ + struct virtblk_plug_cb *cb = + container_of(bcb, struct virtblk_plug_cb, cb); + + virtqueue_notify(cb->vblk->vq); + kfree(cb); +} + +static bool virtblk_plugged(struct virtio_blk *vblk) +{ + struct blk_plug *plug = current->plug; + struct virtblk_plug_cb *cb; + + if (!plug) + return false; + + list_for_each_entry(cb, &plug->cb_list, cb.list) { + if (cb->cb.callback == virtblk_unplug && cb->vblk == vblk) + return true; + } + + /* Not currently on the callback list */ + cb = kmalloc(sizeof(*cb), GFP_ATOMIC); + if (!cb) + return false; + + cb->vblk = vblk; + cb->cb.callback = virtblk_unplug; + list_add(&cb->cb.list, &plug->cb_list); + return true;
[PATCH 2/5] virtio: support unlocked queue kick
Split virtqueue_kick to be able to do the actual notification outside the lock protecting the virtqueue. This patch was originally done by Stefan Hajnoczi, but I can't find the original one anymore and had to recreated it from memory. Pointers to the original or corrections for the commit message are welcome. Index: linux-2.6/drivers/virtio/virtio_ring.c === --- linux-2.6.orig/drivers/virtio/virtio_ring.c 2011-09-15 15:28:55.891347016 +0200 +++ linux-2.6/drivers/virtio/virtio_ring.c 2011-10-03 18:45:32.492738431 +0200 @@ -237,9 +237,11 @@ add_head: } EXPORT_SYMBOL_GPL(virtqueue_add_buf_gfp); -void virtqueue_kick(struct virtqueue *_vq) +bool virtqueue_kick_prepare(struct virtqueue *_vq) { struct vring_virtqueue *vq = to_vvq(_vq); + bool need_kick = false; + u16 new, old; START_USE(vq); /* Descriptors and available array need to be set before we expose the @@ -253,15 +255,32 @@ void virtqueue_kick(struct virtqueue *_v /* Need to update avail index before checking if we should notify */ virtio_mb(); - if (vq->event ? - vring_need_event(vring_avail_event(&vq->vring), new, old) : - !(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY)) - /* Prod other side to tell it about changes. */ - vq->notify(&vq->vq); - + if (vq->event) { + if (vring_need_event(vring_avail_event(&vq->vring), new, old)) + need_kick = true; + } else { + if (!(vq->vring.used->flags & VRING_USED_F_NO_NOTIFY)) + need_kick = true; + } END_USE(vq); + return need_kick; +} +EXPORT_SYMBOL_GPL(virtqueue_kick_prepare); + +void virtqueue_notify(struct virtqueue *_vq) +{ + struct vring_virtqueue *vq = to_vvq(_vq); + + /* Prod other side to tell it about changes. */ + vq->notify(_vq); +} +EXPORT_SYMBOL_GPL(virtqueue_notify); + +void virtqueue_kick(struct virtqueue *vq) +{ + if (virtqueue_kick_prepare(vq)) + virtqueue_notify(vq); } -EXPORT_SYMBOL_GPL(virtqueue_kick); static void detach_buf(struct vring_virtqueue *vq, unsigned int head) { Index: linux-2.6/include/linux/virtio.h === --- linux-2.6.orig/include/linux/virtio.h 2011-09-15 15:28:57.078857804 +0200 +++ linux-2.6/include/linux/virtio.h2011-10-03 18:41:07.309766531 +0200 @@ -86,6 +86,8 @@ static inline int virtqueue_add_buf(stru } void virtqueue_kick(struct virtqueue *vq); +bool virtqueue_kick_prepare(struct virtqueue *vq); +void virtqueue_notify(struct virtqueue *vq); void *virtqueue_get_buf(struct virtqueue *vq, unsigned int *len); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Perf for only one VM?
Is it possible to get perf kvm record to record only what's going on on a single VM? The options seem to be --host (all host data) and --guest (all VM data). I'm getting "74.86% qemu-kvm [unknown][u] 0x7f3795bc408d", and I think that's because I've got kallsyms loaded from a particular guest, but not the others. -Robin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 10/05/2011 01:55 PM, Alexander Graf wrote: > > On 05.10.2011, at 18:06, Scott Wood wrote: > >> Any reason for __ilog2() rather than ilog2()? Shouldn't make a >> difference, just curious about avoiding the public interface. > > I grep'ed through the kernel tree and only found __ilog2 defined as well as > mostly users for __ilog2, so I figured there's got to be a reason ;) ilog2() is defined in include/linux/ilog2.h. It produces constant output if the input is constant, and appears to be the "front door" to __ilog2_u32/__ilog2_u64. Plain __ilog2 is older and powerpc-specific, which is probably why there are more users of that in arch/powerpc. -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[QEMU PATCH] kvm: support TSC deadline MSR with subsection
Jinsong, please test this qemu-kvm patch by migrating a guest which is currently using TSC deadline timer. Using subsections avoids breaking migration to older qemu versions when the guest does not make use of TSC deadline feature. - From: Liu, Jinsong KVM add emulation of lapic tsc deadline timer for guest. This patch is co-operation work at qemu side. Use subsections to save/restore the field (mtosatti). Signed-off-by: Liu, Jinsong diff --git a/target-i386/cpu.h b/target-i386/cpu.h index ae36489..29412dc 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -283,6 +283,7 @@ #define MSR_IA32_APICBASE_BSP (1<<8) #define MSR_IA32_APICBASE_ENABLE(1<<11) #define MSR_IA32_APICBASE_BASE (0xf<<12) +#define MSR_IA32_TSCDEADLINE0x6e0 #define MSR_MTRRcap0xfe #define MSR_MTRRcap_VCNT 8 @@ -687,6 +688,7 @@ typedef struct CPUX86State { uint64_t async_pf_en_msr; uint64_t tsc; +uint64_t tsc_deadline; uint64_t mcg_status; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index b6eef04..90a6ffb 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -59,6 +59,7 @@ const KVMCapabilityInfo kvm_arch_required_capabilities[] = { static bool has_msr_star; static bool has_msr_hsave_pa; +static bool has_msr_tsc_deadline; static bool has_msr_async_pf_en; static int lm_capable_kernel; @@ -568,6 +569,10 @@ static int kvm_get_supported_msrs(KVMState *s) has_msr_hsave_pa = true; continue; } +if (kvm_msr_list->indices[i] == MSR_IA32_TSCDEADLINE) { +has_msr_tsc_deadline = true; +continue; +} } } @@ -881,6 +886,9 @@ static int kvm_put_msrs(CPUState *env, int level) if (has_msr_hsave_pa) { kvm_msr_entry_set(&msrs[n++], MSR_VM_HSAVE_PA, env->vm_hsave); } +if (has_msr_tsc_deadline) { +kvm_msr_entry_set(&msrs[n++], MSR_IA32_TSCDEADLINE, env->tsc_deadline); +} #ifdef TARGET_X86_64 if (lm_capable_kernel) { kvm_msr_entry_set(&msrs[n++], MSR_CSTAR, env->cstar); @@ -1127,6 +1135,9 @@ static int kvm_get_msrs(CPUState *env) if (has_msr_hsave_pa) { msrs[n++].index = MSR_VM_HSAVE_PA; } +if (has_msr_tsc_deadline) { +msrs[n++].index = MSR_IA32_TSCDEADLINE; +} if (!env->tsc_valid) { msrs[n++].index = MSR_IA32_TSC; @@ -1195,6 +1206,9 @@ static int kvm_get_msrs(CPUState *env) case MSR_IA32_TSC: env->tsc = msrs[i].data; break; +case MSR_IA32_TSCDEADLINE: +env->tsc_deadline = msrs[i].data; +break; case MSR_VM_HSAVE_PA: env->vm_hsave = msrs[i].data; break; diff --git a/target-i386/machine.c b/target-i386/machine.c index 9aca8e0..176d372 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -310,6 +310,24 @@ static const VMStateDescription vmstate_fpop_ip_dp = { } }; +static bool tscdeadline_needed(void *opaque) +{ +CPUState *env = opaque; + +return env->tsc_deadline != 0; +} + +static const VMStateDescription vmstate_msr_tscdeadline = { +.name = "cpu/msr_tscdeadline", +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT64(tsc_deadline, CPUState), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_cpu = { .name = "cpu", .version_id = CPU_SAVE_VERSION, @@ -420,6 +438,9 @@ static const VMStateDescription vmstate_cpu = { } , { .vmsd = &vmstate_fpop_ip_dp, .needed = fpop_ip_dp_needed, +}, { +.vmsd = &vmstate_msr_tscdeadline, +.needed = tscdeadline_needed, } , { /* empty */ } -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 05.10.2011, at 18:06, Scott Wood wrote: > On 10/05/2011 09:37 AM, Alexander Graf wrote: >> diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c >> index ec17148..1dd96a9 100644 >> --- a/arch/powerpc/kvm/e500_tlb.c >> +++ b/arch/powerpc/kvm/e500_tlb.c >> @@ -24,6 +24,7 @@ >> #include >> #include >> #include >> +#include >> #include >> #include >> >> @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct >> kvmppc_vcpu_e500 *vcpu_e500, >> pfn &= ~(tsize_pages - 1); >> break; >> } >> +} else if (vma && hva >= vma->vm_start && >> + (vma->vm_flags & VM_HUGETLB)) { >> +unsigned long psize = vma_kernel_pagesize(vma); >> + >> +tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >> >> +MAS1_TSIZE_SHIFT; >> + >> +/* >> + * Take the largest page size that satisfies both host >> + * and guest mapping >> + */ >> +tsize = min(__ilog2(psize) - 10, tsize); > > Any reason for __ilog2() rather than ilog2()? Shouldn't make a > difference, just curious about avoiding the public interface. I grep'ed through the kernel tree and only found __ilog2 defined as well as mostly users for __ilog2, so I figured there's got to be a reason ;) Alex -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [kvm] accessing a 'native' disk from KVM ?
On Wed, Oct 05, 2011 at 11:41:13AM -0700, Sujee Maniyam wrote: > HI all > > Using KVM (qemu-kvm-0.12.1) on CentOS-6. > > how can I supply an entire disk for KVM to use? Is this correct > snippet in kvm.xml ? > > > > > > This is what I have: So, yeah, pretty much, although note tat you're making /dev/sdi*1* available, which is a partition, not a disk. Regardless, http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=%2Fliaat%2Fliaattuncache.htm (and other things I've seen) suggest cache=none for these cases. -Robin -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
accessing a 'native' disk from KVM ?
HI all Using KVM (qemu-kvm-0.12.1) on CentOS-6. how can I supply an entire disk for KVM to use? Is this correct snippet in kvm.xml ? thanks very much Sujee Maniyam http://sujee.net -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: tcpdump locks up kvm host for a while.
On 10/04/2011 09:40 PM, Robin Lee Powell wrote: > >When I run tcpdump on a *guest*, the entire guest completely > >freezes up; no response even to hitting enter on the console. > >"virsh list" also locks up whenever it tries to print state about > >that VM (but the others work fine), as does any other operation > >that touches the state of that VM. The VM takes up 100% of CPU > >on one core while this is happening. Eventually it gets better. > > You can use 'perf kvm' to figure out where the guest is spinning. OK, gathered with: sudo perf kvm --guest --host record -o /tmp/kvm_perf -a I don't know how to read it at all, so it's at http://users.digitalkingdom.org/~rlpowell/media/public/kvm_perf Not accessible. Please post the output of 'perf kvm report > log'. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH] KVM: emulate lapic tsc deadline timer for guest
Marcelo Tosatti wrote: > On Sun, Sep 25, 2011 at 10:47:46PM +0800, Liu, Jinsong wrote: >> Marcelo Tosatti wrote: >>> On Fri, Sep 23, 2011 at 04:25:51PM +0800, Liu, Jinsong wrote: Marcelo Tosatti wrote: > On Thu, Sep 22, 2011 at 04:55:52PM +0800, Liu, Jinsong wrote: >>> From 4d5b83aba40ce0d421add9a41a6c591a8590a32e Mon Sep 17 >>> 00:00:00 2001 >> From: Liu, Jinsong >> Date: Thu, 22 Sep 2011 14:00:08 +0800 >> Subject: [PATCH 2/2] KVM: emulate lapic tsc deadline timer for >> guest >> >> This patch emulate lapic tsc deadline timer for guest: >> Enumerate tsc deadline timer capability by CPUID; >> Enable tsc deadline timer mode by lapic MMIO; >> Start tsc deadline timer by WRMSR; >> >> Signed-off-by: Liu, Jinsong --- >> arch/x86/include/asm/kvm_host.h |2 + >> arch/x86/kvm/kvm_timer.h|2 + >> arch/x86/kvm/lapic.c| 123 >> --- arch/x86/kvm/lapic.h >> |3 + arch/x86/kvm/x86.c | 16 +- >> 5 files changed, 123 insertions(+), 23 deletions(-) > > Looks good, please rebase against branch master of > > git://github.com/avikivity/kvm.git Rebased as attached. Thanks, Jinsong >>> >>> Please write a simple test case to arm a lapic timer via wrmsr (see >>> https://github.com/avikivity/kvm-unit-tests). >>> >>> Kernel patches have been applied, thanks. >> >> Marcelo, >> >> I'm not quite clear the purpose and usage of test case of the >> kvm-unit-tests. Can you give me some hint? > > The purpose is to add unit tests for new features (such as lapic > deadline timer). There are examples that make it relatively easy to > construct new test case (or modify existing ones to accomodate new > tests). > > Please add a new test case for lapic deadline timer, thanks. Thanks Marcelo. I will add the test case. Sorry for slow email reply because of holiday. Regards, Jinsong -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: KVM: emulate lapic tsc deadline timer for guest
Yes, Avi has noticed this issue and fix the bug as attached. Thanks, Jinsong Dan Carpenter wrote: > This patch causes a NULL dereference for me when I start qemu. > > [ 136.130978] BUG: unable to handle kernel NULL pointer dereference > at 0078 [ 136.131032] IP: [] > update_cpuid+0x63/0x90 [kvm] [ 136.131076] PGD 3fcac4067 PUD > 3fc91a067 PMD 0 [ 136.131108] Oops: 0002 [#1] SMP > [ 136.131132] CPU 0 > [ 136.131145] Modules linked in: e1000e fuse kvm_intel kvm radeon > ttm [last unloaded: e1000e] [ 136.131208] > [ 136.131219] Pid: 2678, comm: qemu-system-x86 Not tainted > 3.1.0-rc8-next-20110930+ #92 System manufacturer System Product > Name/P8Z68-V PRO [ 136.131289] RIP: 0010:[] > [] update_cpuid+0x63/0x90 [kvm] [ 136.131341] RSP: > 0018:880404761d20 EFLAGS: 00010282 [ 136.131370] RAX: > RBX: 8803fc408000 RCX: 8803fc408b18 [ > 136.131408] RDX: 80802001 RSI: 0001 RDI: > 8803fc408000 [ 136.131445] RBP: 880404761d28 R08: > 0015 R09: 0003 [ 136.131483] R10: > 0003 R11: R12: 8803fc408000 [ > 136.131520] R13: 7fffb5e8d870 R14: 0015 R15: > [ 136.131559] FS: 7f96c6324760() > GS:88042f40() knlGS: [ 136.131602] CS: > 0010 DS: ES: CR0: 80050033 [ 136.131632] CR2: > 0078 CR3: 0003fcac5000 CR4: 000426f0 [ > 136.131670] DR0: DR1: DR2: > [ 136.131707] DR3: DR6: > 0ff0 DR7: 0400 [ 136.131745] Process > qemu-system-x86 (pid: 2678, threadinfo 88040476, task > 880404618000) [ 136.131792] Stack: [ 136.131804] > 880404761df8 a016635f [ > 136.131851] 0015 0200 > [ 136.131897] 88042f7ee000 880404761da8 > a01beba9 c90006722000 [ 136.131944] Call Trace: [ > 136.131966] [] kvm_arch_vcpu_ioctl+0xd5f/0x15d0 > [kvm] [ 136.132005] [] ? vmx_vcpu_load+0x39/0x1b0 > [kvm_intel] [ 136.132046] [] ? > kvm_arch_vcpu_load+0x6b/0x170 [kvm] > > Here is the code listing from gdb: > > (gdb) list *(update_cpuid+0x63) > 0xc3d3 is in update_cpuid (arch/x86/kvm/x86.c:618). > 613 } > 614 > 615 if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && > 616 best->function == 0x1) { > 617 best->ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER); > 618 vcpu->arch.apic->lapic_timer.timer_mode_mask = (3 << > 17); > 619 } else > 620 vcpu->arch.apic->lapic_timer.timer_mode_mask = (1 << > 17); > 621 } > 622 > (gdb) > > Reverting the patch fixes things for me. I'm using linux-next from > Friday. > > regards, > dan carpenter --- Begin Message --- vcpu->arch.apic may be NULL. Signed-off-by: Avi Kivity --- arch/x86/kvm/x86.c |9 +++-- 1 files changed, 7 insertions(+), 2 deletions(-) diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 83b839f..aa11707 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -600,6 +600,8 @@ static bool guest_cpuid_has_fsgsbase(struct kvm_vcpu *vcpu) static void update_cpuid(struct kvm_vcpu *vcpu) { struct kvm_cpuid_entry2 *best; + struct kvm_lapic *apic = vcpu->arch.apic; + u32 timer_mode_mask; best = kvm_find_cpuid_entry(vcpu, 1, 0); if (!best) @@ -615,9 +617,12 @@ static void update_cpuid(struct kvm_vcpu *vcpu) if (boot_cpu_data.x86_vendor == X86_VENDOR_INTEL && best->function == 0x1) { best->ecx |= bit(X86_FEATURE_TSC_DEADLINE_TIMER); - vcpu->arch.apic->lapic_timer.timer_mode_mask = (3 << 17); + timer_mode_mask = 3 << 17; } else - vcpu->arch.apic->lapic_timer.timer_mode_mask = (1 << 17); + timer_mode_mask = 1 << 17; + + if (apic) + apic->lapic_timer.timer_mode_mask = timer_mode_mask; } int kvm_set_cr4(struct kvm_vcpu *vcpu, unsigned long cr4) -- 1.7.6.3 --- End Message ---
Re: [PATCH] KVM: PPC: E500: Support hugetlbfs
On 10/05/2011 09:37 AM, Alexander Graf wrote: > diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c > index ec17148..1dd96a9 100644 > --- a/arch/powerpc/kvm/e500_tlb.c > +++ b/arch/powerpc/kvm/e500_tlb.c > @@ -24,6 +24,7 @@ > #include > #include > #include > +#include > #include > #include > > @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct > kvmppc_vcpu_e500 *vcpu_e500, > pfn &= ~(tsize_pages - 1); > break; > } > + } else if (vma && hva >= vma->vm_start && > + (vma->vm_flags & VM_HUGETLB)) { > + unsigned long psize = vma_kernel_pagesize(vma); > + > + tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >> > + MAS1_TSIZE_SHIFT; > + > + /* > + * Take the largest page size that satisfies both host > + * and guest mapping > + */ > + tsize = min(__ilog2(psize) - 10, tsize); Any reason for __ilog2() rather than ilog2()? Shouldn't make a difference, just curious about avoiding the public interface. Either way, Acked-by: Scott Wood -Scott -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Kernel bug - 3.1.0 - rc9
Forwarded to linux-kernel. You'll need to get better debug output, perhaps using a serial console. Is the issue on the host or guest? If on the host, it's likely a virtualbox bug, report it there. On 10/05/2011 05:41 PM, Steve wrote: I'm not sure exactly where to report this (and the kernel.org "reporting bugs" page is still offline) RC9 of the 3.1 kernel is causing VirtualBox to fail (certainly here on amd64 - Phenom II 965 quad). syslog output is: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Oops: [#1] PREEMPT SMP Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Stack: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Call Trace: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Code: ac e0 66 8b b3 b8 02 00 00 8b bb b0 02 00 00 48 c7 83 68 02 00 00 00 00 00 00 e8 68 ff ff ff 48 c7 c7 00 aa 62 a0 e8 8e d4 cd e0 Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:CR2: 0088 vboxsrv drivers install correctly and there are no obvious errors from vbox itself. The bug manifests itself by simply hanging the VM at startup. Versions pre rc9 work fine in this regard. I hope that this is of some help. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Kernel bug - 3.1.0 - rc9
I'm not sure exactly where to report this (and the kernel.org "reporting bugs" page is still offline) RC9 of the 3.1 kernel is causing VirtualBox to fail (certainly here on amd64 - Phenom II 965 quad). syslog output is: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Oops: [#1] PREEMPT SMP Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Stack: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Call Trace: Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:Code: ac e0 66 8b b3 b8 02 00 00 8b bb b0 02 00 00 48 c7 83 68 02 00 00 00 00 00 00 e8 68 ff ff ff 48 c7 c7 00 aa 62 a0 e8 8e d4 cd e0 Message from syslogd@Tex at Oct 5 16:16:51 ... kernel:CR2: 0088 vboxsrv drivers install correctly and there are no obvious errors from vbox itself. The bug manifests itself by simply hanging the VM at startup. Versions pre rc9 work fine in this regard. I hope that this is of some help. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
On Wed, 2011-10-05 at 17:29 +0200, Gleb Natapov wrote: > On Wed, Oct 05, 2011 at 04:19:39PM +0200, Peter Zijlstra wrote: > > On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote: > > > +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) > > > +{ > > > + int i, nr_msrs; > > > + struct perf_guest_switch_msr *msrs; > > > + > > > + msrs = perf_guest_get_msrs(&nr_msrs); > > > + > > > + if (!msrs) > > > + return; > > > + > > > + for (i = 0; i < nr_msrs; i++) > > > + if (msrs[i].host == msrs[i].guest) > > > + clear_atomic_switch_msr(vmx, msrs[i].msr); > > > + else > > > + add_atomic_switch_msr(vmx, msrs[i].msr, > > > msrs[i].guest, > > > + msrs[i].host); > > > +} > > > > I don't think this will actually compile with PERF_EVENTS=n due to > > struct perf_guest_switch_msr not being defined. > Oops you are right. Turns out it is not enough to remove PERF_EVENTS > from .config to disable it. It re-appears after "make oldconfig". Should > I send incremental patch to fix that? Frederic, what's the status of being able to disable PERF on x86 again? -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
On Wed, Oct 05, 2011 at 04:19:39PM +0200, Peter Zijlstra wrote: > On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote: > > +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) > > +{ > > + int i, nr_msrs; > > + struct perf_guest_switch_msr *msrs; > > + > > + msrs = perf_guest_get_msrs(&nr_msrs); > > + > > + if (!msrs) > > + return; > > + > > + for (i = 0; i < nr_msrs; i++) > > + if (msrs[i].host == msrs[i].guest) > > + clear_atomic_switch_msr(vmx, msrs[i].msr); > > + else > > + add_atomic_switch_msr(vmx, msrs[i].msr, > > msrs[i].guest, > > + msrs[i].host); > > +} > > I don't think this will actually compile with PERF_EVENTS=n due to > struct perf_guest_switch_msr not being defined. Oops you are right. Turns out it is not enough to remove PERF_EVENTS from .config to disable it. It re-appears after "make oldconfig". Should I send incremental patch to fix that? -- Gleb. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] virtio-net: Prevent NULL dereference
On Wed, Oct 05, 2011 at 03:50:14PM +0200, Sasha Levin wrote: > On Mon, 2011-10-03 at 20:40 +0200, Michael S. Tsirkin wrote: > > On Wed, Sep 28, 2011 at 05:40:55PM +0300, Sasha Levin wrote: > > > This patch prevents a NULL dereference when the user has passed a length > > > longer than an actual buffer to virtio-net. > > > > > > Cc: Rusty Russell > > > Cc: "Michael S. Tsirkin" > > > Cc: virtualizat...@lists.linux-foundation.org > > > Cc: net...@vger.kernel.org > > > Cc: kvm@vger.kernel.org > > > Signed-off-by: Sasha Levin > > > --- > > > drivers/net/virtio_net.c | 12 +++- > > > 1 files changed, 11 insertions(+), 1 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index bde0dec..4a53d2a 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -208,12 +208,22 @@ static struct sk_buff *page_to_skb(struct > > > virtnet_info *vi, > > > return NULL; > > > } > > > > > > - while (len) { > > > + while (len && page) { > > > set_skb_frag(skb, page, offset, &len); > > > page = (struct page *)page->private; > > > offset = 0; > > > } > > > > > > + /* > > > + * This is the case where we ran out of pages in our linked list, but > > > + * supposedly have more data to read. > > > > Again, let's clarify that this only happens with broken devices. > > I think that the code within the if() makes it clear that it isn't the > regular path. It doesn't make it clear that this never happens in absence of bugs. > -- > > Sasha. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] virtio-net: Verify page list size before fitting into skb
On Wed, Oct 05, 2011 at 03:50:54PM +0200, Sasha Levin wrote: > On Mon, 2011-10-03 at 21:04 +0200, Michael S. Tsirkin wrote: > > On Wed, Sep 28, 2011 at 05:40:54PM +0300, Sasha Levin wrote: > > > This patch verifies that the length of a buffer stored in a linked list > > > of pages is small enough to fit into a skb. > > > > > > If the size is larger than a max size of a skb, it means that we shouldn't > > > go ahead building skbs anyway since we won't be able to send the buffer as > > > the user requested. > > > > > > Cc: Rusty Russell > > > Cc: "Michael S. Tsirkin" > > > Cc: virtualizat...@lists.linux-foundation.org > > > Cc: net...@vger.kernel.org > > > Cc: kvm@vger.kernel.org > > > Signed-off-by: Sasha Levin > > > --- > > > drivers/net/virtio_net.c | 13 + > > > 1 files changed, 13 insertions(+), 0 deletions(-) > > > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > > index 0c7321c..bde0dec 100644 > > > --- a/drivers/net/virtio_net.c > > > +++ b/drivers/net/virtio_net.c > > > @@ -195,6 +195,19 @@ static struct sk_buff *page_to_skb(struct > > > virtnet_info *vi, > > > len -= copy; > > > offset += copy; > > > > > > + /* > > > + * Verify that we can indeed put this data into a skb. > > > + * This is here to handle cases when the device erroneously > > > + * tries to receive more than is possible. This is usually > > > + * the case of a broken device. > > > + */ > > > + if (unlikely(len > MAX_SKB_FRAGS * PAGE_SIZE)) { > > > + if (net_ratelimit()) > > > + pr_debug("%s: too much data\n", skb->dev->name); > > > + dev_kfree_skb(skb); > > > + return NULL; > > > + } > > > + > > > > BTW, receive_mergeable does > > pr_debug("%s: packet too long\n", skb->dev->name); > > skb->dev->stats.rx_length_errors++; > > > > which makes sense. > > Do you think we should increase rx_length_errors here as well? this is all debugging tool for devices/drivers, right? so maybe not worth the noise. > -- > > Sasha. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] qemu-kvm: Remove extboot support
On Wed, Oct 05, 2011 at 09:52:12AM +0200, Jan Kiszka wrote: > From: Jan Kiszka > > Except for booting from SCSI via the LSI controller, SeaBIOS has native > support for mass storage interfaces now. And SCSI can be worked around > via [1] - or someone finally adds the necessary bits to SeaBIOS if there > is a real need. > > [1] http://thread.gmane.org/gmane.comp.emulators.qemu/78467 > > Signed-off-by: Jan Kiszka Applied, thanks. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] KVM: PPC: E500: Support hugetlbfs
With hugetlbfs support emerging on e500, we should also support KVM backing its guest memory by it. This patch adds support for hugetlbfs into the e500 shadow mmu code. Signed-off-by: Alexander Graf --- v1 -> v2: - address scott's comments --- arch/powerpc/kvm/e500_tlb.c | 24 1 files changed, 24 insertions(+), 0 deletions(-) diff --git a/arch/powerpc/kvm/e500_tlb.c b/arch/powerpc/kvm/e500_tlb.c index ec17148..1dd96a9 100644 --- a/arch/powerpc/kvm/e500_tlb.c +++ b/arch/powerpc/kvm/e500_tlb.c @@ -24,6 +24,7 @@ #include #include #include +#include #include #include @@ -673,12 +674,31 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, pfn &= ~(tsize_pages - 1); break; } + } else if (vma && hva >= vma->vm_start && + (vma->vm_flags & VM_HUGETLB)) { + unsigned long psize = vma_kernel_pagesize(vma); + + tsize = (gtlbe->mas1 & MAS1_TSIZE_MASK) >> + MAS1_TSIZE_SHIFT; + + /* +* Take the largest page size that satisfies both host +* and guest mapping +*/ + tsize = min(__ilog2(psize) - 10, tsize); + + /* +* e500 doesn't implement the lowest tsize bit, +* or 1K pages. +*/ + tsize = max(BOOK3E_PAGESZ_4K, tsize & ~1); } up_read(¤t->mm->mmap_sem); } if (likely(!pfnmap)) { + unsigned long tsize_pages = 1 << (tsize + 10 - PAGE_SHIFT); pfn = gfn_to_pfn_memslot(vcpu_e500->vcpu.kvm, slot, gfn); if (is_error_pfn(pfn)) { printk(KERN_ERR "Couldn't get real page for gfn %lx!\n", @@ -686,6 +706,10 @@ static inline void kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500, kvm_release_pfn_clean(pfn); return; } + + /* Align guest and physical address to page map boundaries */ + pfn &= ~(tsize_pages - 1); + gvaddr &= ~((tsize_pages << PAGE_SHIFT) - 1); } /* Drop old ref and setup new one. */ -- 1.6.0.2 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 0/9] perf support for x86 guest/host-only bits
On Wed, 2011-10-05 at 15:48 +0200, Avi Kivity wrote: > On 10/05/2011 02:01 PM, Gleb Natapov wrote: > > This patch series consists of Joerg series named "perf support for amd > > guest/host-only bits v2" [1] rebased to 3.1.0-rc7 and in addition, > > support for intel cpus for the same functionality. > > > > Looks good to me. Peter, Ingo, if it works for you as well, please > merge 1-6 into a branch on tip that I can pull and apply the rest on top of. I queued bits, and they should hopefully find their way to tip:perf/core soonish. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
On Wed, 2011-10-05 at 14:01 +0200, Gleb Natapov wrote: > +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) > +{ > + int i, nr_msrs; > + struct perf_guest_switch_msr *msrs; > + > + msrs = perf_guest_get_msrs(&nr_msrs); > + > + if (!msrs) > + return; > + > + for (i = 0; i < nr_msrs; i++) > + if (msrs[i].host == msrs[i].guest) > + clear_atomic_switch_msr(vmx, msrs[i].msr); > + else > + add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, > + msrs[i].host); > +} I don't think this will actually compile with PERF_EVENTS=n due to struct perf_guest_switch_msr not being defined. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/9] perf, intel: Use GO/HO bits in perf-ctr
Had some serious conflicts with tip/master, queued the below merge. --- Subject: perf, intel: Use GO/HO bits in perf-ctr From: Gleb Natapov Date: Wed, 5 Oct 2011 14:01:21 +0200 Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov Signed-off-by: Peter Zijlstra Link: http://lkml.kernel.org/r/1317816084-18026-7-git-send-email-g...@redhat.com --- arch/x86/include/asm/perf_event.h | 12 arch/x86/kernel/cpu/perf_event.h | 12 arch/x86/kernel/cpu/perf_event_intel.c | 91 +++-- 3 files changed, 112 insertions(+), 3 deletions(-) Index: linux-2.6/arch/x86/include/asm/perf_event.h === --- linux-2.6.orig/arch/x86/include/asm/perf_event.h +++ linux-2.6/arch/x86/include/asm/perf_event.h @@ -162,7 +162,19 @@ extern unsigned long perf_misc_flags(str ); \ } +struct perf_guest_switch_msr { + unsigned msr; + u64 host, guest; +}; + +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); #else +static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} + static inline void perf_events_lapic_init(void){ } #endif Index: linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c === --- linux-2.6.orig/arch/x86/kernel/cpu/perf_event_intel.c +++ linux-2.6/arch/x86/kernel/cpu/perf_event_intel.c @@ -749,7 +749,8 @@ static void intel_pmu_enable_all(int add intel_pmu_pebs_enable_all(); intel_pmu_lbr_enable_all(); - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl); + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, + x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask); if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask)) { struct perf_event *event = @@ -872,6 +873,7 @@ static void intel_pmu_disable_fixed(stru static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) { intel_pmu_disable_bts(); @@ -879,6 +881,9 @@ static void intel_pmu_disable_event(stru return; } + cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx); + cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx); + if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_disable_fixed(hwc); return; @@ -924,6 +929,7 @@ static void intel_pmu_enable_fixed(struc static void intel_pmu_enable_event(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) { if (!__this_cpu_read(cpu_hw_events.enabled)) @@ -933,6 +939,11 @@ static void intel_pmu_enable_event(struc return; } + if (event->attr.exclude_host) + cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx); + if (event->attr.exclude_guest) + cpuc->intel_ctrl_host_mask |= (1ull << hwc->idx); + if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_enable_fixed(hwc); return; @@ -1302,12 +1313,84 @@ static int intel_pmu_hw_config(struct pe return 0; } +struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + if (x86_pmu.guest_get_msrs) + return x86_pmu.guest_get_msrs(nr); + *nr = 0; + return NULL; +} +EXPORT_SYMBOL_GPL(perf_guest_get_msrs); + +static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr) +{ + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); + struct perf_guest_switch_msr *arr = cpuc->guest_switch_msrs; + + arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL; + arr[0].host = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask; + arr[0].guest = x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_host_mask; + + *nr = 1; + return arr; +} + +static struct perf_guest_switch_ms
Re: [PATCH v2 0/9] perf support for x86 guest/host-only bits
On 10/05/2011 02:01 PM, Gleb Natapov wrote: This patch series consists of Joerg series named "perf support for amd guest/host-only bits v2" [1] rebased to 3.1.0-rc7 and in addition, support for intel cpus for the same functionality. Looks good to me. Peter, Ingo, if it works for you as well, please merge 1-6 into a branch on tip that I can pull and apply the rest on top of. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v4 00/11] KVM: x86: optimize for writing guest page
On 09/23/2011 02:51 PM, Marcelo Tosatti wrote: On Thu, Sep 22, 2011 at 04:52:40PM +0800, Xiao Guangrong wrote: > This patchset is against https://github.com/avikivity/kvm.git next branch. > > In this version, some changes come from Avi's comments: > - fix instruction retried for nested guest > - skip write-flooding for the sp whose level is 1 > - rename some functions Looks good to me. To me as well. -- error compiling committee.c: too many arguments to function -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/2] virtio-net: Verify page list size before fitting into skb
On Mon, 2011-10-03 at 21:04 +0200, Michael S. Tsirkin wrote: > On Wed, Sep 28, 2011 at 05:40:54PM +0300, Sasha Levin wrote: > > This patch verifies that the length of a buffer stored in a linked list > > of pages is small enough to fit into a skb. > > > > If the size is larger than a max size of a skb, it means that we shouldn't > > go ahead building skbs anyway since we won't be able to send the buffer as > > the user requested. > > > > Cc: Rusty Russell > > Cc: "Michael S. Tsirkin" > > Cc: virtualizat...@lists.linux-foundation.org > > Cc: net...@vger.kernel.org > > Cc: kvm@vger.kernel.org > > Signed-off-by: Sasha Levin > > --- > > drivers/net/virtio_net.c | 13 + > > 1 files changed, 13 insertions(+), 0 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 0c7321c..bde0dec 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -195,6 +195,19 @@ static struct sk_buff *page_to_skb(struct virtnet_info > > *vi, > > len -= copy; > > offset += copy; > > > > + /* > > +* Verify that we can indeed put this data into a skb. > > +* This is here to handle cases when the device erroneously > > +* tries to receive more than is possible. This is usually > > +* the case of a broken device. > > +*/ > > + if (unlikely(len > MAX_SKB_FRAGS * PAGE_SIZE)) { > > + if (net_ratelimit()) > > + pr_debug("%s: too much data\n", skb->dev->name); > > + dev_kfree_skb(skb); > > + return NULL; > > + } > > + > > BTW, receive_mergeable does > pr_debug("%s: packet too long\n", skb->dev->name); > skb->dev->stats.rx_length_errors++; > > which makes sense. Do you think we should increase rx_length_errors here as well? -- Sasha. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/2] virtio-net: Prevent NULL dereference
On Mon, 2011-10-03 at 20:40 +0200, Michael S. Tsirkin wrote: > On Wed, Sep 28, 2011 at 05:40:55PM +0300, Sasha Levin wrote: > > This patch prevents a NULL dereference when the user has passed a length > > longer than an actual buffer to virtio-net. > > > > Cc: Rusty Russell > > Cc: "Michael S. Tsirkin" > > Cc: virtualizat...@lists.linux-foundation.org > > Cc: net...@vger.kernel.org > > Cc: kvm@vger.kernel.org > > Signed-off-by: Sasha Levin > > --- > > drivers/net/virtio_net.c | 12 +++- > > 1 files changed, 11 insertions(+), 1 deletions(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index bde0dec..4a53d2a 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -208,12 +208,22 @@ static struct sk_buff *page_to_skb(struct > > virtnet_info *vi, > > return NULL; > > } > > > > - while (len) { > > + while (len && page) { > > set_skb_frag(skb, page, offset, &len); > > page = (struct page *)page->private; > > offset = 0; > > } > > > > + /* > > +* This is the case where we ran out of pages in our linked list, but > > +* supposedly have more data to read. > > Again, let's clarify that this only happens with broken devices. I think that the code within the if() makes it clear that it isn't the regular path. -- Sasha. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 8/9] KVM, VMX: Add support for guest/host-only profiling
Support guest/host-only profiling by switch perf msrs on a guest entry if needed. Signed-off-by: Gleb Natapov --- arch/x86/kvm/vmx.c | 21 + 1 files changed, 21 insertions(+), 0 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index a938ddf..d0d4afa 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -39,6 +39,7 @@ #include #include #include +#include #include "trace.h" @@ -6054,6 +6055,24 @@ static void vmx_cancel_injection(struct kvm_vcpu *vcpu) vmcs_write32(VM_ENTRY_INTR_INFO_FIELD, 0); } +static void atomic_switch_perf_msrs(struct vcpu_vmx *vmx) +{ + int i, nr_msrs; + struct perf_guest_switch_msr *msrs; + + msrs = perf_guest_get_msrs(&nr_msrs); + + if (!msrs) + return; + + for (i = 0; i < nr_msrs; i++) + if (msrs[i].host == msrs[i].guest) + clear_atomic_switch_msr(vmx, msrs[i].msr); + else + add_atomic_switch_msr(vmx, msrs[i].msr, msrs[i].guest, + msrs[i].host); +} + #ifdef CONFIG_X86_64 #define R "r" #define Q "q" @@ -6103,6 +6122,8 @@ static void __noclone vmx_vcpu_run(struct kvm_vcpu *vcpu) if (vcpu->guest_debug & KVM_GUESTDBG_SINGLESTEP) vmx_set_interrupt_shadow(vcpu, 0); + atomic_switch_perf_msrs(vmx); + vmx->__launched = vmx->loaded_vmcs->launched; asm( /* Store host registers */ -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 2/9] perf, amd: Use GO/HO bits in perf-ctr
From: Joerg Roedel The AMD perf-counters support counting in guest or host-mode only. Make use of that feature when user-space specified guest/host-mode only counting. Signed-off-by: Joerg Roedel Signed-off-by: Gleb Natapov --- arch/x86/include/asm/perf_event.h|3 +++ arch/x86/kernel/cpu/perf_event_amd.c | 13 + 2 files changed, 16 insertions(+), 0 deletions(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index 094fb30..ce2bfb3 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -29,6 +29,9 @@ #define ARCH_PERFMON_EVENTSEL_INV (1ULL << 23) #define ARCH_PERFMON_EVENTSEL_CMASK0xFF00ULL +#define AMD_PERFMON_EVENTSEL_GUESTONLY (1ULL << 40) +#define AMD_PERFMON_EVENTSEL_HOSTONLY (1ULL << 41) + #define AMD64_EVENTSEL_EVENT \ (ARCH_PERFMON_EVENTSEL_EVENT | (0x0FULL << 32)) #define INTEL_ARCH_EVENT_MASK \ diff --git a/arch/x86/kernel/cpu/perf_event_amd.c b/arch/x86/kernel/cpu/perf_event_amd.c index 941caa2..5c01a73 100644 --- a/arch/x86/kernel/cpu/perf_event_amd.c +++ b/arch/x86/kernel/cpu/perf_event_amd.c @@ -132,6 +132,19 @@ static int amd_pmu_hw_config(struct perf_event *event) if (ret) return ret; + if (event->attr.exclude_host && event->attr.exclude_guest) + /* +* When HO == GO == 1 the hardware treats that as GO == HO == 0 +* and will count in both modes. We don't want to count in that +* case so we emulate no-counting by setting US = OS = 0. +*/ + event->hw.config &= ~(ARCH_PERFMON_EVENTSEL_USR | + ARCH_PERFMON_EVENTSEL_OS); + else if (event->attr.exclude_host) + event->hw.config |= AMD_PERFMON_EVENTSEL_GUESTONLY; + else if (event->attr.exclude_guest) + event->hw.config |= AMD_PERFMON_EVENTSEL_HOSTONLY; + if (event->attr.type != PERF_TYPE_RAW) return 0; -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 9/9] KVM, VMX: Check for automatic switch msr table overflow.
Signed-off-by: Gleb Natapov --- arch/x86/kvm/vmx.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index d0d4afa..6e28d58 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -1280,7 +1280,11 @@ static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, if (m->guest[i].index == msr) break; - if (i == m->nr) { + if (i == NR_AUTOLOAD_MSRS) { + printk_once(KERN_WARNING"Not enough mst switch entries. " + "Can't add msr %x\n", msr); + return; + } else if (i == m->nr) { ++m->nr; vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, m->nr); vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr); -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 0/9] perf support for x86 guest/host-only bits
This patch series consists of Joerg series named "perf support for amd guest/host-only bits v2" [1] rebased to 3.1.0-rc7 and in addition, support for intel cpus for the same functionality. [1] https://lkml.org/lkml/2011/6/17/171 Changelog: v1->v2 - move perf_guest_switch_msr array to perf code. - small cosmetic changes. Gleb Natapov (4): perf, intel: Use GO/HO bits in perf-ctr KVM, VMX: add support for switching of PERF_GLOBAL_CTRL KVM, VMX: Add support for guest/host-only profiling KVM, VMX: Check for automatic switch msr table overflow. Joerg Roedel (5): perf, core: Introduce attrs to count in either host or guest mode perf, amd: Use GO/HO bits in perf-ctr perf, tools: Add support for guest/host-only profiling perf, tools: Fix copy&paste error in perf-kvm option description perf, tools: Do guest-only counting in perf-kvm by default arch/x86/include/asm/perf_event.h | 15 arch/x86/kernel/cpu/perf_event.c | 14 arch/x86/kernel/cpu/perf_event_amd.c | 13 +++ arch/x86/kernel/cpu/perf_event_intel.c | 90 +- arch/x86/kvm/vmx.c | 131 +--- include/linux/perf_event.h |5 +- tools/perf/builtin-kvm.c |5 +- tools/perf/util/event.c|8 ++ tools/perf/util/event.h|2 + tools/perf/util/evlist.c |5 +- tools/perf/util/parse-events.c | 15 +++- 11 files changed, 282 insertions(+), 21 deletions(-) -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 4/9] perf, tools: Fix copy&paste error in perf-kvm option description
From: Joerg Roedel The --host option certainly enables host-data collection. Signed-off-by: Joerg Roedel Signed-off-by: Gleb Natapov --- tools/perf/builtin-kvm.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 34d1e85..032324a 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -38,7 +38,7 @@ static const struct option kvm_options[] = { OPT_BOOLEAN(0, "guest", &perf_guest, "Collect guest os data"), OPT_BOOLEAN(0, "host", &perf_host, - "Collect guest os data"), + "Collect host os data"), OPT_STRING(0, "guestmount", &symbol_conf.guestmount, "directory", "guest mount directory under which every guest os" " instance has a subdir"), -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 7/9] KVM, VMX: add support for switching of PERF_GLOBAL_CTRL
Some cpus have special support for switching PERF_GLOBAL_CTRL msr. Add logic to detect if such support exists and works properly and extend msr switching code to use it if available. Also extend number of generic msr switching entries to 8. Signed-off-by: Gleb Natapov --- arch/x86/kvm/vmx.c | 104 ++- 1 files changed, 93 insertions(+), 11 deletions(-) diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index f3ec38f..a938ddf 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -118,7 +118,7 @@ module_param(ple_gap, int, S_IRUGO); static int ple_window = KVM_VMX_DEFAULT_PLE_WINDOW; module_param(ple_window, int, S_IRUGO); -#define NR_AUTOLOAD_MSRS 1 +#define NR_AUTOLOAD_MSRS 8 #define VMCS02_POOL_SIZE 1 struct vmcs { @@ -622,6 +622,7 @@ static unsigned long *vmx_msr_bitmap_legacy; static unsigned long *vmx_msr_bitmap_longmode; static bool cpu_has_load_ia32_efer; +static bool cpu_has_load_perf_global_ctrl; static DECLARE_BITMAP(vmx_vpid_bitmap, VMX_NR_VPIDS); static DEFINE_SPINLOCK(vmx_vpid_lock); @@ -1191,15 +1192,34 @@ static void update_exception_bitmap(struct kvm_vcpu *vcpu) vmcs_write32(EXCEPTION_BITMAP, eb); } +static void clear_atomic_switch_msr_special(unsigned long entry, + unsigned long exit) +{ + vmcs_clear_bits(VM_ENTRY_CONTROLS, entry); + vmcs_clear_bits(VM_EXIT_CONTROLS, exit); +} + static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) { unsigned i; struct msr_autoload *m = &vmx->msr_autoload; - if (msr == MSR_EFER && cpu_has_load_ia32_efer) { - vmcs_clear_bits(VM_ENTRY_CONTROLS, VM_ENTRY_LOAD_IA32_EFER); - vmcs_clear_bits(VM_EXIT_CONTROLS, VM_EXIT_LOAD_IA32_EFER); - return; + switch (msr) { + case MSR_EFER: + if (cpu_has_load_ia32_efer) { + clear_atomic_switch_msr_special(VM_ENTRY_LOAD_IA32_EFER, + VM_EXIT_LOAD_IA32_EFER); + return; + } + break; + case MSR_CORE_PERF_GLOBAL_CTRL: + if (cpu_has_load_perf_global_ctrl) { + clear_atomic_switch_msr_special( + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, + VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL); + return; + } + break; } for (i = 0; i < m->nr; ++i) @@ -1215,18 +1235,44 @@ static void clear_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr) vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, m->nr); } +static void add_atomic_switch_msr_special(unsigned long entry, + unsigned long exit, unsigned long guest_val_vmcs, + unsigned long host_val_vmcs, u64 guest_val, u64 host_val) +{ + vmcs_write64(guest_val_vmcs, guest_val); + vmcs_write64(host_val_vmcs, host_val); + vmcs_set_bits(VM_ENTRY_CONTROLS, entry); + vmcs_set_bits(VM_EXIT_CONTROLS, exit); +} + static void add_atomic_switch_msr(struct vcpu_vmx *vmx, unsigned msr, u64 guest_val, u64 host_val) { unsigned i; struct msr_autoload *m = &vmx->msr_autoload; - if (msr == MSR_EFER && cpu_has_load_ia32_efer) { - vmcs_write64(GUEST_IA32_EFER, guest_val); - vmcs_write64(HOST_IA32_EFER, host_val); - vmcs_set_bits(VM_ENTRY_CONTROLS, VM_ENTRY_LOAD_IA32_EFER); - vmcs_set_bits(VM_EXIT_CONTROLS, VM_EXIT_LOAD_IA32_EFER); - return; + switch (msr) { + case MSR_EFER: + if (cpu_has_load_ia32_efer) { + add_atomic_switch_msr_special(VM_ENTRY_LOAD_IA32_EFER, + VM_EXIT_LOAD_IA32_EFER, + GUEST_IA32_EFER, + HOST_IA32_EFER, + guest_val, host_val); + return; + } + break; + case MSR_CORE_PERF_GLOBAL_CTRL: + if (cpu_has_load_perf_global_ctrl) { + add_atomic_switch_msr_special( + VM_ENTRY_LOAD_IA32_PERF_GLOBAL_CTRL, + VM_EXIT_LOAD_IA32_PERF_GLOBAL_CTRL, + GUEST_IA32_PERF_GLOBAL_CTRL, + HOST_IA32_PERF_GLOBAL_CTRL, + guest_val, host_val); + return; + } + break; } for (i = 0; i < m->nr; ++i) @@ -2455,6 +2501,42 @@ static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf) && allow_1_setting(MSR_IA32_VMX_EXIT_CTLS, VM_EXIT_LOAD_IA32_EFER); + cpu_has_loa
[PATCH v2 6/9] perf, intel: Use GO/HO bits in perf-ctr
Intel does not have guest/host-only bit in perf counters like AMD does. To support GO/HO bits KVM needs to switch EVENTSELn values (or PERF_GLOBAL_CTRL if available) at a guest entry. If a counter is configured to count only in a guest mode it stays disabled in a host, but VMX is configured to switch it to enabled value during guest entry. This patch adds GO/HO tracking to Intel perf code and provides interface for KVM to get a list of MSRs that need to be switched on a guest entry. Only cpus with architectural PMU (v1 or later) are supported with this patch. To my knowledge there is not p6 models with VMX but without architectural PMU and p4 with VMX are rare and the interface is general enough to support them if need arise. Signed-off-by: Gleb Natapov --- arch/x86/include/asm/perf_event.h | 12 arch/x86/kernel/cpu/perf_event.c | 14 + arch/x86/kernel/cpu/perf_event_intel.c | 90 ++- 3 files changed, 113 insertions(+), 3 deletions(-) diff --git a/arch/x86/include/asm/perf_event.h b/arch/x86/include/asm/perf_event.h index ce2bfb3..e47cb61 100644 --- a/arch/x86/include/asm/perf_event.h +++ b/arch/x86/include/asm/perf_event.h @@ -162,7 +162,19 @@ extern unsigned long perf_misc_flags(struct pt_regs *regs); ); \ } +struct perf_guest_switch_msr { + unsigned msr; + u64 host, guest; +}; + +extern struct perf_guest_switch_msr *perf_guest_get_msrs(int *nr); #else +static inline perf_guest_switch_msr *perf_guest_get_msrs(int *nr) +{ + *nr = 0; + return NULL; +} + static inline void perf_events_lapic_init(void){ } #endif diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c index cfa62ec..531f5c6 100644 --- a/arch/x86/kernel/cpu/perf_event.c +++ b/arch/x86/kernel/cpu/perf_event.c @@ -119,6 +119,10 @@ struct cpu_hw_events { struct perf_branch_stacklbr_stack; struct perf_branch_entrylbr_entries[MAX_LBR_ENTRIES]; + /* Intel host/guest exclude bits */ + u64 intel_ctrl_guest_mask; + u64 intel_ctrl_host_mask; + /* * manage shared (per-core, per-cpu) registers * used on Intel NHM/WSM/SNB @@ -129,6 +133,11 @@ struct cpu_hw_events { * AMD specific bits */ struct amd_nb *amd_nb; + + /* +* Intel guest/host-only support +*/ + struct perf_guest_switch_msr guest_switch_msrs[X86_PMC_IDX_MAX]; }; #define __EVENT_CONSTRAINT(c, n, m, w) {\ @@ -292,6 +301,11 @@ struct x86_pmu { */ struct extra_reg *extra_regs; unsigned int er_flags; + + /* +* Guest event support +*/ + struct perf_guest_switch_msr* (*guest_get_msrs)(int *nr); }; #define ERF_NO_HT_SHARING 1 diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c index f88af2c..e35b916 100644 --- a/arch/x86/kernel/cpu/perf_event_intel.c +++ b/arch/x86/kernel/cpu/perf_event_intel.c @@ -746,7 +746,8 @@ static void intel_pmu_enable_all(int added) intel_pmu_pebs_enable_all(); intel_pmu_lbr_enable_all(); - wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, x86_pmu.intel_ctrl); + wrmsrl(MSR_CORE_PERF_GLOBAL_CTRL, + x86_pmu.intel_ctrl & ~cpuc->intel_ctrl_guest_mask); if (test_bit(X86_PMC_IDX_FIXED_BTS, cpuc->active_mask)) { struct perf_event *event = @@ -869,6 +870,7 @@ static void intel_pmu_disable_fixed(struct hw_perf_event *hwc) static void intel_pmu_disable_event(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) { intel_pmu_disable_bts(); @@ -876,6 +878,9 @@ static void intel_pmu_disable_event(struct perf_event *event) return; } + cpuc->intel_ctrl_guest_mask &= ~(1ull << hwc->idx); + cpuc->intel_ctrl_host_mask &= ~(1ull << hwc->idx); + if (unlikely(hwc->config_base == MSR_ARCH_PERFMON_FIXED_CTR_CTRL)) { intel_pmu_disable_fixed(hwc); return; @@ -921,6 +926,7 @@ static void intel_pmu_enable_fixed(struct hw_perf_event *hwc) static void intel_pmu_enable_event(struct perf_event *event) { struct hw_perf_event *hwc = &event->hw; + struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events); if (unlikely(hwc->idx == X86_PMC_IDX_FIXED_BTS)) { if (!__this_cpu_read(cpu_hw_events.enabled)) @@ -930,6 +936,11 @@ static void intel_pmu_enable_event(struct perf_event *event) return; } + if (event->attr.exclude_host) + cpuc->intel_ctrl_guest_mask |= (1ull << hwc->idx); + if (event->attr.exclude_guest) + cpuc->intel_ctrl_host_mas
[PATCH v2 3/9] perf, tools: Add support for guest/host-only profiling
From: Joerg Roedel To restrict a counter to either host or guest mode this patch introduces two new event modifiers: G and H. With G the counter is configured in guest-only mode and with H in host-only mode. Signed-off-by: Joerg Roedel Signed-off-by: Gleb Natapov --- tools/perf/util/parse-events.c | 14 -- 1 files changed, 12 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 928918b..3b00775 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -735,8 +735,8 @@ static int parse_event_modifier(const char **strp, struct perf_event_attr *attr) { const char *str = *strp; - int exclude = 0; - int eu = 0, ek = 0, eh = 0, precise = 0; + int exclude = 0, exclude_GH = 0; + int eu = 0, ek = 0, eh = 0, eH = 0, eG = 0, precise = 0; if (!*str) return 0; @@ -760,6 +760,14 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr) if (!exclude) exclude = eu = ek = eh = 1; eh = 0; + } else if (*str == 'G') { + if (!exclude_GH) + exclude_GH = eG = eH = 1; + eG = 0; + } else if (*str == 'H') { + if (!exclude_GH) + exclude_GH = eG = eH = 1; + eH = 0; } else if (*str == 'p') { precise++; } else @@ -776,6 +784,8 @@ parse_event_modifier(const char **strp, struct perf_event_attr *attr) attr->exclude_kernel = ek; attr->exclude_hv = eh; attr->precise_ip = precise; + attr->exclude_host = eH; + attr->exclude_guest = eG; return 0; } -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 1/9] perf, core: Introduce attrs to count in either host or guest mode
From: Joerg Roedel The two new attributes exclude_guest and exclude_host can bes used by user-space to tell the kernel to setup performance counter to either only count while the CPU is in guest or in host mode. An additional check is also introduced to make sure user-space does not try to exclude guest and host mode from counting. Signed-off-by: Joerg Roedel Signed-off-by: Gleb Natapov --- include/linux/perf_event.h |5 - 1 files changed, 4 insertions(+), 1 deletions(-) diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index c816075..1e9ebe5 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -220,7 +220,10 @@ struct perf_event_attr { mmap_data : 1, /* non-exec mmap data*/ sample_id_all : 1, /* sample_type all events */ - __reserved_1 : 45; + exclude_host : 1, /* don't count in host */ + exclude_guest : 1, /* don't count in guest */ + + __reserved_1 : 43; union { __u32 wakeup_events;/* wakeup every n events */ -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 5/9] perf, tools: Do guest-only counting in perf-kvm by default
From: Joerg Roedel Make use of exclude_guest and exlude_host in perf-kvm to do only guest-only counting by default. Signed-off-by: Joerg Roedel Signed-off-by: Gleb Natapov --- tools/perf/builtin-kvm.c |3 ++- tools/perf/util/event.c|8 tools/perf/util/event.h|2 ++ tools/perf/util/evlist.c |5 - tools/perf/util/parse-events.c |1 + 5 files changed, 17 insertions(+), 2 deletions(-) diff --git a/tools/perf/builtin-kvm.c b/tools/perf/builtin-kvm.c index 032324a..9b05afa 100644 --- a/tools/perf/builtin-kvm.c +++ b/tools/perf/builtin-kvm.c @@ -107,7 +107,8 @@ static int __cmd_buildid_list(int argc, const char **argv) int cmd_kvm(int argc, const char **argv, const char *prefix __used) { - perf_host = perf_guest = 0; + perf_host = 0; + perf_guest = 1; argc = parse_options(argc, argv, kvm_options, kvm_usage, PARSE_OPT_STOP_AT_NON_OPTION); diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c index 437f8ca..31a6d7f 100644 --- a/tools/perf/util/event.c +++ b/tools/perf/util/event.c @@ -44,6 +44,14 @@ static struct perf_sample synth_sample = { .period= 1, }; +void event_attr_init(struct perf_event_attr *attr) +{ + if (!perf_host) + attr->exclude_host = 1; + if (!perf_guest) + attr->exclude_guest = 1; +} + static pid_t perf_event__synthesize_comm(union perf_event *event, pid_t pid, int full, perf_event__handler_t process, struct perf_session *session) diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h index 357a85b..e5f101e 100644 --- a/tools/perf/util/event.h +++ b/tools/perf/util/event.h @@ -150,6 +150,8 @@ typedef int (*perf_event__handler_t)(union perf_event *event, struct perf_sample *sample, struct perf_session *session); +void event_attr_init(struct perf_event_attr *attr); + int perf_event__synthesize_thread_map(struct thread_map *threads, perf_event__handler_t process, struct perf_session *session); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 72e9f48..4773fbe 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -82,8 +82,11 @@ int perf_evlist__add_default(struct perf_evlist *evlist) .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES, }; - struct perf_evsel *evsel = perf_evsel__new(&attr, 0); + struct perf_evsel *evsel; + + event_attr_init(&attr); + evsel = perf_evsel__new(&attr, 0); if (evsel == NULL) goto error; diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c index 3b00775..620ba98 100644 --- a/tools/perf/util/parse-events.c +++ b/tools/perf/util/parse-events.c @@ -848,6 +848,7 @@ int parse_events(struct perf_evlist *evlist , const char *str, int unset __used) for (;;) { ostr = str; memset(&attr, 0, sizeof(attr)); + event_attr_init(&attr); ret = parse_event_symbols(evlist, &str, &attr); if (ret == EVT_FAILED) return -1; -- 1.7.5.3 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH 2/2] LAPIC: make lapic support cpu hotplug
On 2011-10-05 12:26, liu ping fan wrote: >> > And make the creation of apic as part of cpu initialization, so >>> apic's state has been ready, before setting kvm_apic. >> >> There is no kvm-apic upstream yet, so it's hard to judge why we need >> this here. If we do, this has to be a separate patch. But I seriously >> doubt we need it (my hack worked without it, and that was not because of >> its hack nature). >> >> Sorry, I did not explain it clearly. What I mean is that “env->apic_state” > must be prepared > before qemu_kvm_cpu_thread_fn() -> ... -> kvm_put_sregs(), where we get > apic_base by > “ sregs.apic_base = cpu_get_apic_base(env->apic_state);” > and then call “kvm_vcpu_ioctl(env, KVM_SET_SREGS, &sregs);” which will > finally affect the > kvm_apic structure in kernel. > > But as current code, in pc_new_cpu(), we call apic_init() to initialize > apic_state, after cpu_init(), > so we can not guarantee the order of apic_state initializaion and the > setting to kernel. > > Because LAPIC is part of x86 chip, I want to move it into cpu_x86_init(), > and ensure apic_init() > called before thread “qemu_kvm_cpu_thread_fn()” creation. The LAPIC is part of the CPU, the classic APIC was a dedicated chip. For various reasons, a safer approach for creating a new CPU is to stop the machine, add the new device models, run cpu_synchronize_post_init on that new cpu (looks like you missed that) and then resume everything. See http://git.kiszka.org/?p=qemu-kvm.git;a=commitdiff;h=be8f21c6b54eac82f7add7ee9d4ecf9cb8ebb320 ... >>> diff --git a/hw/icc_bus.c b/hw/icc_bus.c >>> new file mode 100644 >>> index 000..360ca2a >>> --- /dev/null >>> +++ b/hw/icc_bus.c >>> @@ -0,0 +1,62 @@ >>> +/* >>> +*/ >>> +#define ICC_BUS_PLUG >>> +#ifdef ICC_BUS_PLUG >>> +#include "icc_bus.h" >>> + >>> + >>> + >>> +struct icc_bus_info icc_info = { >>> +.qinfo.name = "icc", >>> +.qinfo.size = sizeof(struct icc_bus), >>> +.qinfo.props = (Property[]) { >>> +DEFINE_PROP_END_OF_LIST(), >>> +} >>> + >>> +}; >>> + >>> + >>> +static const VMStateDescription vmstate_icc_bus = { >>> +.name = "icc_bus", >>> +.version_id = 1, >>> +.minimum_version_id = 1, >>> +.minimum_version_id_old = 1, >>> +.pre_save = NULL, >>> +.post_load = NULL, >>> +}; >>> + >>> +struct icc_bus *g_iccbus; >>> + >>> +struct icc_bus *icc_init_bus(DeviceState *parent, const char *name) >>> +{ >>> +struct icc_bus *bus; >>> + >>> +bus = FROM_QBUS(icc_bus, qbus_create(&icc_info.qinfo, parent, >> name)); >>> +bus->qbus.allow_hotplug = 1; /* Yes, we can */ >>> +bus->qbus.name = "icc"; >>> +vmstate_register(NULL, -1, &vmstate_icc_bus, bus); >> >> The chipset is the owner of this bus and instantiates it. So it also >> provides a vmstate. You can drop this unneeded one here (it's created >> via an obsolete API anyway). >> > > No familiar with Qemu bus emulation, keep on learning :) . But what I > thought is, > the x86-ICC bus is not the same as bus like PCI. > For a PCI bus, it lies behind a host bridge, but ICC is shared by all x86 > processors in SMP system, > so there is not a outstanding owner. And I right? ICC is also attached to the chipset (due to the IOAPIC). So it looks reasonable to me to let the chipset do the lifecycle management as well. It is the fixed point, CPUs may come and go. Jan signature.asc Description: OpenPGP digital signature
[PATCH] qemu-kvm: Remove extboot support
From: Jan Kiszka Except for booting from SCSI via the LSI controller, SeaBIOS has native support for mass storage interfaces now. And SCSI can be worked around via [1] - or someone finally adds the necessary bits to SeaBIOS if there is a real need. [1] http://thread.gmane.org/gmane.comp.emulators.qemu/78467 Signed-off-by: Jan Kiszka Acked-by: Anthony Liguori --- .gitignore |1 - Makefile|1 - Makefile.target |1 - blockdev.c | 13 - blockdev.h |2 - hw/extboot.c| 123 hw/pc.c | 22 -- hw/pc.h |4 - pc-bios/optionrom/Makefile |2 +- pc-bios/optionrom/extboot.S | 691 --- qemu-config.c |4 - qemu-options.hx |4 +- 12 files changed, 2 insertions(+), 866 deletions(-) delete mode 100644 hw/extboot.c delete mode 100644 pc-bios/optionrom/extboot.S diff --git a/.gitignore b/.gitignore index 625f28e..ea2bd8a 100644 --- a/.gitignore +++ b/.gitignore @@ -66,7 +66,6 @@ pc-bios/vgabios-pq/status pc-bios/optionrom/linuxboot.bin pc-bios/optionrom/multiboot.bin pc-bios/optionrom/multiboot.raw -pc-bios/optionrom/extboot.bin pc-bios/optionrom/vapic.bin .stgit-* cscope.* diff --git a/Makefile b/Makefile index 936f130..f7f33a6 100644 --- a/Makefile +++ b/Makefile @@ -252,7 +252,6 @@ mpc8544ds.dtb \ multiboot.bin linuxboot.bin \ s390-zipl.rom \ spapr-rtas.bin slof.bin -BLOBS += extboot.bin BLOBS += vapic.bin else BLOBS= diff --git a/Makefile.target b/Makefile.target index 324b4f1..f84d8cb 100644 --- a/Makefile.target +++ b/Makefile.target @@ -229,7 +229,6 @@ obj-i386-y += mc146818rtc.o i8259.o pc.o obj-i386-y += cirrus_vga.o sga.o apic.o ioapic.o piix_pci.o obj-i386-y += vmport.o obj-i386-y += device-hotplug.o pci-hotplug.o smbios.o wdt_ib700.o -obj-i386-y += extboot.o obj-i386-y += debugcon.o multiboot.o obj-i386-y += pc_piix.o obj-i386-$(CONFIG_KVM) += kvmclock.o diff --git a/blockdev.c b/blockdev.c index da74171..0827bf7 100644 --- a/blockdev.c +++ b/blockdev.c @@ -16,8 +16,6 @@ #include "sysemu.h" #include "block_int.h" -DriveInfo *extboot_drive = NULL; - static QTAILQ_HEAD(drivelist, DriveInfo) drives = QTAILQ_HEAD_INITIALIZER(drives); static const char *const if_name[IF_COUNT] = { @@ -237,7 +235,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) int on_read_error, on_write_error; const char *devaddr; DriveInfo *dinfo; -int is_extboot = 0; int snapshot = 0; int ret; @@ -356,12 +353,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) } } -is_extboot = qemu_opt_get_bool(opts, "boot", 0); -if (is_extboot && extboot_drive) { -fprintf(stderr, "qemu: two bootable drives specified\n"); -return NULL; -} - on_write_error = BLOCK_ERR_STOP_ENOSPC; if ((buf = qemu_opt_get(opts, "werror")) != NULL) { if (type != IF_IDE && type != IF_SCSI && type != IF_VIRTIO && type != IF_NONE) { @@ -467,10 +458,6 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi) strncpy(dinfo->serial, serial, sizeof(dinfo->serial) - 1); QTAILQ_INSERT_TAIL(&drives, dinfo, next); -if (is_extboot) { -extboot_drive = dinfo; -} - bdrv_set_on_error(dinfo->bdrv, on_read_error, on_write_error); switch(type) { diff --git a/blockdev.h b/blockdev.h index 0a5144c..3587786 100644 --- a/blockdev.h +++ b/blockdev.h @@ -66,6 +66,4 @@ int do_drive_del(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_snapshot_blkdev(Monitor *mon, const QDict *qdict, QObject **ret_data); int do_block_resize(Monitor *mon, const QDict *qdict, QObject **ret_data); -extern DriveInfo *extboot_drive; - #endif diff --git a/hw/extboot.c b/hw/extboot.c deleted file mode 100644 index d517834..000 --- a/hw/extboot.c +++ /dev/null @@ -1,123 +0,0 @@ -/* - * Extended boot option ROM support. - * - * Copyright IBM, Corp. 2007 - * - * Authors: - * Anthony Liguori - * - * This work is licensed under the terms of the GNU GPL, version 2. See - * the COPYING file in the top-level directory. - * - */ - -#include "hw.h" -#include "pc.h" -#include "isa.h" -#include "block.h" - -/* Extended Boot ROM suport */ - -union extboot_cmd -{ -uint16_t type; -struct { - uint16_t type; - uint16_t cylinders; - uint16_t heads; - uint16_t sectors; - uint64_t nb_sectors; -} query_geometry; -struct { - uint16_t type; - uint16_t nb_sectors; - uint16_t segment; - uint16_t offset; - uint64_t sector; -} xfer; -}; - -static void get_translated_chs(BlockDriverState *bs, int *c, int *h, int *s) -{ -bdrv_get_geometry_hint(bs, c, h, s); - -if (*c <= 1024) { - *c >>= 0; - *h <<= 0; -} else if (*c <= 2048) { - *c >>= 1; - *h <<= 1; -} else if (*c <=