[Qemu-devel] [PATCH 3/3] vfio-pci: rework of EOI
Originally VFIO is coded to support IOAPIC only (i.e. x86). The patch adds XICS (POWERPC interrupt controller) and replaces ioapic_add_gsi_eoi_notifier with unified macro to have as little #ifdef TARGET_PPC64 as possible. Still needs some rework to get rid of #ifdef TARGET_PPC64. Signed-off-by: Alexey Kardashevskiy a...@ozlabs.ru --- hw/vfio_pci.c | 24 hw/vfio_pci.h |1 - 2 files changed, 16 insertions(+), 9 deletions(-) diff --git a/hw/vfio_pci.c b/hw/vfio_pci.c index fd65731..cd68fe0 100644 --- a/hw/vfio_pci.c +++ b/hw/vfio_pci.c @@ -21,7 +21,6 @@ #include dirent.h #include stdio.h #include unistd.h -#include sys/io.h #include sys/ioctl.h #include sys/mman.h #include sys/types.h @@ -44,6 +43,15 @@ #include range.h #include vfio_pci.h +#ifndef TARGET_PPC64 +#include sys/io.h +#include ioapic.h +#define vfio_irq_add_eoi_notifier ioapic_add_gsi_eoi_notifier +#else +#include xics.h +#define vfio_irq_add_eoi_notifier xics_add_eoi_notifier +#endif + //#define DEBUG_VFIO #ifdef DEBUG_VFIO #define DPRINTF(fmt, ...) \ @@ -258,7 +266,7 @@ static void vfio_enable_intx_kvm(VFIODevice *vdev) irqfd.fd = event_notifier_get_fd(vdev-intx.interrupt); qemu_set_fd_handler(irqfd.fd, NULL, NULL, vdev); -ioapic_remove_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +notifier_remove(vdev-intx.eoi); vfio_mask_intx(vdev); vdev-intx.pending = false; qemu_set_irq(vdev-pdev.irq[vdev-intx.pin], 0); @@ -294,7 +302,7 @@ static void vfio_enable_intx_kvm(VFIODevice *vdev) return; fail: -ioapic_add_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +vfio_irq_add_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev); vfio_unmask_intx(vdev); #endif @@ -341,7 +349,7 @@ static void vfio_disable_intx_kvm(VFIODevice *vdev) event_notifier_cleanup(vdev-intx.unmask); -ioapic_add_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +vfio_irq_add_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); qemu_set_fd_handler(irqfd.fd, vfio_intx_interrupt, NULL, vdev); vfio_unmask_intx(vdev); @@ -366,7 +374,7 @@ static void vfio_update_irq(Notifier *notify, void *data) vdev-host.func, vdev-intx.irq, irq); vfio_disable_intx_kvm(vdev); -ioapic_remove_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +notifier_remove(vdev-intx.eoi); vdev-intx.irq = irq; @@ -375,7 +383,7 @@ static void vfio_update_irq(Notifier *notify, void *data) return; } -ioapic_add_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +vfio_irq_add_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); vfio_enable_intx_kvm(vdev); /* Re-enable the interrupt in cased we missed an EOI */ @@ -404,7 +412,7 @@ static int vfio_enable_intx(VFIODevice *vdev) vdev-intx.pin = pin - 1; /* Pin A (1) - irq[0] */ vdev-intx.irq = pci_get_irq(vdev-pdev, vdev-intx.pin); vdev-intx.eoi.notify = vfio_eoi; -ioapic_add_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +vfio_irq_add_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); vdev-intx.update_irq.notify = vfio_update_irq; pci_add_irq_update_notifier(vdev-pdev, vdev-intx.update_irq); @@ -441,7 +449,7 @@ static void vfio_disable_intx(VFIODevice *vdev) vfio_disable_irqindex(vdev, VFIO_PCI_INTX_IRQ_INDEX); pci_remove_irq_update_notifier(vdev-intx.update_irq); -ioapic_remove_gsi_eoi_notifier(vdev-intx.eoi, vdev-intx.irq); +notifier_remove(vdev-intx.eoi); fd = event_notifier_get_fd(vdev-intx.interrupt); qemu_set_fd_handler(fd, NULL, NULL, vdev); diff --git a/hw/vfio_pci.h b/hw/vfio_pci.h index 00bb3dd..d1a7434 100644 --- a/hw/vfio_pci.h +++ b/hw/vfio_pci.h @@ -4,7 +4,6 @@ #include qemu-common.h #include qemu-queue.h #include pci.h -#include ioapic.h #include event_notifier.h typedef struct VFIOPCIHostDevice { -- 1.7.10.4
Re: [Qemu-devel] Closing an opened telnet monitor?
On 23.07.2012 02:37, Erik Rull wrote: Hi all, how can I close an open telnet session to my qemu monitor? I didn't find any possiblity beside killing my telnet client on my remote connected system. Is there a nicer way of doing that? quit is definitively the wrong command :-) I guess every telnet client supports Ctrl+[ escape sequence. So the answer to this is --- Ctrl+[, q. man telnet. /mjt
Re: [Qemu-devel] [PATCH v7] kvm: notify host when the guest is panicked
At 07/23/2012 04:19 AM, Sasha Levin Wrote: On 07/22/2012 09:22 PM, Anthony Liguori wrote: Sasha Levin levinsasha...@gmail.com writes: On 07/21/2012 09:12 AM, Wen Congyang wrote: +#define KVM_PV_PORT (0x505UL) + #ifdef __KERNEL__ #include asm/processor.h @@ -221,6 +223,11 @@ static inline void kvm_disable_steal_time(void) } #endif +static inline unsigned int kvm_arch_pv_features(void) +{ + return inl(KVM_PV_PORT); +} + Why is this safe? I'm not sure you can just pick any ioport you'd like and use it. There are three ways I/O ports get used on a PC: 1) Platform devices - This is well defined since the vast majority of platform devices are implemented within a single chip. If you're emulating an i440fx chipset, the PIIX4 spec has an exhaustive list. 2) PCI devices - Typically, PCI only allocates ports starting at 0x0d00 to avoid conflicts with ISA devices. 3) ISA devices - ISA uses subtractive decoding so any ISA device can access. In theory, an ISA device could attempt to use port 0x0505 but it's unlikely. In a modern guest, there aren't really any ISA devices being added either. So yes, picking port 0x0505 is safe for something like this (as long as you check to make sure that you really are under KVM). Is there anything that actually prevents me from using PCI ports lower than 0x0d00? As you said in (3), ISA isn't really used anymore (nor is implemented by lkvm for example), so placing PCI below 0x0d00 might even make sense in that case. Furthermore, I can place one of these brand new virtio-mmio devices which got introduced recently wherever I want right now - Having a device that uses 0x505 would cause a pretty non-obvious failure mode. Either way, If we are going to grab an ioport, then: - It should be documented well somewhere in Documentation/virt/kvm - It should go through request_region() to actually claim those ioports. Good idea. - It should fail gracefully if that port is taken for some reason, instead of not even checking it. Yes, I agree it. I will update it. Thanks Wen Congyang -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [Qemu-devel] [PATCH v2 4/4] ARM: exynos4210_pmu: Add software reset support
20.07.2012 18:32, Peter Maydell пишет: On 12 July 2012 17:54, Maksim Kozlovm.koz...@samsung.com wrote: Signed-off-by: Maksim Kozlovm.koz...@samsung.com --- hw/exynos4210_pmu.c | 40 +--- 1 files changed, 33 insertions(+), 7 deletions(-) diff --git a/hw/exynos4210_pmu.c b/hw/exynos4210_pmu.c index 7f09c79..96588d9 100644 --- a/hw/exynos4210_pmu.c +++ b/hw/exynos4210_pmu.c @@ -18,13 +18,8 @@ * with this program; if not, seehttp://www.gnu.org/licenses/. */ -/* - * This model implements PMU registers just as a bulk of memory. Currently, - * the only reason this device exists is that secondary CPU boot loader - * uses PMU INFORM5 register as a holding pen. - */ - #include sysbus.h +#include sysemu.h #ifndef DEBUG_PMU #define DEBUG_PMU 0 @@ -230,6 +225,8 @@ #define EXYNOS4210_PMU_REGS_MEM_SIZE 0x3d0c +#define SWRESET_SYSTEM_MASK 0x0001 + typedef struct Exynos4210PmuReg { const char *name; /* for debug only */ uint32_t offset; @@ -458,7 +455,17 @@ static void exynos4210_pmu_write(void *opaque, target_phys_addr_t offset, PRINT_DEBUG_EXTEND(%s [0x%04x]- 0x%04x\n, exynos4210_pmu_regs[index].name, (uint32_t)offset, (uint32_t)val); -s-reg[index] = val; +switch (offset) { +case SWRESET: +if (val SWRESET_SYSTEM_MASK) { +s-reg[index] = val; +qemu_system_reset_request(); +} +break; +default: +s-reg[index] = val; +break; +} } static const MemoryRegionOps exynos4210_pmu_ops = { @@ -477,9 +484,28 @@ static void exynos4210_pmu_reset(DeviceState *dev) Exynos4210PmuState *s = container_of(dev, Exynos4210PmuState, busdev.qdev); unsigned i; +uint32_t index = exynos4210_pmu_get_register_index(s, SWRESET); +uint32_t swreset = s-reg[index]; /* Set default values for registers */ for (i = 0; i PMU_NUM_OF_REGISTERS; i++) { +if (swreset) { +switch (exynos4210_pmu_regs[i].offset) { +case INFORM0: +case INFORM1: +case INFORM2: +case INFORM3: +case INFORM4: +case INFORM5: +case INFORM6: +case INFORM7: +case PS_HOLD_CONTROL: +/* keep these registers during SW reset */ +continue; +default: +break; +} +} s-reg[i] = exynos4210_pmu_regs[i].reset_value; } } This patch seems to be trying to make a distinction that QEMU doesn't support, ie between system wide software reset and system wide hard reset. I'm not convinced about the wisdom of trying to paper over this lack with a single-device workaround. Ok. Thank for review And I found out that this patch contains a mistake, therefore it should be rejected in any case -- PMM
Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/2] spapr: Add support for -vga option
On Wed, 2012-06-27 at 17:25 -0500, Anthony Liguori wrote: If a user asks for something and we can't make it work, we should fail. Note that QEMU's cirrus works fine with the new kernel cirrusdrmfb, so I say we should allow it (needs to be able to the powerpc device .mak as well tho. It shouldn't be primary because old RHEL's iirc used to have cirrusfb enabled and that would blow due to guest kernel bugs among others. I haven't had a chance to try whatever userspace DDX the Xorg folks have come up with to use on top of cirrusdrmfb, so that might need a bit of fixing but there's no reason not to allow -vga cirrus at this point. Note to Matthew: cirrusdrmfb is a LOT SLOWER than offb for a similar SW only dumb framebuffer, probably has to do with the way it does the dirty stuff, not sure ... Why not draw directly into the emulated vram ? Cheers, Ben.
Re: [Qemu-devel] [PATCH 4/7] RTC: Update the RTC clock only when reading it
Il 23/07/2012 07:17, Juan Quintela ha scritto: Paolo Bonzini pbonz...@redhat.com wrote: From: Zhang, Yang Z yang.z.zh...@intel.com Calculate guest RTC based on the time of the last update, instead of using timers. The formula is (base_rtc + guest_time_now - guest_time_last_update + offset) Base_rtc is the RTC value when the RTC was last updated. Guest_time_now is the guest time when the access happens. Guest_time_last_update was the guest time when the RTC was last updated. Offset is used when divider reset happens or the set bit is toggled. The timer is kept in order to signal interrupts, but it only needs to run when either UF or AF is cleared. When the bits are both set, the timer does not run. UIP is now synthesized when reading register A. If the timer is not set, or if there is more than one second before it (as is the case at the end of this series), the leading edge of UIP is computed and the rising edge occurs 220us later. If the update timer occurs within one second, however, the rising edge of the AF and UF bits should coincide withe the falling edge of UIP. We do not know exactly when this will happen because there could be delays in the servicing of the timer. Hence, in this case reading register A only computes for the rising edge of UIP, and latches the bit until the timer is fired and clears it. Signed-off-by: Yang Zhang yang.z.zh...@intel.com Signed-off-by: Paolo Bonzini pbonz...@redhat.com @@ -540,11 +593,12 @@ static const VMStateDescription vmstate_rtc = { VMSTATE_INT32(current_tm.tm_mday, RTCState), VMSTATE_INT32(current_tm.tm_mon, RTCState), VMSTATE_INT32(current_tm.tm_year, RTCState), +VMSTATE_UINT64(base_rtc, RTCState), +VMSTATE_UINT64(last_update, RTCState), +VMSTATE_INT64(offset, RTCState), VMSTATE_UINT64_V(base_rtc, RTCState, 3) same ofr the others. Normally, new fields are added at the end of the structure. Doesn't really matter if you bump the minimum version but you're right that it is more correct. VMSTATE_TIMER(periodic_timer, RTCState), VMSTATE_INT64(next_periodic_time, RTCState), -VMSTATE_INT64(next_second_time, RTCState), -VMSTATE_TIMER(second_timer, RTCState), -VMSTATE_TIMER(second_timer2, RTCState), +VMSTATE_TIMER(update_timer, RTCState), I have to read the rest of the patch to know what is the relation of this 4 fields, to see if there is any way to create this in any sane way that is compatible. next_second_time is computed like this (see check_update_timer): guest_nsec = get_guest_rtc_ns(s) % NSEC_PER_SEC; next_update_time = qemu_get_clock_ns(rtc_clock) + NSEC_PER_SEC - guest_nsec; One of second_timer and second_timer2 is unset, depending on whether UIP=1 or UIP=0 respectively. second_timer if set is next_second_time. second_timer2 if set is next_second_time + get_ticks_per_sec/100. The new fields can go in a different subsection. But then you would transmit the subsection always, and old QEMU doesn't know how to skip it. So it doesn't really help. This is because base_rtc, last_update and offset need to be transmitted always. next_alarm_time only if no alarm fired already, but the common case is yes. Only for update_timer we can hack and transmit it in the slot that was used for next_second_time. Unless you want to transmit the subsection only for the new machine version, which is never done in the QEMU tree. Paolo
Re: [Qemu-devel] [PATCH] ide scsi: Mess with geometry only for hard disk devices
Ping? Markus Armbruster arm...@redhat.com writes: Legacy -drive cyls=... are now ignored completely when the drive doesn't back a hard disk device. Before, they were first checked against a hard disk's limits, then ignored. Signed-off-by: Markus Armbruster arm...@redhat.com --- hw/ide/qdev.c |3 ++- hw/scsi-disk.c |3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index 22e58df..5ea9b8f 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -149,7 +149,8 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind) } blkconf_serial(dev-conf, dev-serial); -if (blkconf_geometry(dev-conf, dev-chs_trans, 65536, 16, 255) 0) { +if (kind != IDE_CD + blkconf_geometry(dev-conf, dev-chs_trans, 65536, 16, 255) 0) { return -1; } diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 525816c..318318c 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1750,7 +1750,8 @@ static int scsi_initfn(SCSIDevice *dev) } blkconf_serial(s-qdev.conf, s-serial); -if (blkconf_geometry(dev-conf, NULL, 65535, 255, 255) 0) { +if (dev-type == TYPE_DISK + blkconf_geometry(dev-conf, NULL, 65535, 255, 255) 0) { return -1; }
Re: [Qemu-devel] [Qemu-ppc] [PATCH 2/2] spapr: Add support for -vga option
On Mon, 2012-07-23 at 16:40 +1000, Benjamin Herrenschmidt wrote: Note to Matthew: cirrusdrmfb is a LOT SLOWER than offb for a similar SW only dumb framebuffer, probably has to do with the way it does the dirty stuff, not sure ... Why not draw directly into the emulated vram ? More note to Matthew ... any objection to doing something similar for qemu bochs style framebuffer ? (ia -vga std). This is the preferred one on ppc, does 32-bpp fine and is generally simpler. Cheers, Ben.
Re: [Qemu-devel] [PATCH 7/7] RTC: Allow to migrate from old QEMU
Il 23/07/2012 07:12, Juan Quintela ha scritto: .fields = (VMStateField []) { VMSTATE_BUFFER(cmos_data, RTCState), VMSTATE_UINT8(cmos_index, RTCState), VMSTATE_INT32(current_tm.tm_sec, RTCState), VMSTATE_INT32(current_tm.tm_min, RTCState), VMSTATE_INT32(current_tm.tm_hour, RTCState), VMSTATE_INT32(current_tm.tm_wday, RTCState), VMSTATE_INT32(current_tm.tm_mday, RTCState), VMSTATE_INT32(current_tm.tm_mon, RTCState), VMSTATE_INT32(current_tm.tm_year, RTCState), you can change this to: VMSTATE_UNUSED(7*4); I think this is not safe. In the load_old case, we ignore the struct tm because we call rtc_set_time and check_update_timer. We need this to make up some values of base_rtc, last_update and offset. In the normal case we need both the CMOS data and the timer data. Though perhaps I can make a further change and remove current_tm altogether. I'll take a look this week; however that will make it even harder to produce the old version. Paolo
Re: [Qemu-devel] [PATCH v2] MP initialization protocol differs between cpu families, and for P6 and onward models it is up to CPU to decide if it will be BSP using this protocol, so try to model this.
Hello Gleb, Is this v2 patch more acceptable then v1?
Re: [Qemu-devel] [PATCH 0/4 v2] target-i386: move tcg intialization inside CPU object
ping. On 06/25/2012 03:55 PM, Igor Mammedov wrote: v2: - drop usage of prev_debug_excp_handler consistently in all users - split from reset patches to avoid confusion of inter-dependency Compile Run tested: target-i386: tcg and kvm mode i386-linux-user: running of /bin/ls Compile tested: xtensa-softmmu xtensaeb-softmmu git tree for testing: https://github.com/imammedo/qemu/tree/x86cpu_qom_tcg_v2 Igor Mammedov (4): target-i386: drop usage of prev_debug_excp_handler target-xtensa: drop usage of prev_debug_excp_handler cleanup cpu_set_debug_excp_handler target-i386: move tcg initialization into x86_cpu_initfn() cpu-exec.c |5 + exec-all.h |2 +- target-i386/cpu.c | 10 ++ target-i386/cpu.h |1 + target-i386/helper.c | 16 +--- target-xtensa/helper.c |8 +--- 6 files changed, 15 insertions(+), 27 deletions(-) -- - Igor
Re: [Qemu-devel] [PATCH v2] MP initialization protocol differs between cpu families, and for P6 and onward models it is up to CPU to decide if it will be BSP using this protocol, so try to model this.
On Mon, Jul 23, 2012 at 09:44:05AM +0200, Igor Mammedov wrote: Hello Gleb, Is this v2 patch more acceptable then v1? Yes. Sorry for not being explicit about it :) -- Gleb.
Re: [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend
On Sun, Jul 22, 2012 at 04:38:00PM +0100, Stefan Hajnoczi wrote: On Sat, Jul 21, 2012 at 9:31 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: +typedef struct GlusterAIOCB { +BlockDriverAIOCB common; +QEMUIOVector *qiov; The qiov field is unused. +char *bounce; Unused. Yes, removed these two. +struct BDRVGlusterState *s; You can get this through common.bs-opaque, but if you like having a shortcut, that's fine. +int cancelled; bool Ok. +} GlusterAIOCB; + +typedef struct GlusterCBKData { +GlusterAIOCB *acb; +struct BDRVGlusterState *s; +int64_t size; +int ret; +} GlusterCBKData; I think GlusterCBKData could just be part of GlusterAIOCB. That would simplify the code a little and avoid some malloc/free. Are you suggesting to put a field GlusterCBKData gcbk; inside GlusterAIOCB and use gcbk from there or Are you suggesting that I make the fields of GlusterCBKData part of GlusterAIOCB and get rid of GlusterCBKData altogether ? This means I would have to pass the GlusterAIOCB to gluster async calls and update its fields from gluster callback routine. I can do this, but I am not sure if you can touch the fields of GlusterAIOCB in non-QEMU threads (gluster callback thread). + +typedef struct BDRVGlusterState { +struct glfs *glfs; +int fds[2]; +int open_flags; +struct glfs_fd *fd; +int qemu_aio_count; +int event_reader_pos; +GlusterCBKData *event_gcbk; +} BDRVGlusterState; + +#define GLUSTER_FD_READ 0 +#define GLUSTER_FD_WRITE 1 + +static void qemu_gluster_complete_aio(GlusterCBKData *gcbk) +{ +GlusterAIOCB *acb = gcbk-acb; +int ret; + +if (acb-cancelled) { Where does cancelled get set? I realised that I am not supporting bdrv_aio_cancel(). I guess I will have to add support for this in next version. +qemu_aio_release(acb); +goto done; +} + +if (gcbk-ret == gcbk-size) { +ret = 0; /* Success */ +} else if (gcbk-ret 0) { +ret = gcbk-ret; /* Read/Write failed */ +} else { +ret = -EINVAL; /* Partial read/write - fail it */ EINVAL is for invalid arguments. EIO would be better. Ok. +/* + * file=protocol:server@port:volname:image + */ +static int qemu_gluster_parsename(GlusterConf *c, const char *filename) +{ +char *file = g_strdup(filename); +char *token, *next_token, *saveptr; +char *token_s, *next_token_s, *saveptr_s; +int ret = -EINVAL; + +/* Discard the protocol */ +token = strtok_r(file, :, saveptr); +if (!token) { +goto out; +} + +/* server@port */ +next_token = strtok_r(NULL, :, saveptr); +if (!next_token) { +goto out; +} +if (strchr(next_token, '@')) { +token_s = strtok_r(next_token, @, saveptr_s); +if (!token_s) { +goto out; +} +strncpy(c-server, token_s, HOST_NAME_MAX); strncpy(3) will not NUL-terminate when token_s is HOST_NAME_MAX characters long. QEMU has cutils.c:pstrcpy(). Will use pstrcpy. When the argument is too long we should probably report an error instead of truncating. Or should we let gluster APIs to flag an error with truncated server and volume names ? Same below. +next_token_s = strtok_r(NULL, @, saveptr_s); +if (!next_token_s) { +goto out; +} +c-port = atoi(next_token_s); No error checking. If the input is invalid an error message would help the user here. Fixed. +static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename) +{ +struct glfs *glfs = NULL; +int ret; + +ret = qemu_gluster_parsename(c, filename); +if (ret 0) { +errno = -ret; +goto out; +} + +glfs = glfs_new(c-volname); +if (!glfs) { +goto out; +} + +ret = glfs_set_volfile_server(glfs, socket, c-server, c-port); +if (ret 0) { +goto out; +} + +/* + * TODO: Logging is not necessary but instead nice to have. + * Can QEMU optionally log into a standard place ? QEMU prints to stderr, can you do that here too? The global log file is not okay, especially when multiple QEMU instances are running. Ok, I can do glfs_set_logging(glfs, /dev/stderr, loglevel); + * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of + * hard coded values like 7 here. + */ +ret = glfs_set_logging(glfs, /tmp/qemu-gluster.log, 7); +if (ret 0) { +goto out; +} + +ret = glfs_init(glfs); +if (ret 0) { +goto out; +} +return glfs; + +out: +if (glfs) { +(void)glfs_fini(glfs); +} +return NULL; +} + +static int qemu_gluster_open(BlockDriverState *bs, const char *filename,
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote: On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: -drive file=gluster:server@port:volname:image - Here 'gluster' is the protocol. - 'server@port' specifies the server where the volume file specification for the given volume resides. 'port' is the port number on which gluster management daemon (glusterd) is listening. This is optional and if not specified, QEMU will send 0 which will make libgfapi to use the default port. 'server@port' is weird notation. Normally it is 'server:port' (e.g. URLs). Can you change it? I don't like but, but settled for it since port was optional and : was being used as separator here. What about the other transports supported by libgfapi: UNIX domain sockets and RDMA? My reading of glfs.h is that there are 3 connection options: 1. 'transport': 'socket' (default), 'unix', 'rdma' 2. 'host': server hostname for 'socket', path to UNIX domain socket for 'unix', or something else for 'rdma' 3. 'port': TCP port when 'socket' is used. Ignored otherwise. Unfortunately QEMU block drivers cannot take custom options yet. That would make it possible to cleanly map these connection options and save you from inventing syntax which doesn't expose all options. In the meantime it would be nice if the syntax exposed all options. So without the capability to pass custom options to block drivers, am I forced to keep extending the file= with more and more options ? file=gluster:transport:server:port:volname:image ? Looks ugly and not easy to make any particular option optional. If needed I can support this from GlusterFS backend. Note that we are no longer using volfiles directly and use volume names instead. For this to work, gluster management daemon (glusterd) needs to be running on the QEMU node. This limits the QEMU user to access the volumes by the default volfiles that are generated by gluster CLI. This should be fine as long as gluster CLI provides the capability to generate or regenerate volume files for a given volume with the xlator set that QEMU user is interested in. GlusterFS developers tell me that this can be provided with some enhancements to Gluster CLI/glusterd. Note that the custom volume files is typically needed when GlusterFS server is co-located with QEMU in which case it would be beneficial to get rid of client-server overhead and RPC communication overhead. My knowledge of GlusterFS is limited. Here is what I am thinking: 1. The user cannot specify a local configuration file, you require that there is a glusterd running which provides configuration information. Yes. User only specifies a volume name and glusterd is used to fetch the right volume file for that volume name. 2. It is currently not possible to bypass RPC because the glusterd managed configuration file doesn't support that. It is possible. Gluster already supports custom extensions to volume names and it is possible to use the required volfile by specifying this custom volname extension. For eg, if I have a volume named test, by default the volfile used for it will be test-fuse.vol. Currently I can put my own custom volfile into the standard location and get glusterd pick that up. I can specify test.rpcbypass as volname and glusterd will pick test.rpcbypass.vol. What is currently not supported is the ability to create test.rpcbypass.vol from gluster CLI. I believe that gluster developers are ok with enhancing gluster CLI to support generating/regenerating volfiles for a given volume with custom translator set. I'm not sure if these statements are true? Would you support local volfiles in the future again? Why force users to run glusterd? I will let gluster folks on CC to answer this and let us know the benefits of always depending on glusterd. I guess running glusterd would be beneficial when supporting migration. QEMU working from a local volume (with volname=test.rpcbypass) can be easily restarted on a different node by just changing volname to test. glusterd will take care of fetching the right volfile automatically for us. - As mentioned above, the VM image on gluster volume can be specified like this: -drive file=gluster:localhost:testvol:/F17,format=gluster Note that format=gluster is not needed ideally and its a work around I have until libgfapi provides a working connection cleanup routine (glfs_fini()). When the format isn't specified, QEMU figures out the format by doing find_image_format that results in one open and close before opening the image file long term for standard read and write. Gluster connection initialization is done from open and connection termination is done from close. But since glfs_fini() isn't working yet, I am bypassing find_image_format by specifying format=gluster directly which results in just one open and
Re: [Qemu-devel] [RFC seabios PATCH] enumerate APIC IDs directly from CPUs
On Thu, Jul 19, 2012 at 04:46:35PM -0300, Eduardo Habkost wrote: On Thu, Jul 19, 2012 at 11:28:54AM -0300, Eduardo Habkost wrote: On Thu, Jul 19, 2012 at 12:58:46PM +0300, Gleb Natapov wrote: On Tue, Jul 17, 2012 at 06:56:30PM -0300, Eduardo Habkost wrote: This patch is an attempt to fix the non-continguous-APIC-ID problem without the FW_CFG_LAPIC_INFO approach I have sent proposed last week. Basically, this changes Seabios to probe for APIC IDs directly from the CPUs on boot, instead of getting it using fw_cfg, store the found APIC IDs on a bitmap, and use that information whe building the MADT, SRAT, and SSDT ACPI tables. To do this properly, we have to decide the meaning of CPU IDs in the QEMU-Seabios interfaces, too. I see two possible approaches: 1) Have Seabios and QEMU agree on a a CPU identifier, that is independent from the APIC ID. 2) Always use the Initial APIC ID on all communication between QEMU and Seabios. We need to be prepared to support more than 255 cpus. With 255 cpus comes x2apic and x2apic has 32bit apic ids. HW does not have to support all of the bits though, but potentially all the bitmasks can grow prohibitedly large. I see only two solutions: - Specify an interface/convention for QEMU and Seabios agree upon a CPU identifier = x2apic = LAPIC ID mapping for all CPUs. - Specify new NUMA-information and CPU hotplug interfaces (or extend the existing ones) based on x2apic ID, when Seabios start supporting x2apic. I am not particularly inclined towards any of those two solutions. I dislike them equally. :-) Oh, it is simpler than I have expected. x2APIC specification: The local APIC ID is initialized by hardware with a 32 bit ID (x2APIC ID). The lowest 8 bits of the x2APIC ID is the legacy local xAPIC ID, and is stored in the upper 8 bits of the APIC register for access in xAPIC mode. And the ACPI specification: Logical processors with APIC ID values of 255 and greater are required to have a Processor Device object and must convey the processor’s APIC information to OSPM using the Processor Local X2APIC structure. Logical processors with APIC ID values less than 255 must use the Processor Local APIC structure to convey their APIC information to OSPM. That means the x2APIC ID and the xAPIC ID are interchangeable, for values = 255. That means the QEMU=Seabios communication can be safely based on APIC IDs without any ambiguity. Yes for = 255 they interchangeable. That's why we can add +x2apic to our cpu models without changes to the BIOS. The CPU hotplug interface is a bit of a problem because it is based on a 256-bit bitmap. But on the day it gets extended to support more than 256 CPUs, it can safely be based on APIC IDs and still keep compatibility with systems without x2APIC. The bitmap will have to be extended if we will go beyond 256 cpus. Using apic-id to index the bitmap means that the size of the bitmap is a function of max apic-id we want to support, not max number of cpus. So, now I am strongly inclined towards the second option from the list above: just use APIC IDs everywhere to identify CPUs when QEMU and Seabios communicate with each other, and QEMU can completely ignore the Processor ID used by Seabios. I agree with making Processor ID Seabios internal thing. Note that I am more worried about the QEMU-Seabios interfaces. The APIC ID bitmap on smp.c, for example, is just an implementation detail: if we make Seabios support x2apic, that code can be changed to use a different data structure instead. [...] To try to make things less likely to break on the common, non-hotplug case, this patch makes the Processor IDs be chosen this way: - The CPUs present on boot get contiguous Processor IDs (even if the APIC IDs are not contiguous); - The remaining Processor declarations are going to associated to the remaining APIC IDs (immediately after the last present APIC ID), sequentially. This means that hotplugged CPUs may not get contiguous Processor declarations if their APIC IDs are not contiguous. I am curious what will happen if cpu will be hot plugged, than hibernate and resume is done. After resume hot plugged cpu will have different Processor ID in ACPI. This may or may not be a problem. True. Keeping those tables stable after hotplug and hibernate may be a challenge. Maybe it would be easier to just leave holes on the MADT and SSDT tables (making APIC ID and Processor ID always match), and hope no OS will be confused by the holes. I am inclined to try this approach first (keep APIC ID == ACPI Processor ID), to keep things simple in Seabios. I am hoping no OS will have problems with the holes in the list of enabled Processor IDs. They shouldn't. -- Gleb.
Re: [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend
On Mon, Jul 23, 2012 at 9:32 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: On Sun, Jul 22, 2012 at 04:38:00PM +0100, Stefan Hajnoczi wrote: On Sat, Jul 21, 2012 at 9:31 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: +} GlusterAIOCB; + +typedef struct GlusterCBKData { +GlusterAIOCB *acb; +struct BDRVGlusterState *s; +int64_t size; +int ret; +} GlusterCBKData; I think GlusterCBKData could just be part of GlusterAIOCB. That would simplify the code a little and avoid some malloc/free. Are you suggesting to put a field GlusterCBKData gcbk; inside GlusterAIOCB and use gcbk from there or Are you suggesting that I make the fields of GlusterCBKData part of GlusterAIOCB and get rid of GlusterCBKData altogether ? This means I would have to pass the GlusterAIOCB to gluster async calls and update its fields from gluster callback routine. I can do this, but I am not sure if you can touch the fields of GlusterAIOCB in non-QEMU threads (gluster callback thread). The fields in GlusterCBKData could become part of GlusterAIOCB. Different threads can access fields in a struct, they just need to ensure access is synchronized if they touch the same fields. In the case of this code I think there is nothing that requires synchronization beyond the pipe mechanism that you already use to complete processing in a QEMU thread. When the argument is too long we should probably report an error instead of truncating. Or should we let gluster APIs to flag an error with truncated server and volume names ? What if the truncated name is a valid but different object? For example: Max chars = 5 Objects: helloworld hello If helloworld is truncated to hello we get no error back because it's a valid object! We need to either check sizes explicitly without truncating or use a g_strdup() approach without any size limits and let the gfapi functions error out if the input string is too long. +static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename) +{ +struct glfs *glfs = NULL; +int ret; + +ret = qemu_gluster_parsename(c, filename); +if (ret 0) { +errno = -ret; +goto out; +} + +glfs = glfs_new(c-volname); +if (!glfs) { +goto out; +} + +ret = glfs_set_volfile_server(glfs, socket, c-server, c-port); +if (ret 0) { +goto out; +} + +/* + * TODO: Logging is not necessary but instead nice to have. + * Can QEMU optionally log into a standard place ? QEMU prints to stderr, can you do that here too? The global log file is not okay, especially when multiple QEMU instances are running. Ok, I can do glfs_set_logging(glfs, /dev/stderr, loglevel); Yes. I think - is best since it is supported by gfapi (libglusterfs/src/logging.c:gf_log_init). /dev/stderr is not POSIX. + * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of + * hard coded values like 7 here. + */ +ret = glfs_set_logging(glfs, /tmp/qemu-gluster.log, 7); +if (ret 0) { +goto out; +} + +ret = glfs_init(glfs); +if (ret 0) { +goto out; +} +return glfs; + +out: +if (glfs) { +(void)glfs_fini(glfs); +} +return NULL; +} + +static int qemu_gluster_open(BlockDriverState *bs, const char *filename, +int bdrv_flags) +{ +BDRVGlusterState *s = bs-opaque; +GlusterConf *c = g_malloc(sizeof(GlusterConf)); Can this be allocated on the stack? It consists of PATH_MAX(4096), HOST_NAME_MAX(255) and GLUSTERD_MAX_VOLUME_NAME (1000). A bit heavy to be on stack ? This is userspace, stacks are big but it's up to you. Stefan
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
On Sat, Jul 21, 2012 at 01:59:17PM +0530, Bharata B Rao wrote: Hi, Here is the v2 patchset for supporting GlusterFS protocol from QEMU. This set of patches enables QEMU to boot VM images from gluster volumes. This is achieved by adding gluster as a new block backend driver in QEMU. Its already possible to boot from VM images on gluster volumes, but this patchset provides the ability to boot VM images from gluster volumes by by-passing the FUSE layer in gluster. In case the image is present on the local system, it is possible to even bypass client and server translator and hence the RPC overhead. The major change in this version is to not implement libglusterfs based gluster backend within QEMU but instead use libgfapi. libgfapi library from GlusterFS project provides APIs to access gluster volumes directly. With the use of libgfapi, the specification of gluster backend from QEMU matches more closely with the GlusterFS's way of specifying volumes. We now specify the gluster backed image like this: -drive file=gluster:server@port:volname:image - Here 'gluster' is the protocol. - 'server@port' specifies the server where the volume file specification for the given volume resides. 'port' is the port number on which gluster management daemon (glusterd) is listening. This is optional and if not specified, QEMU will send 0 which will make libgfapi to use the default port. - 'volname' is the name of the gluster volume which contains the VM image. - 'image' is the path to the actual VM image in the gluster volume. I don't think we should be using '@' as a field separator here, when : can do that job just fine. In addition we already have a precendent set with the sheepdog driver for using ':' for separating all fields: -drive file=sheepdog:example.org:6000:imagename If you want to allow for port number to be omitted, this can be handled thus: -drive file=sheepdog:example.org::imagename which is how -chardev deals with omitted port numbers Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
On Mon, Jul 23, 2012 at 9:50 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote: On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: -drive file=gluster:server@port:volname:image - Here 'gluster' is the protocol. - 'server@port' specifies the server where the volume file specification for the given volume resides. 'port' is the port number on which gluster management daemon (glusterd) is listening. This is optional and if not specified, QEMU will send 0 which will make libgfapi to use the default port. 'server@port' is weird notation. Normally it is 'server:port' (e.g. URLs). Can you change it? I don't like but, but settled for it since port was optional and : was being used as separator here. What about the other transports supported by libgfapi: UNIX domain sockets and RDMA? My reading of glfs.h is that there are 3 connection options: 1. 'transport': 'socket' (default), 'unix', 'rdma' 2. 'host': server hostname for 'socket', path to UNIX domain socket for 'unix', or something else for 'rdma' 3. 'port': TCP port when 'socket' is used. Ignored otherwise. Unfortunately QEMU block drivers cannot take custom options yet. That would make it possible to cleanly map these connection options and save you from inventing syntax which doesn't expose all options. In the meantime it would be nice if the syntax exposed all options. So without the capability to pass custom options to block drivers, am I forced to keep extending the file= with more and more options ? file=gluster:transport:server:port:volname:image ? Looks ugly and not easy to make any particular option optional. If needed I can support this from GlusterFS backend. Kevin, Markus: Any thoughts on passing options to block drivers? Encoding GlusterFS options into a filename string is pretty cumbersome.
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
Why not use -drive file=gluster://server[:port]/volname/image A great many protocols today use the form protocol://server:port]/path so this would make it consistent with a lot of other naming schemes out there, and imho make the url more intuitive. FTP looks like this : ftp://user:password@host:port/path NFS looks like this : nfs://host:porturl-path CIFS looks like this : smb://[[[authdomain;]user@]host[:port][/share[/path][/name]]][?context] For iSCSI we use : iscsi://server[:port]/target/lun (The iscsi syntax was picked explicitely to be consistent with the de-facto url naming scheme.) I would argue that this is the de-facto way to create a url for different protocols, so it would imho be natural to specify a glusterfs url in a similar format. ronnie sahlberg On Mon, Jul 23, 2012 at 7:16 PM, Daniel P. Berrange berra...@redhat.com wrote: On Sat, Jul 21, 2012 at 01:59:17PM +0530, Bharata B Rao wrote: Hi, Here is the v2 patchset for supporting GlusterFS protocol from QEMU. This set of patches enables QEMU to boot VM images from gluster volumes. This is achieved by adding gluster as a new block backend driver in QEMU. Its already possible to boot from VM images on gluster volumes, but this patchset provides the ability to boot VM images from gluster volumes by by-passing the FUSE layer in gluster. In case the image is present on the local system, it is possible to even bypass client and server translator and hence the RPC overhead. The major change in this version is to not implement libglusterfs based gluster backend within QEMU but instead use libgfapi. libgfapi library from GlusterFS project provides APIs to access gluster volumes directly. With the use of libgfapi, the specification of gluster backend from QEMU matches more closely with the GlusterFS's way of specifying volumes. We now specify the gluster backed image like this: -drive file=gluster:server@port:volname:image - Here 'gluster' is the protocol. - 'server@port' specifies the server where the volume file specification for the given volume resides. 'port' is the port number on which gluster management daemon (glusterd) is listening. This is optional and if not specified, QEMU will send 0 which will make libgfapi to use the default port. - 'volname' is the name of the gluster volume which contains the VM image. - 'image' is the path to the actual VM image in the gluster volume. I don't think we should be using '@' as a field separator here, when : can do that job just fine. In addition we already have a precendent set with the sheepdog driver for using ':' for separating all fields: -drive file=sheepdog:example.org:6000:imagename If you want to allow for port number to be omitted, this can be handled thus: -drive file=sheepdog:example.org::imagename which is how -chardev deals with omitted port numbers Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :|
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
Stefan, in iscsi, i just specify those extra arguments that are required that are not part of the url itself as just command line options : qemu-system-i386 -iscsi initiator-name=iqn.qemu.test:my-initiator \ -boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \ -cdrom iscsi://127.0.0.1/iqn.qemu.test/2 Here initiator-name is a custom option to the iscsi layer to tell it which name to use when identifying/logging in to the target. Similar concept could be used by glusterfs ? regards ronnie sahlberg On Mon, Jul 23, 2012 at 7:20 PM, Stefan Hajnoczi stefa...@gmail.com wrote: On Mon, Jul 23, 2012 at 9:50 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: On Sun, Jul 22, 2012 at 03:42:28PM +0100, Stefan Hajnoczi wrote: On Sat, Jul 21, 2012 at 9:29 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: -drive file=gluster:server@port:volname:image - Here 'gluster' is the protocol. - 'server@port' specifies the server where the volume file specification for the given volume resides. 'port' is the port number on which gluster management daemon (glusterd) is listening. This is optional and if not specified, QEMU will send 0 which will make libgfapi to use the default port. 'server@port' is weird notation. Normally it is 'server:port' (e.g. URLs). Can you change it? I don't like but, but settled for it since port was optional and : was being used as separator here. What about the other transports supported by libgfapi: UNIX domain sockets and RDMA? My reading of glfs.h is that there are 3 connection options: 1. 'transport': 'socket' (default), 'unix', 'rdma' 2. 'host': server hostname for 'socket', path to UNIX domain socket for 'unix', or something else for 'rdma' 3. 'port': TCP port when 'socket' is used. Ignored otherwise. Unfortunately QEMU block drivers cannot take custom options yet. That would make it possible to cleanly map these connection options and save you from inventing syntax which doesn't expose all options. In the meantime it would be nice if the syntax exposed all options. So without the capability to pass custom options to block drivers, am I forced to keep extending the file= with more and more options ? file=gluster:transport:server:port:volname:image ? Looks ugly and not easy to make any particular option optional. If needed I can support this from GlusterFS backend. Kevin, Markus: Any thoughts on passing options to block drivers? Encoding GlusterFS options into a filename string is pretty cumbersome.
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
On Mon, Jul 23, 2012 at 10:34 AM, ronnie sahlberg ronniesahlb...@gmail.com wrote: in iscsi, i just specify those extra arguments that are required that are not part of the url itself as just command line options : qemu-system-i386 -iscsi initiator-name=iqn.qemu.test:my-initiator \ -boot d -drive file=iscsi://127.0.0.1/iqn.qemu.test/1 \ -cdrom iscsi://127.0.0.1/iqn.qemu.test/2 Here initiator-name is a custom option to the iscsi layer to tell it which name to use when identifying/logging in to the target. Similar concept could be used by glusterfs ? That works for global options only. If it's per-drive then this approach can't be used because different drives might need different option values. Stefan
Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2
On 07/23/2012 02:20 PM, Bharata B Rao wrote: 2. It is currently not possible to bypass RPC because the glusterd managed configuration file doesn't support that. It is possible. Gluster already supports custom extensions to volume names and it is possible to use the required volfile by specifying this custom volname extension. For eg, if I have a volume named test, by default the volfile used for it will be test-fuse.vol. Currently I can put my own custom volfile into the standard location and get glusterd pick that up. I can specify test.rpcbypass as volname and glusterd will pick test.rpcbypass.vol. What is currently not supported is the ability to create test.rpcbypass.vol from gluster CLI. I believe that gluster developers are ok with enhancing gluster CLI to support generating/regenerating volfiles for a given volume with custom translator set. Yes, this would be the preferred approach. We can tune the volume file generation to evolve the desired configuration file. I'm not sure if these statements are true? Would you support local volfiles in the future again? Why force users to run glusterd? I will let gluster folks on CC to answer this and let us know the benefits of always depending on glusterd. I guess running glusterd would be beneficial when supporting migration. QEMU working from a local volume (with volname=test.rpcbypass) can be easily restarted on a different node by just changing volname to test. glusterd will take care of fetching the right volfile automatically for us. Yes, running glusterd would be beneficial in migration. Without deriving the file from glusterd features like volume tuning, client monitoring etc. would not be available to to clients that talk to a gluster volume. Additionally, driving configuration generation and management through glusterd helps in standardizing and stabilizing gluster configurations. Has libgfapi been released yet? Its part of gluster mainline now. Does it have versioning which will allow the QEMU GlusterFS block driver to build against different versions? I'm just wondering how the pieces will fit together once distros start shipping them. I request gluster folks on CC to comment about version and shipping information. There is no release that contains libgfapi as yet. Once that is done, we can probably have the dependency specified in QEMU as well. Regards, Vijay
Re: [Qemu-devel] [PATCH 3/3] migration: block migration when streaming block jobs are running.
On Fri, Jul 20, 2012 at 8:32 PM, benoit.ca...@gmail.com wrote: From: Benoît Canet ben...@irqsave.net Signed-off-by: Benoit Canet ben...@irqsave.net --- migration.c |5 + 1 file changed, 5 insertions(+) diff --git a/migration.c b/migration.c index 8db1b43..dfce680 100644 --- a/migration.c +++ b/migration.c @@ -425,6 +425,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (bdrv_have_block_jobs()) { +error_set(errp, QERR_STREAMING_BLOCKS_MIGRATION); +return; +} I think bdrv_have_block_jobs() is too specific and would use bdrv_in_use(bs) here to give basically an EBUSY-type error. Stefan
Re: [Qemu-devel] [PATCH 1/3] block: Add bdrv_have_block_jobs() so migration code abort if needed.
Il 20/07/2012 21:32, benoit.ca...@gmail.com ha scritto: +bool bdrv_have_block_jobs(void) +{ +BlockDriverState *bs; + +QTAILQ_FOREACH(bs, bdrv_states, list) { +if (bs-job || bdrv_in_use(bs)) { +return true; +} +bdrv_close(bs); Why close the device here? Paolo +} + +return false; +} +
Re: [Qemu-devel] [PATCH] ide scsi: Mess with geometry only for hard disk devices
Il 23/07/2012 09:25, Markus Armbruster ha scritto: Ping? Markus Armbruster arm...@redhat.com writes: Legacy -drive cyls=... are now ignored completely when the drive doesn't back a hard disk device. Before, they were first checked against a hard disk's limits, then ignored. Signed-off-by: Markus Armbruster arm...@redhat.com --- hw/ide/qdev.c |3 ++- hw/scsi-disk.c |3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/hw/ide/qdev.c b/hw/ide/qdev.c index 22e58df..5ea9b8f 100644 --- a/hw/ide/qdev.c +++ b/hw/ide/qdev.c @@ -149,7 +149,8 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind kind) } blkconf_serial(dev-conf, dev-serial); -if (blkconf_geometry(dev-conf, dev-chs_trans, 65536, 16, 255) 0) { +if (kind != IDE_CD + blkconf_geometry(dev-conf, dev-chs_trans, 65536, 16, 255) 0) { return -1; } diff --git a/hw/scsi-disk.c b/hw/scsi-disk.c index 525816c..318318c 100644 --- a/hw/scsi-disk.c +++ b/hw/scsi-disk.c @@ -1750,7 +1750,8 @@ static int scsi_initfn(SCSIDevice *dev) } blkconf_serial(s-qdev.conf, s-serial); -if (blkconf_geometry(dev-conf, NULL, 65535, 255, 255) 0) { +if (dev-type == TYPE_DISK + blkconf_geometry(dev-conf, NULL, 65535, 255, 255) 0) { return -1; } Acked-by: Paolo Bonzini pbonz...@redhat.com Kevin, are you taking this patch? Paolo
Re: [Qemu-devel] [PATCH 3/3] migration: block migration when streaming block jobs are running.
Would int bdrv_are_busy(void) { BlockDriverState *bs; QTAILQ_FOREACH(bs, bdrv_states, list) { if (bs-job || bdrv_in_use(bs)) { return -EBUSY; } } return 0; } be more acceptable ? Benoît Le Monday 23 Jul 2012 à 10:55:41 (+0100), Stefan Hajnoczi a écrit : On Fri, Jul 20, 2012 at 8:32 PM, benoit.ca...@gmail.com wrote: From: Benoît Canet ben...@irqsave.net Signed-off-by: Benoit Canet ben...@irqsave.net --- migration.c |5 + 1 file changed, 5 insertions(+) diff --git a/migration.c b/migration.c index 8db1b43..dfce680 100644 --- a/migration.c +++ b/migration.c @@ -425,6 +425,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (bdrv_have_block_jobs()) { +error_set(errp, QERR_STREAMING_BLOCKS_MIGRATION); +return; +} I think bdrv_have_block_jobs() is too specific and would use bdrv_in_use(bs) here to give basically an EBUSY-type error. Stefan
Re: [Qemu-devel] [PATCH 1/3] block: Add bdrv_have_block_jobs() so migration code abort if needed.
Why close the device here? Thanks Benoît
Re: [Qemu-devel] [PATCH 3/3] migration: block migration when streaming block jobs are running.
On Mon, Jul 23, 2012 at 11:17 AM, Benoît Canet benoit.ca...@irqsave.net wrote: Would int bdrv_are_busy(void) { BlockDriverState *bs; QTAILQ_FOREACH(bs, bdrv_states, list) { if (bs-job || bdrv_in_use(bs)) { return -EBUSY; } } return 0; } be more acceptable ? Looks fine to me. Stefan
[Qemu-devel] [PATCH] pc: Fix max_cpus
From: Dunrong Huang riegama...@gmail.com The VCPU count limit in kernel now is 254, defined by KVM_MAX_VCPUS in kernel's header files. But the count limit in QEMU is 255, so QEMU will failed to start if user passes -enable-kvm and -smp 255 to it. This patch intruduces a Macro MAX_VCPUS whose value is KVM_MAX_VCPUS if CONFIG_KVM is defined. If user do not use kvm, set it's value to 255. Signed-off-by: Dunrong Huang riegama...@gmail.com --- hw/pc_piix.c | 28 +++- 1 files changed, 19 insertions(+), 9 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 0c0096f..49cda51 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -49,6 +49,16 @@ #define MAX_IDE_BUS 2 +#ifndef KVM_MAX_VCPUS +#define KVM_MAX_VCPUS 254 +#endif + +#ifdef CONFIG_KVM +#define MAX_VCPUS KVM_MAX_VCPUS +#else +#define MAX_VCPUS 255 +#endif + static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 }; static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; @@ -354,7 +364,7 @@ static QEMUMachine pc_machine_v1_2 = { .alias = pc, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .is_default = 1, }; @@ -381,7 +391,7 @@ static QEMUMachine pc_machine_v1_1 = { .name = pc-1.1, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_1, { /* end of list */ } @@ -416,7 +426,7 @@ static QEMUMachine pc_machine_v1_0 = { .name = pc-1.0, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_1_0, { /* end of list */ } @@ -431,7 +441,7 @@ static QEMUMachine pc_machine_v0_15 = { .name = pc-0.15, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_15, { /* end of list */ } @@ -463,7 +473,7 @@ static QEMUMachine pc_machine_v0_14 = { .name = pc-0.14, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_14, { @@ -496,7 +506,7 @@ static QEMUMachine pc_machine_v0_13 = { .name = pc-0.13, .desc = Standard PC, .init = pc_init_pci_no_kvmclock, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_13, { @@ -533,7 +543,7 @@ static QEMUMachine pc_machine_v0_12 = { .name = pc-0.12, .desc = Standard PC, .init = pc_init_pci_no_kvmclock, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_12, { @@ -566,7 +576,7 @@ static QEMUMachine pc_machine_v0_11 = { .name = pc-0.11, .desc = Standard PC, qemu 0.11, .init = pc_init_pci_no_kvmclock, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_11, { @@ -587,7 +597,7 @@ static QEMUMachine pc_machine_v0_10 = { .name = pc-0.10, .desc = Standard PC, qemu 0.10, .init = pc_init_pci_no_kvmclock, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .compat_props = (GlobalProperty[]) { PC_COMPAT_0_11, { -- 1.7.8.6
Re: [Qemu-devel] [PATCH] pc: Fix max_cpus
Am 23.07.2012 12:47, schrieb riegama...@gmail.com: From: Dunrong Huang riegama...@gmail.com The VCPU count limit in kernel now is 254, defined by KVM_MAX_VCPUS in kernel's header files. But the count limit in QEMU is 255, so QEMU will failed to start if user passes -enable-kvm and -smp 255 to it. This patch intruduces a Macro MAX_VCPUS whose value is KVM_MAX_VCPUS if CONFIG_KVM is defined. If user do not use kvm, set it's value to 255. Signed-off-by: Dunrong Huang riegama...@gmail.com --- hw/pc_piix.c | 28 +++- 1 files changed, 19 insertions(+), 9 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 0c0096f..49cda51 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -49,6 +49,16 @@ #define MAX_IDE_BUS 2 +#ifndef KVM_MAX_VCPUS +#define KVM_MAX_VCPUS 254 +#endif + +#ifdef CONFIG_KVM +#define MAX_VCPUS KVM_MAX_VCPUS +#else +#define MAX_VCPUS 255 +#endif + static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 }; static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; @@ -354,7 +364,7 @@ static QEMUMachine pc_machine_v1_2 = { .alias = pc, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .is_default = 1, }; [snip] This is not so ideal: -enable-kvm is a runtime switch whereas you are changing a compile-time limit here. Any chance to change the runtime usage of .max_cpus instead? Possibly introducing a helper function? Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
Re: [Qemu-devel] Qemu-devel Digest, Vol 112, Issue 522
On Jul 23, 2012 5:36 PM, qemu-devel-requ...@nongnu.org wrote: Send Qemu-devel mailing list submissions to qemu-devel@nongnu.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.nongnu.org/mailman/listinfo/qemu-devel or, via email, send a message with subject or body 'help' to qemu-devel-requ...@nongnu.org You can reach the person managing the list at qemu-devel-ow...@nongnu.org When replying, please edit your Subject line so it is more specific than Re: Contents of Qemu-devel digest... Today's Topics: 1. Re: [RFC PATCH 2/2] block: gluster as block backend (Stefan Hajnoczi) 2. Re: [RFC PATCH 0/2] GlusterFS support in QEMU - v2 (Daniel P. Berrange) 3. Re: [RFC PATCH 0/2] GlusterFS support in QEMU - v2 (Stefan Hajnoczi) 4. Re: [RFC PATCH 0/2] GlusterFS support in QEMU - v2 (ronnie sahlberg) 5. Re: [RFC PATCH 0/2] GlusterFS support in QEMU - v2 (ronnie sahlberg) 6. Re: [RFC PATCH 0/2] GlusterFS support in QEMU - v2 (Stefan Hajnoczi) -- Message: 1 Date: Mon, 23 Jul 2012 10:06:40 +0100 From: Stefan Hajnoczi stefa...@gmail.com To: bhar...@linux.vnet.ibm.com Cc: Amar Tumballi ama...@redhat.com, Anand Avati aav...@redhat.com,qemu-devel@nongnu.org, Vijay Bellur vbel...@redhat.com Subject: Re: [Qemu-devel] [RFC PATCH 2/2] block: gluster as block backend Message-ID: cajsp0qu+exbjp9e_r2gaewhszkkaj8rpajoqdcouhy5jzpt...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Mon, Jul 23, 2012 at 9:32 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: On Sun, Jul 22, 2012 at 04:38:00PM +0100, Stefan Hajnoczi wrote: On Sat, Jul 21, 2012 at 9:31 AM, Bharata B Rao bhar...@linux.vnet.ibm.com wrote: +} GlusterAIOCB; + +typedef struct GlusterCBKData { +GlusterAIOCB *acb; +struct BDRVGlusterState *s; +int64_t size; +int ret; +} GlusterCBKData; I think GlusterCBKData could just be part of GlusterAIOCB. That would simplify the code a little and avoid some malloc/free. Are you suggesting to put a field GlusterCBKData gcbk; inside GlusterAIOCB and use gcbk from there or Are you suggesting that I make the fields of GlusterCBKData part of GlusterAIOCB and get rid of GlusterCBKData altogether ? This means I would have to pass the GlusterAIOCB to gluster async calls and update its fields from gluster callback routine. I can do this, but I am not sure if you can touch the fields of GlusterAIOCB in non-QEMU threads (gluster callback thread). The fields in GlusterCBKData could become part of GlusterAIOCB. Different threads can access fields in a struct, they just need to ensure access is synchronized if they touch the same fields. In the case of this code I think there is nothing that requires synchronization beyond the pipe mechanism that you already use to complete processing in a QEMU thread. When the argument is too long we should probably report an error instead of truncating. Or should we let gluster APIs to flag an error with truncated server and volume names ? What if the truncated name is a valid but different object? For example: Max chars = 5 Objects: helloworld hello If helloworld is truncated to hello we get no error back because it's a valid object! We need to either check sizes explicitly without truncating or use a g_strdup() approach without any size limits and let the gfapi functions error out if the input string is too long. +static struct glfs *qemu_gluster_init(GlusterConf *c, const char *filename) +{ +struct glfs *glfs = NULL; +int ret; + +ret = qemu_gluster_parsename(c, filename); +if (ret 0) { +errno = -ret; +goto out; +} + +glfs = glfs_new(c-volname); +if (!glfs) { +goto out; +} + +ret = glfs_set_volfile_server(glfs, socket, c-server, c-port); +if (ret 0) { +goto out; +} + +/* + * TODO: Logging is not necessary but instead nice to have. + * Can QEMU optionally log into a standard place ? QEMU prints to stderr, can you do that here too? The global log file is not okay, especially when multiple QEMU instances are running. Ok, I can do glfs_set_logging(glfs, /dev/stderr, loglevel); Yes. I think - is best since it is supported by gfapi (libglusterfs/src/logging.c:gf_log_init). /dev/stderr is not POSIX. + * Need to use defines like gf_loglevel_t:GF_LOG_INFO instead of + * hard coded values like 7 here. + */ +ret = glfs_set_logging(glfs, /tmp/qemu-gluster.log, 7); +if (ret 0) { +goto out; +} + +ret = glfs_init(glfs); +if (ret 0) { +goto out; +} +return glfs; +
Re: [Qemu-devel] Qemu-devel Digest, Vol 112, Issue 523
On Jul 23, 2012 6:47 PM, qemu-devel-requ...@nongnu.org wrote: Send Qemu-devel mailing list submissions to qemu-devel@nongnu.org To subscribe or unsubscribe via the World Wide Web, visit https://lists.nongnu.org/mailman/listinfo/qemu-devel or, via email, send a message with subject or body 'help' to qemu-devel-requ...@nongnu.org You can reach the person managing the list at qemu-devel-ow...@nongnu.org When replying, please edit your Subject line so it is more specific than Re: Contents of Qemu-devel digest... Today's Topics: 1. Re: [RFC PATCH 0/2] GlusterFS support in QEMU - v2 (Vijay Bellur) 2. Re: [PATCH 3/3] migration: block migration when streaming block jobs are running. (Stefan Hajnoczi) 3. Re: [PATCH 1/3] block: Add bdrv_have_block_jobs() so migration code abort if needed. (Paolo Bonzini) 4. Re: [PATCH] ide scsi: Mess with geometry only for harddisk devices (Paolo Bonzini) 5. Re: [PATCH 3/3] migration: block migration when streaming block jobs are running. (Beno?t Canet) 6. Re: [PATCH 1/3] block: Add bdrv_have_block_jobs() so migration code abort if needed. (Beno?t Canet) 7. Re: [PATCH 3/3] migration: block migration when streaming block jobs are running. (Stefan Hajnoczi) 8. [PATCH] pc: Fix max_cpus (riegama...@gmail.com) -- Message: 1 Date: Mon, 23 Jul 2012 15:06:06 +0530 From: Vijay Bellur vbel...@redhat.com To: bhar...@linux.vnet.ibm.com Cc: Stefan Hajnoczi stefa...@gmail.com, Anand Avati aav...@redhat.com,qemu-devel@nongnu.org, Amar Tumballi ama...@redhat.com Subject: Re: [Qemu-devel] [RFC PATCH 0/2] GlusterFS support in QEMU - v2 Message-ID: 500d1b06.3020...@redhat.com Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 07/23/2012 02:20 PM, Bharata B Rao wrote: 2. It is currently not possible to bypass RPC because the glusterd managed configuration file doesn't support that. It is possible. Gluster already supports custom extensions to volume names and it is possible to use the required volfile by specifying this custom volname extension. For eg, if I have a volume named test, by default the volfile used for it will be test-fuse.vol. Currently I can put my own custom volfile into the standard location and get glusterd pick that up. I can specify test.rpcbypass as volname and glusterd will pick test.rpcbypass.vol. What is currently not supported is the ability to create test.rpcbypass.vol from gluster CLI. I believe that gluster developers are ok with enhancing gluster CLI to support generating/regenerating volfiles for a given volume with custom translator set. Yes, this would be the preferred approach. We can tune the volume file generation to evolve the desired configuration file. I'm not sure if these statements are true? Would you support local volfiles in the future again? Why force users to run glusterd? I will let gluster folks on CC to answer this and let us know the benefits of always depending on glusterd. I guess running glusterd would be beneficial when supporting migration. QEMU working from a local volume (with volname=test.rpcbypass) can be easily restarted on a different node by just changing volname to test. glusterd will take care of fetching the right volfile automatically for us. Yes, running glusterd would be beneficial in migration. Without deriving the file from glusterd features like volume tuning, client monitoring etc. would not be available to to clients that talk to a gluster volume. Additionally, driving configuration generation and management through glusterd helps in standardizing and stabilizing gluster configurations. Has libgfapi been released yet? Its part of gluster mainline now. Does it have versioning which will allow the QEMU GlusterFS block driver to build against different versions? I'm just wondering how the pieces will fit together once distros start shipping them. I request gluster folks on CC to comment about version and shipping information. There is no release that contains libgfapi as yet. Once that is done, we can probably have the dependency specified in QEMU as well. Regards, Vijay -- Message: 2 Date: Mon, 23 Jul 2012 10:55:41 +0100 From: Stefan Hajnoczi stefa...@gmail.com To: benoit.ca...@gmail.com Cc: Kevin Wolf kw...@redhat.com, Beno?t Canet ben...@irqsave.net, qemu-devel@nongnu.org, stefa...@linux.vnet.ibm.com Subject: Re: [Qemu-devel] [PATCH 3/3] migration: block migration when streaming block jobs are running. Message-ID: cajsp0qx6krazmkxmfvwhxjdizg6x5uvhqzgshvfqk+tsrxe...@mail.gmail.com Content-Type: text/plain; charset=ISO-8859-1 On Fri, Jul 20, 2012 at 8:32 PM, benoit.ca...@gmail.com wrote: From: Beno?t
Re: [Qemu-devel] [PATCH v2 4/4] ARM: exynos4210_pmu: Add software reset support
20.07.2012 18:32, Peter Maydell пишет: On 12 July 2012 17:54, Maksim Kozlovm.koz...@samsung.com wrote: Signed-off-by: Maksim Kozlovm.koz...@samsung.com --- hw/exynos4210_pmu.c | 40 +--- 1 files changed, 33 insertions(+), 7 deletions(-) diff --git a/hw/exynos4210_pmu.c b/hw/exynos4210_pmu.c index 7f09c79..96588d9 100644 --- a/hw/exynos4210_pmu.c +++ b/hw/exynos4210_pmu.c @@ -18,13 +18,8 @@ * with this program; if not, seehttp://www.gnu.org/licenses/. */ -/* - * This model implements PMU registers just as a bulk of memory. Currently, - * the only reason this device exists is that secondary CPU boot loader - * uses PMU INFORM5 register as a holding pen. - */ - #include sysbus.h +#include sysemu.h #ifndef DEBUG_PMU #define DEBUG_PMU 0 @@ -230,6 +225,8 @@ #define EXYNOS4210_PMU_REGS_MEM_SIZE 0x3d0c +#define SWRESET_SYSTEM_MASK 0x0001 + typedef struct Exynos4210PmuReg { const char *name; /* for debug only */ uint32_t offset; @@ -458,7 +455,17 @@ static void exynos4210_pmu_write(void *opaque, target_phys_addr_t offset, PRINT_DEBUG_EXTEND(%s [0x%04x]- 0x%04x\n, exynos4210_pmu_regs[index].name, (uint32_t)offset, (uint32_t)val); -s-reg[index] = val; +switch (offset) { +case SWRESET: +if (val SWRESET_SYSTEM_MASK) { +s-reg[index] = val; +qemu_system_reset_request(); +} +break; +default: +s-reg[index] = val; +break; +} } static const MemoryRegionOps exynos4210_pmu_ops = { @@ -477,9 +484,28 @@ static void exynos4210_pmu_reset(DeviceState *dev) Exynos4210PmuState *s = container_of(dev, Exynos4210PmuState, busdev.qdev); unsigned i; +uint32_t index = exynos4210_pmu_get_register_index(s, SWRESET); +uint32_t swreset = s-reg[index]; /* Set default values for registers */ for (i = 0; i PMU_NUM_OF_REGISTERS; i++) { +if (swreset) { +switch (exynos4210_pmu_regs[i].offset) { +case INFORM0: +case INFORM1: +case INFORM2: +case INFORM3: +case INFORM4: +case INFORM5: +case INFORM6: +case INFORM7: +case PS_HOLD_CONTROL: +/* keep these registers during SW reset */ +continue; +default: +break; +} +} s-reg[i] = exynos4210_pmu_regs[i].reset_value; } } This patch seems to be trying to make a distinction that QEMU doesn't support, ie between system wide software reset and system wide hard reset. I'm not convinced about the wisdom of trying to paper over this lack with a single-device workaround. Peter, if we add callback (*swreset)() into DeviceClass, whether it will be an acceptable solution? And devices which have different behavior during swreset can register different function for that. What do you think about this? -- PMM -- Best regards, Maksim Kozlov Leading Engineer Advanced Software Group, Samsung Moscow Research Center e-mail: m.koz...@samsung.com Tel: 7(495) 797-25-00 ext. 3949
Re: [Qemu-devel] [PATCH v2] MP initialization protocol differs between cpu families, and for P6 and onward models it is up to CPU to decide if it will be BSP using this protocol, so try to model this.
Am 12.07.2012 15:22, schrieb Igor Mammedov: This patch: - moves decision to designate BSP from board into cpu, making cpu self-sufficient in this regard. Later it will allow to cleanup hw/pc.c and remove cpu_reset and wrappers from there. - stores flag that CPU is BSP in IA32_APIC_BASE to model behavior described in Inted SDM vol 3a part 1 chapter 8.4.1 - uses MSR_IA32_APICBASE_BSP flag in apic_base for checking if cpu is BSP patch is based on Jan Kiszka's proposal: http://thread.gmane.org/gmane.comp.emulators.qemu/100806 v2: - fix build for i386-linux-user spotted-by: Peter Maydell peter.mayd...@linaro.org v3: - style change requested by Andreas Färber afaer...@suse.de v4: - reuse cpu_is_bsp() rather than open code check if apicbase has BSP bit set requested by Gleb Natapov g...@redhat.com - hijacked Andreas' patch [1] to use X86CPU instead of CPUX86State in cpu_is_bsp() 1) http://lists.gnu.org/archive/html/qemu-devel/2012-05/msg03185.html Signed-off-by: Igor Mammedov imamm...@redhat.com --- hw/apic.h|5 - hw/apic_common.c | 16 +--- hw/pc.c |9 - target-i386/cpu.c| 18 ++ target-i386/helper.c |1 - target-i386/kvm.c|4 +++- 6 files changed, 38 insertions(+), 15 deletions(-) diff --git a/hw/apic.h b/hw/apic.h index 62179ce..4da10b6 100644 --- a/hw/apic.h +++ b/hw/apic.h @@ -20,9 +20,12 @@ void apic_init_reset(DeviceState *s); void apic_sipi(DeviceState *s); void apic_handle_tpr_access_report(DeviceState *d, target_ulong ip, TPRAccess access); +void apic_designate_bsp(DeviceState *d); /* pc.c */ -int cpu_is_bsp(CPUX86State *env); DeviceState *cpu_get_current_apic(void); +/* cpu.c */ +bool cpu_is_bsp(X86CPU *cpu); + #endif diff --git a/hw/apic_common.c b/hw/apic_common.c index 60b8259..58e63b0 100644 --- a/hw/apic_common.c +++ b/hw/apic_common.c @@ -43,8 +43,8 @@ uint64_t cpu_get_apic_base(DeviceState *d) trace_cpu_get_apic_base((uint64_t)s-apicbase); return s-apicbase; } else { -trace_cpu_get_apic_base(0); -return 0; +trace_cpu_get_apic_base(MSR_IA32_APICBASE_BSP); +return MSR_IA32_APICBASE_BSP; } } @@ -201,13 +201,23 @@ void apic_init_reset(DeviceState *d) s-timer_expiry = -1; } +void apic_designate_bsp(DeviceState *d) +{ +if (d == NULL) { +return; +} + +APICCommonState *s = APIC_COMMON(d); +s-apicbase |= MSR_IA32_APICBASE_BSP; +} + static void apic_reset_common(DeviceState *d) { APICCommonState *s = DO_UPCAST(APICCommonState, busdev.qdev, d); APICCommonClass *info = APIC_COMMON_GET_CLASS(s); bool bsp; -bsp = cpu_is_bsp(s-cpu_env); +bsp = cpu_is_bsp(x86_env_get_cpu(s-cpu_env)); s-apicbase = 0xfee0 | (bsp ? MSR_IA32_APICBASE_BSP : 0) | MSR_IA32_APICBASE_ENABLE; diff --git a/hw/pc.c b/hw/pc.c index c7e9ab3..50c1715 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -871,12 +871,6 @@ void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd) nb_ne2k++; } -int cpu_is_bsp(CPUX86State *env) -{ -/* We hard-wire the BSP to the first CPU. */ -return env-cpu_index == 0; -} - DeviceState *cpu_get_current_apic(void) { if (cpu_single_env) { @@ -927,10 +921,7 @@ void pc_acpi_smi_interrupt(void *opaque, int irq, int level) static void pc_cpu_reset(void *opaque) { X86CPU *cpu = opaque; -CPUX86State *env = cpu-env; - cpu_reset(CPU(cpu)); -env-halted = !cpu_is_bsp(env); } static X86CPU *pc_new_cpu(const char *cpu_model) diff --git a/target-i386/cpu.c b/target-i386/cpu.c index 5521709..0c38b7f 100644 --- a/target-i386/cpu.c +++ b/target-i386/cpu.c @@ -1686,6 +1686,24 @@ static void x86_cpu_reset(CPUState *s) env-dr[7] = DR7_FIXED_1; cpu_breakpoint_remove_all(env, BP_CPU); cpu_watchpoint_remove_all(env, BP_CPU); + +#if !defined(CONFIG_USER_ONLY) +/* We hard-wire the BSP to the first CPU. */ +if (env-cpu_index == 0) { +apic_designate_bsp(env-apic_state); +} + +env-halted = !cpu_is_bsp(cpu); +#endif +} + +#ifndef CONFIG_USER_ONLY +bool cpu_is_bsp(X86CPU *cpu) +{ +return cpu_get_apic_base(cpu-env.apic_state) MSR_IA32_APICBASE_BSP; +} +#endif + } I'm okay with this approach too, but I think the above brace is a merge conflict? Did you git-grep for cpu_is_bsp to be sure you caught all usages? Andreas static void mce_init(X86CPU *cpu) diff --git a/target-i386/helper.c b/target-i386/helper.c index d3af6ea..b748d90 100644 --- a/target-i386/helper.c +++ b/target-i386/helper.c @@ -1191,7 +1191,6 @@ void do_cpu_init(X86CPU *cpu) env-interrupt_request = sipi; env-pat = pat; apic_init_reset(env-apic_state); -env-halted = !cpu_is_bsp(env); } void
Re: [Qemu-devel] [PATCH 2/6] s390: sclp base support
Andreas, thanks for having a look. I no other comments arrive today I will resubmit the patch set with all comments addressed. On 20/07/12 16:06, Andreas Färber wrote: +#ifdef DEBUG_HELPER +printf(KVM: invalid sclp call 0x%x / 0x% PRIx64 x\n, sccb, code); sccb is a pointer so %x seems wrong for 64-bit host. %p? yes its broken. We will probably just remove that, since it will be changed by followon patch. +typedef struct S390SCLPDevice { +DeviceState qdev; I note that this is probably the first device directly derived from DeviceState. This should be possible now after having merged the QOM QBus series. Test case to check is info qdm from the monitor. info qdm seems to work. what test did you have in mind? [...] //#define DEBUG_S390 @@ -61,6 +62,7 @@ #define MAX_BLK_DEVS10 static VirtIOS390Bus *s390_bus; +SCLPS390Bus *sclp_bus; I don't like this so much. We shouldn't be following that example and adding global state for each bus there. Can't we add a child property to the machine for the hosting device so that we can obtain that via QOM path and access the bus from there? Since Blue already compained Heinz already moved that bus into s390-sclp.c as a static variable without any export since we only touch that there. We also added an comment that by architecture definition there is only one sclp per machine, so this should be fine. --- a/hw/s390x/Makefile.objs +++ b/hw/s390x/Makefile.objs @@ -1,3 +1,4 @@ obj-y = s390-virtio-bus.o s390-virtio.o +obj-y += s390-sclp.o obj-y := $(addprefix ../,$(obj-y)) IIUC we have a dependency on the CPU inside the device code. If so, please move the s390-sclp.c file into hw/s390x/ and move the obj-y += line below the addprefix line. Ok. diff --git a/target-s390x/cpu.c b/target-s390x/cpu.c index 619b202..8900872 100644 --- a/target-s390x/cpu.c +++ b/target-s390x/cpu.c @@ -21,9 +21,26 @@ */ #include cpu.h +#include kvm.h #include qemu-common.h #include qemu-timer.h +/* service interrupts are floating therefore we must not pass an cpustate */ +void s390_sclp_extint(uint32_t parm) +{ +S390CPU *dummy_cpu = s390_cpu_addr2state(0); +CPUS390XState *env = dummy_cpu-env; + +if (kvm_enabled()) { +#ifdef CONFIG_KVM +kvm_s390_interrupt_internal(env, KVM_S390_INT_SERVICE, parm, 0, 1); +#endif +} else { +env-psw.addr += 4; +cpu_inject_ext(env, EXT_SERVICE, parm, 0); +} +} Not so happy about this placement, it is not called s390_cpu_, the prototype is in cpu.h not cpu-qom.h and it operates only indirectly on the CPU. Is there another place for it? Not yet. I will try to introduce a new one, e.g. target-s390x/interrupt.c We can then move other code there as well.
Re: [Qemu-devel] [PATCH v2 4/4] ARM: exynos4210_pmu: Add software reset support
On 23 July 2012 12:01, Maksim Kozlov m.koz...@samsung.com wrote: 20.07.2012 18:32, Peter Maydell пишет: This patch seems to be trying to make a distinction that QEMU doesn't support, ie between system wide software reset and system wide hard reset. I'm not convinced about the wisdom of trying to paper over this lack with a single-device workaround. Peter, if we add callback (*swreset)() into DeviceClass, whether it will be an acceptable solution? And devices which have different behavior during swreset can register different function for that. What do you think about this? I think reset is a key part of device lifecycle and adding support for 'warm reset' or whatever needs general discussion so we can get the semantics right... cc'ing a couple of people who might have opinions. -- PMM
Re: [Qemu-devel] [Bug 992067] Re: Windows 2008R2 very slow cold boot when 4GB memory
We have been experiencing this problem for a while now too, using qemu-kvm (currently at 1.1.1). Unfortunately, hv_relaxed doesn't seem to fix it. The following command line produces the issue: qemu-kvm -nodefaults -m 4096 -smp 8 -cpu host,hv_relaxed -vga cirrus -usbdevice tablet -vnc :99 -monitor stdio -hda test.img The hardware consists of dual AMD Opteron 6128 processors (16 cores in total) and 64GB of memory. This command line was tested on kernel 3.1.4. I've also tested with -no-hpet. What I have seen is much as described: the memory fills out slowly, and top on the host will show the process using 100% on all allocated CPU cores. The most extreme case was a machine which took something between 6 and 8 hours to boot. This seems to be related to the assigned memory, as described, but also the number of processor cores (which makes sense if we believe it's a timing issue?). I have seen slow-booting guests improved by switching down to a single or even two cores. Matthew, I agree that this seems to be linked to the number of VMs running - in fact, shutting down other VMs on a dedicated test host caused the machine to start booting at a normal speed (with no reboot required). However, the level of contention is never such that this could be explained by the host simply being overcommitted. If it helps anyone, there's an image of the hard drive I've been using to test at: http://46.20.114.253/ It's 5G of gzip file containing a fairly standard Windows 2008 trial installation. Since it's in the trial period, anyone who wants to use it may have to re-arm the trial: http://support.microsoft.com/kb/948472 Please let me know if I can provide any more information, or test anything. Best wishes, Owen Tuz
[Qemu-devel] [PATCH V2 0/3] Block migration when streaming block jobs are running
From: Benoît Canet ben...@irqsave.net This patchset is designed to avoid starting a live migration while one or more streaming block jobs are running. Tested with the following sequence: QEMU 1.1.50 monitor - type 'help' for more information (qemu) block_stream virtio0 1k (qemu) migrate tcp:localhost: migrate: Streaming blocks migration (qemu) migrate tcp:localhost: migrate: Streaming blocks migration (qemu) block_job_cancel virtio0 (qemu) migrate tcp:localhost: migrate: Connection can not be completed immediately (qemu) = migration then succeed in v2: stefanha: Rename bdrv_have_block_jobs() to bdrv_are_busy() and make it return -EBUSY. paolo: remove spurious bdrv_close() Benoît Canet (3): block: Add bdrv_are_busy() so migration code abort if needed. qerror: Add error telling that streaming blocks migration migration: block migration when streaming block jobs are running. block.c | 13 + block.h |2 ++ migration.c |5 + qerror.c|4 qerror.h|3 +++ 5 files changed, 27 insertions(+) -- 1.7.9.5
[Qemu-devel] [PATCH V2 2/3] qerror: Add error telling that streaming blocks migration
From: Benoît Canet ben...@irqsave.net Signed-off-by: Benoit Canet ben...@irqsave.net --- qerror.c |4 qerror.h |3 +++ 2 files changed, 7 insertions(+) diff --git a/qerror.c b/qerror.c index 92c4eff..bcd74b7 100644 --- a/qerror.c +++ b/qerror.c @@ -283,6 +283,10 @@ static const QErrorStringTable qerror_table[] = { .desc = Could not set password, }, { +.error_fmt = QERR_STREAMING_BLOCKS_MIGRATION, +.desc = Streaming blocks migration, +}, +{ .error_fmt = QERR_TOO_MANY_FILES, .desc = Too many open files, }, diff --git a/qerror.h b/qerror.h index b4c8758..95d9e8d 100644 --- a/qerror.h +++ b/qerror.h @@ -233,6 +233,9 @@ QError *qobject_to_qerror(const QObject *obj); #define QERR_SET_PASSWD_FAILED \ { 'class': 'SetPasswdFailed', 'data': {} } +#define QERR_STREAMING_BLOCKS_MIGRATION \ +{ 'class': 'StreamingFeatureBlocksMigration', 'data': {} } + #define QERR_TOO_MANY_FILES \ { 'class': 'TooManyFiles', 'data': {} } -- 1.7.9.5
[Qemu-devel] [PATCH V2 1/3] block: Add bdrv_are_busy()
From: Benoît Canet ben...@irqsave.net bdrv_are_busy will be used to check if any of the bs are in use or if one of them have a running block job. The first user will be qmp_migrate(). Signed-off-by: Benoit Canet ben...@irqsave.net --- block.c | 13 + block.h |2 ++ 2 files changed, 15 insertions(+) diff --git a/block.c b/block.c index ce7eb8f..bc8f160 100644 --- a/block.c +++ b/block.c @@ -4027,6 +4027,19 @@ out: return ret; } +int bdrv_are_busy(void) +{ +BlockDriverState *bs; + +QTAILQ_FOREACH(bs, bdrv_states, list) { +if (bs-job || bdrv_in_use(bs)) { +return -EBUSY; +} +} + +return 0; +} + void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs, int64_t speed, BlockDriverCompletionFunc *cb, void *opaque, Error **errp) diff --git a/block.h b/block.h index c89590d..0a3de2f 100644 --- a/block.h +++ b/block.h @@ -337,6 +337,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs); void bdrv_set_in_use(BlockDriverState *bs, int in_use); int bdrv_in_use(BlockDriverState *bs); +int bdrv_are_busy(void); + enum BlockAcctType { BDRV_ACCT_READ, BDRV_ACCT_WRITE, -- 1.7.9.5
[Qemu-devel] [PATCH V2 3/3] migration: block migration when streaming block jobs are running.
From: Benoît Canet ben...@irqsave.net Signed-off-by: Benoit Canet ben...@irqsave.net --- migration.c |5 + 1 file changed, 5 insertions(+) diff --git a/migration.c b/migration.c index 8db1b43..5196d7e 100644 --- a/migration.c +++ b/migration.c @@ -425,6 +425,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (bdrv_are_busy()) { +error_set(errp, QERR_STREAMING_BLOCKS_MIGRATION); +return; +} + s = migrate_init(params); if (strstart(uri, tcp:, p)) { -- 1.7.9.5
Re: [Qemu-devel] [PATCH V2 2/3] qerror: Add error telling that streaming blocks migration
On 23 July 2012 12:23, benoit.ca...@gmail.com wrote: --- a/qerror.c +++ b/qerror.c @@ -283,6 +283,10 @@ static const QErrorStringTable qerror_table[] = { .desc = Could not set password, }, { +.error_fmt = QERR_STREAMING_BLOCKS_MIGRATION, +.desc = Streaming blocks migration, +}, An error should be a description of something that went wrong (this isn't supported, I couldn't do X, couldn't find Y, that isn't valid on devices of type Z). This text isn't describing anything that's gone wrong in any comprehensible way. In particular I just parsed Streaming blocks migration as (streaming blocks) migration ie migration of streaming blocks. If you meant Migration was blocked by streaming or something similar you need to rephrase. -- PMM
[Qemu-devel] [PATCH] check for available room when formatting OpenFirmware device path
Signed-off-by: Laszlo Ersek ler...@redhat.com --- hw/qdev.c | 14 +- vl.c |7 ++- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index af54467..f1e83a4 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -502,6 +502,10 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, char *p, int size) if (dev dev-parent_bus) { char *d; l = qdev_get_fw_dev_path_helper(dev-parent_bus-parent, p, size); +if (l = size) { +return l; +} + d = bus_get_fw_dev_path(dev-parent_bus, dev); if (d) { l += snprintf(p + l, size - l, %s, d); @@ -509,6 +513,10 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, char *p, int size) } else { l += snprintf(p + l, size - l, %s, object_get_typename(OBJECT(dev))); } + +if (l = size) { +return l; +} } l += snprintf(p + l , size - l, /); @@ -520,8 +528,12 @@ char* qdev_get_fw_dev_path(DeviceState *dev) char path[128]; int l; -l = qdev_get_fw_dev_path_helper(dev, path, 128); +l = qdev_get_fw_dev_path_helper(dev, path, sizeof(path)); +assert(l 0); +if (l = sizeof(path)) { +return NULL; +} path[l-1] = '\0'; return strdup(path); diff --git a/vl.c b/vl.c index 8904db1..78dcc93 100644 --- a/vl.c +++ b/vl.c @@ -913,7 +913,12 @@ char *get_boot_devices_list(uint32_t *size) if (i-dev) { devpath = qdev_get_fw_dev_path(i-dev); -assert(devpath); +if (devpath == NULL) { +fprintf(stderr, +OpenFirmware Device Path too long (boot index %d)\n, +i-bootindex); +exit(1); +} } if (i-suffix devpath) { -- 1.7.1
Re: [Qemu-devel] [seabios PATCH 1/2] acpi: set I/O APIC ID to 0 by default
On Fri, Jul 20, 2012 at 01:22:43PM -0300, Eduardo Habkost wrote: On Fri, Jul 20, 2012 at 12:18:59AM +0300, Gleb Natapov wrote: On Thu, Jul 19, 2012 at 05:52:41PM -0300, Eduardo Habkost wrote: When resetting an I/O APIC, its ID is set to 0, so set it to 0 on the MADT table too. Actually BIOS needs to configure ioapic id to a uniqe value. This does not really matter for KVM though. Where does this requirement comes from? I am guessing it matters only when the I/O APIC is directly connected to the APIC bus (according to Intel SDM, that's the case only for old Pentium and P6 CPUs)[1]. http://www.intel.com/design/chipsets/datashts/29056601.pdf says that it should be programmed to unique value. I checked 4 machines on 3 of them this was the case on one (AMD) there are 3 IOAPICs and they all overlap with APICs. Anyway, even if some hardware has this unique-ID requirement, today Seabios does not fulfill it, leaving the I/O APIC ID as 0. The patch at least makes the MADT table match reality. The currant code was written when we could only dream supporting 16 cpus and back than it did correct thing, but now it just broken. QEMU do not care about IOAPIC ID for sure so I am OK with the patch. [1] I have checked 3 different machines, and all machines I have looked have an I/O APIC ID that conflicts with an existing Local APIC ID, on the ACPI MADT table. Some iasl dumps may be found online by googling for: Subtable Type : 01 I/O Apic ID I looked at 5 or 6 matches, and almost every one have an I/O APIC ID conflicting with a Local APIC ID. 1 from 4 for me :) Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- src/acpi.c |2 +- src/config.h |2 ++ 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/src/acpi.c b/src/acpi.c index 55e4607..3f55de9 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -335,7 +335,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic-type = APIC_IO; io_apic-length = sizeof(*io_apic); -io_apic-io_apic_id = CountCPUs; +io_apic-io_apic_id = BUILD_IOAPIC_ID; io_apic-address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic-interrupt = cpu_to_le32(0); diff --git a/src/config.h b/src/config.h index 3a70867..878c691 100644 --- a/src/config.h +++ b/src/config.h @@ -52,9 +52,11 @@ #define BUILD_PCIMEM64_END0x100ULL #define BUILD_IOAPIC_ADDR 0xfec0 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS0xfed0 #define BUILD_APIC_ADDR 0xfee0 + // Important real-mode segments #define SEG_IVT 0x #define SEG_BDA 0x0040 -- 1.7.10.4 -- Gleb. -- Eduardo -- Gleb.
[Qemu-devel] [PATCH 1/2] usb: Clean common object and dependency files
Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index ab82ef3..1b3882a 100644 --- a/Makefile +++ b/Makefile @@ -218,7 +218,7 @@ clean: rm -Rf .libs rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d rm -f qom/*.o qom/*.d - rm -f usb/*.o usb/*.d hw/*.o hw/*.d + rm -f hw/usb/*.o hw/usb/*.d hw/*.o hw/*.d rm -f qemu-img-cmds.h rm -f trace/*.o trace/*.d rm -f trace-dtrace.dtrace trace-dtrace.dtrace-timestamp -- 1.7.3.4
[Qemu-devel] [PATCH 2/2] qom: Clean libuser object and dependency files
Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Makefile |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/Makefile b/Makefile index 1b3882a..964e592 100644 --- a/Makefile +++ b/Makefile @@ -217,7 +217,7 @@ clean: rm -f *.o *.d *.a *.lo $(TOOLS) $(HELPERS-y) qemu-ga TAGS cscope.* *.pod *~ */*~ rm -Rf .libs rm -f slirp/*.o slirp/*.d audio/*.o audio/*.d block/*.o block/*.d net/*.o net/*.d fsdev/*.o fsdev/*.d ui/*.o ui/*.d qapi/*.o qapi/*.d qga/*.o qga/*.d - rm -f qom/*.o qom/*.d + rm -f qom/*.o qom/*.d libuser/qom/*.o libuser/qom/*.d rm -f hw/usb/*.o hw/usb/*.d hw/*.o hw/*.d rm -f qemu-img-cmds.h rm -f trace/*.o trace/*.d -- 1.7.3.4
[Qemu-devel] [PATCH 03/19] qapi: add test case for deallocating traversal of incomplete structure
From: Laszlo Ersek ler...@redhat.com v3: - new patch Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- tests/test-qmp-commands.c | 42 ++ 1 file changed, 42 insertions(+) diff --git a/tests/test-qmp-commands.c b/tests/test-qmp-commands.c index 60cbf01..dc3c507 100644 --- a/tests/test-qmp-commands.c +++ b/tests/test-qmp-commands.c @@ -3,6 +3,9 @@ #include test-qmp-commands.h #include qapi/qmp-core.h #include module.h +#include qapi/qmp-input-visitor.h +#include tests/test-qapi-types.h +#include tests/test-qapi-visit.h void qmp_user_def_cmd(Error **errp) { @@ -123,6 +126,44 @@ static void test_dealloc_types(void) qapi_free_UserDefOneList(ud1list); } +/* test generated deallocation on an object whose construction was prematurely + * terminated due to an error */ +static void test_dealloc_partial(void) +{ +static const char text[] = don't leak me; + +UserDefTwo *ud2 = NULL; +Error *err = NULL; + +/* create partial object */ +{ +QDict *ud2_dict; +QmpInputVisitor *qiv; + +ud2_dict = qdict_new(); +qdict_put_obj(ud2_dict, string, QOBJECT(qstring_from_str(text))); + +qiv = qmp_input_visitor_new(QOBJECT(ud2_dict)); +visit_type_UserDefTwo(qmp_input_get_visitor(qiv), ud2, NULL, err); +qmp_input_visitor_cleanup(qiv); +QDECREF(ud2_dict); +} + +/* verify partial success */ +assert(ud2 != NULL); +assert(ud2-string != NULL); +assert(strcmp(ud2-string, text) == 0); +assert(ud2-dict.dict.userdef == NULL); + +/* confirm release construction error */ +assert(err != NULL); +error_free(err); + +/* tear down partial object */ +qapi_free_UserDefTwo(ud2); +} + + int main(int argc, char **argv) { g_test_init(argc, argv, NULL); @@ -131,6 +172,7 @@ int main(int argc, char **argv) g_test_add_func(/0.15/dispatch_cmd_error, test_dispatch_cmd_error); g_test_add_func(/0.15/dispatch_cmd_io, test_dispatch_cmd_io); g_test_add_func(/0.15/dealloc_types, test_dealloc_types); +g_test_add_func(/0.15/dealloc_partial, test_dealloc_partial); module_call_init(MODULE_INIT_QAPI); g_test_run(); -- 1.7.10.4
[Qemu-devel] [PATCH 04/19] qapi: generate C types for fixed-width integers
From: Laszlo Ersek ler...@redhat.com (Long line folded using parens: http://www.python.org/dev/peps/pep-0008/#maximum-line-length.) Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- scripts/qapi.py |4 1 file changed, 4 insertions(+) diff --git a/scripts/qapi.py b/scripts/qapi.py index e062336..1292476 100644 --- a/scripts/qapi.py +++ b/scripts/qapi.py @@ -159,6 +159,10 @@ def c_type(name): return 'char *' elif name == 'int': return 'int64_t' +elif (name == 'int8' or name == 'int16' or name == 'int32' or + name == 'int64' or name == 'uint8' or name == 'uint16' or + name == 'uint32' or name == 'uint64'): +return name + '_t' elif name == 'bool': return 'bool' elif name == 'number': -- 1.7.10.4
[Qemu-devel] [PATCH 12/19] convert net_init_nic() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com v1-v2: - NetLegacyNicOptions::vectors is of type uint32 Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net.c | 39 ++- 1 file changed, 22 insertions(+), 17 deletions(-) diff --git a/net.c b/net.c index af544b2..a62e902 100644 --- a/net.c +++ b/net.c @@ -748,12 +748,15 @@ int net_handle_fd_param(Monitor *mon, const char *param) return fd; } -static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, +static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { int idx; NICInfo *nd; -const char *netdev; +const NetLegacyNicOptions *nic; + +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_NIC); +nic = opts-nic; idx = nic_get_free_idx(); if (idx == -1 || nb_nics = MAX_NICS) { @@ -765,10 +768,10 @@ static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, memset(nd, 0, sizeof(*nd)); -if ((netdev = qemu_opt_get(opts, netdev))) { -nd-netdev = qemu_find_netdev(netdev); +if (nic-has_netdev) { +nd-netdev = qemu_find_netdev(nic-netdev); if (!nd-netdev) { -error_report(netdev '%s' not found, netdev); +error_report(netdev '%s' not found, nic-netdev); return -1; } } else { @@ -778,26 +781,28 @@ static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, if (name) { nd-name = g_strdup(name); } -if (qemu_opt_get(opts, model)) { -nd-model = g_strdup(qemu_opt_get(opts, model)); +if (nic-has_model) { +nd-model = g_strdup(nic-model); } -if (qemu_opt_get(opts, addr)) { -nd-devaddr = g_strdup(qemu_opt_get(opts, addr)); +if (nic-has_addr) { +nd-devaddr = g_strdup(nic-addr); } -if (qemu_opt_get(opts, macaddr) -net_parse_macaddr(nd-macaddr.a, qemu_opt_get(opts, macaddr)) 0) { +if (nic-has_macaddr +net_parse_macaddr(nd-macaddr.a, nic-macaddr) 0) { error_report(invalid syntax for ethernet address); return -1; } qemu_macaddr_default_if_unset(nd-macaddr); -nd-nvectors = qemu_opt_get_number(opts, vectors, - DEV_NVECTORS_UNSPECIFIED); -if (nd-nvectors != DEV_NVECTORS_UNSPECIFIED -(nd-nvectors 0 || nd-nvectors 0x7ff)) { -error_report(invalid # of vectors: %d, nd-nvectors); -return -1; +if (nic-has_vectors) { +if (nic-vectors 0x7ff) { +error_report(invalid # of vectors: %PRIu32, nic-vectors); +return -1; +} +nd-nvectors = nic-vectors; +} else { +nd-nvectors = DEV_NVECTORS_UNSPECIFIED; } nd-used = 1; -- 1.7.10.4
[Qemu-devel] [PATCH 09/19] qapi schema: add Netdev types
From: Laszlo Ersek ler...@redhat.com NetdevTapOptions::sndbuf and NetdevDumpOptions::len use the new size type. v1-v2: - NetLegacy::name is optional - NetLegacyNicOptions::vectors is of type uint32 - NetdevVdeOptions::port and ::mode are of type uint16 - NetLegacy::vlan has type int32 v2-v3: - NetLegacy::id is allowed and takes precedence over NetLegacy::name - replace @traits with @opts in NetLegacy Netdev descriptions Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qapi-schema.json | 278 ++ 1 file changed, 278 insertions(+) diff --git a/qapi-schema.json b/qapi-schema.json index d2f8e02..bc55ed2 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -1872,6 +1872,284 @@ { 'command': 'netdev_del', 'data': {'id': 'str'} } ## +# @NetdevNoneOptions +# +# Use it alone to have zero network devices. +# +# Since 1.2 +## +{ 'type': 'NetdevNoneOptions', + 'data': { } } + +## +# @NetLegacyNicOptions +# +# Create a new Network Interface Card. +# +# @netdev: #optional id of -netdev to connect to +# +# @macaddr: #optional MAC address +# +# @model: #optional device model (e1000, rtl8139, virtio etc.) +# +# @addr: #optional PCI device address +# +# @vectors: #optional number of MSI-x vectors, 0 to disable MSI-X +# +# Since 1.2 +## +{ 'type': 'NetLegacyNicOptions', + 'data': { +'*netdev': 'str', +'*macaddr': 'str', +'*model': 'str', +'*addr':'str', +'*vectors': 'uint32' } } + +## +# @String +# +# A fat type wrapping 'str', to be embedded in lists. +# +# Since 1.2 +## +{ 'type': 'String', + 'data': { +'str': 'str' } } + +## +# @NetdevUserOptions +# +# Use the user mode network stack which requires no administrator privilege to +# run. +# +# @hostname: #optional client hostname reported by the builtin DHCP server +# +# @restrict: #optional isolate the guest from the host +# +# @ip: #optional legacy parameter, use net= instead +# +# @net: #optional IP address and optional netmask +# +# @host: #optional guest-visible address of the host +# +# @tftp: #optional root directory of the built-in TFTP server +# +# @bootfile: #optional BOOTP filename, for use with tftp= +# +# @dhcpstart: #optional the first of the 16 IPs the built-in DHCP server can +# assign +# +# @dns: #optional guest-visible address of the virtual nameserver +# +# @smb: #optional root directory of the built-in SMB server +# +# @smbserver: #optional IP address of the built-in SMB server +# +# @hostfwd: #optional redirect incoming TCP or UDP host connections to guest +# endpoints +# +# @guestfwd: #optional forward guest TCP connections +# +# Since 1.2 +## +{ 'type': 'NetdevUserOptions', + 'data': { +'*hostname': 'str', +'*restrict': 'bool', +'*ip':'str', +'*net': 'str', +'*host': 'str', +'*tftp': 'str', +'*bootfile': 'str', +'*dhcpstart': 'str', +'*dns': 'str', +'*smb': 'str', +'*smbserver': 'str', +'*hostfwd': ['String'], +'*guestfwd': ['String'] } } + +## +# @NetdevTapOptions +# +# Connect the host TAP network interface name to the VLAN. +# +# @ifname: #optional interface name +# +# @fd: #optional file descriptor of an already opened tap +# +# @script: #optional script to initialize the interface +# +# @downscript: #optional script to shut down the interface +# +# @helper: #optional command to execute to configure bridge +# +# @sndbuf: #optional send buffer limit. Understands [TGMKkb] suffixes. +# +# @vnet_hdr: #optional enable the IFF_VNET_HDR flag on the tap interface +# +# @vhost: #optional enable vhost-net network accelerator +# +# @vhostfd: #optional file descriptor of an already opened vhost net device +# +# @vhostforce: #optional vhost on for non-MSIX virtio guests +# +# Since 1.2 +## +{ 'type': 'NetdevTapOptions', + 'data': { +'*ifname': 'str', +'*fd': 'str', +'*script': 'str', +'*downscript': 'str', +'*helper': 'str', +'*sndbuf': 'size', +'*vnet_hdr': 'bool', +'*vhost': 'bool', +'*vhostfd':'str', +'*vhostforce': 'bool' } } + +## +# @NetdevSocketOptions +# +# Connect the VLAN to a remote VLAN in another QEMU virtual machine using a TCP +# socket connection. +# +# @fd: #optional file descriptor of an already opened socket +# +# @listen: #optional port number, and optional hostname, to listen on +# +# @connect: #optional port number, and optional hostname, to connect to +# +# @mcast: #optional UDP multicast address and port number +# +# @localaddr: #optional source address and port for multicast and udp packets +# +# @udp: #optional UDP unicast address and port number +# +# Since 1.2 +## +{ 'type': 'NetdevSocketOptions', + 'data': { +'*fd':'str', +'*listen':'str', +'*connect': 'str', +'*mcast': 'str', +'*localaddr': 'str', +'*udp': 'str' } } + +## +#
[Qemu-devel] [PATCH 18/19] convert net_init_bridge() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net/tap.c | 23 --- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/net/tap.c b/net/tap.c index c5563c0..d2736ea 100644 --- a/net/tap.c +++ b/net/tap.c @@ -513,21 +513,22 @@ static int net_bridge_run_helper(const char *helper, const char *bridge) return -1; } -int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_bridge(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { +const NetdevBridgeOptions *bridge; +const char *helper, *br; + TAPState *s; int fd, vnet_hdr; -if (!qemu_opt_get(opts, br)) { -qemu_opt_set(opts, br, DEFAULT_BRIDGE_INTERFACE); -} -if (!qemu_opt_get(opts, helper)) { -qemu_opt_set(opts, helper, DEFAULT_BRIDGE_HELPER); -} +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_BRIDGE); +bridge = opts-bridge; + +helper = bridge-has_helper ? bridge-helper : DEFAULT_BRIDGE_HELPER; +br = bridge-has_br ? bridge-br : DEFAULT_BRIDGE_INTERFACE; -fd = net_bridge_run_helper(qemu_opt_get(opts, helper), - qemu_opt_get(opts, br)); +fd = net_bridge_run_helper(helper, br); if (fd == -1) { return -1; } @@ -542,8 +543,8 @@ int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, return -1; } -snprintf(s-nc.info_str, sizeof(s-nc.info_str), helper=%s,br=%s, - qemu_opt_get(opts, helper), qemu_opt_get(opts, br)); +snprintf(s-nc.info_str, sizeof(s-nc.info_str), helper=%s,br=%s, helper, + br); return 0; } -- 1.7.10.4
[Qemu-devel] [PATCH 13/19] convert net_init_dump() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com v1-v2: - NetdevDumpOptions::len is of type 'size', whose C type was changed to uint64_t. Adapt the printf() format specifier macro. Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net/dump.c | 21 + 1 file changed, 17 insertions(+), 4 deletions(-) diff --git a/net/dump.c b/net/dump.c index 27e9528..f3d2fa9 100644 --- a/net/dump.c +++ b/net/dump.c @@ -144,22 +144,35 @@ static int net_dump_init(VLANState *vlan, const char *device, return 0; } -int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_dump(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { int len; const char *file; char def_file[128]; +const NetdevDumpOptions *dump; + +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_DUMP); +dump = opts-dump; assert(vlan); -file = qemu_opt_get(opts, file); -if (!file) { +if (dump-has_file) { +file = dump-file; +} else { snprintf(def_file, sizeof(def_file), qemu-vlan%d.pcap, vlan-id); file = def_file; } -len = qemu_opt_get_size(opts, len, 65536); +if (dump-has_len) { +if (dump-len INT_MAX) { +error_report(invalid length: %PRIu64, dump-len); +return -1; +} +len = dump-len; +} else { +len = 65536; +} return net_dump_init(vlan, dump, name, file, len); } -- 1.7.10.4
[Qemu-devel] [PATCH 06/19] expose QemuOpt and QemuOpts struct definitions to interested parties
From: Laszlo Ersek ler...@redhat.com The only clients should be the existent qemu-option.c, and the upcoming qapi/opts-visitor.c. Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qemu-option-internal.h | 53 qemu-option.c | 24 +- 2 files changed, 54 insertions(+), 23 deletions(-) create mode 100644 qemu-option-internal.h diff --git a/qemu-option-internal.h b/qemu-option-internal.h new file mode 100644 index 000..19fdc1c --- /dev/null +++ b/qemu-option-internal.h @@ -0,0 +1,53 @@ +/* + * Commandline option parsing functions + * + * Copyright (c) 2003-2008 Fabrice Bellard + * Copyright (c) 2009 Kevin Wolf kw...@redhat.com + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#ifndef QEMU_OPTIONS_INTERNAL_H +#define QEMU_OPTIONS_INTERNAL_H + +#include qemu-option.h + +struct QemuOpt { +const char *name; +const char *str; + +const QemuOptDesc *desc; +union { +bool boolean; +uint64_t uint; +} value; + +QemuOpts *opts; +QTAILQ_ENTRY(QemuOpt) next; +}; + +struct QemuOpts { +char *id; +QemuOptsList *list; +Location loc; +QTAILQ_HEAD(QemuOptHead, QemuOpt) head; +QTAILQ_ENTRY(QemuOpts) next; +}; + +#endif diff --git a/qemu-option.c b/qemu-option.c index bb3886c..8334190 100644 --- a/qemu-option.c +++ b/qemu-option.c @@ -29,9 +29,9 @@ #include qemu-common.h #include qemu-error.h #include qemu-objects.h -#include qemu-option.h #include error.h #include qerror.h +#include qemu-option-internal.h /* * Extracts the name of an option from the parameter string (p points at the @@ -511,28 +511,6 @@ void print_option_help(QEMUOptionParameter *list) /* -- */ -struct QemuOpt { -const char *name; -const char *str; - -const QemuOptDesc *desc; -union { -bool boolean; -uint64_t uint; -} value; - -QemuOpts *opts; -QTAILQ_ENTRY(QemuOpt) next; -}; - -struct QemuOpts { -char *id; -QemuOptsList *list; -Location loc; -QTAILQ_HEAD(QemuOptHead, QemuOpt) head; -QTAILQ_ENTRY(QemuOpts) next; -}; - static QemuOpt *qemu_opt_find(QemuOpts *opts, const char *name) { QemuOpt *opt; -- 1.7.10.4
Re: [Qemu-devel] [RFC] introduce a dynamic library to expose qemu block API
Stefan Hajnoczi writes: [...] So I tried trimming down the list of files needed to compile qemu tools, and here is a list: Easy to relicense to LGPLv2+: block/raw.c none (GPLv2+: Red Hat, IBM) error.c LGPLv2 (Red Hat, IBM, Stefan Weil) iov.c GPLv2 (Red Hat, SuSE/Hannes Reinecke, Michael Tokarev) module.cGPLv2 (Red Hat, IBM, Blue Swirl) qemu-error.cGPLv2+ (Red Hat, Blue Swirl, IBM) trace/control.c GPLv2 (Lluis Vilanova) trace/default.c GPLv2 (Lluis Vilanova) (I added some people to Cc. Lluis and Michael, can you also look at http://wiki.qemu.org/Relicensing if you're willing to relicense your past contributions from GPLv2 to GPLv2+?. Blue Swirl said he'd accept any other GPLv2 or GPLv3 compatible license, which should include LGPLv2+). I have no problems relicensing to GPLv2 or later or GPLv3 or later. What about LGPLv2+? (Note the L.) I'd prefer to keep it non-lesser. Is it absolutely necessary? PS: I suppose the + stands for or later Lluis -- And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer. -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On Sat, 21 Jul 2012 15:16:56 +0200 Jan Kiszka jan.kis...@web.de wrote: On 2012-07-21 14:57, Peter Maydell wrote: On 21 July 2012 13:35, Jan Kiszka jan.kis...@web.de wrote: On 2012-07-21 14:17, Peter Maydell wrote: You still haven't really explained why we can't just ignore irqfd for now. I don't see how it would particularly affect the design of the kernel implementation very much, and the userspace interface seems to just be an extra ioctl. I bet you ignored MSI so far. Physical IRQ lines are just a part of the whole picture. How are MSIs delivered on the systems you want to add? You're using random acronyms without defining them again. It looks as if MSI is a PCI specific term. That would seem to me to fall under the heading of routing across a board model which we can't do anyway, because you have no idea how this all wired up, it will depend on the details of the SoC and the PCI controller. For sure you can. You need to discover those wiring, cache it, and then let the source inject to the final destination directly. See the INTx routing notifier and pci_device_route_intx_to_irq from [1] for that simplistic approach we are taking on the x86/PC architecture. Once you support the backend (KVM_SET_GSI_ROUTING + KVM_IRQ_LINE), adding irqfd is in fact simple. I don't really understand where KVM_SET_GSI_ROUTING comes into this -- the documentation is a bit opaque. It says Sets the GSI routing table entries but it doesn't define what a GSI is or what we're routing to where. Googling suggests GSI is an APIC specific term so it doesn't sound like it's relevant for non-x86. As I said before: GSI needs to be read as physical or virtual IRQ line. The virtual ones can be of any source you define, irqfd is just one. What's a virtual irq line in this context? We're modelling a physical bit of hardware which has N interrupt lines, so I'm not sure what a virtual irq line would be or how it would appear to the guest... A virtual line is an input of the in-kernel IRQ router you configure via SET_GSI_ROUTING. A physical line is a potential output of it that goes into some in-kernel interrupt controller model. It can also be an interrupt message sent to a specific CPU - provided the arch supports such a delivery protocol. Of course, the current router was modeled after x86 and ia64. So I wouldn't be surprised if some ARM system configuration cannot be expressed this way. Then we need to discuss reasonable extensions. But it should provide a sound foundation at least. OK, so I was reading through this thread since I want to add irqfd support for s390, but we don't have any kind of irqchip. The understanding I got so far is that !s390 architectures have some kind of mechanism that allows them to route an interrupt between a device and a cpu, meaning that there's a fixed tie-in between a device and a cpu. If that's correct, I don't see how to model irqfds via this irqchip infrastructure for s390. Here's in a nutshell how (external and I/O) interrupts work on s390: - Interrupts have an internal prioritation, that means different types of interrupts (external, I/O, machine check, ...) take precedent over other types - External and I/O interrupts are floating, i. e. they are not tied to a specific cpu, but can be delivered to any cpu that has external resp. I/O interrupts enabled - Interrupts take payload with them that defines which device they are for So, for example, if a specific subchannel (=device) has pending status and an I/O interrupt is to be generated, this interrupt remains pending until an arbitrary cpu is enabled for I/O interrupts. If several cpus are enabled for I/O interrupts, any of them may be interrupted. When an I/O interrupt is delivered on a cpu, the cpu's lowcore contains the interrupt payload which defines the subchannel (=device) the interrupt is for. Any idea on how this architecture can be married with the irqchip concept is welcome. If all else fails, would a special irqfd concept for !irqchip be acceptable? Cornelia
[Qemu-devel] [PULL 00/14] Migration next
Hi Anthony This series include the ram_save_live() split. XBZRLE patches got dropped until we fix a new bug. Please apply. Thanks, Juan. The following changes since commit 61dc008f3529fa74a63aad1907438dad857e255a: Revert audio: Make PC speaker audio card available by default (2012-07-19 18:25:52 -0500) are available in the git repository at: http://repo.or.cz/r/qemu/quintela.git for you to fetch changes up to 6c779f22a93cc6e4565b940ef616e3efc5b50ba5: Change ram_save_block to return -1 if there are no more changes (2012-07-23 14:02:28 +0200) Juan Quintela (12): savevm: Use a struct to pass all handlers savevm: Live migration handlers register the struct directly savevm: remove SaveSetParamsHandler savevm: remove SaveLiveStateHandler savevm: Refactor cancel operation in its own operation savevm: introduce is_active method savevm: split save_live_setup from save_live_state savevm: split save_live into stage2 and stage3 ram: save_live_setup() don't need to sent pages ram: save_live_complete() only do one loop ram: iterate phase ram: save_live_setup() we don't need to synchronize the dirty bitmap. Orit Wasserman (2): Add migration capabilities Change ram_save_block to return -1 if there are no more changes arch_init.c | 137 ++- block-migration.c | 153 +++-- hmp-commands.hx | 16 ++ hmp.c | 64 ++ hmp.h |2 + migration.c | 72 - migration.h |6 ++- monitor.c |7 +++ qapi-schema.json | 53 ++- qmp-commands.hx | 71 +++-- savevm.c | 77 +++ vl.c |3 +- vmstate.h | 18 --- 13 files changed, 525 insertions(+), 154 deletions(-) -- 1.7.10.4
[Qemu-devel] KVM call agenda for Tuesday, July 24
Hi Please send in any agenda items you are interested in covering. Later, Juan.
Re: [Qemu-devel] [RFC] introduce a dynamic library to expose qemu block API
Il 23/07/2012 13:55, Lluís Vilanova ha scritto: I have no problems relicensing to GPLv2 or later or GPLv3 or later. What about LGPLv2+? (Note the L.) I'd prefer to keep it non-lesser. Is it absolutely necessary? This is about making the block layer part of a library. The block layer is 90% under a BSD license, but the LGPLv2 is a good compromise. Paolo
Re: [Qemu-devel] [SeaBIOS PATCH 1/2] acpi: report real I/O APIC ID (0) on MADT table (v2)
On Fri, Jul 20, 2012 at 02:04:49PM -0300, Eduardo Habkost wrote: When resetting an I/O APIC, its ID is set to 0, and SeaBIOS doesn't change it, so report it correctly on the MADT table. Some hardware may require the BIOS to initialize I/O APIC ID to an unique value, but SeaBIOS doesn't do that. This patch at least makes the MADT table reflect reality. Changes v1 - v2: - Cosmetic: whitespace change (removed extra newline) - New patch description Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- src/acpi.c |2 +- src/config.h |1 + 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/src/acpi.c b/src/acpi.c index d39cbd9..da3bc57 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -336,7 +336,7 @@ build_madt(void) struct madt_io_apic *io_apic = (void*)apic; io_apic-type = APIC_IO; io_apic-length = sizeof(*io_apic); -io_apic-io_apic_id = CountCPUs; +io_apic-io_apic_id = BUILD_IOAPIC_ID; io_apic-address = cpu_to_le32(BUILD_IOAPIC_ADDR); io_apic-interrupt = cpu_to_le32(0); mptable also have ioapic_id. diff --git a/src/config.h b/src/config.h index 3a70867..0d4066d 100644 --- a/src/config.h +++ b/src/config.h @@ -52,6 +52,7 @@ #define BUILD_PCIMEM64_END0x100ULL #define BUILD_IOAPIC_ADDR 0xfec0 +#define BUILD_IOAPIC_ID 0 #define BUILD_HPET_ADDRESS0xfed0 #define BUILD_APIC_ADDR 0xfee0 -- 1.7.10.4 -- Gleb.
[Qemu-devel] [PATCH 07/19] qapi: introduce OptsVisitor
From: Laszlo Ersek ler...@redhat.com This visitor supports parsing -option [type=]discriminator[,optarg1=val1][,optarg2=val2][,...] style QemuOpts objects into native C structures. After defining the type tree in the qapi schema (see below), a root type traversal with this visitor linked to the underlying QemuOpts object will build the native C representation of the option. The type tree in the schema, corresponding to an option with a discriminator, must have the following structure: struct scalar member for non-discriminated optarg 1 [*] list for repeating non-discriminated optarg 2 [*] wrapper struct single scalar member union struct for discriminator case 1 scalar member for optarg 3 [*] list for repeating optarg 4 [*] wrapper struct single scalar member scalar member for optarg 5 [*] struct for discriminator case 2 ... The type optarg name is fixed for the discriminator role. Its schema representation is union of structures, and each discriminator value must correspond to a member name in the union. If the option takes no type descriminator, then the type subtree rooted at the union must be absent from the schema (including the union itself). Optarg values can be of scalar types str / bool / integers / size. Members marked with [*] may be defined as optional in the schema, describing an optional optarg. Repeating an optarg is supported; its schema representation must be list of structure with single mandatory scalar member. If an optarg is not described as repeating in the schema (ie. it is defined as a scalar field instead of a list), its last occurrence will take effect. Ordering between differently named optargs is not preserved. A mandatory list (or an optional one which is reported to be available), corresponding to a repeating optarg, has at least one element after successful parsing. v1-v2: - Update opts_type_size() prototype to uint64_t. - Add opts_type_uint64() for options needing the full uint64_t range. (Internals could be extracted to cutils.c.) - Allow negative values in opts_type_int(). - Rebase to nested Makefiles. v2-v3: - Factor opts_visitor_insert() out of opts_start_struct() and call it separately for opts_root-id if there's any. - Don't require non-negative values in opts_type_int()'s error message. - g_malloc0() may return NULL for zero-sized requests. Support empty structures by requesting 1 byte for them instead. Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qapi/Makefile.objs |2 +- qapi/opts-visitor.c | 427 +++ qapi/opts-visitor.h | 31 3 files changed, 459 insertions(+), 1 deletion(-) create mode 100644 qapi/opts-visitor.c create mode 100644 qapi/opts-visitor.h diff --git a/qapi/Makefile.objs b/qapi/Makefile.objs index d0b0c16..5f5846e 100644 --- a/qapi/Makefile.objs +++ b/qapi/Makefile.objs @@ -1,3 +1,3 @@ qapi-obj-y = qapi-visit-core.o qapi-dealloc-visitor.o qmp-input-visitor.o qapi-obj-y += qmp-output-visitor.o qmp-registry.o qmp-dispatch.o -qapi-obj-y += string-input-visitor.o string-output-visitor.o +qapi-obj-y += string-input-visitor.o string-output-visitor.o opts-visitor.o diff --git a/qapi/opts-visitor.c b/qapi/opts-visitor.c new file mode 100644 index 000..a59d306 --- /dev/null +++ b/qapi/opts-visitor.c @@ -0,0 +1,427 @@ +/* + * Options Visitor + * + * Copyright Red Hat, Inc. 2012 + * + * Author: Laszlo Ersek ler...@redhat.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include opts-visitor.h +#include qemu-queue.h +#include qemu-option-internal.h +#include qapi-visit-impl.h + + +struct OptsVisitor +{ +Visitor visitor; + +/* Ownership remains with opts_visitor_new()'s caller. */ +const QemuOpts *opts_root; + +unsigned depth; + +/* Non-null iff depth is positive. Each key is a QemuOpt name. Each value + * is a non-empty GQueue, enumerating all QemuOpt occurrences with that + * name. */ +GHashTable *unprocessed_opts; + +/* The list currently being traversed with opts_start_list() / + * opts_next_list(). The list must have a struct element type in the + * schema, with a single mandatory scalar member. */ +GQueue *repeated_opts; +bool repeated_opts_first; + +/* If opts_root-id is set, reinstantiate it as a fake QemuOpt for + * uniformity. Only its name and str fields are set. fake_id_opt does + * not survive or escape the OptsVisitor object. + */ +QemuOpt *fake_id_opt; +}; + + +static void +destroy_list(gpointer list) +{ + g_queue_free(list); +} + + +static void +opts_visitor_insert(GHashTable *unprocessed_opts, const QemuOpt *opt) +{ +GQueue *list; + +list = g_hash_table_lookup(unprocessed_opts, opt-name); +if (list == NULL) { +
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 07/23/2012 03:04 PM, Cornelia Huck wrote: OK, so I was reading through this thread since I want to add irqfd support for s390, but we don't have any kind of irqchip. The understanding I got so far is that !s390 architectures have some kind of mechanism that allows them to route an interrupt between a device and a cpu, meaning that there's a fixed tie-in between a device and a cpu. It's not fixed, but it changes rarely. Various interrupt routers can be programmed to change the route, but guests do so only rarely. If that's correct, I don't see how to model irqfds via this irqchip infrastructure for s390. Here's in a nutshell how (external and I/O) interrupts work on s390: - Interrupts have an internal prioritation, that means different types of interrupts (external, I/O, machine check, ...) take precedent over other types - External and I/O interrupts are floating, i. e. they are not tied to a specific cpu, but can be delivered to any cpu that has external resp. I/O interrupts enabled - Interrupts take payload with them that defines which device they are for So, for example, if a specific subchannel (=device) has pending status and an I/O interrupt is to be generated, this interrupt remains pending until an arbitrary cpu is enabled for I/O interrupts. If several cpus are enabled for I/O interrupts, any of them may be interrupted. This may be costly to emulate. On x86 we do not have access to a guest's interrupt status while it is running. Is this not the case for s390? Oh, let me guess. You write some interrupt descriptor in memory somewhere, issue one of your famous instructions, and the hardware finds a guest vcpu and injects the interrupt. x86 has a least priority mode which is similar to what you're describing, but I don't think we emulate it correctly. When an I/O interrupt is delivered on a cpu, the cpu's lowcore contains the interrupt payload which defines the subchannel (=device) the interrupt is for. Any idea on how this architecture can be married with the irqchip concept is welcome. If all else fails, would a special irqfd concept for !irqchip be acceptable? I don't see an issue. You need an arch-specific irqfd configuration ioctl (or your own data field in the existing ioctl) that defines the payload. Then the kernel installs a poll function on that eventfd that, when called, does the magic sequence needed to get the interrupt there. While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 02/19] qapi: fix error propagation
From: Paolo Bonzini pbonz...@redhat.com Don't overwrite / leak previously set errors. Make traversal cope with missing mandatory sub-structs. Don't try to end a container that could not be started. v1-v2: - unchanged v2-v3: - instead of examining, assert that we never overwrite errors with error_set() - allow visitors to set a NULL struct pointer successfully, so traversal of incomplete objects can continue - check for a NULL obj before accessing (*obj)-has_XXX (this is not a typo, obj != NULL implies *obj != NULL here) - fix start_struct / end_struct balance for unions as well Signed-off-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- docs/qapi-code-gen.txt |2 + error.c|3 +- error.h|2 +- qapi/qapi-visit-core.c | 10 ++- scripts/qapi-visit.py | 150 +--- tests/test-qmp-input-visitor.c | 24 --- 6 files changed, 121 insertions(+), 70 deletions(-) diff --git a/docs/qapi-code-gen.txt b/docs/qapi-code-gen.txt index ad11767..cccb11e 100644 --- a/docs/qapi-code-gen.txt +++ b/docs/qapi-code-gen.txt @@ -220,6 +220,8 @@ Example: #endif mdroth@illuin:~/w/qemu2.git$ +(The actual structure of the visit_type_* functions is a bit more complex +in order to propagate errors correctly and avoid leaking memory). === scripts/qapi-commands.py === diff --git a/error.c b/error.c index a52b771..58f55a0 100644 --- a/error.c +++ b/error.c @@ -32,6 +32,7 @@ void error_set(Error **errp, const char *fmt, ...) if (errp == NULL) { return; } +assert(*errp == NULL); err = g_malloc0(sizeof(*err)); @@ -132,7 +133,7 @@ bool error_is_type(Error *err, const char *fmt) void error_propagate(Error **dst_err, Error *local_err) { -if (dst_err) { +if (dst_err !*dst_err) { *dst_err = local_err; } else if (local_err) { error_free(local_err); diff --git a/error.h b/error.h index 45ff6c1..3d9d96d 100644 --- a/error.h +++ b/error.h @@ -57,7 +57,7 @@ void error_set_field(Error *err, const char *field, const char *value); /** * Propagate an error to an indirect pointer to an error. This function will * always transfer ownership of the error reference and handles the case where - * dst_err is NULL correctly. + * dst_err is NULL correctly. Errors after the first are discarded. */ void error_propagate(Error **dst_err, Error *local_err); diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c index 705eca9..d41595e 100644 --- a/qapi/qapi-visit-core.c +++ b/qapi/qapi-visit-core.c @@ -39,9 +39,8 @@ void visit_start_struct(Visitor *v, void **obj, const char *kind, void visit_end_struct(Visitor *v, Error **errp) { -if (!error_is_set(errp)) { -v-end_struct(v, errp); -} +assert(!error_is_set(errp)); +v-end_struct(v, errp); } void visit_start_list(Visitor *v, const char *name, Error **errp) @@ -62,9 +61,8 @@ GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp) void visit_end_list(Visitor *v, Error **errp) { -if (!error_is_set(errp)) { -v-end_list(v, errp); -} +assert(!error_is_set(errp)); +v-end_list(v, errp); } void visit_start_optional(Visitor *v, bool *present, const char *name, diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py index 8d4e94a..04ef7c4 100644 --- a/scripts/qapi-visit.py +++ b/scripts/qapi-visit.py @@ -17,32 +17,49 @@ import os import getopt import errno -def generate_visit_struct_body(field_prefix, members): -ret = +def generate_visit_struct_body(field_prefix, name, members): +ret = mcgen(''' +if (!error_is_set(errp)) { +''') +push_indent() + if len(field_prefix): field_prefix = field_prefix + . +ret += mcgen(''' +Error **errp = err; /* from outer scope */ +Error *err = NULL; +visit_start_struct(m, NULL, , %(name)s, 0, err); +''', +name=name) +else: +ret += mcgen(''' +Error *err = NULL; +visit_start_struct(m, (void **)obj, %(name)s, name, sizeof(%(name)s), err); +''', +name=name) + +ret += mcgen(''' +if (!err) { +if (!obj || *obj) { +''') + +push_indent() +push_indent() for argname, argentry, optional, structured in parse_args(members): if optional: ret += mcgen(''' -visit_start_optional(m, (obj *obj) ? (*obj)-%(c_prefix)shas_%(c_name)s : NULL, %(name)s, errp); -if ((*obj)-%(prefix)shas_%(c_name)s) { +visit_start_optional(m, obj ? (*obj)-%(c_prefix)shas_%(c_name)s : NULL, %(name)s, err); +if (obj (*obj)-%(prefix)shas_%(c_name)s) { ''', c_prefix=c_var(field_prefix), prefix=field_prefix, c_name=c_var(argname), name=argname) push_indent() if structured: -ret += mcgen(''' -visit_start_struct(m, NULL,
Re: [Qemu-devel] [SeaBIOS PATCH 2/2] allow CPUs to have non-contiguous Local APIC IDs (v2)
On Fri, Jul 20, 2012 at 02:04:50PM -0300, Eduardo Habkost wrote: Extract Local APIC IDs directly from the CPUs, and instead of check for i CountCPUs, check if the APIC ID was present on boot, when building ACPI tables and the MP-Table. This keeps ACPI Processor ID == APIC ID, but allows the hardware-SeaBIOS interface be completely APIC-ID based and not depend on any other kind of CPU identifier. This way, SeaBIOS may change the way ACPI Processor IDs are chosen in the future. As currently SeaBIOS supports only xAPIC and not x2APIC, the list of present-on-boot APIC IDs is a 256-bit bitmap. If one day SeaBIOS starts to support x2APIC, the data structure used to enumerate the APIC IDs will have to be changed (but this is an internal implementation detail, not visible to the OS or on any hardware=SeaBIOS interface). For current QEMU versions (that always make the APIC IDs contiguous), the OS-visible behavior and resulting ACPI tables should be exactly the same. This patch will simply allow QEMU to start setting non-contiguous APIC IDs (that is a requirement for some sockets/cores/threads topology settings). Changes v1 - v2: - Use size suffixes on all asm instructions on smp.c - New patch description Signed-off-by: Eduardo Habkost ehabk...@redhat.com --- src/acpi-dsdt.dsl |4 +++- src/acpi.c|9 + src/mptable.c |2 +- src/smp.c | 17 + src/util.h|1 + 5 files changed, 27 insertions(+), 6 deletions(-) diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl index 2060686..72dc7d8 100644 --- a/src/acpi-dsdt.dsl +++ b/src/acpi-dsdt.dsl @@ -676,6 +676,7 @@ DefinitionBlock ( /* Methods called by run-time generated SSDT Processor objects */ Method (CPMA, 1, NotSerialized) { // _MAT method - create an madt apic buffer +// Arg0 = Processor ID = Local APIC ID // Local0 = CPON flag for this cpu Store(DerefOf(Index(CPON, Arg0)), Local0) // Local1 = Buffer (in madt apic form) to return @@ -688,6 +689,7 @@ DefinitionBlock ( } Method (CPST, 1, NotSerialized) { // _STA method - return ON status of cpu +// Arg0 = Processor ID = Local APIC ID // Local0 = CPON flag for this cpu Store(DerefOf(Index(CPON, Arg0)), Local0) If (Local0) { Return(0xF) } Else { Return(0x0) } @@ -708,7 +710,7 @@ DefinitionBlock ( Store (PRS, Local5) // Local2 = last read byte from bitmap Store (Zero, Local2) -// Local0 = cpuid iterator +// Local0 = Processor ID / APIC ID iterator Store (Zero, Local0) While (LLess(Local0, SizeOf(CPON))) { // Local1 = CPON flag for this cpu diff --git a/src/acpi.c b/src/acpi.c index da3bc57..39b7172 100644 --- a/src/acpi.c +++ b/src/acpi.c @@ -327,7 +327,7 @@ build_madt(void) apic-length = sizeof(*apic); apic-processor_id = i; apic-local_apic_id = i; -if (i CountCPUs) +if (apic_id_is_present(apic-local_apic_id)) apic-flags = cpu_to_le32(1); else apic-flags = cpu_to_le32(0); @@ -445,6 +445,7 @@ build_ssdt(void) } // build Method(NTFY, 2) {If (LEqual(Arg0, 0x00)) {Notify(CP00, Arg1)} ...} +// Arg0 = Processor ID = APIC ID *(ssdt_ptr++) = 0x14; // MethodOp ssdt_ptr = encodeLen(ssdt_ptr, 2+5+(12*acpi_cpus), 2); *(ssdt_ptr++) = 'N'; @@ -477,7 +478,7 @@ build_ssdt(void) ssdt_ptr = encodeLen(ssdt_ptr, 2+1+(1*acpi_cpus), 2); *(ssdt_ptr++) = acpi_cpus; for (i=0; iacpi_cpus; i++) -*(ssdt_ptr++) = (i CountCPUs) ? 0x01 : 0x00; +*(ssdt_ptr++) = (apic_id_is_present(i)) ? 0x01 : 0x00; // store pci io windows: start, end, length // this way we don't have to do the math in the dsdt @@ -656,10 +657,10 @@ build_srat(void) core-proximity_lo = curnode; memset(core-proximity_hi, 0, 3); core-local_sapic_eid = 0; -if (i CountCPUs) +if (apic_id_is_present(i)) core-flags = cpu_to_le32(1); else -core-flags = 0; +core-flags = cpu_to_le32(0); core++; } diff --git a/src/mptable.c b/src/mptable.c index 103f462..9406f98 100644 --- a/src/mptable.c +++ b/src/mptable.c @@ -59,7 +59,7 @@ mptable_init(void) cpu-apicid = i; cpu-apicver = apic_version; /* cpu flags: enabled, bootstrap cpu */ -cpu-cpuflag = ((iCountCPUs) ? 0x01 : 0x00) | ((i==0) ? 0x02 : 0x00); +cpu-cpuflag = (apic_id_is_present(i) ? 0x01 : 0x00) | ((i==0) ? 0x02 : 0x00); cpu-cpusignature = cpuid_signature; cpu-featureflag = cpuid_features; cpu++; diff --git a/src/smp.c b/src/smp.c index 8c077a1..3c36f8c
[Qemu-devel] [PULL 00/19] Net patches
The main piece is Laszlo's OptsVisitor and -net/-netdev parsing conversion. The following changes since commit 61dc008f3529fa74a63aad1907438dad857e255a: Revert audio: Make PC speaker audio card available by default (2012-07-19 18:25:52 -0500) are available in the git repository at: git://github.com/stefanha/qemu.git net for you to fetch changes up to 1a0c09583df097d62b0580f9073ba45c9d18351a: remove unused QemuOpts parameter from net init functions (2012-07-23 11:55:18 +0100) Laszlo Ersek (17): qapi: add test case for deallocating traversal of incomplete structure qapi: generate C types for fixed-width integers qapi: introduce size type expose QemuOpt and QemuOpts struct definitions to interested parties qapi: introduce OptsVisitor qapi schema: remove trailing whitespace qapi schema: add Netdev types hw, net: net_client_type - NetClientOptionsKind (qapi-generated) convert net_client_init() to OptsVisitor convert net_init_nic() to NetClientOptions convert net_init_dump() to NetClientOptions convert net_init_slirp() to NetClientOptions convert net_init_socket() to NetClientOptions convert net_init_vde() to NetClientOptions convert net_init_tap() to NetClientOptions convert net_init_bridge() to NetClientOptions remove unused QemuOpts parameter from net init functions Paolo Bonzini (1): qapi: fix error propagation Stefan Hajnoczi (1): MAINTAINERS: Replace net maintainer Mark McLoughlin with Stefan Hajnoczi MAINTAINERS|3 +- docs/qapi-code-gen.txt |2 + error.c|3 +- error.h|2 +- hw/cadence_gem.c |2 +- hw/dp8393x.c |2 +- hw/e1000.c |2 +- hw/eepro100.c |2 +- hw/etraxfs_eth.c |2 +- hw/lan9118.c |2 +- hw/lance.c |2 +- hw/mcf_fec.c |2 +- hw/milkymist-minimac2.c|2 +- hw/mipsnet.c |2 +- hw/musicpal.c |2 +- hw/ne2000-isa.c|2 +- hw/ne2000.c|2 +- hw/opencores_eth.c |2 +- hw/pcnet-pci.c |2 +- hw/rtl8139.c |2 +- hw/smc91c111.c |2 +- hw/spapr_llan.c|2 +- hw/stellaris_enet.c|2 +- hw/usb/dev-network.c |2 +- hw/vhost_net.c |2 +- hw/virtio-net.c| 10 +- hw/xen_nic.c |2 +- hw/xgmac.c |2 +- hw/xilinx_axienet.c|2 +- hw/xilinx_ethlite.c|2 +- net.c | 494 +++- net.h | 16 +- net/dump.c | 24 +- net/dump.h |5 +- net/slirp.c| 96 +++- net/slirp.h|4 +- net/socket.c | 124 -- net/socket.h |5 +- net/tap-aix.c |2 +- net/tap-bsd.c |2 +- net/tap-haiku.c|2 +- net/tap-linux.c|9 +- net/tap-solaris.c |2 +- net/tap-win32.c| 14 +- net/tap.c | 152 ++--- net/tap.h | 10 +- net/vde.c | 20 +- net/vde.h |5 +- qapi-schema.json | 288 ++- qapi/Makefile.objs |2 +- qapi/opts-visitor.c| 427 ++ qapi/opts-visitor.h| 31 +++ qapi/qapi-visit-core.c | 17 +- qapi/qapi-visit-core.h |3 + qemu-option-internal.h | 53 + qemu-option.c | 24 +- scripts/qapi-visit.py | 150 +++- scripts/qapi.py|6 + tests/test-qmp-commands.c | 42 tests/test-qmp-input-visitor.c | 24 +- 60 files changed, 1352 insertions(+), 771 deletions(-) create mode 100644 qapi/opts-visitor.c create mode 100644 qapi/opts-visitor.h create mode 100644 qemu-option-internal.h -- 1.7.10.4
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 23 July 2012 13:18, Avi Kivity a...@redhat.com wrote: While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about. This seems an odd point of view -- async interrupt injection doesn't have anything to do with whether we're modelling the irqchip in the kernel or in QEMU, I thought... -- PMM
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 07/21/2012 11:54 AM, Peter Maydell wrote: On 21 July 2012 07:57, Jan Kiszka jan.kis...@web.de wrote: On 2012-07-20 21:14, Peter Maydell wrote: I'm sure this isn't the only x86ism in the KVM generic source files. However the thing I'm specifically trying to do is nuke all the uses of kvm_irqchip_in_kernel() in common code, No, irqchip in kernel is supposed to be a generic concept. We will also have it on Power. Not sure what your plans are for ARM, maybe it will always be true there. I agree that irqchip in kernel? is generic (though as you'll see below there's disagreement about what that ought to mean or imply). irq0_override though seems to me to be absolutely x86 specific. That said, maybe there is room for discussion about what it means for the general KVM code and its users if the irqchip is in the kernel. Two things that should be common for every arch: - VCPU idle management is done inside the kernel The trouble is that at the moment QEMU assumes that is the irqchip in kernel? == is VCPU idle management in kernel, for instance. For ARM, VCPU idle management is in kernel whether we're using the kernel's model of the VGIC or not. Alex tells me PPC is the same way. It's only x86 that has tied these two concepts together. Really, irqchip in kernel means asynchronous interrupts - you can inject an interrupt from outside the vcpu thread. Obviously if the vcpu is sleeping you need to wake it up and that pulls in idle management. irqchip for x86 really means the local APIC, which has a synchronous interface with the cpu core. local APIC in kernel really is equivalent to kernel idle management, KVM_IRQ_LINE, and irqfd. The ioapic and pit, on the other hand, don't contribute anything to this (just performance). So yes, ARM with and without GIC are irqchip_in_kernel, since the ARM-GIC interface is asynchronous. Whether to emulate the GIC or not is just a performance question. The reason I want to get rid of common-code uses of kvm_irqchip_in_kernel() is because I think they're all similar to this -- the common code is using the check as a proxy for something else, and it should be directly asking about that something else. The only bits of code that should care about is the irqchip in kernel? are: * target-specific device/machine setup code which needs to know which apic/etc to instantiate * target-specific x86 code which has this weird synchronous IRQ delivery model for irqchip-not-in-kernel (Obviously I might have missed something, I'm flailing around trying to understand this code :-)) Agree naming should be improved. In fact the early series I pushed to decompose local apic, ioapic, and pic, but that didn't happen. If it did we'd probably not have this conversation. -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 07/23/2012 03:25 PM, Peter Maydell wrote: On 23 July 2012 13:18, Avi Kivity a...@redhat.com wrote: While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about. This seems an odd point of view -- async interrupt injection doesn't have anything to do with whether we're modelling the irqchip in the kernel or in QEMU, I thought... It does on x86. The relationship between the APIC and the core is synchronous - the APIC presents the interrupt, the core grabs is when it is ready (interrupt flag, etc.) and signals the APIC it has done so. The APIC then clears the interrupt from the interrupt request register and sets it in the interrupt status register. This sequence has to be done while the vcpu is stopped, since we don't have access to the interrupt flag otherwise. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 16/19] convert net_init_vde() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com v1-v2: - NetdevVdeOptions::port and ::mode are of type uint16. Remove superfluous range checks. Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net/vde.c | 17 +++-- 1 file changed, 7 insertions(+), 10 deletions(-) diff --git a/net/vde.c b/net/vde.c index 8e60f68..703888c 100644 --- a/net/vde.c +++ b/net/vde.c @@ -110,20 +110,17 @@ static int net_vde_init(VLANState *vlan, const char *model, return 0; } -int net_init_vde(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_vde(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { -const char *sock; -const char *group; -int port, mode; +const NetdevVdeOptions *vde; -sock = qemu_opt_get(opts, sock); -group = qemu_opt_get(opts, group); +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_VDE); +vde = opts-vde; -port = qemu_opt_get_number(opts, port, 0); -mode = qemu_opt_get_number(opts, mode, 0700); - -if (net_vde_init(vlan, vde, name, sock, port, group, mode) == -1) { +/* missing optional values have been initialized to all bits zero */ +if (net_vde_init(vlan, vde, name, vde-sock, vde-port, vde-group, + vde-has_mode ? vde-mode : 0700) == -1) { return -1; } -- 1.7.10.4
Re: [Qemu-devel] [PATCH] pc: Fix max_cpus
2012/7/23 Andreas Färber afaer...@suse.de: Am 23.07.2012 12:47, schrieb riegama...@gmail.com: From: Dunrong Huang riegama...@gmail.com The VCPU count limit in kernel now is 254, defined by KVM_MAX_VCPUS in kernel's header files. But the count limit in QEMU is 255, so QEMU will failed to start if user passes -enable-kvm and -smp 255 to it. This patch intruduces a Macro MAX_VCPUS whose value is KVM_MAX_VCPUS if CONFIG_KVM is defined. If user do not use kvm, set it's value to 255. Signed-off-by: Dunrong Huang riegama...@gmail.com --- hw/pc_piix.c | 28 +++- 1 files changed, 19 insertions(+), 9 deletions(-) diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 0c0096f..49cda51 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -49,6 +49,16 @@ #define MAX_IDE_BUS 2 +#ifndef KVM_MAX_VCPUS +#define KVM_MAX_VCPUS 254 +#endif + +#ifdef CONFIG_KVM +#define MAX_VCPUS KVM_MAX_VCPUS +#else +#define MAX_VCPUS 255 +#endif + static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 }; static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; @@ -354,7 +364,7 @@ static QEMUMachine pc_machine_v1_2 = { .alias = pc, .desc = Standard PC, .init = pc_init_pci, -.max_cpus = 255, +.max_cpus = MAX_VCPUS, .is_default = 1, }; [snip] This is not so ideal: -enable-kvm is a runtime switch whereas you are changing a compile-time limit here. Any chance to change the runtime usage of .max_cpus instead? Possibly introducing a helper function? Do you mean do some hacks in smp_parse? I agree with you, there has codes for checking max_cpus in runtime: if (max_cpus 255) { // Should be changed to 254 if enable kvm. fprintf(stderr, Unsupported number of maxcpus\n); exit(1); } I think we also need to change the hard coded value of .max_cpus from 255 to macro which should be synchronized with KVM_MAX_VCPUS or something else. Andreas -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg -- Best Regards, Dunrong Huang
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 07/23/2012 03:31 PM, Avi Kivity wrote: On 07/23/2012 03:25 PM, Peter Maydell wrote: On 23 July 2012 13:18, Avi Kivity a...@redhat.com wrote: While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about. This seems an odd point of view -- async interrupt injection doesn't have anything to do with whether we're modelling the irqchip in the kernel or in QEMU, I thought... It does on x86. The relationship between the APIC and the core is synchronous - the APIC presents the interrupt, the core grabs is when it is ready (interrupt flag, etc.) and signals the APIC it has done so. The APIC then clears the interrupt from the interrupt request register and sets it in the interrupt status register. This sequence has to be done while the vcpu is stopped, since we don't have access to the interrupt flag otherwise. Again, this is just the local APIC. The IOAPIC (which is x86 equivalent to the GIC, IIUC) doesn't change this picture and could have been emulated in userspace even with async interrupts. As it happens local APIC emulation pulls in the IOAPIC and PIC. So my view is that ARM with and without kernel GIC are irqchip_in_kernel, since whatever is the local APIC in ARM is always emulated in the kernel. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 11/19] convert net_client_init() to OptsVisitor
From: Laszlo Ersek ler...@redhat.com The net_client_init() prototype is kept intact. Based on is_netdev, the QemuOpts-rooted QemuOpt-list is parsed as a Netdev or a NetLegacy. The original meat of net_client_init() is moved to and simplified in net_client_init1(): Fields not common between -net and -netdev are clearly separated. Getting the name for the init functions is cleaner: Netdev::id is mandatory, and all init functions handle a NULL NetLegacy::name. NetLegacy::vlan explicitly depends on -net (see below). Verifying the type= option for -netdev can be turned into a switch. Format validation with qemu_opts_validate() can be removed because the visitor covers it. Relatedly, the net_client_types array is reduced to an array of init functions that can be directly indexed by opts-kind. (Help text is available in the schema JSON.) The outermost negation in the condition around qemu_find_vlan() was flattened, because it expresses the dependent code's requirements more clearly. VLAN lookup is avoided if there's no init function to pass the VLAN to. Whenever the value of type=... is needed, we substitute NetClientOptionsKind_lookup[kind]. The individual init functions are not converted yet, thus the original QemuOpts instance is passed transparently. v1-v2: - NetLegacy::name is optional. Tracked it through all init functions: they all handle a NULL name. Updated commit message accordingly. v2-v3: - NetLegacy::id is allowed and takes precedence over NetLegacy::name. Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net.c | 429 +-- net/dump.c |3 +- net/dump.h |4 +- net/slirp.c |3 +- net/slirp.h |4 +- net/socket.c|3 +- net/socket.h|4 +- net/tap-win32.c |3 +- net/tap.c |6 +- net/tap.h |7 +- net/vde.c |3 +- net/vde.h |4 +- 12 files changed, 127 insertions(+), 346 deletions(-) diff --git a/net.c b/net.c index c46695f..af544b2 100644 --- a/net.c +++ b/net.c @@ -37,6 +37,9 @@ #include qmp-commands.h #include hw/qdev.h #include iov.h +#include qapi-visit.h +#include qapi/opts-visitor.h +#include qapi/qapi-dealloc-visitor.h /* Net bridge is currently not supported for W32. */ #if !defined(_WIN32) @@ -745,7 +748,8 @@ int net_handle_fd_param(Monitor *mon, const char *param) return fd; } -static int net_init_nic(QemuOpts *opts, const char *name, VLANState *vlan) +static int net_init_nic(QemuOpts *opts, const NetClientOptions *new_opts, +const char *name, VLANState *vlan) { int idx; NICInfo *nd; @@ -802,371 +806,130 @@ static int net_init_nic(QemuOpts *opts, const char *name, VLANState *vlan) return idx; } -#define NET_COMMON_PARAMS_DESC \ -{ \ -.name = type,\ -.type = QEMU_OPT_STRING, \ -.help = net client type (nic, tap etc.), \ - }, { \ -.name = vlan,\ -.type = QEMU_OPT_NUMBER, \ -.help = vlan number, \ - }, { \ -.name = name,\ -.type = QEMU_OPT_STRING, \ -.help = identifier for monitor commands, \ - } - -typedef int (*net_client_init_func)(QemuOpts *opts, -const char *name, -VLANState *vlan); - -/* magic number, but compiler will warn if too small */ -#define NET_MAX_DESC 20 - -static const struct { -const char *type; -net_client_init_func init; -QemuOptDesc desc[NET_MAX_DESC]; -} net_client_types[NET_CLIENT_OPTIONS_KIND_MAX] = { -[NET_CLIENT_OPTIONS_KIND_NONE] = { -.type = none, -.desc = { -NET_COMMON_PARAMS_DESC, -{ /* end of list */ } -}, -}, -[NET_CLIENT_OPTIONS_KIND_NIC] = { -.type = nic, -.init = net_init_nic, -.desc = { -NET_COMMON_PARAMS_DESC, -{ -.name = netdev, -.type = QEMU_OPT_STRING, -.help = id of -netdev to connect to, -}, -{ -.name = macaddr, -.type = QEMU_OPT_STRING, -.help = MAC address, -}, { -.name = model, -.type = QEMU_OPT_STRING, -.help = device model (e1000, rtl8139, virtio etc.), -}, { -.name = addr, -.type = QEMU_OPT_STRING, -.help = PCI device address, -}, { -.name = vectors, -.type = QEMU_OPT_NUMBER, -.help = number of MSI-x
Re: [Qemu-devel] [PATCH] check for available room when formatting OpenFirmware device path
On 23 July 2012 12:33, Laszlo Ersek ler...@redhat.com wrote: Signed-off-by: Laszlo Ersek ler...@redhat.com I think it would be much nicer to just rewrite qdev_get_fw_dev_path so we weren't trying to fill the path into a fixed string buffer at all. Here is an entirely untested implementation: char *qdev_get_fw_dev_path(DeviceState *dev) { char *path; char **strarray; int depth = 0; DeviceState *d = dev; for (d = dev; d d-parent_bus; d = d-parent_bus-parent) { depth++; } depth++; strarray = g_new(char*, depth); for (d = dev; d d-parent_bus; d = d-parent_bus-parent) { depth--; strarray[depth] = bus_get_fw_dev_path(dev-parent_bus, dev); if (!strarray[depth]) { strarray[depth] = g_strdup(object_get_typename(OBJECT(dev))); } } strarray[0] = g_strdup(); path = g_strjoinv(/, strarray); g_strfreev(strarray); return path; } Bonus extra patch: check that all the implementations of get_fw_dev_path() are returning a g_malloc'd string rather than a plain malloc'd one (both this code and the current implementation assume so but at least the scsi bus implementation doesn't use the glib functions.) -- PMM
[Qemu-devel] [PATCH 19/19] remove unused QemuOpts parameter from net init functions
From: Laszlo Ersek ler...@redhat.com v1-v2: - unchanged v2-v3: - keep qemu-option.h included in net/slirp.h Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net.c | 14 ++ net/dump.c |4 ++-- net/dump.h |5 ++--- net/slirp.c |4 ++-- net/slirp.h |4 ++-- net/socket.c|4 ++-- net/socket.h|5 ++--- net/tap-win32.c |4 ++-- net/tap.c |8 net/tap.h |9 - net/vde.c |4 ++-- net/vde.h |5 ++--- 12 files changed, 32 insertions(+), 38 deletions(-) diff --git a/net.c b/net.c index a62e902..dbca77b 100644 --- a/net.c +++ b/net.c @@ -748,8 +748,8 @@ int net_handle_fd_param(Monitor *mon, const char *param) return fd; } -static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts, -const char *name, VLANState *vlan) +static int net_init_nic(const NetClientOptions *opts, const char *name, +VLANState *vlan) { int idx; NICInfo *nd; @@ -813,8 +813,7 @@ static int net_init_nic(QemuOpts *old_opts, const NetClientOptions *opts, static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( -QemuOpts *old_opts, -const NetClientOptions *new_opts, +const NetClientOptions *opts, const char *name, VLANState *vlan) = { [NET_CLIENT_OPTIONS_KIND_NIC]= net_init_nic, @@ -833,8 +832,7 @@ static int (* const net_client_init_fun[NET_CLIENT_OPTIONS_KIND_MAX])( }; -static int net_client_init1(const void *object, int is_netdev, -QemuOpts *old_opts, Error **errp) +static int net_client_init1(const void *object, int is_netdev, Error **errp) { union { const Netdev*netdev; @@ -885,7 +883,7 @@ static int net_client_init1(const void *object, int is_netdev, vlan = qemu_find_vlan(u.net-has_vlan ? u.net-vlan : 0, true); } -if (net_client_init_fun[opts-kind](old_opts, opts, name, vlan) 0) { +if (net_client_init_fun[opts-kind](opts, name, vlan) 0) { /* TODO push error reporting into init() methods */ error_set(errp, QERR_DEVICE_INIT_FAILED, NetClientOptionsKind_lookup[opts-kind]); @@ -920,7 +918,7 @@ int net_client_init(QemuOpts *opts, int is_netdev, Error **errp) } if (!err) { -ret = net_client_init1(object, is_netdev, opts, err); +ret = net_client_init1(object, is_netdev, err); } if (object) { diff --git a/net/dump.c b/net/dump.c index f3d2fa9..b575430 100644 --- a/net/dump.c +++ b/net/dump.c @@ -144,8 +144,8 @@ static int net_dump_init(VLANState *vlan, const char *device, return 0; } -int net_init_dump(QemuOpts *old_opts, const NetClientOptions *opts, - const char *name, VLANState *vlan) +int net_init_dump(const NetClientOptions *opts, const char *name, + VLANState *vlan) { int len; const char *file; diff --git a/net/dump.h b/net/dump.h index 85ac00b..0fa2dd7 100644 --- a/net/dump.h +++ b/net/dump.h @@ -25,10 +25,9 @@ #define QEMU_NET_DUMP_H #include net.h -#include qemu-common.h #include qapi-types.h -int net_init_dump(QemuOpts *opts, const NetClientOptions *new_opts, - const char *name, VLANState *vlan); +int net_init_dump(const NetClientOptions *opts, const char *name, + VLANState *vlan); #endif /* QEMU_NET_DUMP_H */ diff --git a/net/slirp.c b/net/slirp.c index 44b059f..5c2e6b2 100644 --- a/net/slirp.c +++ b/net/slirp.c @@ -702,8 +702,8 @@ net_init_slirp_configs(const StringList *fwd, int flags) } } -int net_init_slirp(QemuOpts *old_opts, const NetClientOptions *opts, - const char *name, VLANState *vlan) +int net_init_slirp(const NetClientOptions *opts, const char *name, + VLANState *vlan) { struct slirp_config_str *config; char *vnet; diff --git a/net/slirp.h b/net/slirp.h index ef13a65..e2c71ee 100644 --- a/net/slirp.h +++ b/net/slirp.h @@ -31,8 +31,8 @@ #ifdef CONFIG_SLIRP -int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts, - const char *name, VLANState *vlan); +int net_init_slirp(const NetClientOptions *opts, const char *name, + VLANState *vlan); void net_slirp_hostfwd_add(Monitor *mon, const QDict *qdict); void net_slirp_hostfwd_remove(Monitor *mon, const QDict *qdict); diff --git a/net/socket.c b/net/socket.c index e3cba20..600c287 100644 --- a/net/socket.c +++ b/net/socket.c @@ -586,8 +586,8 @@ static int net_socket_udp_init(VLANState *vlan, return 0; } -int net_init_socket(QemuOpts *old_opts, const NetClientOptions *opts, -const char *name, VLANState *vlan) +int net_init_socket(const NetClientOptions *opts, const char *name, +VLANState *vlan) { const
[Qemu-devel] [PATCH 10/19] hw, net: net_client_type - NetClientOptionsKind (qapi-generated)
From: Laszlo Ersek ler...@redhat.com NET_CLIENT_TYPE_ - NET_CLIENT_OPTIONS_KIND_ Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- hw/cadence_gem.c|2 +- hw/dp8393x.c|2 +- hw/e1000.c |2 +- hw/eepro100.c |2 +- hw/etraxfs_eth.c|2 +- hw/lan9118.c|2 +- hw/lance.c |2 +- hw/mcf_fec.c|2 +- hw/milkymist-minimac2.c |2 +- hw/mipsnet.c|2 +- hw/musicpal.c |2 +- hw/ne2000-isa.c |2 +- hw/ne2000.c |2 +- hw/opencores_eth.c |2 +- hw/pcnet-pci.c |2 +- hw/rtl8139.c|2 +- hw/smc91c111.c |2 +- hw/spapr_llan.c |2 +- hw/stellaris_enet.c |2 +- hw/usb/dev-network.c|2 +- hw/vhost_net.c |2 +- hw/virtio-net.c | 10 +- hw/xen_nic.c|2 +- hw/xgmac.c |2 +- hw/xilinx_axienet.c |2 +- hw/xilinx_ethlite.c |2 +- net.c | 50 +++ net.h | 16 ++- net/dump.c |2 +- net/slirp.c |2 +- net/socket.c|4 ++-- net/tap-win32.c |2 +- net/tap.c | 16 +++ net/vde.c |2 +- 34 files changed, 71 insertions(+), 83 deletions(-) diff --git a/hw/cadence_gem.c b/hw/cadence_gem.c index 87143ca..a0f51de 100644 --- a/hw/cadence_gem.c +++ b/hw/cadence_gem.c @@ -1161,7 +1161,7 @@ static void gem_set_link(VLANClientState *nc) } static NetClientInfo net_gem_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = gem_can_receive, .receive = gem_receive, diff --git a/hw/dp8393x.c b/hw/dp8393x.c index 017d074..756d630 100644 --- a/hw/dp8393x.c +++ b/hw/dp8393x.c @@ -872,7 +872,7 @@ static void nic_cleanup(VLANClientState *nc) } static NetClientInfo net_dp83932_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = nic_can_receive, .receive = nic_receive, diff --git a/hw/e1000.c b/hw/e1000.c index 4573f13..ad24298 100644 --- a/hw/e1000.c +++ b/hw/e1000.c @@ -1206,7 +1206,7 @@ pci_e1000_uninit(PCIDevice *dev) } static NetClientInfo net_e1000_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = e1000_can_receive, .receive = e1000_receive, diff --git a/hw/eepro100.c b/hw/eepro100.c index 6279ae3..f343685 100644 --- a/hw/eepro100.c +++ b/hw/eepro100.c @@ -1845,7 +1845,7 @@ static int pci_nic_uninit(PCIDevice *pci_dev) } static NetClientInfo net_eepro100_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = nic_can_receive, .receive = nic_receive, diff --git a/hw/etraxfs_eth.c b/hw/etraxfs_eth.c index 16a0637..45fb40c 100644 --- a/hw/etraxfs_eth.c +++ b/hw/etraxfs_eth.c @@ -579,7 +579,7 @@ static void eth_cleanup(VLANClientState *nc) } static NetClientInfo net_etraxfs_info = { - .type = NET_CLIENT_TYPE_NIC, + .type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = eth_can_receive, .receive = eth_receive, diff --git a/hw/lan9118.c b/hw/lan9118.c index 7b4fe87..40fb765 100644 --- a/hw/lan9118.c +++ b/hw/lan9118.c @@ -1310,7 +1310,7 @@ static void lan9118_cleanup(VLANClientState *nc) } static NetClientInfo net_lan9118_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = lan9118_can_receive, .receive = lan9118_receive, diff --git a/hw/lance.c b/hw/lance.c index ce3d46c..91c0e16 100644 --- a/hw/lance.c +++ b/hw/lance.c @@ -93,7 +93,7 @@ static void lance_cleanup(VLANClientState *nc) } static NetClientInfo net_lance_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = pcnet_can_receive, .receive = pcnet_receive, diff --git a/hw/mcf_fec.c b/hw/mcf_fec.c index ae37bef..4ab4ff5 100644 --- a/hw/mcf_fec.c +++ b/hw/mcf_fec.c @@ -450,7 +450,7 @@ static void mcf_fec_cleanup(VLANClientState *nc) } static NetClientInfo net_mcf_fec_info = { -.type = NET_CLIENT_TYPE_NIC, +.type = NET_CLIENT_OPTIONS_KIND_NIC, .size = sizeof(NICState), .can_receive = mcf_fec_can_receive, .receive = mcf_fec_receive, diff --git a/hw/milkymist-minimac2.c b/hw/milkymist-minimac2.c index 70bf336..3924b83 100644 --- a/hw/milkymist-minimac2.c +++ b/hw/milkymist-minimac2.c @@ -448,7 +448,7 @@ static void milkymist_minimac2_reset(DeviceState *d) } static NetClientInfo
[Qemu-devel] [PATCH v5 0/3] refactor PC machine, i440fx and piix3 to take advantage of QOM
This series aggressively refactors the PC machine initialization to be more modelled and less ad-hoc. The highlights of this series are: 1) Things like -m and -bios-name are now device model properties 2) The i440fx and piix3 are now modelled in a thorough fashion 3) i440fx_init is trivialized to creating devices and setting properties 4) convert PCI host bridge to QOM The point (3) is the most important one. As we refactor in this fashion, we should quickly get to the point where machine-init disappears completely in favor of just creating a handful of devices. The two stage initialization of QOM is important here. instance_init() is when composed devices are created which means that after you've created a device, all of its children are visible in the device model. This lets you set properties of the parent and its children. realize() (which is still called DeviceState::init today) will be called right before the guest starts up for the first time. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com Change in v5: * drop patch convert MemoryRegion to QOM and prepare to create HPET, RTC and i8254 through composition * add Andreas' recent attempt against pci_host Change in v4: *rebase patchset Changes in v3: * fix coding style issues * fix rebase error * add changes log Changes in v2: * Rebase patch series of i440fx in Anthony's qom-rebase.12 branch to upstream * convert MemoryRegion to QOM * convert pci_host to QOM Anthony Liguori (3): eliminate piix_pci.c and module i440fx and piix3 merge pc_piix.c to pc.c convert pci-host to QOM hw/i386/Makefile.objs |3 +- hw/i440fx.c | 434 ++ hw/i440fx.h | 77 ++ hw/pc.c | 695 + hw/pc.h | 46 +--- hw/pc_piix.c | 661 -- hw/pci_host.c | 14 + hw/pci_host.h |2 + hw/piix3.c| 234 + hw/piix3.h| 69 + hw/piix_pci.c | 599 -- 11 files changed, 1481 insertions(+), 1353 deletions(-) create mode 100644 hw/i440fx.c create mode 100644 hw/i440fx.h delete mode 100644 hw/pc_piix.c create mode 100644 hw/piix3.c create mode 100644 hw/piix3.h delete mode 100644 hw/piix_pci.c -- 1.7.7.6
[Qemu-devel] [PATCH v5 3/3] convert pci-host to QOM
From: Anthony Liguori aligu...@us.ibm.com makes pci_host a proper QOM type. Changelog: * against Andreas pci_host branch * make host bridge TypeInfos const * use PCI_HOST_BRIDGE() where appropriate Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com --- hw/i440fx.c |6 +++--- hw/pc.c |2 +- hw/pci_host.c | 14 ++ hw/pci_host.h |2 ++ hw/piix3.c|4 ++-- 5 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/i440fx.c b/hw/i440fx.c index 720a25a..fdf040b 100644 --- a/hw/i440fx.c +++ b/hw/i440fx.c @@ -191,7 +191,7 @@ static const VMStateDescription vmstate_i440fx_pmc = { static int i440fx_realize(SysBusDevice *dev) { I440FXState *s = I440FX(dev); -PCIHostState *h = PCI_HOST(s); +PCIHostState *h = PCI_HOST_BRIDGE(s); int bios_size, isa_bios_size; char *filename; int ret; @@ -401,7 +401,7 @@ static void i440fx_pmc_class_init(ObjectClass *klass, void *data) dc-vmsd = vmstate_i440fx_pmc; } -static TypeInfo i440fx_pmc_info = { +static const TypeInfo i440fx_pmc_info = { .name = TYPE_I440FX_PMC, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(I440FXPMCState), @@ -418,7 +418,7 @@ static void i440fx_class_init(ObjectClass *klass, void *data) dc-no_user = 1; } -static TypeInfo i440fx_info = { +static const TypeInfo i440fx_info = { .name = TYPE_I440FX, .parent= TYPE_PCI_HOST_BRIDGE, .instance_size = sizeof(I440FXState), diff --git a/hw/pc.c b/hw/pc.c index d9a0443..f095109 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1217,7 +1217,7 @@ static PCIBus *i440fx_init(I440FXPMCState **pi440fx_state, int *piix3_devfn, PCIHostState *h; s = I440FX(object_new(TYPE_I440FX)); -h = PCI_HOST(s); +h = PCI_HOST_BRIDGE(s); /* FIXME make a properties */ h-address_space = address_space_mem; diff --git a/hw/pci_host.c b/hw/pci_host.c index 3950e94..4e10042 100644 --- a/hw/pci_host.c +++ b/hw/pci_host.c @@ -165,11 +165,25 @@ const MemoryRegionOps pci_host_data_be_ops = { .endianness = DEVICE_BIG_ENDIAN, }; +void pci_host_set_mmio(PCIHostState *s, MemoryRegion *value) +{ +object_property_set_link(OBJECT(s), OBJECT(value), mmio, NULL); +} + +static void pci_host_initfn(Object *obj) +{ +PCIHostState *s = PCI_HOST_BRIDGE(obj); + +object_property_add_link(obj, mmio, memory-region, +(Object **)s-address_space, NULL); +} + static const TypeInfo pci_host_type_info = { .name = TYPE_PCI_HOST_BRIDGE, .parent = TYPE_SYS_BUS_DEVICE, .abstract = true, .instance_size = sizeof(PCIHostState), +.instance_init = pci_host_initfn, }; static void pci_host_register_types(void) diff --git a/hw/pci_host.h b/hw/pci_host.h index 4b9c300..9f28728 100644 --- a/hw/pci_host.h +++ b/hw/pci_host.h @@ -54,6 +54,8 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr, void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len); uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len); +void pci_host_set_mmio(PCIHostState *s, MemoryRegion *value); + extern const MemoryRegionOps pci_host_conf_le_ops; extern const MemoryRegionOps pci_host_conf_be_ops; extern const MemoryRegionOps pci_host_data_le_ops; diff --git a/hw/piix3.c b/hw/piix3.c index eca6ec8..3b69b15 100644 --- a/hw/piix3.c +++ b/hw/piix3.c @@ -204,7 +204,7 @@ static void piix3_class_init(ObjectClass *klass, void *data) k-class_id = PCI_CLASS_BRIDGE_ISA; } -static TypeInfo piix3_info = { +static const TypeInfo piix3_info = { .name = TYPE_PIIX3, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(PIIX3State), @@ -219,7 +219,7 @@ static void piix3_xen_class_init(ObjectClass *klass, void *data) k-config_write = piix3_write_config_xen; }; -static TypeInfo piix3_xen_info = { +static const TypeInfo piix3_xen_info = { .name = PIIX3-xen, .parent= TYPE_PIIX3, .instance_size = sizeof(PIIX3State), -- 1.7.5.4
[Qemu-devel] [PATCH v5 2/3] merge pc_piix.c to pc.c
From: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com --- hw/i386/Makefile.objs |1 - hw/pc.c | 695 + hw/pc.h | 46 +--- hw/pc_piix.c | 661 -- 4 files changed, 650 insertions(+), 753 deletions(-) delete mode 100644 hw/pc_piix.c diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 49b32d0..868020c 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -4,7 +4,6 @@ obj-y += sga.o ioapic_common.o ioapic.o i440fx.o piix3.o obj-y += vmport.o obj-y += pci-hotplug.o smbios.o wdt_ib700.o obj-y += debugcon.o multiboot.o -obj-y += pc_piix.o obj-y += pc_sysfw.o obj-$(CONFIG_XEN) += xen_platform.o xen_apic.o obj-$(CONFIG_XEN_PCI_PASSTHROUGH) += xen-host-pci-device.o diff --git a/hw/pc.c b/hw/pc.c index 598267a..d9a0443 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -27,6 +27,7 @@ #include fdc.h #include ide.h #include pci.h +#include usb.h #include vmware_vga.h #include monitor.h #include fw_cfg.h @@ -48,7 +49,10 @@ #include ui/qemu-spice.h #include memory.h #include exec-memory.h +#include kvm/clock.h #include arch_init.h +#include smbus.h +#include boards.h /* output Bochs bios info messages */ //#define DEBUG_BIOS @@ -76,6 +80,8 @@ #define E820_NR_ENTRIES16 +#define MAX_IDE_BUS 2 + struct e820_entry { uint64_t address; uint64_t length; @@ -87,10 +93,14 @@ struct e820_table { struct e820_entry entry[E820_NR_ENTRIES]; } QEMU_PACKED __attribute((__aligned__(4))); +static const int ide_iobase[MAX_IDE_BUS] = { 0x1f0, 0x170 }; +static const int ide_iobase2[MAX_IDE_BUS] = { 0x3f6, 0x376 }; +static const int ide_irq[MAX_IDE_BUS] = { 14, 15 }; + static struct e820_table e820_table; struct hpet_fw_config hpet_cfg = {.count = UINT8_MAX}; -void gsi_handler(void *opaque, int n, int level) +static void gsi_handler(void *opaque, int n, int level) { GSIState *s = opaque; @@ -108,7 +118,7 @@ static void ioport80_write(void *opaque, uint32_t addr, uint32_t data) /* MSDOS compatibility mode FPU exception support */ static qemu_irq ferr_irq; -void pc_register_ferr_irq(qemu_irq irq) +static void pc_register_ferr_irq(qemu_irq irq) { ferr_irq = irq; } @@ -323,7 +333,7 @@ static void pc_cmos_init_late(void *opaque) qemu_unregister_reset(pc_cmos_init_late, opaque); } -void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size, +static void pc_cmos_init(ram_addr_t ram_size, ram_addr_t above_4g_mem_size, const char *boot_device, ISADevice *floppy, BusState *idebus0, BusState *idebus1, ISADevice *s) @@ -846,7 +856,7 @@ static const int ne2000_irq[NE2000_NB_MAX] = { 9, 10, 11, 3, 4, 5 }; static const int parallel_io[MAX_PARALLEL_PORTS] = { 0x378, 0x278, 0x3bc }; static const int parallel_irq[MAX_PARALLEL_PORTS] = { 7, 7, 7 }; -void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd) +static void pc_init_ne2k_isa(ISABus *bus, NICInfo *nd) { static int nb_ne2k = 0; @@ -901,7 +911,7 @@ static DeviceState *apic_init(void *env, uint8_t apic_id) return dev; } -void pc_acpi_smi_interrupt(void *opaque, int irq, int level) +static void pc_acpi_smi_interrupt(void *opaque, int irq, int level) { CPUX86State *s = opaque; @@ -938,7 +948,7 @@ static X86CPU *pc_new_cpu(const char *cpu_model) return cpu; } -void pc_cpus_init(const char *cpu_model) +static void pc_cpus_init(const char *cpu_model) { int i; @@ -956,55 +966,18 @@ void pc_cpus_init(const char *cpu_model) } } -void *pc_memory_init(MemoryRegion *system_memory, +static void *pc_memory_init(MemoryRegion *system_memory, const char *kernel_filename, const char *kernel_cmdline, const char *initrd_filename, ram_addr_t below_4g_mem_size, -ram_addr_t above_4g_mem_size, -MemoryRegion *rom_memory, -MemoryRegion **ram_memory) +ram_addr_t above_4g_mem_size) { int linux_boot, i; -MemoryRegion *ram, *option_rom_mr; -MemoryRegion *ram_below_4g, *ram_above_4g; void *fw_cfg; linux_boot = (kernel_filename != NULL); -/* Allocate RAM. We allocate it as a single memory region and use - * aliases to address portions of it, mostly for backwards compatibility - * with older qemus that used qemu_ram_alloc(). - */ -ram = g_malloc(sizeof(*ram)); -memory_region_init_ram(ram, pc.ram, - below_4g_mem_size + above_4g_mem_size); -vmstate_register_ram_global(ram); -*ram_memory = ram; -ram_below_4g = g_malloc(sizeof(*ram_below_4g)); -memory_region_init_alias(ram_below_4g, ram-below-4g, ram, - 0,
[Qemu-devel] [PATCH v5 1/3] eliminate piix_pci.c and module i440fx and piix3
From: Anthony Liguori aligu...@us.ibm.com The big picture about the patch is shown as follows: 1) pc_init creates an I440FX, any bus devices (ISA serial port, PCI vga and nics, etc.), sets properties appropriately, and realizes the devices. 2) I440FX is-a PCIHost, has-a I440FX-PMC, has-a PIIX3 3) PIIX3 has-a RTC, has-a I8042, has-a DMAController, etc. i440fx-pcihost = i440fx i440fx = i440fx-pmc i440fx_pmc is Programmable Memory Controller which integrated in I440FX chipset, and move ram initialization into i440fx-pmc. It might seem like a small change, but it better reflects the fact that the PMC is contained within the i440fx which we will now reflect in composition in the next few changesets. Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com --- hw/i386/Makefile.objs |2 +- hw/i440fx.c | 434 +++ hw/i440fx.h | 77 +++ hw/piix3.c| 234 +++ hw/piix3.h| 69 ++ hw/piix_pci.c | 599 - 6 files changed, 815 insertions(+), 600 deletions(-) create mode 100644 hw/i440fx.c create mode 100644 hw/i440fx.h create mode 100644 hw/piix3.c create mode 100644 hw/piix3.h delete mode 100644 hw/piix_pci.c diff --git a/hw/i386/Makefile.objs b/hw/i386/Makefile.objs index 8c764bb..49b32d0 100644 --- a/hw/i386/Makefile.objs +++ b/hw/i386/Makefile.objs @@ -1,6 +1,6 @@ obj-y += mc146818rtc.o pc.o obj-y += apic_common.o apic.o kvmvapic.o -obj-y += sga.o ioapic_common.o ioapic.o piix_pci.o +obj-y += sga.o ioapic_common.o ioapic.o i440fx.o piix3.o obj-y += vmport.o obj-y += pci-hotplug.o smbios.o wdt_ib700.o obj-y += debugcon.o multiboot.o diff --git a/hw/i440fx.c b/hw/i440fx.c new file mode 100644 index 000..720a25a --- /dev/null +++ b/hw/i440fx.c @@ -0,0 +1,434 @@ +/* + * QEMU i440FX PCI Host Bridge Emulation + * + * Copyright (c) 2006 Fabrice Bellard + * + * Permission is hereby granted, free of charge, to any person obtaining a copy + * of this software and associated documentation files (the Software), to deal + * in the Software without restriction, including without limitation the rights + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell + * copies of the Software, and to permit persons to whom the Software is + * furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice shall be included in + * all copies or substantial portions of the Software. + * + * THE SOFTWARE IS PROVIDED AS IS, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN + * THE SOFTWARE. + */ + +#include i440fx.h +#include range.h +#include xen.h +#include loader.h +#include pc.h + +#define BIOS_FILENAME bios.bin + +/* + * I440FX chipset data sheet. + * http://download.intel.com/design/chipsets/datashts/29054901.pdf + * + * The I440FX is a package that contains an integrated PCI Host controller, + * memory controller, and is usually packaged with a PCI-ISA bus and super I/O + * chipset. + * + * The i440FX device is the PCI host controller. On function 0.0, there is a + * memory controller called the Programmable Memory Controller (PMC). On + * function 1.0, there is the PCI-ISA bus/super I/O chip called the PIIX3. + */ + +#define I440FX_PMC_PCI_HOLE 0xE000ULL +#define I440FX_PMC_PCI_HOLE_END 0x1ULL + +#define I440FX_PAM 0x59 +#define I440FX_PAM_SIZE 7 +#define I440FX_SMRAM0x72 + +static void piix3_set_irq(void *opaque, int pirq, int level) +{ +PIIX3State *piix3 = opaque; +piix3_set_irq_level(piix3, pirq, level); +} + +/* + * return the global irq number corresponding to a given device irq + * pin. We could also use the bus number to have a more precise + * mapping. + */ +static int pci_slot_get_pirq(PCIDevice *pci_dev, int pci_intx) +{ +int slot_addend; +slot_addend = (pci_dev-devfn 3) - 1; +return (pci_intx + slot_addend) 3; +} + +static void update_pam(I440FXPMCState *d, uint32_t start, uint32_t end, int r, + PAMMemoryRegion *mem) +{ +if (mem-initialized) { +memory_region_del_subregion(d-system_memory, mem-mem); +memory_region_destroy(mem-mem); +} + +switch (r) { +case 3: +/* RAM */ +memory_region_init_alias(mem-mem, pam-ram, d-ram_memory, + start, end - start); +break; +case 1: +/* ROM (XXX: not quite correct) */ +memory_region_init_alias(mem-mem, pam-rom, d-ram_memory, +
Re: [Qemu-devel] [PATCH] check for available room when formatting OpenFirmware device path
On 23 July 2012 13:34, Peter Maydell peter.mayd...@linaro.org wrote: On 23 July 2012 12:33, Laszlo Ersek ler...@redhat.com wrote: Signed-off-by: Laszlo Ersek ler...@redhat.com I think it would be much nicer to just rewrite qdev_get_fw_dev_path so we weren't trying to fill the path into a fixed string buffer at all. Here is an entirely untested implementation: char *qdev_get_fw_dev_path(DeviceState *dev) { char *path; char **strarray; int depth = 0; DeviceState *d = dev; for (d = dev; d d-parent_bus; d = d-parent_bus-parent) { depth++; } depth++; strarray = g_new(char*, depth); for (d = dev; d d-parent_bus; d = d-parent_bus-parent) { depth--; strarray[depth] = bus_get_fw_dev_path(dev-parent_bus, dev); d not dev here and in the line below, obviously. I said it was untested :-) if (!strarray[depth]) { strarray[depth] = g_strdup(object_get_typename(OBJECT(dev))); } } strarray[0] = g_strdup(); path = g_strjoinv(/, strarray); g_strfreev(strarray); return path; } -- PMM
[Qemu-devel] [PATCH 08/19] qapi schema: remove trailing whitespace
From: Laszlo Ersek ler...@redhat.com Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qapi-schema.json | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/qapi-schema.json b/qapi-schema.json index a92adb1..d2f8e02 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -343,7 +343,7 @@ # @CPU: the index of the virtual CPU # # @current: this only exists for backwards compatible and should be ignored -# +# # @halted: true if the virtual CPU is in the halt state. Halt usually refers # to a processor specific low power mode. # @@ -686,7 +686,7 @@ # @SpiceInfo # # Information about the SPICE session. -# +# # @enabled: true if the SPICE server is enabled, false otherwise # # @host: #optional The hostname the SPICE server is bound to. This depends on @@ -1297,7 +1297,7 @@ ## { 'command': 'human-monitor-command', 'data': {'command-line': 'str', '*cpu-index': 'int'}, - 'returns': 'str' } + 'returns': 'str' } ## # @migrate_cancel @@ -1458,7 +1458,7 @@ # @password: the new password # # @connected: #optional how to handle existing clients when changing the -# password. If nothing is specified, defaults to `keep' +# password. If nothing is specified, defaults to `keep' # `fail' to fail the command if clients are connected # `disconnect' to disconnect existing clients # `keep' to maintain existing clients @@ -1598,7 +1598,7 @@ # If the argument combination is invalid, InvalidParameterCombination # # Since: 1.1 -## +## { 'command': 'block_set_io_throttle', 'data': { 'device': 'str', 'bps': 'int', 'bps_rd': 'int', 'bps_wr': 'int', 'iops': 'int', 'iops_rd': 'int', 'iops_wr': 'int' } } -- 1.7.10.4
Re: [Qemu-devel] [PATCH 01/16] net: Add a hub net client
Two hairs to split: On 07/20/12 14:01, Stefan Hajnoczi wrote: +static NetHubPort *net_hub_port_new(NetHub *hub, const char *name) +{ +VLANClientState *nc; +NetHubPort *port; +unsigned int id = hub-num_ports++; There are projects that don't like to put logic or externally visible side-effects into initializers. I don't know about qemu. diff --git a/qapi-schema.json b/qapi-schema.json index bc55ed2..6618eb5 100644 --- a/qapi-schema.json +++ b/qapi-schema.json @@ -2094,6 +2094,19 @@ '*helper': 'str' } } ## +# @NetdevHubPortOptions +# +# Connect two or more net clients through a software hub. +# +# @hubid: hub identifier number +# +# Since 1.2 +## +{ 'type': 'NetdevHubPortOptions', + 'data': { +'hubid': 'int' } } I think this should say 'uint32'. Thanks, Laszlo
[Qemu-devel] [PATCH 14/19] convert net_init_slirp() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net/slirp.c | 93 --- 1 file changed, 25 insertions(+), 68 deletions(-) diff --git a/net/slirp.c b/net/slirp.c index 1243d43..44b059f 100644 --- a/net/slirp.c +++ b/net/slirp.c @@ -686,89 +686,46 @@ void do_info_usernet(Monitor *mon) } } -static int net_init_slirp_configs(const char *name, const char *value, void *opaque) +static void +net_init_slirp_configs(const StringList *fwd, int flags) { -struct slirp_config_str *config; - -if (strcmp(name, hostfwd) != 0 strcmp(name, guestfwd) != 0) { -return 0; -} - -config = g_malloc0(sizeof(*config)); +while (fwd) { +struct slirp_config_str *config; -pstrcpy(config-str, sizeof(config-str), value); +config = g_malloc0(sizeof(*config)); +pstrcpy(config-str, sizeof(config-str), fwd-value-str); +config-flags = flags; +config-next = slirp_configs; +slirp_configs = config; -if (!strcmp(name, hostfwd)) { -config-flags = SLIRP_CFG_HOSTFWD; +fwd = fwd-next; } - -config-next = slirp_configs; -slirp_configs = config; - -return 0; } -int net_init_slirp(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_slirp(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { struct slirp_config_str *config; -const char *vhost; -const char *vhostname; -const char *vdhcp_start; -const char *vnamesrv; -const char *tftp_export; -const char *bootfile; -const char *smb_export; -const char *vsmbsrv; -const char *restrict_opt; -char *vnet = NULL; -int restricted = 0; +char *vnet; int ret; +const NetdevUserOptions *user; -vhost = qemu_opt_get(opts, host); -vhostname = qemu_opt_get(opts, hostname); -vdhcp_start = qemu_opt_get(opts, dhcpstart); -vnamesrv= qemu_opt_get(opts, dns); -tftp_export = qemu_opt_get(opts, tftp); -bootfile= qemu_opt_get(opts, bootfile); -smb_export = qemu_opt_get(opts, smb); -vsmbsrv = qemu_opt_get(opts, smbserver); - -restrict_opt = qemu_opt_get(opts, restrict); -if (restrict_opt) { -if (!strcmp(restrict_opt, on) || -!strcmp(restrict_opt, yes) || !strcmp(restrict_opt, y)) { -restricted = 1; -} else if (strcmp(restrict_opt, off) -strcmp(restrict_opt, no) strcmp(restrict_opt, n)) { -error_report(invalid option: 'restrict=%s', restrict_opt); -return -1; -} -} - -if (qemu_opt_get(opts, ip)) { -const char *ip = qemu_opt_get(opts, ip); -int l = strlen(ip) + strlen(/24) + 1; +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_USER); +user = opts-user; -vnet = g_malloc(l); +vnet = user-has_net ? g_strdup(user-net) : + user-has_ip ? g_strdup_printf(%s/24, user-ip) : + NULL; -/* emulate legacy ip= parameter */ -pstrcpy(vnet, l, ip); -pstrcat(vnet, l, /24); -} - -if (qemu_opt_get(opts, net)) { -if (vnet) { -g_free(vnet); -} -vnet = g_strdup(qemu_opt_get(opts, net)); -} +/* all optional fields are initialized to all bits zero */ -qemu_opt_foreach(opts, net_init_slirp_configs, NULL, 0); +net_init_slirp_configs(user-hostfwd, SLIRP_CFG_HOSTFWD); +net_init_slirp_configs(user-guestfwd, 0); -ret = net_slirp_init(vlan, user, name, restricted, vnet, vhost, - vhostname, tftp_export, bootfile, vdhcp_start, - vnamesrv, smb_export, vsmbsrv); +ret = net_slirp_init(vlan, user, name, user-restrict, vnet, user-host, + user-hostname, user-tftp, user-bootfile, + user-dhcpstart, user-dns, user-smb, + user-smbserver); while (slirp_configs) { config = slirp_configs; -- 1.7.10.4
[Qemu-devel] [PATCH 05/19] qapi: introduce size type
From: Laszlo Ersek ler...@redhat.com v1-v2: - fall back to uint64 rather than int Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- qapi/qapi-visit-core.c |7 +++ qapi/qapi-visit-core.h |3 +++ scripts/qapi.py|2 ++ 3 files changed, 12 insertions(+) diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c index d41595e..7a82b637 100644 --- a/qapi/qapi-visit-core.c +++ b/qapi/qapi-visit-core.c @@ -234,6 +234,13 @@ void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp) } } +void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +(v-type_size ? v-type_size : v-type_uint64)(v, obj, name, errp); +} +} + void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp) { if (!error_is_set(errp)) { diff --git a/qapi/qapi-visit-core.h b/qapi/qapi-visit-core.h index a19d70c..60aceda 100644 --- a/qapi/qapi-visit-core.h +++ b/qapi/qapi-visit-core.h @@ -60,6 +60,8 @@ struct Visitor void (*type_int16)(Visitor *v, int16_t *obj, const char *name, Error **errp); void (*type_int32)(Visitor *v, int32_t *obj, const char *name, Error **errp); void (*type_int64)(Visitor *v, int64_t *obj, const char *name, Error **errp); +/* visit_type_size() falls back to (*type_uint64)() if type_size is unset */ +void (*type_size)(Visitor *v, uint64_t *obj, const char *name, Error **errp); }; void visit_start_handle(Visitor *v, void **obj, const char *kind, @@ -85,6 +87,7 @@ void visit_type_int8(Visitor *v, int8_t *obj, const char *name, Error **errp); void visit_type_int16(Visitor *v, int16_t *obj, const char *name, Error **errp); void visit_type_int32(Visitor *v, int32_t *obj, const char *name, Error **errp); void visit_type_int64(Visitor *v, int64_t *obj, const char *name, Error **errp); +void visit_type_size(Visitor *v, uint64_t *obj, const char *name, Error **errp); void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp); void visit_type_str(Visitor *v, char **obj, const char *name, Error **errp); void visit_type_number(Visitor *v, double *obj, const char *name, Error **errp); diff --git a/scripts/qapi.py b/scripts/qapi.py index 1292476..8082af3 100644 --- a/scripts/qapi.py +++ b/scripts/qapi.py @@ -163,6 +163,8 @@ def c_type(name): name == 'int64' or name == 'uint8' or name == 'uint16' or name == 'uint32' or name == 'uint64'): return name + '_t' +elif name == 'size': +return 'uint64_t' elif name == 'bool': return 'bool' elif name == 'number': -- 1.7.10.4
[Qemu-devel] [PATCH 01/19] MAINTAINERS: Replace net maintainer Mark McLoughlin with Stefan Hajnoczi
The net subsystem has lacked an active maintainer since 2009. I have built and tested a net-next tree to get the ball rolling again. Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- MAINTAINERS |3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 30ed56d..2d219d2 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -560,9 +560,10 @@ F: monitor.c Network device layer M: Anthony Liguori aligu...@us.ibm.com -M: Mark McLoughlin mar...@redhat.com +M: Stefan Hajnoczi stefa...@linux.vnet.ibm.com S: Maintained F: net/ +T: git git://github.com/stefanha/qemu.git net Network Block Device (NBD) M: Paolo Bonzini pbonz...@redhat.com -- 1.7.10.4
Re: [Qemu-devel] [PATCH v5 3/3] convert pci-host to QOM
Am 23.07.2012 14:35, schrieb Wanpeng Li: From: Anthony Liguori aligu...@us.ibm.com makes pci_host a proper QOM type. Changelog: * against Andreas pci_host branch * make host bridge TypeInfos const * use PCI_HOST_BRIDGE() where appropriate Signed-off-by: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Wanpeng Li liw...@linux.vnet.ibm.com --- hw/i440fx.c |6 +++--- hw/pc.c |2 +- hw/pci_host.c | 14 ++ hw/pci_host.h |2 ++ hw/piix3.c|4 ++-- 5 files changed, 22 insertions(+), 6 deletions(-) diff --git a/hw/i440fx.c b/hw/i440fx.c index 720a25a..fdf040b 100644 --- a/hw/i440fx.c +++ b/hw/i440fx.c @@ -191,7 +191,7 @@ static const VMStateDescription vmstate_i440fx_pmc = { static int i440fx_realize(SysBusDevice *dev) { I440FXState *s = I440FX(dev); -PCIHostState *h = PCI_HOST(s); +PCIHostState *h = PCI_HOST_BRIDGE(s); int bios_size, isa_bios_size; char *filename; int ret; Either there's a miscommunication or a technical error: My branch surely is using PCI_HOST_BRIDGE(), so these PCI_HOST - PCI_HOST_BRIDGE changes look bogus. Did you make sure each patch compiles? @@ -401,7 +401,7 @@ static void i440fx_pmc_class_init(ObjectClass *klass, void *data) dc-vmsd = vmstate_i440fx_pmc; } -static TypeInfo i440fx_pmc_info = { +static const TypeInfo i440fx_pmc_info = { .name = TYPE_I440FX_PMC, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(I440FXPMCState), @@ -418,7 +418,7 @@ static void i440fx_class_init(ObjectClass *klass, void *data) dc-no_user = 1; } -static TypeInfo i440fx_info = { +static const TypeInfo i440fx_info = { .name = TYPE_I440FX, .parent= TYPE_PCI_HOST_BRIDGE, .instance_size = sizeof(I440FXState), Patch 1/3 does not have const, patch 2/3 adds new TypeInfos without const. So my guess is you've not rebased this on my pci-host branch [1] but onto something else? If they're not against master it's advisable to mark patches [PATCH treename xx/nn] btw, for clarity. For the new/changed TypeInfos please add const from the start. Regards, Andreas [1] http://repo.or.cz/w/qemu/afaerber.git/shortlog/refs/heads/pci-host git://repo.or.cz/qemu/afaerber.git pci-host diff --git a/hw/pc.c b/hw/pc.c index d9a0443..f095109 100644 --- a/hw/pc.c +++ b/hw/pc.c @@ -1217,7 +1217,7 @@ static PCIBus *i440fx_init(I440FXPMCState **pi440fx_state, int *piix3_devfn, PCIHostState *h; s = I440FX(object_new(TYPE_I440FX)); -h = PCI_HOST(s); +h = PCI_HOST_BRIDGE(s); /* FIXME make a properties */ h-address_space = address_space_mem; diff --git a/hw/pci_host.c b/hw/pci_host.c index 3950e94..4e10042 100644 --- a/hw/pci_host.c +++ b/hw/pci_host.c @@ -165,11 +165,25 @@ const MemoryRegionOps pci_host_data_be_ops = { .endianness = DEVICE_BIG_ENDIAN, }; +void pci_host_set_mmio(PCIHostState *s, MemoryRegion *value) +{ +object_property_set_link(OBJECT(s), OBJECT(value), mmio, NULL); +} + +static void pci_host_initfn(Object *obj) +{ +PCIHostState *s = PCI_HOST_BRIDGE(obj); + +object_property_add_link(obj, mmio, memory-region, +(Object **)s-address_space, NULL); +} + static const TypeInfo pci_host_type_info = { .name = TYPE_PCI_HOST_BRIDGE, .parent = TYPE_SYS_BUS_DEVICE, .abstract = true, .instance_size = sizeof(PCIHostState), +.instance_init = pci_host_initfn, }; static void pci_host_register_types(void) diff --git a/hw/pci_host.h b/hw/pci_host.h index 4b9c300..9f28728 100644 --- a/hw/pci_host.h +++ b/hw/pci_host.h @@ -54,6 +54,8 @@ uint32_t pci_host_config_read_common(PCIDevice *pci_dev, uint32_t addr, void pci_data_write(PCIBus *s, uint32_t addr, uint32_t val, int len); uint32_t pci_data_read(PCIBus *s, uint32_t addr, int len); +void pci_host_set_mmio(PCIHostState *s, MemoryRegion *value); + extern const MemoryRegionOps pci_host_conf_le_ops; extern const MemoryRegionOps pci_host_conf_be_ops; extern const MemoryRegionOps pci_host_data_le_ops; diff --git a/hw/piix3.c b/hw/piix3.c index eca6ec8..3b69b15 100644 --- a/hw/piix3.c +++ b/hw/piix3.c @@ -204,7 +204,7 @@ static void piix3_class_init(ObjectClass *klass, void *data) k-class_id = PCI_CLASS_BRIDGE_ISA; } -static TypeInfo piix3_info = { +static const TypeInfo piix3_info = { .name = TYPE_PIIX3, .parent= TYPE_PCI_DEVICE, .instance_size = sizeof(PIIX3State), @@ -219,7 +219,7 @@ static void piix3_xen_class_init(ObjectClass *klass, void *data) k-config_write = piix3_write_config_xen; }; -static TypeInfo piix3_xen_info = { +static const TypeInfo piix3_xen_info = { .name = PIIX3-xen, .parent= TYPE_PIIX3, .instance_size = sizeof(PIIX3State), -- SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg,
[Qemu-devel] [PATCH 15/19] convert net_init_socket() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com I reverse engineered the following permissions between the -socket sub-options: fd listen connect mcast udp | localaddr fd x . .. . | . listen . x .. . | . connect. . x. . | . mcast . . .x . | x udp. . .. x | x ---+ localaddr . . .x x x I transformed the code accordingly. The real fix would be to embed fd, listen, connect, mcast and udp in a separate union. However OptsVisitor's enum parser only supports the type=XXX QemuOpt instance as union discriminator. Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net/socket.c | 119 +- 1 file changed, 43 insertions(+), 76 deletions(-) diff --git a/net/socket.c b/net/socket.c index 563447d0..e3cba20 100644 --- a/net/socket.c +++ b/net/socket.c @@ -586,101 +586,68 @@ static int net_socket_udp_init(VLANState *vlan, return 0; } -int net_init_socket(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_socket(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { -if (qemu_opt_get(opts, fd)) { -int fd; +const NetdevSocketOptions *sock; -if (qemu_opt_get(opts, listen) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, mcast) || -qemu_opt_get(opts, localaddr)) { -error_report(listen=, connect=, mcast= and localaddr= is invalid with fd=); -return -1; -} +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_SOCKET); +sock = opts-socket; -fd = net_handle_fd_param(cur_mon, qemu_opt_get(opts, fd)); -if (fd == -1) { -return -1; -} +if (sock-has_fd + sock-has_listen + sock-has_connect + sock-has_mcast + +sock-has_udp != 1) { +error_report(exactly one of fd=, listen=, connect=, mcast= or udp= + is required); +return -1; +} -if (!net_socket_fd_init(vlan, socket, name, fd, 1)) { -return -1; -} -} else if (qemu_opt_get(opts, listen)) { -const char *listen; - -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, mcast) || -qemu_opt_get(opts, localaddr)) { -error_report(fd=, connect=, mcast= and localaddr= is invalid with listen=); -return -1; -} +if (sock-has_localaddr !sock-has_mcast !sock-has_udp) { +error_report(localaddr= is only valid with mcast= or udp=); +return -1; +} -listen = qemu_opt_get(opts, listen); +if (sock-has_fd) { +int fd; -if (net_socket_listen_init(vlan, socket, name, listen) == -1) { -return -1; -} -} else if (qemu_opt_get(opts, connect)) { -const char *connect; - -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, listen) || -qemu_opt_get(opts, mcast) || -qemu_opt_get(opts, localaddr)) { -error_report(fd=, listen=, mcast= and localaddr= is invalid with connect=); +fd = net_handle_fd_param(cur_mon, sock-fd); +if (fd == -1 || !net_socket_fd_init(vlan, socket, name, fd, 1)) { return -1; } +return 0; +} -connect = qemu_opt_get(opts, connect); - -if (net_socket_connect_init(vlan, socket, name, connect) == -1) { +if (sock-has_listen) { +if (net_socket_listen_init(vlan, socket, name, sock-listen) == -1) { return -1; } -} else if (qemu_opt_get(opts, mcast)) { -const char *mcast, *localaddr; +return 0; +} -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, listen)) { -error_report(fd=, connect= and listen= is invalid with mcast=); +if (sock-has_connect) { +if (net_socket_connect_init(vlan, socket, name, sock-connect) == +-1) { return -1; } +return 0; +} -mcast = qemu_opt_get(opts, mcast); -localaddr = qemu_opt_get(opts, localaddr); - -if (net_socket_mcast_init(vlan, socket, name, mcast, localaddr) == -1) { -return -1; -} -} else if (qemu_opt_get(opts, udp)) { -const char *udp, *localaddr; - -if (qemu_opt_get(opts, fd) || -qemu_opt_get(opts, connect) || -qemu_opt_get(opts, listen) || -qemu_opt_get(opts, mcast)) { -error_report(fd=, connect=, listen= - and mcast= is invalid with udp=); +if (sock-has_mcast) { +/* if
[Qemu-devel] [PATCH 17/19] convert net_init_tap() to NetClientOptions
From: Laszlo Ersek ler...@redhat.com Signed-off-by: Laszlo Ersek ler...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- net/tap-aix.c |2 +- net/tap-bsd.c |2 +- net/tap-haiku.c |2 +- net/tap-linux.c |9 +++-- net/tap-solaris.c |2 +- net/tap-win32.c | 11 +++--- net/tap.c | 111 ++--- net/tap.h |2 +- 8 files changed, 71 insertions(+), 70 deletions(-) diff --git a/net/tap-aix.c b/net/tap-aix.c index e19aaba..f27c177 100644 --- a/net/tap-aix.c +++ b/net/tap-aix.c @@ -31,7 +31,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return -1; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-bsd.c b/net/tap-bsd.c index 937a94b..a3b717d 100644 --- a/net/tap-bsd.c +++ b/net/tap-bsd.c @@ -117,7 +117,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return fd; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-haiku.c b/net/tap-haiku.c index 91dda8e..34739d1 100644 --- a/net/tap-haiku.c +++ b/net/tap-haiku.c @@ -31,7 +31,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return -1; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-linux.c b/net/tap-linux.c index 41d581b..c6521be 100644 --- a/net/tap-linux.c +++ b/net/tap-linux.c @@ -98,16 +98,19 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required */ #define TAP_DEFAULT_SNDBUF 0 -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { int sndbuf; -sndbuf = qemu_opt_get_size(opts, sndbuf, TAP_DEFAULT_SNDBUF); +sndbuf = !tap-has_sndbuf ? TAP_DEFAULT_SNDBUF : + tap-sndbuf INT_MAX ? INT_MAX : + tap-sndbuf; + if (!sndbuf) { sndbuf = INT_MAX; } -if (ioctl(fd, TUNSETSNDBUF, sndbuf) == -1 qemu_opt_get(opts, sndbuf)) { +if (ioctl(fd, TUNSETSNDBUF, sndbuf) == -1 tap-has_sndbuf) { error_report(TUNSETSNDBUF ioctl failed: %s, strerror(errno)); return -1; } diff --git a/net/tap-solaris.c b/net/tap-solaris.c index cf76463..5d6ac42 100644 --- a/net/tap-solaris.c +++ b/net/tap-solaris.c @@ -197,7 +197,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr, int vnet_hdr_required return fd; } -int tap_set_sndbuf(int fd, QemuOpts *opts) +int tap_set_sndbuf(int fd, const NetdevTapOptions *tap) { return 0; } diff --git a/net/tap-win32.c b/net/tap-win32.c index b738f45..b6099cd 100644 --- a/net/tap-win32.c +++ b/net/tap-win32.c @@ -699,19 +699,20 @@ static int tap_win32_init(VLANState *vlan, const char *model, return 0; } -int net_init_tap(QemuOpts *opts, const NetClientOptions *new_opts, +int net_init_tap(QemuOpts *old_opts, const NetClientOptions *opts, const char *name, VLANState *vlan) { -const char *ifname; +const NetdevTapOptions *tap; -ifname = qemu_opt_get(opts, ifname); +assert(opts-kind == NET_CLIENT_OPTIONS_KIND_TAP); +tap = opts-tap; -if (!ifname) { +if (!tap-has_ifname) { error_report(tap: no interface name); return -1; } -if (tap_win32_init(vlan, tap, name, ifname) == -1) { +if (tap_win32_init(vlan, tap, name, tap-ifname) == -1) { return -1; } diff --git a/net/tap.c b/net/tap.c index 0fc856c..c5563c0 100644 --- a/net/tap.c +++ b/net/tap.c @@ -548,29 +548,32 @@ int net_init_bridge(QemuOpts *opts, const NetClientOptions *new_opts, return 0; } -static int net_tap_init(QemuOpts *opts, int *vnet_hdr) +static int net_tap_init(const NetdevTapOptions *tap, int *vnet_hdr, +const char *setup_script, char *ifname, +size_t ifname_sz) { int fd, vnet_hdr_required; -char ifname[128] = {0,}; -const char *setup_script; -if (qemu_opt_get(opts, ifname)) { -pstrcpy(ifname, sizeof(ifname), qemu_opt_get(opts, ifname)); +if (tap-has_ifname) { +pstrcpy(ifname, ifname_sz, tap-ifname); +} else { +assert(ifname_sz 0); +ifname[0] = '\0'; } -*vnet_hdr = qemu_opt_get_bool(opts, vnet_hdr, 1); -if (qemu_opt_get(opts, vnet_hdr)) { +if (tap-has_vnet_hdr) { +*vnet_hdr = tap-vnet_hdr; vnet_hdr_required = *vnet_hdr; } else { +*vnet_hdr = 1; vnet_hdr_required = 0; } -TFR(fd = tap_open(ifname, sizeof(ifname), vnet_hdr, vnet_hdr_required)); +TFR(fd = tap_open(ifname, ifname_sz, vnet_hdr, vnet_hdr_required)); if (fd 0) { return -1; } -setup_script
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 23 July 2012 13:26, Avi Kivity a...@redhat.com wrote: Really, irqchip in kernel means asynchronous interrupts - you can inject an interrupt from outside the vcpu thread. Obviously if the vcpu is sleeping you need to wake it up and that pulls in idle management. irqchip for x86 really means the local APIC, which has a synchronous interface with the cpu core. local APIC in kernel really is equivalent to kernel idle management, KVM_IRQ_LINE, and irqfd. The ioapic and pit, on the other hand, don't contribute anything to this (just performance). So yes, ARM with and without GIC are irqchip_in_kernel, since the ARM-GIC interface is asynchronous. Whether to emulate the GIC or not is just a performance question. So should we be using something other than KVM_CREATE_IRQCHIP to ask the kernel to create a GIC model for us (and leave KVM_CREATE_IRQCHIP as a dummy always succeed ioctl)? So my view is that ARM with and without kernel GIC are irqchip_in_kernel, since whatever is the local APIC in ARM is always emulated in the kernel. I'm not sure ARM has any equivalent to the local APIC -- the GIC deals with everything and we don't have any equivalent division of labour to the x86 LAPIC-IOAPIC one. -- PMM
Re: [Qemu-devel] [PATCHv4 0/4] Sandboxing Qemu guests with Libseccomp
On Tue, Jul 17, 2012 at 04:19:11PM -0300, Eduardo Otubo wrote: Hello all, This patch is an effort to sandbox Qemu guests using Libseccomp[0]. The patches that follows are pretty simple and straightforward. I added the correct options and checks to the configure script and the basic calls to libseccomp in the main loop at vl.c. Details of each one are in the emails of the patch set. This support limits the system call footprint of the entire QEMU process to a limited set of syscalls, those that we know QEMU uses. The idea is to limit the allowable syscalls, therefore limiting the impact that an attacked guest could have on the host system. It's important to note that the libseccomp itself needs the seccomp mode 2 feature in the kernel, which is only available in kernel versions older (or equal) than 3.5-rc1. v2: Files separated in qemu-seccomp.c and qemu-seccomp.h for a cleaner implementation. The development was tested with the 3.5-rc1 kernel. v3: As we discussed in previous emails in this mailing list, this feature is not supposed to replace existing security feature, but add another layer to the whole. The whitelist should contain all the syscalls QEMU needs. And as stated by Will Drewry's commit message[1]: Filter programs will be inherited across fork/clone and execve., the same white list should be passed along from the father process to the child, then execve() shouldn't be a problem. Note that there's a feature PR_SET_NO_NEW_PRIVS in seccomp mode 2 in the kernel, this prevents processes from gaining privileges on execve. For example, this will prevent qemu (if running unprivileged) from executing setuid programs[2]. v4: Introducing debug mode on libseccomp support. The debug mode will set the flag SCMP_ACT_TRAP when calling seccomp_start(). It will verbosely print a message to the stderr in the form seccomp: illegal system call execution trapped: XXX and resume the execution. This is really just used as debug mode, it helps users and developers to full fill the whitelist. As always, comments are more than welcome. Hello folks, Does anyone got a chance to take a look at these? Thanks in advance :) Regards, [0] - http://sourceforge.net/projects/libseccomp/ [1] - http://git.kernel.org/?p=linux/kernel/git/next/linux-next.git;a=commit;h=e2cfabdfd075648216f99c2c03821cf3f47c1727 [2] - https://lkml.org/lkml/2012/4/12/457 Eduardo Otubo (4): Adding support for libseccomp in configure and Makefile Adding qemu-seccomp.[ch] Adding qemu-seccomp-debug.[ch] Adding seccomp calls to vl.c Makefile.objs| 10 configure| 34 ++ qemu-seccomp-debug.c | 95 + qemu-seccomp-debug.h | 38 +++ qemu-seccomp.c | 126 ++ qemu-seccomp.h | 22 + vl.c | 31 + 7 files changed, 356 insertions(+) create mode 100644 qemu-seccomp-debug.c create mode 100644 qemu-seccomp-debug.h create mode 100644 qemu-seccomp.c create mode 100644 qemu-seccomp.h -- 1.7.9.5 -- Eduardo Otubo Software Engineer Linux Technology Center IBM Systems Technology Group Mobile: +55 19 8135 0885 eot...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On Mon, 23 Jul 2012 15:18:49 +0300 Avi Kivity a...@redhat.com wrote: So, for example, if a specific subchannel (=device) has pending status and an I/O interrupt is to be generated, this interrupt remains pending until an arbitrary cpu is enabled for I/O interrupts. If several cpus are enabled for I/O interrupts, any of them may be interrupted. This may be costly to emulate. On x86 we do not have access to a guest's interrupt status while it is running. Is this not the case for s390? Oh, let me guess. You write some interrupt descriptor in memory somewhere, issue one of your famous instructions, and the hardware finds a guest vcpu and injects the interrupt. Basically, we have some flags in our control block we can set so that the cpu drops out of SIE whenever external/I/O/... interrupts are enabled and then have the host do the lowcore updates, psw swaps, etc. x86 has a least priority mode which is similar to what you're describing, but I don't think we emulate it correctly. When an I/O interrupt is delivered on a cpu, the cpu's lowcore contains the interrupt payload which defines the subchannel (=device) the interrupt is for. Any idea on how this architecture can be married with the irqchip concept is welcome. If all else fails, would a special irqfd concept for !irqchip be acceptable? I don't see an issue. You need an arch-specific irqfd configuration ioctl (or your own data field in the existing ioctl) that defines the payload. Then the kernel installs a poll function on that eventfd that, when called, does the magic sequence needed to get the interrupt there. If extending the existing ioctl is acceptable, I think we will go that route. While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about. You mean injection via ioctl() that is asynchronous to vcpu execution? Yes, although we use a different ioctl than the others. Cornelia
[Qemu-devel] [PATCH V3 2/3] qerror: Add error telling that streaming blocks migration
From: Benoît Canet ben...@irqsave.net Signed-off-by: Benoit Canet ben...@irqsave.net --- qerror.c |4 qerror.h |3 +++ 2 files changed, 7 insertions(+) diff --git a/qerror.c b/qerror.c index 92c4eff..c7889fe 100644 --- a/qerror.c +++ b/qerror.c @@ -283,6 +283,10 @@ static const QErrorStringTable qerror_table[] = { .desc = Could not set password, }, { +.error_fmt = QERR_MIGRATION_BLOCKED_BY_STREAMING, +.desc = Migration is blocked by streaming, +}, +{ .error_fmt = QERR_TOO_MANY_FILES, .desc = Too many open files, }, diff --git a/qerror.h b/qerror.h index b4c8758..dca27f7 100644 --- a/qerror.h +++ b/qerror.h @@ -233,6 +233,9 @@ QError *qobject_to_qerror(const QObject *obj); #define QERR_SET_PASSWD_FAILED \ { 'class': 'SetPasswdFailed', 'data': {} } +#define QERR_MIGRATION_BLOCKED_BY_STREAMING \ +{ 'class': 'MigrationBlockedByStreaming', 'data': {} } + #define QERR_TOO_MANY_FILES \ { 'class': 'TooManyFiles', 'data': {} } -- 1.7.9.5
[Qemu-devel] [PATCH V3 3/3] migration: block migration when streaming block jobs are running.
From: Benoît Canet ben...@irqsave.net Signed-off-by: Benoit Canet ben...@irqsave.net --- migration.c |5 + 1 file changed, 5 insertions(+) diff --git a/migration.c b/migration.c index 8db1b43..4ffdcf2 100644 --- a/migration.c +++ b/migration.c @@ -425,6 +425,11 @@ void qmp_migrate(const char *uri, bool has_blk, bool blk, return; } +if (bdrv_are_busy()) { +error_set(errp, QERR_MIGRATION_BLOCKED_BY_STREAMING); +return; +} + s = migrate_init(params); if (strstart(uri, tcp:, p)) { -- 1.7.9.5
[Qemu-devel] [PATCH V3 1/3] block: Add bdrv_are_busy()
From: Benoît Canet ben...@irqsave.net bdrv_are_busy will be used to check if any of the bs are in use or if one of them have a running block job. The first user will be qmp_migrate(). Signed-off-by: Benoit Canet ben...@irqsave.net --- block.c | 13 + block.h |2 ++ 2 files changed, 15 insertions(+) diff --git a/block.c b/block.c index ce7eb8f..bc8f160 100644 --- a/block.c +++ b/block.c @@ -4027,6 +4027,19 @@ out: return ret; } +int bdrv_are_busy(void) +{ +BlockDriverState *bs; + +QTAILQ_FOREACH(bs, bdrv_states, list) { +if (bs-job || bdrv_in_use(bs)) { +return -EBUSY; +} +} + +return 0; +} + void *block_job_create(const BlockJobType *job_type, BlockDriverState *bs, int64_t speed, BlockDriverCompletionFunc *cb, void *opaque, Error **errp) diff --git a/block.h b/block.h index c89590d..0a3de2f 100644 --- a/block.h +++ b/block.h @@ -337,6 +337,8 @@ void bdrv_disable_copy_on_read(BlockDriverState *bs); void bdrv_set_in_use(BlockDriverState *bs, int in_use); int bdrv_in_use(BlockDriverState *bs); +int bdrv_are_busy(void); + enum BlockAcctType { BDRV_ACCT_READ, BDRV_ACCT_WRITE, -- 1.7.9.5
[Qemu-devel] [PATCH V3 0/3] Block migration when streaming block jobs are running
From: Benoît Canet ben...@irqsave.net This patchset is designed to avoid starting a live migration while one or more streaming block jobs are running. Tested with the following sequence: QEMU 1.1.50 monitor - type 'help' for more information (qemu) block_stream virtio0 1k (qemu) migrate tcp:localhost: migrate: Migration is blocked by streaming (qemu) block_job_cancel virtio0 (qemu) migrate tcp:localhost: migrate: Connection can not be completed immediately (qemu) = migration then succeed in v2: stefanha: Rename bdrv_have_block_jobs() to bdrv_are_busy() and make it return -EBUSY. paolo: remove spurious bdrv_close() in v3 pm215: rewrite confusing error message Benoît Canet (3): block: Add bdrv_are_busy() qerror: Add error telling that streaming blocks migration migration: block migration when streaming block jobs are running. block.c | 13 + block.h |2 ++ migration.c |5 + qerror.c|4 qerror.h|3 +++ 5 files changed, 27 insertions(+) -- 1.7.9.5
Re: [Qemu-devel] [PATCH] check for available room when formatting OpenFirmware device path
On 07/23/12 14:46, Markus Armbruster wrote: Laszlo Ersek ler...@redhat.com writes: Signed-off-by: Laszlo Ersek ler...@redhat.com --- hw/qdev.c | 14 +- vl.c |7 ++- 2 files changed, 19 insertions(+), 2 deletions(-) diff --git a/hw/qdev.c b/hw/qdev.c index af54467..f1e83a4 100644 --- a/hw/qdev.c +++ b/hw/qdev.c @@ -502,6 +502,10 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, char *p, int size) if (dev dev-parent_bus) { char *d; l = qdev_get_fw_dev_path_helper(dev-parent_bus-parent, p, size); +if (l = size) { +return l; +} + d = bus_get_fw_dev_path(dev-parent_bus, dev); if (d) { l += snprintf(p + l, size - l, %s, d); @@ -509,6 +513,10 @@ static int qdev_get_fw_dev_path_helper(DeviceState *dev, char *p, int size) } else { l += snprintf(p + l, size - l, %s, object_get_typename(OBJECT(dev))); } + +if (l = size) { +return l; +} } l += snprintf(p + l , size - l, /); If the return value is less than the size argument, it's the length of the string written into p[]. Else, it means p[] has insufficient space. Yes. (snprintf() returns the number of bytes it would store, excluding the terminating NUL, had there been enough room. http://pubs.opengroup.org/onlinepubs/9699919799/functions/snprintf.html#tag_16_159_04) Did I make a mistake? Supposing snprintf() encounters no error, it returns a positive value P in the above. P = snprintf(..., size - l0, ...) l1 = l0 + P; l1 = size - l0 + P = size - P = size - l0 The return value of qdev_get_fw_dev_path_helper() comes from another invocation of itself, or from the trailing snprintf(), so it behaves like snprintf. Or what do you have in mind? @@ -520,8 +528,12 @@ char* qdev_get_fw_dev_path(DeviceState *dev) char path[128]; int l; -l = qdev_get_fw_dev_path_helper(dev, path, 128); +l = qdev_get_fw_dev_path_helper(dev, path, sizeof(path)); +assert(l 0); +if (l = sizeof(path)) { +return NULL; +} Failure mode could be avoided with the common technique: make qdev_get_fw_dev_path_helper() return the true length. If it fit into path[], return strdup(path). Else, allocate a suitable buffer and try again. What do you think? You're right of course, but I didn't have (or wanted to spend) the time to change the code that much (and to test it...), especially because it would have been fixed already if it had caused actual problems. I think the patch could work as a last resort, small improvement, but I don't mind if someone posts a better fix :) Laszlo
[Qemu-devel] [PATCH 2/2] qxl: add QXL_IO_MONITORS_CONFIG_ASYNC
bumps spice-protocol to 0.12.0 for new IO. revision bumped to 4 for new IO support, enabled for spice-server = 0.11.1 RHBZ: 770842 Signed-off-by: Alon Levy al...@redhat.com --- configure|2 +- hw/qxl.c | 29 - hw/qxl.h |4 trace-events |1 + 4 files changed, 34 insertions(+), 2 deletions(-) diff --git a/configure b/configure index cef0a71..5fcd315 100755 --- a/configure +++ b/configure @@ -2630,7 +2630,7 @@ EOF spice_cflags=$($pkg_config --cflags spice-protocol spice-server 2/dev/null) spice_libs=$($pkg_config --libs spice-protocol spice-server 2/dev/null) if $pkg_config --atleast-version=0.8.2 spice-server /dev/null 21 \ - $pkg_config --atleast-version=0.8.1 spice-protocol /dev/null 21 \ + $pkg_config --atleast-version=0.12.0 spice-protocol /dev/null 21 \ compile_prog $spice_cflags $spice_libs ; then spice=yes libs_softmmu=$libs_softmmu $spice_libs diff --git a/hw/qxl.c b/hw/qxl.c index 3a883ce..5b5d380 100644 --- a/hw/qxl.c +++ b/hw/qxl.c @@ -249,6 +249,18 @@ static void qxl_spice_destroy_surfaces(PCIQXLDevice *qxl, qxl_async_io async) } } +#if SPICE_SERVER_VERSION = 0x000b01 /* 0.11.1 */ +static void qxl_spice_monitors_config_async(PCIQXLDevice *qxl) +{ +trace_qxl_spice_monitors_config(qxl-id); +spice_qxl_monitors_config_async(qxl-ssd.qxl, +qxl-ram-monitors_config, +MEMSLOT_GROUP_GUEST, +(uintptr_t)qxl_cookie_new(QXL_COOKIE_TYPE_IO, + QXL_IO_MONITORS_CONFIG_ASYNC)); +} +#endif + void qxl_spice_reset_image_cache(PCIQXLDevice *qxl) { trace_qxl_spice_reset_image_cache(qxl-id); @@ -538,6 +550,7 @@ static const char *io_port_to_string(uint32_t io_port) = QXL_IO_DESTROY_ALL_SURFACES_ASYNC, [QXL_IO_FLUSH_SURFACES_ASYNC] = QXL_IO_FLUSH_SURFACES_ASYNC, [QXL_IO_FLUSH_RELEASE] = QXL_IO_FLUSH_RELEASE, +[QXL_IO_MONITORS_CONFIG_ASYNC] = QXL_IO_MONITORS_CONFIG_ASYNC, }; return io_port_to_string[io_port]; } @@ -1333,7 +1346,7 @@ static void ioport_write(void *opaque, target_phys_addr_t addr, io_port, io_port_to_string(io_port)); /* be nice to buggy guest drivers */ if (io_port = QXL_IO_UPDATE_AREA_ASYNC -io_port = QXL_IO_DESTROY_ALL_SURFACES_ASYNC) { +io_port = QXL_IO_MONITORS_CONFIG_ASYNC) { qxl_send_events(d, QXL_INTERRUPT_IO_CMD); } return; @@ -1361,6 +1374,9 @@ static void ioport_write(void *opaque, target_phys_addr_t addr, io_port = QXL_IO_DESTROY_ALL_SURFACES; goto async_common; case QXL_IO_FLUSH_SURFACES_ASYNC: +#if SPICE_SERVER_VERSION = 0x000b01 /* 0.11.1 */ +case QXL_IO_MONITORS_CONFIG_ASYNC: +#endif async_common: async = QXL_ASYNC; qemu_mutex_lock(d-async_lock); @@ -1490,6 +1506,11 @@ async_common: d-mode = QXL_MODE_UNDEFINED; qxl_spice_destroy_surfaces(d, async); break; +#if SPICE_SERVER_VERSION = 0x000b01 /* 0.11.1 */ +case QXL_IO_MONITORS_CONFIG_ASYNC: +qxl_spice_monitors_config_async(d); +break; +#endif default: qxl_set_guest_bug(d, %s: unexpected ioport=0x%x\n, __func__, io_port); } @@ -1785,9 +1806,15 @@ static int qxl_init_common(PCIQXLDevice *qxl) io_size = 16; break; case 3: /* qxl-3 */ +pci_device_rev = QXL_REVISION_STABLE_V10; +io_size = 32; /* PCI region size must be pow2 */ +break; +#if SPICE_SERVER_VERSION = 0x000b01 /* 0.11.1 */ +case 4: /* qxl-4 */ pci_device_rev = QXL_DEFAULT_REVISION; io_size = msb_mask(QXL_IO_RANGE_SIZE * 2 - 1); break; +#endif default: error_report(Invalid revision %d for qxl device (max %d), qxl-revision, QXL_DEFAULT_REVISION); diff --git a/hw/qxl.h b/hw/qxl.h index 172baf6..c1aadaa 100644 --- a/hw/qxl.h +++ b/hw/qxl.h @@ -128,7 +128,11 @@ typedef struct PCIQXLDevice { } \ } while (0) +#if SPICE_SERVER_VERSION = 0x000b01 /* 0.11.1 */ +#define QXL_DEFAULT_REVISION QXL_REVISION_STABLE_V12 +#else #define QXL_DEFAULT_REVISION QXL_REVISION_STABLE_V10 +#endif /* qxl.c */ void *qxl_phys2virt(PCIQXLDevice *qxl, QXLPHYSICAL phys, int group_id); diff --git a/trace-events b/trace-events index 2a5f074..3db04ad 100644 --- a/trace-events +++ b/trace-events @@ -954,6 +954,7 @@ qxl_spice_destroy_surfaces(int qid, int async) %d async=%d qxl_spice_destroy_surface_wait_complete(int qid, uint32_t id) %d sid=%d qxl_spice_destroy_surface_wait(int qid, uint32_t id, int async) %d sid=%d async=%d qxl_spice_flush_surfaces_async(int qid, uint32_t surface_count, uint32_t num_free_res) %d s#=%d, res#=%d +qxl_spice_monitors_config(int id) %d qxl_spice_loadvm_commands(int qid, void *ext, uint32_t count) %d ext=%p
[Qemu-devel] [PATCH 1/2] qxl: disallow unknown revisions
Signed-off-by: Alon Levy al...@redhat.com --- hw/qxl.c |5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/hw/qxl.c b/hw/qxl.c index c2dd3b4..3a883ce 100644 --- a/hw/qxl.c +++ b/hw/qxl.c @@ -1785,10 +1785,13 @@ static int qxl_init_common(PCIQXLDevice *qxl) io_size = 16; break; case 3: /* qxl-3 */ -default: pci_device_rev = QXL_DEFAULT_REVISION; io_size = msb_mask(QXL_IO_RANGE_SIZE * 2 - 1); break; +default: +error_report(Invalid revision %d for qxl device (max %d), + qxl-revision, QXL_DEFAULT_REVISION); +return -1; } pci_set_byte(config[PCI_REVISION_ID], pci_device_rev); -- 1.7.10.1
[Qemu-devel] [PATCH v5 5/6] block: Convert close calls to qemu_close
This patch converts all block layer close calls, that correspond to qemu_open calls, to qemu_close. v5: -This patch is new in v5. (kw...@redhat.com, ebl...@redhat.com) Signed-off-by: Corey Bryant cor...@linux.vnet.ibm.com --- block/raw-posix.c | 24 block/raw-win32.c |2 +- block/vmdk.c |4 ++-- block/vpc.c |2 +- block/vvfat.c | 12 ++-- osdep.c |5 + qemu-common.h |1 + savevm.c |4 ++-- 8 files changed, 30 insertions(+), 24 deletions(-) diff --git a/block/raw-posix.c b/block/raw-posix.c index 7408a42..a172de3 100644 --- a/block/raw-posix.c +++ b/block/raw-posix.c @@ -271,7 +271,7 @@ static int raw_open_common(BlockDriverState *bs, const char *filename, out_free_buf: qemu_vfree(s-aligned_buf); out_close: -close(fd); +qemu_close(fd); return -errno; } @@ -376,7 +376,7 @@ static void raw_close(BlockDriverState *bs) { BDRVRawState *s = bs-opaque; if (s-fd = 0) { -close(s-fd); +qemu_close(s-fd); s-fd = -1; if (s-aligned_buf != NULL) qemu_vfree(s-aligned_buf); @@ -580,7 +580,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) if (ftruncate(fd, total_size * BDRV_SECTOR_SIZE) != 0) { result = -errno; } -if (close(fd) != 0) { +if (qemu_close(fd) != 0) { result = -errno; } } @@ -850,7 +850,7 @@ static int hdev_open(BlockDriverState *bs, const char *filename, int flags) if (fd 0) { bsdPath[strlen(bsdPath)-1] = '1'; } else { -close(fd); +qemu_close(fd); } filename = bsdPath; } @@ -889,7 +889,7 @@ static int fd_open(BlockDriverState *bs) last_media_present = (s-fd = 0); if (s-fd = 0 (get_clock() - s-fd_open_time) = FD_OPEN_TIMEOUT) { -close(s-fd); +qemu_close(s-fd); s-fd = -1; #ifdef DEBUG_FLOPPY printf(Floppy closed\n); @@ -988,7 +988,7 @@ static int hdev_create(const char *filename, QEMUOptionParameter *options) else if (lseek(fd, 0, SEEK_END) total_size * BDRV_SECTOR_SIZE) ret = -ENOSPC; -close(fd); +qemu_close(fd); return ret; } @@ -1038,7 +1038,7 @@ static int floppy_open(BlockDriverState *bs, const char *filename, int flags) return ret; /* close fd so that we can reopen it as needed */ -close(s-fd); +qemu_close(s-fd); s-fd = -1; s-fd_media_changed = 1; @@ -1070,7 +1070,7 @@ static int floppy_probe_device(const char *filename) prio = 100; outc: -close(fd); +qemu_close(fd); out: return prio; } @@ -1105,14 +1105,14 @@ static void floppy_eject(BlockDriverState *bs, bool eject_flag) int fd; if (s-fd = 0) { -close(s-fd); +qemu_close(s-fd); s-fd = -1; } fd = qemu_open(bs-filename, s-open_flags | O_NONBLOCK); if (fd = 0) { if (ioctl(fd, FDEJECT, 0) 0) perror(FDEJECT); -close(fd); +qemu_close(fd); } } @@ -1173,7 +1173,7 @@ static int cdrom_probe_device(const char *filename) prio = 100; outc: -close(fd); +qemu_close(fd); out: return prio; } @@ -1281,7 +1281,7 @@ static int cdrom_reopen(BlockDriverState *bs) * FreeBSD seems to not notice sometimes... */ if (s-fd = 0) -close(s-fd); +qemu_close(s-fd); fd = qemu_open(bs-filename, s-open_flags, 0644); if (fd 0) { s-fd = -1; diff --git a/block/raw-win32.c b/block/raw-win32.c index 8d7838d..c56bf83 100644 --- a/block/raw-win32.c +++ b/block/raw-win32.c @@ -261,7 +261,7 @@ static int raw_create(const char *filename, QEMUOptionParameter *options) return -EIO; set_sparse(fd); ftruncate(fd, total_size * 512); -close(fd); +qemu_close(fd); return 0; } diff --git a/block/vmdk.c b/block/vmdk.c index 557dc1b..daee426 100644 --- a/block/vmdk.c +++ b/block/vmdk.c @@ -1258,7 +1258,7 @@ static int vmdk_create_extent(const char *filename, int64_t filesize, ret = 0; exit: -close(fd); +qemu_close(fd); return ret; } @@ -1506,7 +1506,7 @@ static int vmdk_create(const char *filename, QEMUOptionParameter *options) } ret = 0; exit: -close(fd); +qemu_close(fd); return ret; } diff --git a/block/vpc.c b/block/vpc.c index 60ebf5a..c0b82c4 100644 --- a/block/vpc.c +++ b/block/vpc.c @@ -744,7 +744,7 @@ static int vpc_create(const char *filename, QEMUOptionParameter *options) } fail: -close(fd); +qemu_close(fd); return ret; } diff --git a/block/vvfat.c b/block/vvfat.c index 22b586a..59d3c5b 100644 --- a/block/vvfat.c +++ b/block/vvfat.c @@ -1105,7 +1105,7 @@ static inline void vvfat_close_current_file(BDRVVVFATState *s) if(s-current_mapping) {
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 07/23/2012 03:58 PM, Peter Maydell wrote: On 23 July 2012 13:26, Avi Kivity a...@redhat.com wrote: Really, irqchip in kernel means asynchronous interrupts - you can inject an interrupt from outside the vcpu thread. Obviously if the vcpu is sleeping you need to wake it up and that pulls in idle management. irqchip for x86 really means the local APIC, which has a synchronous interface with the cpu core. local APIC in kernel really is equivalent to kernel idle management, KVM_IRQ_LINE, and irqfd. The ioapic and pit, on the other hand, don't contribute anything to this (just performance). So yes, ARM with and without GIC are irqchip_in_kernel, since the ARM-GIC interface is asynchronous. Whether to emulate the GIC or not is just a performance question. So should we be using something other than KVM_CREATE_IRQCHIP to ask the kernel to create a GIC model for us (and leave KVM_CREATE_IRQCHIP as a dummy always succeed ioctl)? Some time ago I suggested using the parameter to KVM_CREATE_IRQCHIP to select the irqchip type. So my view is that ARM with and without kernel GIC are irqchip_in_kernel, since whatever is the local APIC in ARM is always emulated in the kernel. I'm not sure ARM has any equivalent to the local APIC -- the GIC deals with everything and we don't have any equivalent division of labour to the x86 LAPIC-IOAPIC one. It's probably a tiny part of the core with no name. The point is that the x86-lapic interface is synchronous and bidirectional, while the lapic-ioapic interface is asynchronous (it is still bidirectional, but not in a stop-the-vcpu way). I assume the ARM-GIC interface is unidirectional? -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH] kvm: Move kvm_allows_irq0_override() to target-i386
On 07/23/2012 04:06 PM, Cornelia Huck wrote: On Mon, 23 Jul 2012 15:18:49 +0300 Avi Kivity a...@redhat.com wrote: So, for example, if a specific subchannel (=device) has pending status and an I/O interrupt is to be generated, this interrupt remains pending until an arbitrary cpu is enabled for I/O interrupts. If several cpus are enabled for I/O interrupts, any of them may be interrupted. This may be costly to emulate. On x86 we do not have access to a guest's interrupt status while it is running. Is this not the case for s390? Oh, let me guess. You write some interrupt descriptor in memory somewhere, issue one of your famous instructions, and the hardware finds a guest vcpu and injects the interrupt. Basically, we have some flags in our control block we can set so that the cpu drops out of SIE whenever external/I/O/... interrupts are enabled and then have the host do the lowcore updates, psw swaps, etc. Can you write them from a different cpu and expect them to take effect? How do you emulate an interrupt with a large guest? You have to update the flags in the control blocks for all vcpus; then restore them when the interrupt is delivered? I don't see an issue. You need an arch-specific irqfd configuration ioctl (or your own data field in the existing ioctl) that defines the payload. Then the kernel installs a poll function on that eventfd that, when called, does the magic sequence needed to get the interrupt there. If extending the existing ioctl is acceptable, I think we will go that route. Sure. While you don't have an irqchip, you do have asynchronous interrupt injection, yes? That's what irqchip really is all about. You mean injection via ioctl() that is asynchronous to vcpu execution? Yes, although we use a different ioctl than the others. Ok. The difference between irqfd and the ioctl is that with irqfd everything (the payload in your case, the interrupt number for others) is prepared beforehand, while with the ioctl the extra information is delivered with the ioctl. -- error compiling committee.c: too many arguments to function