[Qemu-devel] [PATCH] net: Add the missing option declaration of vhostforce
Signed-off-by: Jason Wang jasow...@redhat.com --- net.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/net.c b/net.c index 9ba5be2..21d4443 100644 --- a/net.c +++ b/net.c @@ -1025,7 +1025,11 @@ static const struct { .name = vhostfd, .type = QEMU_OPT_STRING, .help = file descriptor of an already opened vhost net device, -}, +}, { +.name = vhostforce, +.type = QEMU_OPT_BOOL, +.help = force vhost on for non-MSIX virtio guests, +}, #endif /* _WIN32 */ { /* end of list */ } },
Re: [Qemu-devel] [PATCH] mst_fpga: correct irq level settings
On 16 February 2011 14:22, Dmitry Eremin-Solenikov dbarysh...@gmail.com wrote: Final corrections for IRQ levels that are set by mst_fpga: * Don't retranslate IRQ if previously IRQ was masked. * After setting or clearing IRQs through register, apply mask before setting parent IRQ level. Thanks, applied this change. However now to have a completely correct behaviour, I think we need something like the following, what do you think? (prev_level is now unused, but the main change is not masking 1u irq) diff --git a/hw/mst_fpga.c b/hw/mst_fpga.c index 407bac9..f66de69 100644 --- a/hw/mst_fpga.c +++ b/hw/mst_fpga.c @@ -31,7 +31,6 @@ typedef struct mst_irq_state{ qemu_irq parent; - uint32_t prev_level; uint32_t leddat1; uint32_t leddat2; uint32_t ledctrl; @@ -53,11 +52,6 @@ mst_fpga_set_irq(void *opaque, int irq, int level) uint32_t oldint = s-intsetclr s-intmskena; if (level) - s-prev_level |= 1u irq; - else - s-prev_level = ~(1u irq); - - if ((s-intmskena (1u irq)) level) s-intsetclr |= 1u irq; if (oldint != (s-intsetclr s-intmskena)) @@ -193,12 +187,11 @@ static int mst_fpga_init(SysBusDevice *dev) static VMStateDescription vmstate_mst_fpga_regs = { .name = mainstone_fpga, - .version_id = 0, - .minimum_version_id = 0, - .minimum_version_id_old = 0, + .version_id = 1, + .minimum_version_id = 1, + .minimum_version_id_old = 1, .post_load = mst_fpga_post_load, .fields = (VMStateField []) { - VMSTATE_UINT32(prev_level, mst_irq_state), VMSTATE_UINT32(leddat1, mst_irq_state), VMSTATE_UINT32(leddat2, mst_irq_state), VMSTATE_UINT32(ledctrl, mst_irq_state), Cheers
[Qemu-devel] Re: [PATCH] net: Add the missing option declaration of vhostforce
On Fri, Feb 25, 2011 at 04:11:27PM +0800, Jason Wang wrote: Signed-off-by: Jason Wang jasow...@redhat.com Acked-by: Michael S. Tsirkin m...@redhat.com --- net.c |6 +- 1 files changed, 5 insertions(+), 1 deletions(-) diff --git a/net.c b/net.c index 9ba5be2..21d4443 100644 --- a/net.c +++ b/net.c @@ -1025,7 +1025,11 @@ static const struct { .name = vhostfd, .type = QEMU_OPT_STRING, .help = file descriptor of an already opened vhost net device, -}, +}, { +.name = vhostforce, +.type = QEMU_OPT_BOOL, +.help = force vhost on for non-MSIX virtio guests, +}, #endif /* _WIN32 */ { /* end of list */ } },
Re: [Qemu-devel] [PATCH] Remove a detached device from qemu_device_opts.
Minoru Usui u...@mxm.nes.nec.co.jp writes: Hi, William, Markus and other people. On Wed, 23 Feb 2011 10:42:02 +0100 William Dauchy wdau...@gmail.com wrote: Hi Minoru, On Tue, Feb 15, 2011 at 3:32 AM, Minoru Usui u...@mxm.nes.nec.co.jp wrote: I can reproduce, too. But strangely, it don't occur in case of loading acpiphp driver to the guest VM on below environment. Host : RHEL6.0 Guest: RHEL5.5 Unfortunately, I'm not familiar with qemu-kvm. I investigated below questions about this problem, but I couldn't resolve them. - How to call qdev_free() asynchronously. (How should we fix this problem) - Why it don't occur with acpiphp driver If anyone knows answer of above questions or its clue, please let me know. If fact this is not a bug. `qdev_free` is called when the acpi detach succeed in `pciej_write`. The virtual machine has to correctly support acpi signals. Please read the explanation from Markus Armbruster on http://lists.nongnu.org/archive/html/qemu-devel/2011-02/msg02637.html William, Thank you for your help and telling me about it. Markus, Thank you for your detailed explanation. Basically, I understand behaviour of device_del command. The result of pci hotunplug depends on behaviour of guest OS, but device_del command doesn't wait hotunplug's result. May I ask you a question? Which device does qemu_device_opts manage? just hotplugged to virtual machine? Or hotplugged to guest OS? By the present implementation, device_add command adds qemu_device_opts immediately, whether guest OS can hotplug the device or not. Nevertheless, device_del command waits for the device appropriately until it is hotunplugged by the guest OS. By Markus's explanation, device_del command can't wait for the device which hotunplugged from guest OS. So, I feel it's better that qemu_device_opts manages the device which hotplugged to guest OS. If I am wrong, please let me know. qemu_device_opts holds the currently defined device configurations. A device configuration becomes defined the moment its QemuOpts get created (for -device and device_add: right when the argument gets parsed, which is *before* the device gets created, let alone plugged). It ceases to be defined when device creation fails, or when the device is deleted after unplug completed. qemu_device_opts is *not* the set of devices currently plugged in. That information is encoded in the device tree.
Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject
Anthony Liguori aligu...@linux.vnet.ibm.com writes: On 02/24/2011 10:20 AM, Markus Armbruster wrote: Anthony Liguorialigu...@linux.vnet.ibm.com writes: On 02/24/2011 02:33 AM, Markus Armbruster wrote: Anthony Liguorianth...@codemonkey.ws writes: [...] Please describe all expected errors. Quoting qmp-commands.hx: 3. Errors, in special, are not documented. Applications should NOT check for specific errors classes or data (it's strongly recommended to only check for the error key) Indeed, not a single error is documented there. This is intentional. Yeah, but we're not 0.14 anymore and for 0.15, we need to document errors. If you are suggesting I send a patch to remove that section, I'm more than happy to. Two separate issues here: 1. Are we ready to commit to the current design of errors, and 2. Is it fair to reject Lai's patch now because he doesn't document his errors. I'm not commenting on 1. here. Regarding 2.: rejecting a patch because it doesn't document an aspect that current master intentionally leaves undocumented is not how you treat contributors. At least not if you want any other than certified masochists who enjoy pain, and professionals who get adequately compensated for it. Lead by example, not by fiat. http://repo.or.cz/w/qemu/aliguori.git/blob/refs/heads/glib:/qmp-schema.json I am in the process of documenting the errors of every command. It's a royal pain but I'm going to document everything we have right now. It's actually the last bit of work I need to finish before sending QAPI out. So for new commands being added, it would be hugely helpful for the authors to document the errors such that I don't have to reverse engineer all of the possible error conditions. The moment this lands in master, you can begin to demand error descriptions from contributors. Until then, I'll NAK error descriptions in qmp-commands.txt. We left them undocumented there for good reasons: Once we have an error design in place that has a reasonable hope to stand the test of time, and have errors documented for at least some of the commands here, we can start to require proper error documentation for new commands. But not now. I won't NAK non-normative error descriptions, say in commit messages, or in comments. And I won't object to you asking for them. But I feel you really shouldn't make it a condition for committing patches. Especially not for simple patches that have been on list for months.
Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 06/15] xen: Add the Xen platform pci device
On Thu, 2011-02-24 at 17:36 +, Paolo Bonzini wrote: +/* Send bytes to syslog */ +static void log_writeb(PCIXenPlatformState *s, char val) +{ + if (val == '\n' || s-log_buffer_off == sizeof(s-log_buffer) - 1) { + /* Flush buffer */ + s-log_buffer[s-log_buffer_off] = 0; + DPRINTF(%s\n, s-log_buffer); This should go to a chardev. Or it should just go away. Guests can already write to 0xe9 and see the output on the host's xm dmesg ring and serial console. Only true if you have configured the guest log level to include debug messages. In any case host dmesg is not really the same as going to a file in dom0 from a supportability PoV. Ian.
Re: [Qemu-devel] [PATCH v3 01/16] vnc: qemu can die if the client is disconnected while updating screen
On Wed, Feb 23, 2011 at 11:23 PM, Anthony Liguori aligu...@linux.vnet.ibm.com wrote: On 02/04/2011 02:05 AM, Corentin Chary wrote: agraf reported that qemu_mutex_destroy(vs-output_mutex) while failing in vnc_disconnect_finish(). It's because vnc_worker_thread_loop() tries to unlock the mutex while not locked. The unlocking call doesn't fail (pthread bug ?), but the destroy call does. Signed-off-by: Corentin Charycorenti...@iksaif.net Applied 2/16. Thanks! Regards, Anthony Liguori Great, Thanks ! Please also merge these two patchs: http://patchwork.ozlabs.org/patch/84517/ http://patchwork.ozlabs.org/patch/84496/ -- Corentin Chary http://xf.iksaif.net
[Qemu-devel] [PATCH] linux-user: Fix unlock_user() call in return from poll()
Correct the broken attempt to calculate the third argument to unlock_user() in the code path which unlocked the pollfd array on return from poll() and ppoll() emulation. (This only caused a problem if unlock_user() wasn't a no-op, eg if DEBUG_REMAP is defined.) Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- linux-user/syscall.c |4 +--- 1 files changed, 1 insertions(+), 3 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index cf8a4c3..822b863 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -6314,10 +6314,8 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1, for(i = 0; i nfds; i++) { target_pfd[i].revents = tswap16(pfd[i].revents); } -ret += nfds * (sizeof(struct target_pollfd) - - sizeof(struct pollfd)); } -unlock_user(target_pfd, arg1, ret); +unlock_user(target_pfd, arg1, sizeof(struct target_pollfd) * nfds); } break; #endif -- 1.7.1
Re: [Qemu-devel] [PATCH V10 06/15] xen: Add the Xen platform pci device
On 02/25/2011 10:58 AM, Ian Campbell wrote: Or it should just go away. Guests can already write to 0xe9 and see the output on the host's xm dmesg ring and serial console. Only true if you have configured the guest log level to include debug messages. If you can recompile QEMU to add DEBUG_PLATFORM, you can usually do that too. To avoid recompilation, rather than a chardev it would be even better to keep it as a trace event. Paolo
RE: [Qemu-devel] Re: Strategic decision: COW format
On 02/23/2011 05:50 PM, Anthony Liguori wrote: I still don't see. What would you do with thousands of checkpoints? For reverse debugging, if you store checkpoints at a rate of save, every 10ms, and then degrade to storing every 100ms after 1 second, etc. you'll have quite a large number of snapshots pretty quickly. The idea of snapshotting with reverse debugging is that instead of undoing every instruction, you can revert to the snapshot before, and then replay the instruction stream until you get to the desired point in time. You cannot replay the instruction stream since inputs (interrupts, rdtsc or other timers, I/O) will be different. You need Kemari for this. I've created the technology for replaying instruction stream and all of the inputs. This technology is similar to deterministic replay in VMWare. Now I need something to save machine state in many checkpoints to implement reverse debugging. I think COW2 may be useful for it (or I should create something like this). Pavel Dovgaluk
Re: [Qemu-devel] [PATCH 02/10] pxa2xx_pic: update to use qdev and arm-pic
Hi Dmitry, On 20 February 2011 14:50, Dmitry Eremin-Solenikov dbarysh...@gmail.com wrote: Use qdev/sysbus framework to handle pxa2xx-pic. Instead of exposing IRQs via array, reference them via qdev_get_gpio_in(). Also pxa2xx_pic duplicated some code from arm-pic. Drop it, replacing with references to arm-pic, as all other ARM SoCs do for their PIC code. As I said earlier not using arm-pic was deliberate (and I also asked what the gain was from converting the pic to a separate sysbus device from the CPU) so I skipped this part of the patch and pushed the rest of it, please check that everything works. Signed-off-by: Dmitry Eremin-Solenikov dbarysh...@gmail.com --- hw/mainstone.c | 2 +- hw/pxa.h | 12 +++-- hw/pxa2xx.c | 84 +++ hw/pxa2xx_gpio.c | 11 +++-- hw/pxa2xx_pic.c | 126 - hw/pxa2xx_timer.c | 16 +++--- 6 files changed, 144 insertions(+), 107 deletions(-) diff --git a/hw/mainstone.c b/hw/mainstone.c index aec8d34..4eabdb9 100644 --- a/hw/mainstone.c +++ b/hw/mainstone.c @@ -140,7 +140,7 @@ static void mainstone_common_init(ram_addr_t ram_size, } mst_irq = sysbus_create_simple(mainstone-fpga, MST_FPGA_PHYS, - cpu-pic[PXA2XX_PIC_GPIO_0]); + qdev_get_gpio_in(cpu-pic, PXA2XX_PIC_GPIO_0)); I'm also wondering if this device should really use the interrupt line instead of using a GPIO. It seems wrong that both the fpga and the gpio module are connected to the same line. /* setup keypad */ printf(map addr %p\n, map); diff --git a/hw/pxa.h b/hw/pxa.h index f73d33b..7c6fd44 100644 --- a/hw/pxa.h +++ b/hw/pxa.h @@ -63,15 +63,16 @@ # define PXA2XX_INTERNAL_SIZE 0x4 /* pxa2xx_pic.c */ -qemu_irq *pxa2xx_pic_init(target_phys_addr_t base, CPUState *env); +DeviceState *pxa2xx_pic_init(target_phys_addr_t base, CPUState *env, + qemu_irq *arm_pic); /* pxa2xx_timer.c */ -void pxa25x_timer_init(target_phys_addr_t base, qemu_irq *irqs); -void pxa27x_timer_init(target_phys_addr_t base, qemu_irq *irqs, qemu_irq irq4); +void pxa25x_timer_init(target_phys_addr_t base, DeviceState *pic); +void pxa27x_timer_init(target_phys_addr_t base, DeviceState *pic); /* pxa2xx_gpio.c */ DeviceState *pxa2xx_gpio_init(target_phys_addr_t base, - CPUState *env, qemu_irq *pic, int lines); + CPUState *env, DeviceState *pic, int lines); void pxa2xx_gpio_read_notifier(DeviceState *dev, qemu_irq handler); /* pxa2xx_dma.c */ @@ -125,7 +126,7 @@ typedef struct PXA2xxFIrState PXA2xxFIrState; typedef struct { CPUState *env; - qemu_irq *pic; + DeviceState *pic; qemu_irq reset; PXA2xxDMAState *dma; DeviceState *gpio; @@ -180,6 +181,7 @@ typedef struct { QEMUTimer *rtc_swal1; QEMUTimer *rtc_swal2; QEMUTimer *rtc_pi; + qemu_irq rtc_irq; } PXA2xxState; struct PXA2xxI2SState { diff --git a/hw/pxa2xx.c b/hw/pxa2xx.c index 9ebbce6..58e6e7b 100644 --- a/hw/pxa2xx.c +++ b/hw/pxa2xx.c @@ -16,6 +16,7 @@ #include qemu-timer.h #include qemu-char.h #include blockdev.h +#include arm-misc.h static struct { target_phys_addr_t io_base; @@ -888,7 +889,7 @@ static int pxa2xx_ssp_init(SysBusDevice *dev) static inline void pxa2xx_rtc_int_update(PXA2xxState *s) { - qemu_set_irq(s-pic[PXA2XX_PIC_RTCALARM], !!(s-rtsr 0x2553)); + qemu_set_irq(s-rtc_irq, !!(s-rtsr 0x2553)); } static void pxa2xx_rtc_hzupdate(PXA2xxState *s) @@ -1197,6 +1198,8 @@ static void pxa2xx_rtc_init(PXA2xxState *s) s-rtc_swal1 = qemu_new_timer(rt_clock, pxa2xx_rtc_swal1_tick, s); s-rtc_swal2 = qemu_new_timer(rt_clock, pxa2xx_rtc_swal2_tick, s); s-rtc_pi = qemu_new_timer(rt_clock, pxa2xx_rtc_pi_tick, s); + + s-rtc_irq = qdev_get_gpio_in(s-pic, PXA2XX_PIC_RTCALARM); } static void pxa2xx_rtc_save(QEMUFile *f, void *opaque) @@ -2069,6 +2072,8 @@ PXA2xxState *pxa270_init(unsigned int sdram_size, const char *revision) PXA2xxState *s; int iomemtype, i; DriveInfo *dinfo; + qemu_irq *arm_pic; + s = (PXA2xxState *) qemu_mallocz(sizeof(PXA2xxState)); if (revision strncmp(revision, pxa27, 5)) { @@ -2093,12 +2098,13 @@ PXA2xxState *pxa270_init(unsigned int sdram_size, const char *revision) 0x4, qemu_ram_alloc(NULL, pxa270.internal, 0x4) | IO_MEM_RAM); - s-pic = pxa2xx_pic_init(0x40d0, s-env); + arm_pic = arm_pic_init_cpu(s-env); + s-pic = pxa2xx_pic_init(0x40d0, s-env, arm_pic); - s-dma = pxa27x_dma_init(0x4000, s-pic[PXA2XX_PIC_DMA]); + s-dma = pxa27x_dma_init(0x4000, + qdev_get_gpio_in(s-pic, PXA2XX_PIC_DMA)); - pxa27x_timer_init(0x40a0, s-pic[PXA2XX_PIC_OST_0], - s-pic[PXA27X_PIC_OST_4_11]); +
[Qemu-devel] EXPLORE: Lifesciences in India!
Dear Sir, I am Tushara Nair, the Industry Relationship Manager at Atharva Lifesciences Consulting Pvt. Ltd. Atharva Lifesciences Consulting is a lifesciences consulting firm tracking the industry in India and in certain territories internationally. Atharva Lifesciences Consulting Pvt. Ltd is the leading consulting firm delivering reports and information on biopharma in India. We publish e-newspapers in 6 editions. Each edition is published once a week. EXPLORE BioPharma: Tracks the Science of Biotech Pharmaceuticals. Click here to see the sample (http://bit.ly/fvGJ5W ) EXPLORE Agri Vet: Tracks the realm of Agribusiness and Veterinary Biotechnology. Click here to see the sample (http://bit.ly/ek2uRG ) EXPLORE Ayurveda: Looks at the world of natural medicine in India. Click here to see the sample (http://bit.ly/eb8HI7 ) EXPLORE BioPharma Alliances: Looks at joint ventures and agreements in India. Click here to see the sample (http://bit.ly/gPR4KO ) EXPLORE Aquaculture Marine Biotechnology: Looks at the marine fisheries area in India. Click here to see the sample (http://bit.ly/eVaYZs ) EXPLORE BioFuels: Tracks the Alternative Energy Industry in India. Click here to see the sample (http://bit.ly/e0aL4C ) Please let me know which e-newspaper are you interested to receive. With Kind Regards, Tushara -- Tushara S. Nair Industry Relationship Manager Atharva Lifesciences Consulting Pvt. Ltd. Bangalore, INDIA Tel No: +91-80-42140007, 42140016 (Ext: 31) Skype: atharvalife http://alc...@atharvalife.com alc...@atharvalife.com www.atharvalife.com http://www.atharvalife.com/
Re: [Qemu-devel] Re: Strategic decision: COW format
On Fri, Feb 25, 2011 at 11:20 AM, Pavel Dovgaluk pavel.dovga...@ispras.ru wrote: On 02/23/2011 05:50 PM, Anthony Liguori wrote: I still don't see. What would you do with thousands of checkpoints? For reverse debugging, if you store checkpoints at a rate of save, every 10ms, and then degrade to storing every 100ms after 1 second, etc. you'll have quite a large number of snapshots pretty quickly. The idea of snapshotting with reverse debugging is that instead of undoing every instruction, you can revert to the snapshot before, and then replay the instruction stream until you get to the desired point in time. You cannot replay the instruction stream since inputs (interrupts, rdtsc or other timers, I/O) will be different. You need Kemari for this. I've created the technology for replaying instruction stream and all of the inputs. This technology is similar to deterministic replay in VMWare. Now I need something to save machine state in many checkpoints to implement reverse debugging. I think COW2 may be useful for it (or I should create something like this). Or the BTRFS_IOC_CLONE ioctl on the btrfs filesystem. You can copy-on-write clone a file using it. Stefan
Re: [Qemu-devel] [PATCH 02/10] pxa2xx_pic: update to use qdev and arm-pic
On 2/25/11, andrzej zaborowski balr...@gmail.com wrote: Hi Dmitry, On 20 February 2011 14:50, Dmitry Eremin-Solenikov dbarysh...@gmail.com wrote: Use qdev/sysbus framework to handle pxa2xx-pic. Instead of exposing IRQs via array, reference them via qdev_get_gpio_in(). Also pxa2xx_pic duplicated some code from arm-pic. Drop it, replacing with references to arm-pic, as all other ARM SoCs do for their PIC code. As I said earlier not using arm-pic was deliberate (and I also asked what the gain was from converting the pic to a separate sysbus device from the CPU) so I skipped this part of the patch and pushed the rest of it, please check that everything works. The primary goal was using arm-pic IRQs in pxa2xx-gpio and not having to mess with passing CPUEnv around. Moreover all other ARM SoCs use arm-pic w/o any references to performance gains/loses. I can still provide a patch that will use arm-pic only for pxa2xx-gpio, will that be suitable for you? BTW: it seems that your version won't work: using of sysbus_init_mmio() is hackish and there is no place where that mmio region will be mapped to base. About mapping pic to a separate device from CPU. Initially I wanted to reuse somehow pxa2xx-pic for sa-11[0-1]0 emulation. It doesn't seem reasonable for me anymore anyway. Second, the pic is already in separate module, so I didn't want to disturb main pxa2xx.c with it. I might still later use pxa2xx-pic for allocating main CPU structure and making all other device hang on ot. diff --git a/hw/mainstone.c b/hw/mainstone.c index aec8d34..4eabdb9 100644 --- a/hw/mainstone.c +++ b/hw/mainstone.c @@ -140,7 +140,7 @@ static void mainstone_common_init(ram_addr_t ram_size, } mst_irq = sysbus_create_simple(mainstone-fpga, MST_FPGA_PHYS, -cpu-pic[PXA2XX_PIC_GPIO_0]); +qdev_get_gpio_in(cpu-pic, PXA2XX_PIC_GPIO_0)); I'm also wondering if this device should really use the interrupt line instead of using a GPIO. It seems wrong that both the fpga and the gpio module are connected to the same line. Fixed, will submit a fix soon. @@ -241,53 +239,33 @@ static CPUWriteMemoryFunc * const pxa2xx_pic_writefn[] = { pxa2xx_pic_mem_write, }; -static void pxa2xx_pic_save(QEMUFile *f, void *opaque) +static int pxa2xx_pic_post_load(void *opaque, int version_id) { -PXA2xxPICState *s = (PXA2xxPICState *) opaque; -int i; - -for (i = 0; i 2; i ++) -qemu_put_be32s(f, s-int_enabled[i]); -for (i = 0; i 2; i ++) -qemu_put_be32s(f, s-int_pending[i]); -for (i = 0; i 2; i ++) -qemu_put_be32s(f, s-is_fiq[i]); -qemu_put_be32s(f, s-int_idle); -for (i = 0; i PXA2XX_PIC_SRCS; i ++) -qemu_put_be32s(f, s-priority[i]); +pxa2xx_pic_update(opaque); +return 0; } -static int pxa2xx_pic_load(QEMUFile *f, void *opaque, int version_id) +DeviceState *pxa2xx_pic_init(target_phys_addr_t base, CPUState *env, +qemu_irq *arm_pic) { -PXA2xxPICState *s = (PXA2xxPICState *) opaque; -int i; - -for (i = 0; i 2; i ++) -qemu_get_be32s(f, s-int_enabled[i]); -for (i = 0; i 2; i ++) -qemu_get_be32s(f, s-int_pending[i]); -for (i = 0; i 2; i ++) -qemu_get_be32s(f, s-is_fiq[i]); -qemu_get_be32s(f, s-int_idle); -for (i = 0; i PXA2XX_PIC_SRCS; i ++) -qemu_get_be32s(f, s-priority[i]); +DeviceState *dev; -pxa2xx_pic_update(opaque); -return 0; +dev = sysbus_create_varargs(pxa2xx_pic, base, +arm_pic[ARM_PIC_CPU_IRQ], +arm_pic[ARM_PIC_CPU_FIQ], +arm_pic[ARM_PIC_CPU_WAKE], +NULL); + +/* Enable IC coprocessor access. */ +cpu_arm_set_cp_io(env, 6, pxa2xx_pic_cp_read, pxa2xx_pic_cp_write, dev); I changed the last parameter to s as passing dev here was hacky. Fine with me. BTW: what about all other patches? -- With best wishes Dmitry
Re: [Qemu-devel] [PATCH V10 05/15] xen: Add xenfv machine
On Thu, Feb 24, 2011 at 17:31, Anthony Liguori anth...@codemonkey.ws wrote: diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 7b74473..0ab8907 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -36,6 +36,10 @@ #include sysbus.h #include arch_init.h #include blockdev.h +#include xen.h +#ifdef CONFIG_XEN +# include xen/hvm/hvm_info_table.h +#endif Admittedly a nit, but isn't this a system header? It belongs to Xen. I use it for HVM_MAX_VCPUS. I can put it in xen.h, if you prefer. Regards, -- Anthony PERARD
Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 03/15] xen: Support new libxc calls from xen unstable.
On Thu, Feb 24, 2011 at 17:29, Anthony Liguori anth...@codemonkey.ws wrote: On 02/02/2011 08:49 AM, anthony.per...@citrix.com wrote: From: Anthony PERARDanthony.per...@citrix.com This patch adds a generic layer for xc calls, allowing us to choose between the xenner and xen implementations at runtime. It also update the libxenctrl calls in Qemu to use the new interface, otherwise Qemu wouldn't be able to build against new versions of the library. We check libxenctrl version in configure, from Xen 3.3.0 to Xen unstable. Signed-off-by: Anthony PERARDanthony.per...@citrix.com Signed-off-by: Stefano Stabellinistefano.stabell...@eu.citrix.com Acked-by: Alexander Grafag...@suse.de --- Makefile.target | 3 + configure | 62 +++- hw/xen_backend.c | 74 ++- hw/xen_backend.h | 7 +- hw/xen_common.h | 38 ++ hw/xen_console.c | 10 +- hw/xen_devconfig.c | 10 +- hw/xen_disk.c | 28 --- hw/xen_domainbuild.c | 29 hw/xen_interfaces.c | 191 hw/xen_interfaces.h | 198 ++ hw/xen_nic.c | 36 +- hw/xenfb.c | 14 ++-- 13 files changed, 584 insertions(+), 116 deletions(-) create mode 100644 hw/xen_interfaces.c create mode 100644 hw/xen_interfaces.h diff --git a/Makefile.target b/Makefile.target index db29e96..d09719f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -205,6 +205,9 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS) QEMU_CFLAGS += $(VNC_JPEG_CFLAGS) QEMU_CFLAGS += $(VNC_PNG_CFLAGS) +# xen support +obj-$(CONFIG_XEN) += xen_interfaces.o + # xen backend driver support obj-$(CONFIG_XEN) += xen_backend.o xen_devconfig.o obj-$(CONFIG_XEN) += xen_console.o xenfb.o xen_disk.o xen_nic.o diff --git a/configure b/configure index 5a9121d..fde9bad 100755 --- a/configure +++ b/configure @@ -126,6 +126,7 @@ vnc_jpeg= vnc_png= vnc_thread=no xen= +xen_ctrl_version= linux_aio= attr= vhost_net= @@ -1144,13 +1145,71 @@ fi if test $xen != no ; then xen_libs=-lxenstore -lxenctrl -lxenguest + + # Xen unstable cat $TMPCEOF #includexenctrl.h #includexs.h -int main(void) { xs_daemon_open(); xc_interface_open(); return 0; } +#includestdint.h +#includexen/hvm/hvm_info_table.h +#if !defined(HVM_MAX_VCPUS) +# error HVM_MAX_VCPUS not defined +#endif +int main(void) { + xc_interface *xc; + xs_daemon_open(); + xc = xc_interface_open(0, 0, 0); + xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0); + xc_gnttab_open(NULL, 0); + return 0; +} EOF if compile_prog $xen_libs ; then + xen_ctrl_version=410 + xen=yes + + # Xen 4.0.0 + elif ( + cat $TMPCEOF +#includexenctrl.h +#includexs.h +#includestdint.h +#includexen/hvm/hvm_info_table.h +#if !defined(HVM_MAX_VCPUS) +# error HVM_MAX_VCPUS not defined +#endif +int main(void) { + xs_daemon_open(); + xc_interface_open(); + xc_gnttab_open(); + xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0); + return 0; +} +EOF + compile_prog $xen_libs + ) ; then + xen_ctrl_version=400 + xen=yes + + # Xen 3.3.0, 3.4.0 + elif ( + cat $TMPCEOF +#includexenctrl.h +#includexs.h +int main(void) { + xs_daemon_open(); + xc_interface_open(); + xc_gnttab_open(); + xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0); + return 0; +} +EOF + compile_prog $xen_libs + ) ; then + xen_ctrl_version=330 xen=yes + + # Xen not found or unsupported else if test $xen = yes ; then feature_not_found xen @@ -3009,6 +3068,7 @@ case $target_arch2 in if test $xen = yes -a $target_softmmu = yes ; then echo CONFIG_XEN=y $config_target_mak echo LIBS+=$xen_libs $config_target_mak + echo CONFIG_XEN_CTRL_INTERFACE_VERSION=$xen_ctrl_version $config_target_mak fi esac case $target_arch2 in diff --git a/hw/xen_backend.c b/hw/xen_backend.c index 860b038..cf081e1 100644 --- a/hw/xen_backend.c +++ b/hw/xen_backend.c @@ -43,7 +43,8 @@ /* - */ /* public */ -int xen_xc; +XenXC xen_xc = XC_HANDLER_INITIAL_VALUE; +XenGnttab xen_xcg = XC_HANDLER_INITIAL_VALUE; struct xs_handle *xenstore = NULL; const char *xen_protocol; @@ -58,7 +59,7 @@ int xenstore_write_str(const char *base, const char *node, const char *val) char abspath[XEN_BUFSIZE]; snprintf(abspath, sizeof(abspath), %s/%s, base, node); - if (!xs_write(xenstore, 0, abspath, val, strlen(val))) + if (!xs_ops.write(xenstore, 0, abspath, val, strlen(val))) return -1; return 0; } @@ -70,7 +71,7 @@ char *xenstore_read_str(const char *base, const char *node) char *str, *ret = NULL; snprintf(abspath, sizeof(abspath), %s/%s, base, node); - str = xs_read(xenstore, 0, abspath,len);
Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 03/15] xen: Support new libxc calls from xen unstable.
On 02/25/2011 08:06 AM, Anthony PERARD wrote: On Thu, Feb 24, 2011 at 17:29, Anthony Liguorianth...@codemonkey.ws wrote: On 02/02/2011 08:49 AM, anthony.per...@citrix.com wrote: From: Anthony PERARDanthony.per...@citrix.com This patch adds a generic layer for xc calls, allowing us to choose between the xenner and xen implementations at runtime. It also update the libxenctrl calls in Qemu to use the new interface, otherwise Qemu wouldn't be able to build against new versions of the library. We check libxenctrl version in configure, from Xen 3.3.0 to Xen unstable. Signed-off-by: Anthony PERARDanthony.per...@citrix.com Signed-off-by: Stefano Stabellinistefano.stabell...@eu.citrix.com Acked-by: Alexander Grafag...@suse.de --- Makefile.target |3 + configure| 62 +++- hw/xen_backend.c | 74 ++- hw/xen_backend.h |7 +- hw/xen_common.h | 38 ++ hw/xen_console.c | 10 +- hw/xen_devconfig.c | 10 +- hw/xen_disk.c| 28 --- hw/xen_domainbuild.c | 29 hw/xen_interfaces.c | 191 hw/xen_interfaces.h | 198 ++ hw/xen_nic.c | 36 +- hw/xenfb.c | 14 ++-- 13 files changed, 584 insertions(+), 116 deletions(-) create mode 100644 hw/xen_interfaces.c create mode 100644 hw/xen_interfaces.h diff --git a/Makefile.target b/Makefile.target index db29e96..d09719f 100644 --- a/Makefile.target +++ b/Makefile.target @@ -205,6 +205,9 @@ QEMU_CFLAGS += $(VNC_SASL_CFLAGS) QEMU_CFLAGS += $(VNC_JPEG_CFLAGS) QEMU_CFLAGS += $(VNC_PNG_CFLAGS) +# xen support +obj-$(CONFIG_XEN) += xen_interfaces.o + # xen backend driver support obj-$(CONFIG_XEN) += xen_backend.o xen_devconfig.o obj-$(CONFIG_XEN) += xen_console.o xenfb.o xen_disk.o xen_nic.o diff --git a/configure b/configure index 5a9121d..fde9bad 100755 --- a/configure +++ b/configure @@ -126,6 +126,7 @@ vnc_jpeg= vnc_png= vnc_thread=no xen= +xen_ctrl_version= linux_aio= attr= vhost_net= @@ -1144,13 +1145,71 @@ fi if test $xen != no ; then xen_libs=-lxenstore -lxenctrl -lxenguest + + # Xen unstable cat$TMPCEOF #includexenctrl.h #includexs.h -int main(void) { xs_daemon_open(); xc_interface_open(); return 0; } +#includestdint.h +#includexen/hvm/hvm_info_table.h +#if !defined(HVM_MAX_VCPUS) +# error HVM_MAX_VCPUS not defined +#endif +int main(void) { + xc_interface *xc; + xs_daemon_open(); + xc = xc_interface_open(0, 0, 0); + xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0); + xc_gnttab_open(NULL, 0); + return 0; +} EOF if compile_prog $xen_libs ; then +xen_ctrl_version=410 +xen=yes + + # Xen 4.0.0 + elif ( + cat$TMPCEOF +#includexenctrl.h +#includexs.h +#includestdint.h +#includexen/hvm/hvm_info_table.h +#if !defined(HVM_MAX_VCPUS) +# error HVM_MAX_VCPUS not defined +#endif +int main(void) { + xs_daemon_open(); + xc_interface_open(); + xc_gnttab_open(); + xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0); + return 0; +} +EOF + compile_prog $xen_libs +) ; then +xen_ctrl_version=400 +xen=yes + + # Xen 3.3.0, 3.4.0 + elif ( + cat$TMPCEOF +#includexenctrl.h +#includexs.h +int main(void) { + xs_daemon_open(); + xc_interface_open(); + xc_gnttab_open(); + xc_hvm_set_mem_type(0, 0, HVMMEM_ram_ro, 0, 0); + return 0; +} +EOF + compile_prog $xen_libs +) ; then +xen_ctrl_version=330 xen=yes + + # Xen not found or unsupported else if test $xen = yes ; then feature_not_found xen @@ -3009,6 +3068,7 @@ case $target_arch2 in if test $xen = yes -a $target_softmmu = yes ; then echo CONFIG_XEN=y$config_target_mak echo LIBS+=$xen_libs$config_target_mak + echo CONFIG_XEN_CTRL_INTERFACE_VERSION=$xen_ctrl_version $config_target_mak fi esac case $target_arch2 in diff --git a/hw/xen_backend.c b/hw/xen_backend.c index 860b038..cf081e1 100644 --- a/hw/xen_backend.c +++ b/hw/xen_backend.c @@ -43,7 +43,8 @@ /* - */ /* public */ -int xen_xc; +XenXC xen_xc = XC_HANDLER_INITIAL_VALUE; +XenGnttab xen_xcg = XC_HANDLER_INITIAL_VALUE; struct xs_handle *xenstore = NULL; const char *xen_protocol; @@ -58,7 +59,7 @@ int xenstore_write_str(const char *base, const char *node, const char *val) char abspath[XEN_BUFSIZE]; snprintf(abspath, sizeof(abspath), %s/%s, base, node); -if (!xs_write(xenstore, 0, abspath, val, strlen(val))) +if (!xs_ops.write(xenstore, 0, abspath, val, strlen(val))) return -1; return 0; } @@ -70,7 +71,7 @@ char *xenstore_read_str(const char *base, const char *node) char *str, *ret = NULL; snprintf(abspath, sizeof(abspath), %s/%s, base, node); -str = xs_read(xenstore, 0, abspath,len); +str = xs_ops.read(xenstore, 0,
Re: [Qemu-devel] [PATCH V10 05/15] xen: Add xenfv machine
On 02/25/2011 07:55 AM, Anthony PERARD wrote: On Thu, Feb 24, 2011 at 17:31, Anthony Liguorianth...@codemonkey.ws wrote: diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 7b74473..0ab8907 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -36,6 +36,10 @@ #include sysbus.h #include arch_init.h #include blockdev.h +#include xen.h +#ifdef CONFIG_XEN +# include xen/hvm/hvm_info_table.h +#endif Admittedly a nit, but isn't this a system header? It belongs to Xen. I use it for HVM_MAX_VCPUS. I can put it in xen.h, if you prefer. I meant, you should use: #include xen/hvm/hvm_info_table.h Regards, Anthony Liguori Regards,
Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 06/15] xen: Add the Xen platform pci device
On Fri, Feb 25, 2011 at 10:54, Paolo Bonzini pbonz...@redhat.com wrote: On 02/25/2011 10:58 AM, Ian Campbell wrote: Or it should just go away. Guests can already write to 0xe9 and see the output on the host's xm dmesg ring and serial console. Only true if you have configured the guest log level to include debug messages. If you can recompile QEMU to add DEBUG_PLATFORM, you can usually do that too. To avoid recompilation, rather than a chardev it would be even better to keep it as a trace event. The trace event seems a good idea, let's go for that! Regards, -- Anthony PERARD
Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 05/15] xen: Add xenfv machine
On Fri, Feb 25, 2011 at 14:09, Anthony Liguori anth...@codemonkey.ws wrote: On 02/25/2011 07:55 AM, Anthony PERARD wrote: On Thu, Feb 24, 2011 at 17:31, Anthony Liguorianth...@codemonkey.ws wrote: diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 7b74473..0ab8907 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -36,6 +36,10 @@ #include sysbus.h #include arch_init.h #include blockdev.h +#include xen.h +#ifdef CONFIG_XEN +# include xen/hvm/hvm_info_table.h +#endif Admittedly a nit, but isn't this a system header? It belongs to Xen. I use it for HVM_MAX_VCPUS. I can put it in xen.h, if you prefer. I meant, you should use: #include xen/hvm/hvm_info_table.h Sure, I will do that. Thanks, -- Anthony PERARD
[Qemu-devel] [PATCH] target-arm: Don't decode old cp15 WFI instructions on v7 cores
In v7 of the ARM architecture, WFI (wait for interrupt) is a first-class instruction, but in previous versions this functionality was provided via a cp15 coprocessor register. Add correct feature checks to the decoding of the cp15 WFI instructions so that they behave correctly for newer cores. In particular, the old 0,c7,c8,2 encoding used on ARM940 has been reused for VA-to-PA translation in v6 and v7. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- This patch stands alone as a fix to target-arm; it's a prerequisite for Adam's VA-PA translation patch, because otherwise attempting a user-read translation will get you a WFI instead... target-arm/translate.c | 35 ++- 1 files changed, 30 insertions(+), 5 deletions(-) diff --git a/target-arm/translate.c b/target-arm/translate.c index dbd958b..baa1256 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -2538,13 +2538,38 @@ static int disas_cp15_insn(CPUState *env, DisasContext *s, uint32_t insn) if (IS_USER(s) !cp15_user_ok(insn)) { return 1; } -if ((insn 0x0fff0fff) == 0x0e070f90 -|| (insn 0x0fff0fff) == 0x0e070f58) { -/* Wait for interrupt. */ -gen_set_pc_im(s-pc); -s-is_jmp = DISAS_WFI; + +/* Pre-v7 versions of the architecture implemented WFI via coprocessor + * instructions rather than a separate instruction. + */ +if ((insn 0x0fff0fff) == 0x0e070f90) { +/* 0,c7,c0,4: Standard v6 WFI (also used in some pre-v6 cores). + * In v7, this must NOP. + */ +if (!arm_feature(env, ARM_FEATURE_V7)) { +/* Wait for interrupt. */ +gen_set_pc_im(s-pc); +s-is_jmp = DISAS_WFI; +} return 0; } + +if ((insn 0x0fff0fff) == 0x0e070f58) { +/* 0,c7,c8,2: Not all pre-v6 cores implemented this WFI, + * so this is slightly over-broad. + */ +if (!arm_feature(env, ARM_FEATURE_V6)) { +/* Wait for interrupt. */ +gen_set_pc_im(s-pc); +s-is_jmp = DISAS_WFI; +return 0; +} +/* Otherwise fall through to handle via helper function. + * In particular, on v7 and some v6 cores this is one of + * the VA-PA registers. + */ +} + rd = (insn 12) 0xf; if (cp15_tls_load_store(env, s, insn, rd)) -- 1.7.1
Re: [Qemu-devel] [PATCH 2/2] microblaze: Allow targeting little-endian mb
On Mon, Feb 21, 2011 at 3:44 PM, Edgar E. Iglesias edgar.igles...@petalogix.com wrote: Signed-off-by: Edgar E. Iglesias edgar.igles...@petalogix.com --- configure | 7 +-- default-configs/microblazeel-linux-user.mak | 1 + default-configs/microblazeel-softmmu.mak | 4 3 files changed, 10 insertions(+), 2 deletions(-) create mode 100644 default-configs/microblazeel-linux-user.mak create mode 100644 default-configs/microblazeel-softmmu.mak diff --git a/configure b/configure index 791b71d..3036faf 100755 --- a/configure +++ b/configure @@ -984,6 +984,7 @@ arm-softmmu \ cris-softmmu \ m68k-softmmu \ microblaze-softmmu \ +microblazeel-softmmu \ mips-softmmu \ mipsel-softmmu \ mips64-softmmu \ @@ -1008,6 +1009,7 @@ armeb-linux-user \ cris-linux-user \ m68k-linux-user \ microblaze-linux-user \ +microblazeel-linux-user \ mips-linux-user \ mipsel-linux-user \ ppc-linux-user \ @@ -3005,7 +3007,8 @@ case $target_arch2 in target_long_alignment=2 target_llong_alignment=2 ;; - microblaze) + microblaze|microblazeel) + TARGET_ARCH=microblaze bflt=yes target_nptl=yes target_phys_bits=32 @@ -3231,7 +3234,7 @@ for i in $ARCH $TARGET_BASE_ARCH ; do echo CONFIG_M68K_DIS=y $config_target_mak echo CONFIG_M68K_DIS=y $libdis_config_mak ;; - microblaze) + microblaze*) echo CONFIG_MICROBLAZE_DIS=y $config_target_mak echo CONFIG_MICROBLAZE_DIS=y $libdis_config_mak ;; diff --git a/default-configs/microblazeel-linux-user.mak b/default-configs/microblazeel-linux-user.mak new file mode 100644 index 000..566fdc0 --- /dev/null +++ b/default-configs/microblazeel-linux-user.mak @@ -0,0 +1 @@ +# Default configuration for microblaze-linux-user microblazeel-linux-user? diff --git a/default-configs/microblazeel-softmmu.mak b/default-configs/microblazeel-softmmu.mak new file mode 100644 index 000..4399b8b --- /dev/null +++ b/default-configs/microblazeel-softmmu.mak @@ -0,0 +1,4 @@ +# Default configuration for microblaze-softmmu microblazeel-softmmu?
Re: [Qemu-devel] when to check external interrupt request ? or what is the timing to check and arise external interrupt ?
On Tue, Feb 22, 2011 at 6:47 AM, wang sheng wans...@gmail.com wrote: I'm porting qemu to an new architecture. I come across some difficulty that I can't define the timing that enable qemu's main-thread to be interrupt and check external interrupt . I understand the way that mips used to check external interrupt . in qemu-system-mips , during do translation , if there is an instruction that access CP0's Status register and Cause register, the target-mips/translate.c will add a calling to function helper_interrupt_restart in the end of the translation_block. But in my architecture which use load/st instruction to access the contr register in interrupt controller . Because I can't distinguish the access for normal memory and access for interrupt controller's register , I can't add interrupt_restart function calling in the end of translation block. How can I do to enable qemu have chance to check external interrupt ? Please try something similar to how cpu_request_exit function and signal is used by hw/dma.c and hw/pc.c.
Re: [Qemu-devel] [PATCH 3/3] target-arm: Use TCG temporary leak debugging facilities
On Wed, Feb 23, 2011 at 5:19 PM, Peter Maydell peter.mayd...@linaro.org wrote: Use the new TCG temporary leak debugging facilities to check that each ARM instruction does not leak temporaries. Signed-off-by: Peter Maydell peter.mayd...@linaro.org --- target-arm/translate.c | 7 +++ 1 files changed, 7 insertions(+), 0 deletions(-) diff --git a/target-arm/translate.c b/target-arm/translate.c index 31067d5..b96a136 100644 --- a/target-arm/translate.c +++ b/target-arm/translate.c @@ -9125,6 +9125,8 @@ static inline void gen_intermediate_code_internal(CPUState *env, gen_icount_start(); + tcg_clear_temp_count(); + /* A note on handling of the condexec (IT) bits: * * We want to avoid the overhead of having to write the updated condexec @@ -9234,6 +9236,11 @@ static inline void gen_intermediate_code_internal(CPUState *env, gen_set_label(dc-condlabel); dc-condjmp = 0; } + + if (tcg_check_temp_count()) { + fprintf(stderr, TCG temporary leak before %08x\n, dc-pc); + } Perhaps this check and tcg_clear_temp_count() calls should be added instead to tb_gen_code() in exec.c, to benefit all targets at once. PC information will not be as accurate, though.
Re: [Qemu-devel] checkpatch.pl false positive: wants braces on #if
On Wed, Feb 23, 2011 at 6:07 PM, Peter Maydell peter.mayd...@linaro.org wrote: If you run checkpatch.pl on this patch: http://patchwork.ozlabs.org/patch/84189/ it complains: WARNING: braces {} are necessary even for single statement blocks #29: FILE: tcg/tcg.c:454: +#if defined(CONFIG_DEBUG_TCG) + s-temps_in_use++; ...but braces on a cpp conditional are a bit tricky :-) The script is sufficiently hairy perl that I'm afraid I can't suggest a solution, only report the problem. Maybe this helps: diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl index 075b614..4b1e2c2 100755 --- a/scripts/checkpatch.pl +++ b/scripts/checkpatch.pl @@ -2537,7 +2537,7 @@ sub process { } if (!defined $suppress_ifbraces{$linenr - 1} $line =~ /\b(if|while|for|else)\b/ - $line !~ /\#\s*else/) { + $line !~ /\#\s*(if|else|elif)/) { my $allowed = 0; # Check the pre-context.
Re: [Qemu-devel] [PATCH 3/3] target-arm: Use TCG temporary leak debugging facilities
On 25 February 2011 15:32, Blue Swirl blauwir...@gmail.com wrote: On Wed, Feb 23, 2011 at 5:19 PM, Peter Maydell peter.mayd...@linaro.org wrote: + + if (tcg_check_temp_count()) { + fprintf(stderr, TCG temporary leak before %08x\n, dc-pc); + } Perhaps this check and tcg_clear_temp_count() calls should be added instead to tb_gen_code() in exec.c, to benefit all targets at once. PC information will not be as accurate, though. You'd get a pile of false positives, for instance target-arm doesn't bother to destroy the whole-TB temporaries like cpu_F0s because there's no need to. We're trying to check whether the translator could unboundedly leak temporaries... -- PMM
Re: [Xen-devel] Re: [Qemu-devel] [PATCH V10 03/15] xen: Support new libxc calls from xen unstable.
On Fri, Feb 25, 2011 at 14:11, Anthony Liguori anth...@codemonkey.ws wrote: I think I gave this feedback before but I'd really like to see static inlines here. It's very likely that you'll either want to have tracing or some commands can have a NULL function pointer in which case having a central location to do this is very useful. Plus, it's more natural to read code that's making a function call instead of going through a function pointer in a structure redirection. Can probably do this with just a sed over the current patch. Is it good to have a .h with functions like that? : static inline XenXC qemu_xc_interface_open(xentoollog_logger *logger, xentoollog_logger *dombuild_logger, unsigned open_flags) { #if CONFIG_XEN_CTRL_INTERFACE_VERSION 410 return xc_interface_open(); #else return xc_interface_open(logger, dombuild_logger, open_flags); #endif } So there will have no more structure redirection. It would be better to have two versions of the header, one that implemented the 410 functions and one that implemented the newer functions. If you're just using the new signature for everything, you could even just #define in the later header. Actually, the #define in the later header was done in a previous version of this patch series. But I change to the structure redirection after a comment of Alexander Graf and by taking one of his patches for Xenner. Here is the comment of Alexander: http://lists.nongnu.org/archive/html/qemu-devel/2010-11/msg01251.html The function pointers help switch at run time to either Xen or Xenner implementation. This message is why I did not use static inline. http://lists.nongnu.org/archive/html/qemu-devel/2011-01/msg03125.html So, I can go for multiple version of the header that defines the static inlines functions, or just have a few define. BTW, I think there are now only 4 functions with a different prototype between old and new version of Xen. Other prototype change are only the handler parameter, but a typedef handle it. Regards, -- Anthony PERARD
Re: [Qemu-devel] Memory Map
On Thu, Feb 24, 2011 at 11:08 AM, Salvatore Lionetti salvatorelione...@yahoo.it wrote: Hi, This is what my board do cpu_register_physical_memory(0, 128*1024*1024, ...) cpu_register_physical_memory(0xFF80, 8*1024*1024, ...) and this layout does not change over the entire live (virtual) of the board. For the following offset (1st column) and size in bytes (2nd column) {0x00, 512}, {0x000200, 16}, {0x000300, 32}, {0x000400, 32}, {0x000500, 64}, {0x000600, 64}, {0x000700, 128}, {0x000800, 30}, {0x000900, 256}, {0x000A00, 44}, {0x000B00, 256}, {0x000C00, 24}, {0x000F00, 20}, {0x001000, 20}, {0x001100, 20}, {0x001400, 168}, {0x001800, 24}, {0x002000, 4096}, {0x003000, 24}, {0x003100, 24}, {0x004500, 36}, {0x005000, 224}, {0x008000, 768}, {0x008300, 16}, i do, for each item, a = cpu_register_io_memory(r, w, o, DEVICE_NATIVE_ENDIAN) cpu_register_physical_memory(_base+offset, len, a) And _base could be reprogrammed at any time. So before to change _base i: cpu_unregister_io_memory(a) What i see is that accessing to _base+ _base+0x005000 = Wake up r/w with offset 0 _base+0x000204 = Wake up r/w with offset 0x204 So the question - Am i wrong something? cpu_unregister_io_memory() is the counterpart of cpu_register_io_memory(), it does not affect mappings created by cpu_register_physical_memory(). They should be removed first. - Is possible to map address with last TARGET_PAGE_BITS (es 0x200) bits set? Yes.
[Qemu-devel] Re: qemu compiling error on ppc64: kvm.c:81: error: 'struct kvm_sregs' has no member named 'pvr'
On Wednesday 16 February 2011 02:39 PM, Avi Kivity wrote: On 02/15/2011 05:59 PM, Dushyant Bansal wrote: 2. How to configure makefiles to get output of printk statements inside kvm/arch/powerpc/kvm/trace.h Better don't make them printks - just use the tracing framework. I'd write up a small howto here myself, but I'm pretty much on the jump to my plane for vacation. Avi, could you please guide him a bit on how to get data out of tracepoints? Thanks for the quick reply :) I have added some more trace parameters in the tracing framework and currently, it is working fine. 1. Add new field in struct kvm_vcpu_stat (kvm_host.h) 2. Add corresponding entry in struct kvm_stats_debugfs_item debugfs_entries[] (book3s.c) 3. Increment or Decrement that field where ever necessary. Those aren't tracepoints; they're deprecated debug statistics. For tracepoints, see include/trace/events/kvm.h (general kvm tracepoints) arch/powerpc/kvm/trace.h (ppc specific tracepoints) arch/powerpc/kvm/book3s_mmu_hpte.c (examples of use, look for trace_kvm_*) Documentation/trace/tracepoints.txt (documentation, likely outdated) Thanks a lot for the information. Dushyant
Re: [Qemu-devel] Missing op on SPARC
On Thu, Feb 24, 2011 at 11:12 AM, 陳韋任 che...@iis.sinica.edu.tw wrote: Hi, all I have a Linux/SPARC machine and want to run QEMU on it. Here is the system information. -- $ uname -a Linux sparc 2.6.37-rc5-git #1 SMP Tue Dec 21 17:03:53 CST 2010 sparc64 sun4v UltraSparc T2 (Niagara2) GNU/Linux $ gcc --version gcc (Gentoo 4.3.4 p1.0, pie-10.1.5) 4.3.4 -- QEMU is configured with --sparc_cpu=v8plus. QEMU report there are some missing op definitions. See below, -- $ qemu-sparc hello Missing op definition for qemu_ld64 Missing op definition for qemu_st64 /tmp/chenwj/qemu-0.14.0/tcg/tcg.c:1116: tcg fatal error Aborted -- Is it possible to fix it? If so, how? Yes, the place is in tcg/sparc/tcg-target.[ch]. Sparc generator for TCG only implements the functions qemu_ld64/st64 on V9 (full 64 bit). These should be implemented also for v8plus. This can be implemented by adding a helper function to call the V9 versions of tcg_out_qemu_ld/st. One problem is that v8plus gives few 64 bit registers, %g1 to %g7, so addr_reg should probably be set up to %g1 and data_reg to %g2 in the v8plus helper. Data and address must be moved to/from these registers from/to 32 bit registers allocated by TCG.
Re: [Qemu-devel] [PATCH] Split machine creation from the main loop
On Wed, Feb 23, 2011 at 11:38 PM, Anthony Liguori aligu...@us.ibm.com wrote: The goal is to enable the monitor to run independently of whether the machine has been created such that the monitor can be used to specify all of the parameters for machine initialization. Signed-off-by: Anthony Liguori aligu...@us.ibm.com diff --git a/vl.c b/vl.c index b436952..181cc77 100644 --- a/vl.c +++ b/vl.c @@ -1917,17 +1917,360 @@ static const QEMUOption *lookup_opt(int argc, char **argv, return popt; } +static int qemu_machine_init(QEMUMachine *machine, const char *kernel_filename, + const char *kernel_cmdline, + const char *initrd_filename, + const char *boot_devices, const char *cpu_model, + int snapshot, int tb_size, const char *gdbstub_dev, + const char *loadvm, const char *incoming) qemu_machine_init() would mix host state initialization and machine initialization. I'd make instead two functions, qemu_host_init() and qemu_machine_init(). For example parameters snapshot, tb_size, gdbstub_dev, (maybe also loadvm and incoming if handled elsewhere) do not change how the machine is initialized. Also KVM, drive, chardev and display init should go to qemu_host_init() if possible.
Re: [Qemu-devel] Re: KVM call agenda for Jan 25
On Saturday 29 January 2011 04:20 PM, Dushyant Bansal wrote: Or this: which is faster, qemu-img convert -fformat -Oformat src-image dst-image or cpsrc-image dst-image? What about for raw images, shouldn't that be the same speed as cp(1)? Poke around the source code, profile it, understand what it's doing, think about ways to improve it. No need to do everything, just doing part of this will give you background on QEMU's block layer. Contributing patches is a good way get up to speed and show your skills. If time doesn't permit that, just think about the problem and how you intend to solve it, and feel free to bounce ideas off me. I explored 'qemu-img create and convert' and got a basic understanding of how they work. cp faster than qemu-img convert For raw-raw In cp, it just copies all the disk blocks actually occupied by the file. And, with qemu-img convert, it checks all the sectors and copy those, which contains atleast one non-NUL byte. The better performance of cp over qemu-img convert is the result of overhead of this checking. I tried a few variations: 1. just copy all the sectors without checking So, actual size becomes equal to virtual size. 2. In is_allocated_sectors,out of n sectors, if any sector has a non-NUL byte then break and copy all n sectors. As expected, resultant raw image was quite large in size. Looking forward to your comments. Thanks, Dushyant
Re: [Qemu-devel] [PATCH v3 00/16] vnc: adapative tight, zrle, zywrle, and bitmap module
On Fri, Feb 25, 2011 at 12:43 AM, Corentin Chary corentin.ch...@gmail.com wrote: Is there a special reason why you use __always_inline instead of inline in bitops.h? Because it's not only a hint, I really want this function to be inlined. This breaks compilation for mingw :-( mingw also fails at timersub() in vnc.c. Then we should defined timersub when not available. There's also this one, struct timeval is missing: CCui/vnc-enc-zlib.o In file included from /src/qemu/ui/vnc.c:27:0: /src/qemu/ui/vnc.h:105:20: error: array type has incomplete element type /src/qemu/ui/vnc.h:116:20: error: field 'last_freq_check' has incomplete type
Re: [Qemu-devel] [PATCH] Fixing network over sockets implementation for win32
Thanks, applied. On Mon, Feb 21, 2011 at 1:46 PM, Pavel Dovgaluk pavel.dovga...@ispras.ru wrote: MSDN includes the following in WSAEALREADY error description for connect() function: To preserve backward compatibility, this error is reported as WSAEINVAL to Winsock applications that link to either Winsock.dll or Wsock32.dll. So check of this error code was added to allow network connections through the sockets in Windows. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@gmail.com --- net/socket.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/socket.c b/net/socket.c index 3182b37..7337f4f 100644 --- a/net/socket.c +++ b/net/socket.c @@ -457,7 +457,7 @@ static int net_socket_connect_init(VLANState *vlan, } else if (err == EINPROGRESS) { break; #ifdef _WIN32 - } else if (err == WSAEALREADY) { + } else if (err == WSAEALREADY || err == WSAEINVAL) { break; #endif } else {
Re: [Qemu-devel] [PATCH] Fixing tap adapter for win32
Thanks, applied. On Mon, Feb 21, 2011 at 1:47 PM, Pavel Dovgaluk pavel.dovga...@ispras.ru wrote: This fix allows connection of internal VLAN to the external TAP interface. If tap_win32_write function always returns 0, the TAP network interface in QEMU is disabled. Signed-off-by: Pavel Dovgalyuk pavel.dovga...@gmail.com --- net/tap-win32.c | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/net/tap-win32.c b/net/tap-win32.c index 081904e..596132e 100644 --- a/net/tap-win32.c +++ b/net/tap-win32.c @@ -480,7 +480,7 @@ static int tap_win32_write(tap_win32_overlapped_t *overlapped, } } - return 0; + return write_size; } static DWORD WINAPI tap_win32_thread_entry(LPVOID param)
Re: [Qemu-devel] [PATCH] slirp: Remove some type casts caused by bad declaration of x.tp_buf
Thanks, applied. On Wed, Feb 23, 2011 at 8:40 PM, Stefan Weil w...@mail.berlios.de wrote: x.tp_buf was declared as a uint8_t array, but always used as a char array (which needed a lot of type casts). The patch includes these changes: * Fix declaration of x.tp_buf and remove all type casts. * Use offsetof() to get the offset of x.tp_buf. Signed-off-by: Stefan Weil w...@mail.berlios.de --- slirp/tftp.c | 14 +++--- slirp/tftp.h | 2 +- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/slirp/tftp.c b/slirp/tftp.c index 1821648..8055ccc 100644 --- a/slirp/tftp.c +++ b/slirp/tftp.c @@ -136,9 +136,9 @@ static int tftp_send_oack(struct tftp_session *spt, m-m_data += sizeof(struct udpiphdr); tp-tp_op = htons(TFTP_OACK); - n += snprintf((char *)tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %s, + n += snprintf(tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %s, key) + 1; - n += snprintf((char *)tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %u, + n += snprintf(tp-x.tp_buf + n, sizeof(tp-x.tp_buf) - n, %u, value) + 1; saddr.sin_addr = recv_tp-ip.ip_dst; @@ -283,7 +283,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t *tp, int pktlen) /* skip header fields */ k = 0; - pktlen -= ((uint8_t *)tp-x.tp_buf[0] - (uint8_t *)tp); + pktlen -= offsetof(struct tftp_t, x.tp_buf); /* prepend tftp_prefix */ prefix_len = strlen(slirp-tftp_prefix); @@ -299,7 +299,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t *tp, int pktlen) tftp_send_error(spt, 2, Access violation, tp); return; } - req_fname[k] = (char)tp-x.tp_buf[k]; + req_fname[k] = tp-x.tp_buf[k]; if (req_fname[k++] == '\0') { break; } @@ -311,7 +311,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t *tp, int pktlen) return; } - if (strcasecmp((const char *)tp-x.tp_buf[k], octet) != 0) { + if (strcasecmp(tp-x.tp_buf[k], octet) != 0) { tftp_send_error(spt, 4, Unsupported transfer mode, tp); return; } @@ -340,7 +340,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t *tp, int pktlen) while (k pktlen) { const char *key, *value; - key = (const char *)tp-x.tp_buf[k]; + key = tp-x.tp_buf[k]; k += strlen(key) + 1; if (k = pktlen) { @@ -348,7 +348,7 @@ static void tftp_handle_rrq(Slirp *slirp, struct tftp_t *tp, int pktlen) return; } - value = (const char *)tp-x.tp_buf[k]; + value = tp-x.tp_buf[k]; k += strlen(value) + 1; if (strcasecmp(key, tsize) == 0) { diff --git a/slirp/tftp.h b/slirp/tftp.h index b9f0847..72e5e91 100644 --- a/slirp/tftp.h +++ b/slirp/tftp.h @@ -26,7 +26,7 @@ struct tftp_t { uint16_t tp_error_code; uint8_t tp_msg[512]; } tp_error; - uint8_t tp_buf[512 + 2]; + char tp_buf[512 + 2]; } x; }; -- 1.7.2.3
Re: [Qemu-devel] [PATCH] bitops: fix test_and_change_bit()
Thanks, applied. On Fri, Feb 25, 2011 at 12:47 AM, Corentin Chary corenti...@iksaif.net wrote: ./bitops.h:192: warning: ‘old’ is used uninitialized in this function Signed-off-by: Corentin Chary corenti...@iksaif.net --- bitops.h | 2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/bitops.h b/bitops.h index ae7bcb1..e2b9df3 100644 --- a/bitops.h +++ b/bitops.h @@ -187,7 +187,7 @@ static inline int test_and_change_bit(int nr, volatile unsigned long *addr) { unsigned long mask = BIT_MASK(nr); unsigned long *p = ((unsigned long *)addr) + BIT_WORD(nr); - unsigned long old; + unsigned long old = *p; *p = old ^ mask; return (old mask) != 0; -- 1.7.4.1
Re: [Qemu-devel] [PATCH v3 00/16] vnc: adapative tight, zrle, zywrle, and bitmap module
On Fri, Feb 25, 2011 at 12:43 AM, Corentin Chary corentin.ch...@gmail.com wrote: Is there a special reason why you use __always_inline instead of inline in bitops.h? Because it's not only a hint, I really want this function to be inlined. I applied a patch which changes this to just inline. See osdep.h.
Re: [Qemu-devel] null mac address
On Fri, Feb 25, 2011 at 4:55 AM, Wen Congyang we...@cn.fujitsu.com wrote: At 02/24/2011 10:40 PM, William Dauchy Write: Hi, I got some troubles hot plugging network pci devices. An attach works as expected but the mac address is still set to 00:00:00:00:00:00 on the guest machine. I have to reboot the guest to get the correct mac address. I first tried through libvirt with: # virsh attach-interface dom0 network default --mac 52:54:00:f6:84:ba and then through qemu monitor to make sure that it wasn't a libvirt issue: device_add rtl8139 or device_add rtl8139,mac=01:02:03:04:05:06 Always the same result on the guest. A device info on qemu give the correct result, that is to say, with a correct mac address. I went through rtl8139.c and saw that the mac address is set in `rtl8139_reset`. This function was called in `pci_rtl8139_init` but removed since c169998802505c244b8bcad562633f29de7d74a4 commit, because it doesn't make sense to call it when the virtual machine is shutdown. I'm now wondering where I am supposed to call this reset function when live attaching a pci device. I think it could fix the mac address issue. I will be very pleased to receive some tips to create a patch for this issue. Please try the following patch. Thanks Wen Congyang From efa0632f563a69dc299daaf4b235c1a0521d6e02 Mon Sep 17 00:00:00 2001 From: Wen Congyang we...@cn.fujitsu.com Date: Fri, 25 Feb 2011 09:56:27 +0800 Subject: [PATCH] move eeprom init from reset function to init function --- hw/pcnet-pci.c | 12 hw/pcnet.c | 13 - hw/rtl8139.c | 24 3 files changed, 24 insertions(+), 25 deletions(-) diff --git a/hw/pcnet-pci.c b/hw/pcnet-pci.c index 339a401..d7c4fc3 100644 --- a/hw/pcnet-pci.c +++ b/hw/pcnet-pci.c @@ -270,6 +270,8 @@ static int pci_pcnet_init(PCIDevice *pci_dev) PCIPCNetState *d = DO_UPCAST(PCIPCNetState, pci_dev, pci_dev); PCNetState *s = d-state; uint8_t *pci_conf; + int i; + uint16_t checksum; #if 0 printf(sizeof(RMD)=%d, sizeof(TMD)=%d\n, @@ -292,6 +294,16 @@ static int pci_pcnet_init(PCIDevice *pci_dev) pci_conf[PCI_MIN_GNT] = 0x06; pci_conf[PCI_MAX_LAT] = 0xff; + /* Initialize the PROM */ + + memcpy(s-prom, s-conf.macaddr.a, 6); + s-prom[12] = s-prom[13] = 0x00; + s-prom[14] = s-prom[15] = 0x57; + + for (i = 0,checksum = 0; i 16; i++) Please add braces to fix the CODING_STYLE problem while moving. + checksum += s-prom[i]; + *(uint16_t *)s-prom[12] = cpu_to_le16(checksum); This is not the right place, since lance.c uses the common part of pcnet.c. Please put the lines instead to pcnet_common_init(). + // PCI vendor and device ID should be mirrored here Also here it would be nice to convert C99 comments to C89 while moving.
Re: [Qemu-devel] [FYI] memory leak in 0.14.0rc1 ?
On Tuesday, February 15, 2011 21:16:49 Stefan Hajnoczi wrote: 2011/2/15 Torsten Förtsch torsten.foert...@gmx.net: On Tuesday, February 15, 2011 15:43:32 Stefan Hajnoczi wrote: I have installed winxp and run the machine as /usr/bin/qemu-kvm -name xp.home -m 768 Are you able to try QEMU 0.14.0-rc2 from source? $ git clone git://git.qemu.org/qemu.git $ git checkout v0.14.0-rc2 $ ./configure --target-list=x86_64-softmmu --enable-io-thread --disable-strip --prefix=/usr $ make $ x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 768 -name xp.home ... Now, the process size stays around 1300 Mb and RSS is very constant at 794 Mb. Thank you for checking this. This is probably a Suse-specific or qemu-kvm issue. Just for your information, it turns out that --enable-vnc-thread is the culprit, see https://bugzilla.novell.com/show_bug.cgi?id=671809 The method explained there (comment 4) also makes a 0.14.0 compiled from the sources and configured as ./configure --target-list=x86_64-softmmu \ --enable-io-thread --enable-vnc-thread grow. Torsten Förtsch -- Need professional modperl support? Hire me! (http://foertsch.name) Like fantasy? http://kabatinte.net
Re: [Qemu-devel] [PATCH 2/3] target-arm: Implement cp15 VA-PA translation
On 21 February 2011 23:19, Adam Lackorzynski a...@os.inf.tu-dresden.de wrote: Implement VA-PA translations by cp15-c7 that went through unchanged previously. Signed-off-by: Adam Lackorzynski a...@os.inf.tu-dresden.de Reviewed-by: Peter Maydell peter.mayd...@linaro.org (Sorry for the delay, I only got time to knock up a test program for this functionality this afternoon.) Note that without the patch I posted today that cleans up cp15 wfi decoding, you won't be able to get at one of the translation types. -- PMM
Re: [Qemu-devel] [FYI] memory leak in 0.14.0rc1 ?
On 2/25/2011 at 11:21 AM, Torsten Förtschtorsten.foert...@gmx.net wrote: On Tuesday, February 15, 2011 21:16:49 Stefan Hajnoczi wrote: 2011/2/15 Torsten Förtsch torsten.foert...@gmx.net: On Tuesday, February 15, 2011 15:43:32 Stefan Hajnoczi wrote: I have installed winxp and run the machine as /usr/bin/qemu-kvm -name xp.home -m 768 Are you able to try QEMU 0.14.0-rc2 from source? $ git clone git://git.qemu.org/qemu.git $ git checkout v0.14.0-rc2 $ ./configure --target-list=x86_64-softmmu --enable-io-thread --disable-strip --prefix=/usr $ make $ x86_64-softmmu/qemu-system-x86_64 -enable-kvm -m 768 -name xp.home ... Now, the process size stays around 1300 Mb and RSS is very constant at 794 Mb. Thank you for checking this. This is probably a Suse-specific or qemu-kvm issue. Just for your information, it turns out that --enable-vnc-thread is the culprit, see https://bugzilla.novell.com/show_bug.cgi?id=671809 The method explained there (comment 4) also makes a 0.14.0 compiled from the sources and configured as ./configure --target-list=x86_64-softmmu \ --enable-io-thread --enable-vnc-thread grow. Torsten Förtsch I haven't played much in the vnc code, but the following patch at least gets rid of the leak. I'm not sure if it's the correct solution. If someone more familiar with the vnc code wants to look into this, that would be great: diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c index 0b5d750..ebdba41 100644 --- a/ui/vnc-jobs-async.c +++ b/ui/vnc-jobs-async.c @@ -52,7 +52,6 @@ struct VncJobQueue { QemuCond cond; QemuMutex mutex; QemuThread thread; -Buffer buffer; bool exit; QTAILQ_HEAD(, VncJob) jobs; }; @@ -171,10 +170,9 @@ static void vnc_async_encoding_start(VncState *orig, VncSta te *local) local-tight = orig-tight; local-zlib = orig-zlib; local-hextile = orig-hextile; -local-output = queue-buffer; local-csock = -1; /* Don't do any network work on this thread */ -buffer_reset(local-output); +buffer_free(local-output); } static void vnc_async_encoding_end(VncState *orig, VncState *local) @@ -288,7 +286,6 @@ static void vnc_queue_clear(VncJobQueue *q) { qemu_cond_destroy(queue-cond); qemu_mutex_destroy(queue-mutex); -buffer_free(queue-buffer); qemu_free(q); queue = NULL; /* Unset global queue */ } Bruce
[Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
On 02/23/2011 12:39 PM, Jan Kiszka wrote: You should try to trace the event flow in qemu, either via strace, via the built-in tracer (which likely requires a bit more tracepoints), or via a system-level tracer (ftrace / kernelshark). The apparent problem is that 25% of cycles is spent in mutex locking and unlocking. But in fact, the real problem is that 90% of the time is spent doing something else than executing code. QEMU exits _a lot_ due to the vm_clock timers. The deadlines are rarely more than a few ms ahead, and at 1 MIPS that leaves room for executing a few thousand instructions for each context switch. The iothread overhead is what makes the situation so bad, because it takes a lot more time to execute those instructions. We do so many (almost) useless passes through cpu_exec_all that even microoptimization helps, for example this: --- a/cpus.c +++ b/cpus.c @@ -767,10 +767,6 @@ static void qemu_wait_io_event_common(CPUState *env) { CPUState *env; -while (all_cpu_threads_idle()) { -qemu_cond_timedwait(tcg_halt_cond, qemu_global_mutex, 1000); -} - qemu_mutex_unlock(qemu_global_mutex); /* @@ -1110,7 +,15 @@ bool cpu_exec_all(void) } } exit_request = 0; + +#ifdef CONFIG_IOTHREAD +while (all_cpu_threads_idle()) { + qemu_cond_timedwait(tcg_halt_cond, qemu_global_mutex, 1000); +} +return true; +#else return !all_cpu_threads_idle(); +#endif } void set_numa_modes(void) is enough to cut all_cpu_threads_idle from 9 to 4.5% (not unexpected: the number of calls is halved). But it shouldn't be that high anyway, so I'm not proposing the patch formally. Additionally, the fact that the execution is 99.99% lockstep means you cannot really overlap any part of the I/O and VCPU threads. I found a couple of inaccuracies in my patches that already cut 50% of the time, though. Did my patches contribute a bit to overhead reduction? They specifically target the costly vcpu/iothread switches in TCG mode (caused by TCGs excessive lock-holding times). Yes, they cut 15%. Paolo
[Qemu-devel] Re: virtio-serial semantics for binary data and guest agents
On 02/24/2011 06:48 AM, Amit Shah wrote: On (Wed) 23 Feb 2011 [08:31:52], Michael Roth wrote: On 02/22/2011 10:59 PM, Amit Shah wrote: On (Tue) 22 Feb 2011 [16:40:55], Michael Roth wrote: If something in the guest is attempting to read/write from the virtio-serial device, and nothing is connected to virtio-serial's host character device (say, a socket) 1. writes will block until something connect()s, at which point the write will succeed 2. reads will always return 0 until something connect()s, at which point the reads will block until there's data This makes it difficult (impossible?) to implement the notion of connect/disconnect or open/close over virtio-serial without layering another protocol on top using hackish things like length-encoded payloads or sentinel values to determine the end of one RPC/request/response/session and the start of the next. For instance, if the host side disconnects, then reconnects before we read(), we may never get the read()=0, and our FD remains valid. Whereas with a tcp/unix socket our FD is no longer valid, and the read()=0 is an event we can check for at any point after the other end does a close/disconnect. There's SIGIO support, so host connect-disconnect notifications can be caught via the signal. I recall looking into this at some pointbut don't we get a SIGIO for read/write-ability in general? I don't get you -- the virtio_console driver emits the SIGIO signal only when the host side connects or disconnects. See http://www.linux-kvm.org/page/Virtio-serial_API So whenever you receive a SIGIO, poll() in the signal handler for all fds of interest and whichever has POLLIN set is writable. Whichever has POLLHUP set is not. If you maintain previous state of the fd (before signal), you can figure out if something happened on the host side. I tried this on RHEL6+rhn updates but the O_ASYNC flag doesn't seem to be supported. Has this been backported? Either way, it seems we can still lose the disconnect event/poll state change if the host reconnects before the signal is delivered. So SIGIO in an application would need to be reserved for absolutely 2 things: a host connect or disconnect (distinguishing between the 2 may not be so important, we could treat either as the previous session having been closed). Which limits the application to only having 1 O_ASYNC FD open at a time. But even if we do that, it seems like there might still be a small window where the application could read/write data intended for the previous connection before the signal handler is invoked. Not too sure on that point though. Assuming this isn't the case...it could work. But what about windows guests? So you still need some way differentiate, say, readability from a disconnect/EOF, and the read()=0 that could determine this is still racing with host-side reconnects. Also, nonblocking reads/writes will return -EPIPE if the host-side connection is not up. But we still essentially need to poll() for a host-side disconnected state, which is still racy since they may reconnect before we've done a read/write that would've generated the -EPIPE. It seems like what we really need is for the FD to be invalid from that point forward. This would go against (or abuse) a chardev interface. It would effectively treat a host-side port close as a hot-unplug event. Well, not a complete hot-unplug. The port would still be there, we'd just need to re-open it after a read()=0 Personally I'm not necessarily advocating we change the default behavior, but couldn't we support this as a separate mode? -device virtserialport,inv_fd_on_host_close=1 or something along that line? Also, I focused more on the guest-side connect/disconnect detection, but as Anthony mentioned I think the host side shares similar limitations as well. AFAIK once we connect to the chardev that FD remains valid until the connected process closes it, and so races with the guest side on detecting connect/disconnect events in a similar manner. For the host side it looks like virtio-console has guest_close/guest_open callbacks already that we could potentially use...seems like it's just a matter of tying them to the chardev... basically having virtio-serial's guest_close() result in a close() on the corresponding chardev connection's FD. Yes, this could be used. However, the problem with that will be that the chardev can't be opened again (AFAIR) and a new chardev will have to be used. Hmm...yeah I was thinking more specifically about the socket chardev, where we can leave the listen_fd alone but close anything we've accept()'d prior to a guest-side disconnect. But isn't that enough? Just add this option for chardevs where this actually makes sense? For instance: -chardev socket,inv_fd_on_guest_close=1 Although, this wouldn't make sense if we're using the chardev for anything other than virtio-serial...so that flag makes more sense as a virtio-serial flagbut that
[Qemu-devel] x86_64 debugging while in 32-bit mode
Hi, I have a problem with debugging 64-bit emulation using Qemu GDB stub. The problem is that Qemu always sends x86_64 registers set disregarding current actual mode of an emulated CPU. It results in error message in GDB - Remote 'g' packet reply is too long: Yes, I understand that in case I will execute set architecture i386:x86-64:intel command it will show me correct registers content. But the problem is that in such case it will incorrectly try to disassemble the code and unwind the stack - it will interpret it as 64-bit while it is actually 32-bit. In my understanding Qemu should dynamically change the format of g and G packets depending on current CPU mode. On the other end, user could change manually GDB current architecture by corresponding set architecture command. Please correct me, if I am not right. May be there is some existing methodology of debugging Qemu emulated x86_64 architecture in different CPU modes. For now, I have strong intention to make a patch for Qemu GDB stub, at least for me. But I have impression that this should be corrected in official release too. -- Best regards, Artyom.
Re: [Qemu-devel] [PATCH] Use sigwait instead of sigwaitinfo.
Thanks, applied. On Fri, Feb 18, 2011 at 3:17 PM, Tristan Gingold ging...@adacore.com wrote: Fix compilation failure on Darwin. Signed-off-by: Tristan Gingold ging...@adacore.com --- compatfd.c | 36 ++-- 1 files changed, 18 insertions(+), 18 deletions(-) diff --git a/compatfd.c b/compatfd.c index a7cebc4..bd377c4 100644 --- a/compatfd.c +++ b/compatfd.c @@ -26,45 +26,45 @@ struct sigfd_compat_info static void *sigwait_compat(void *opaque) { struct sigfd_compat_info *info = opaque; - int err; sigset_t all; sigfillset(all); sigprocmask(SIG_BLOCK, all, NULL); - do { - siginfo_t siginfo; + while (1) { + int sig; + int err; - err = sigwaitinfo(info-mask, siginfo); - if (err == -1 errno == EINTR) { - err = 0; - continue; - } - - if (err 0) { - char buffer[128]; + err = sigwait(info-mask, sig); + if (err != 0) { + if (errno == EINTR) { + continue; + } else { + return NULL; + } + } else { + struct qemu_signalfd_siginfo buffer; size_t offset = 0; - memcpy(buffer, err, sizeof(err)); + memset(buffer, 0, sizeof(buffer)); + buffer.ssi_signo = sig; + while (offset sizeof(buffer)) { ssize_t len; - len = write(info-fd, buffer + offset, + len = write(info-fd, (char *)buffer + offset, sizeof(buffer) - offset); if (len == -1 errno == EINTR) continue; if (len = 0) { - err = -1; - break; + return NULL; } offset += len; } } - } while (err = 0); - - return NULL; + } } static int qemu_signalfd_compat(const sigset_t *mask) -- 1.7.3.GIT
[Qemu-devel] Re: [PATCH] target-arm: Don't decode old cp15 WFI instructions on v7 cores
On Fri Feb 25, 2011 at 15:04:12 +, Peter Maydell wrote: In v7 of the ARM architecture, WFI (wait for interrupt) is a first-class instruction, but in previous versions this functionality was provided via a cp15 coprocessor register. Add correct feature checks to the decoding of the cp15 WFI instructions so that they behave correctly for newer cores. In particular, the old 0,c7,c8,2 encoding used on ARM940 has been reused for VA-to-PA translation in v6 and v7. Signed-off-by: Peter Maydell peter.mayd...@linaro.org Reviewed-by: Adam Lackorzynski a...@os.inf.tu-dresden.de --- This patch stands alone as a fix to target-arm; it's a prerequisite for Adam's VA-PA translation patch, because otherwise attempting a user-read translation will get you a WFI instead... Thanks, (un)fortunately I never triggered this case in my setup. Adam -- Adam a...@os.inf.tu-dresden.de Lackorzynski http://os.inf.tu-dresden.de/~adam/
[Qemu-devel] [PATCH] vnc: fix a memory leak in threaded vnc server
VncJobQueue's buffer is intended to be used for as the output buffer for all operations in this queue, but unfortunatly. vnc_async_encoding_start() is in charge of setting this buffer as the current output buffer, but vnc_async_encoding_end() was not writting the changes back to VncJobQueue, resulting in a big and ugly memleak. Signed-off-by: Corentin Chary corenti...@iksaif.net --- I believe this is a (slightly) better patch than Bruce's one, because it reduce memory allocations by using always the same buffer. ui/vnc-jobs-async.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c index 1d4c5e7..f596247 100644 --- a/ui/vnc-jobs-async.c +++ b/ui/vnc-jobs-async.c @@ -186,6 +186,8 @@ static void vnc_async_encoding_end(VncState *orig, VncState *local) orig-hextile = local-hextile; orig-zrle = local-zrle; orig-lossy_rect = local-lossy_rect; + +queue-buffer = local-output; } static int vnc_worker_thread_loop(VncJobQueue *queue) -- 1.7.4
Re: [Qemu-devel] [PATCH] Outdated comment in HACKING
This patch won't apply with git-am because your mailer is doing weird things. Please use git-send-email to send the patch. Regards, Anthony Liguori On 02/24/2011 06:27 PM, Joey Trebbien wrote: All printf-style functions in the source (except for a few in tests/) already have a format __attribute__ (via the GCC_ATTR or GCC_FMT_ATTR macros). Signed-off-by: Joseph Trebbien jtrebb...@gmail.com mailto:jtrebb...@gmail.com --- HACKING | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/HACKING b/HACKING index 6ba9d7e..3af53fd 100644 --- a/HACKING +++ b/HACKING @@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype. This makes it so gcc's -Wformat and -Wformat-security options can do their jobs and cross-check format strings with the number and types of arguments. - -Currently many functions in QEMU are not following this rule but -patches to add the attribute would be very much appreciated.
[Qemu-devel] [RESENT][PATCH] HACKING: Update status of format checking
This patch was already sent on 2011-01-24: Hopefully all functions with printf like arguments now use format checking. This was tested with default build configuration on linux and windows hosts (including some cross compilations), so chances are good that there remain few (if any) functions without format checking. Therefore the last comment in HACKING is no longer valid but misleading. Cc: Blue Swirl blauwir...@gmail.com Signed-off-by: Stefan Weil w...@mail.berlios.de --- HACKING |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/HACKING b/HACKING index 6ba9d7e..3af53fd 100644 --- a/HACKING +++ b/HACKING @@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype. This makes it so gcc's -Wformat and -Wformat-security options can do their jobs and cross-check format strings with the number and types of arguments. - -Currently many functions in QEMU are not following this rule but -patches to add the attribute would be very much appreciated. -- 1.7.2.3
Re: [Qemu-devel] [PATCH V6 3/4] qmp, nmi: convert do_inject_nmi() to QObject
On 02/25/2011 03:54 AM, Markus Armbruster wrote: Anthony Liguorialigu...@linux.vnet.ibm.com writes: On 02/24/2011 10:20 AM, Markus Armbruster wrote: Anthony Liguorialigu...@linux.vnet.ibm.com writes: On 02/24/2011 02:33 AM, Markus Armbruster wrote: Anthony Liguorianth...@codemonkey.wswrites: [...] Please describe all expected errors. Quoting qmp-commands.hx: 3. Errors, in special, are not documented. Applications should NOT check for specific errors classes or data (it's strongly recommended to only check for the error key) Indeed, not a single error is documented there. This is intentional. Yeah, but we're not 0.14 anymore and for 0.15, we need to document errors. If you are suggesting I send a patch to remove that section, I'm more than happy to. Two separate issues here: 1. Are we ready to commit to the current design of errors, and 2. Is it fair to reject Lai's patch now because he doesn't document his errors. I'm not commenting on 1. here. Regarding 2.: rejecting a patch because it doesn't document an aspect that current master intentionally leaves undocumented is not how you treat contributors. At least not if you want any other than certified masochists who enjoy pain, and professionals who get adequately compensated for it. Lead by example, not by fiat. http://repo.or.cz/w/qemu/aliguori.git/blob/refs/heads/glib:/qmp-schema.json I am in the process of documenting the errors of every command. It's a royal pain but I'm going to document everything we have right now. It's actually the last bit of work I need to finish before sending QAPI out. So for new commands being added, it would be hugely helpful for the authors to document the errors such that I don't have to reverse engineer all of the possible error conditions. The moment this lands in master, you can begin to demand error descriptions from contributors. Until then, I'll NAK error descriptions in qmp-commands.txt. We left them undocumented there for good reasons: No, it was always a bad reason. Good documentation is necessary to build good commands. Errors are a huge part of the semantics of a command. We cannot properly assess a command unless it's behavior in error conditions is well defined. Once we have an error design in place that has a reasonable hope to stand the test of time, and have errors documented for at least some of the commands here, we can start to require proper error documentation for new commands. But not now. I won't NAK non-normative error descriptions, say in commit messages, or in comments. And I won't object to you asking for them. But I feel you really shouldn't make it a condition for committing patches. Especially not for simple patches that have been on list for months. I'm strongly committed to making QMP a first class interface in QEMU for 0.15. I feel documentation is a crucial part to making that happen. I'm not asking for test cases even though that's something that we'll need for 0.15 because there's not enough infrastructure in master yet to do that in a reasonable way. I realize I'm likely going to have to write that test case and I'm happy to do that. But there's no reason that we shouldn't require thorough documentation for all new QMP commands moving forward including error semantics. This is a critical part of having a first class API and no additional infrastructure is needed in master to do this. Regards, Anthony Liguori -- To unsubscribe from this list: send the line unsubscribe kvm in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [Qemu-devel] [PATCH] Outdated comment in HACKING
Am 25.02.2011 23:08, schrieb Anthony Liguori: This patch won't apply with git-am because your mailer is doing weird things. Please use git-send-email to send the patch. Regards, Anthony Liguori On 02/24/2011 06:27 PM, Joey Trebbien wrote: All printf-style functions in the source (except for a few in tests/) already have a format __attribute__ (via the GCC_ATTR or GCC_FMT_ATTR macros). Signed-off-by: Joseph Trebbien jtrebb...@gmail.com mailto:jtrebb...@gmail.com --- HACKING | 3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/HACKING b/HACKING index 6ba9d7e..3af53fd 100644 --- a/HACKING +++ b/HACKING @@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype. This makes it so gcc's -Wformat and -Wformat-security options can do their jobs and cross-check format strings with the number and types of arguments. - -Currently many functions in QEMU are not following this rule but -patches to add the attribute would be very much appreciated. Hi Anthony, the same patch is on my list of missing patches which I had sent weeks ago, so no need for Joey to resent his patch. I'll resend my version. Regards, Stefan W.
[Qemu-devel] [PATCH 10/26] FVD: add impl of interface bdrv_file_open()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_file_open() interface. It supports openning an FVD image. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-journal.c |6 + block/fvd-open.c | 445 +- block/fvd-prefetch.c | 17 ++ block/fvd.c |1 + 4 files changed, 468 insertions(+), 1 deletions(-) create mode 100644 block/fvd-prefetch.c diff --git a/block/fvd-journal.c b/block/fvd-journal.c index 246f425..5ba34bd 100644 --- a/block/fvd-journal.c +++ b/block/fvd-journal.c @@ -22,6 +22,12 @@ static inline int64_t calc_min_journal_size(int64_t table_entries) return 512; } +static int init_journal(int read_only, BlockDriverState * bs, +FvdHeader * header) +{ +return -ENOTSUP; +} + void fvd_emulate_host_crash(bool cond) { emulate_host_crash = cond; diff --git a/block/fvd-open.c b/block/fvd-open.c index 056b994..8caf8d3 100644 --- a/block/fvd-open.c +++ b/block/fvd-open.c @@ -11,7 +11,450 @@ * */ +static void init_prefetch_timer(BlockDriverState * bs, BDRVFvdState * s); +static int init_data_file(BDRVFvdState * s, FvdHeader * header, int flags); +static int init_bitmap(BlockDriverState * bs, BDRVFvdState * s, + FvdHeader * header, const char *const filename); +static int load_table(BDRVFvdState * s, FvdHeader * header, + const char *const filename); +static int init_journal(int read_only, BlockDriverState * bs, +FvdHeader * header); +static int init_compact_image(BDRVFvdState * s, FvdHeader * header, + const char *const filename); + static int fvd_open(BlockDriverState * bs, const char *filename, int flags) { -return -ENOTSUP; +BDRVFvdState *s = bs-opaque; +int ret; +FvdHeader header; +BlockDriver *drv; +int i; + +const char *protocol = strchr(filename, ':'); +if (protocol) { +drv = bdrv_find_protocol(filename); +filename = protocol + 1; +} else { +/* Use raw instead of file to allow storing the image on device. */ +drv = bdrv_find_format(raw); +if (!drv) { +fprintf(stderr, Failed to find the block device driver\n); +return -EINVAL; +} +} + +s-fvd_metadata = bdrv_new(); +ret = bdrv_open(s-fvd_metadata, filename, flags, drv); +if (ret 0) { +fprintf(stderr, Failed to open %s\n, filename); +return ret; +} + +/* Initialize so that jumping to 'fail' would do cleanup properly. */ +s-stale_bitmap = NULL; +s-fresh_bitmap = NULL; +s-table = NULL; +s-outstanding_copy_on_read_data = 0; +QLIST_INIT(s-write_locks); +QLIST_INIT(s-copy_locks); +s-prefetch_acb = NULL; +s-add_storage_cmd = NULL; +#ifdef FVD_DEBUG +s-total_copy_on_read_data = s-total_prefetch_data = 0; +#endif + +if (bdrv_pread(s-fvd_metadata, 0, header, sizeof(header)) != +sizeof(header)) { +fprintf(stderr, Failed to read the header of %s\n, filename); +ret = -EIO; +goto fail; +} + +fvd_header_le_to_cpu(header); + +if (header.magic != FVD_MAGIC) { +fprintf(stderr, Incorrect magic number in header: %0X\n, +header.magic); +ret = -EINVAL; +goto fail; +} + +/* Check incompatible features. */ +for (i = 0; i INCOMPATIBLE_FEATURES_SPACE; i++) { +if (header.incompatible_features[i] != 0) { +fprintf(stderr, The image was created by FVD version %d + and uses features not supported by this FVD version %d\n, +header.create_version, FVD_VERSION); +ret = -ENOTSUP; +} +} + +if (header.virtual_disk_size % 512 != 0) { +fprintf(stderr, Disk size % PRId64 in the header of %s is not +a multple of 512.\n, header.virtual_disk_size, filename); +ret = -EINVAL; +goto fail; +} + +/* Initialize the fields of BDRVFvdState. */ +s-chunks_relocated = header.chunks_relocated; +s-dirty_image = false; +s-metadata_err_prohibit_write = false; +s-block_size = header.block_size / 512; +s-bitmap_size = header.bitmap_size; +s-prefetch_timer = NULL; +s-sectors_per_prefetch = (header.bytes_per_prefetch + 511) / 512; +s-prefetch_throttle_time = header.prefetch_throttle_time; +s-prefetch_read_throughput_measure_time = +header.prefetch_read_throughput_measure_time; +s-prefetch_write_throughput_measure_time = +header.prefetch_write_throughput_measure_time; + +/* Convert KB/s to bytes/millisec. */ +s-prefetch_min_read_throughput = +((double)header.prefetch_min_read_throughput) * 1024.0 / 1000.0; +s-prefetch_min_write_throughput = +((double)header.prefetch_min_write_throughput)
[Qemu-devel] [PATCH 20/26] FVD: add impl of interface bdrv_get_info()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_get_info() interface. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-misc.c | 98 +- 1 files changed, 97 insertions(+), 1 deletions(-) diff --git a/block/fvd-misc.c b/block/fvd-misc.c index a42bfac..c515d74 100644 --- a/block/fvd-misc.c +++ b/block/fvd-misc.c @@ -11,6 +11,7 @@ * */ +static int read_fvd_header(BDRVFvdState * s, FvdHeader * header); static void fvd_aio_cancel_bjnl_buf_write(FvdAIOCB * acb); static void fvd_aio_cancel_bjnl_flush(FvdAIOCB * acb); static void fvd_aio_cancel_read(FvdAIOCB * acb); @@ -95,7 +96,102 @@ static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num, static int fvd_get_info(BlockDriverState * bs, BlockDriverInfo * bdi) { -return -ENOTSUP; +BDRVFvdState *s = bs-opaque; +FvdHeader header; + +if (read_fvd_header(s, header) 0) { +return -1; +} + +printf(= Begin of FVD specific information ==\n); +printf(magic\t\t\t\t\t\t%0X\n, header.magic); +printf(header_size\t\t\t\t\t%d\n, header.header_size); +printf(create_version\t\t\t\t\t%d\n, header.create_version); +printf(last_open_version\t\t\t\t%d\n, header.last_open_version); +printf(virtual_disk_size (bytes)\t\t\t% PRId64 \n, + header.virtual_disk_size); +printf(disk_metadata_size (bytes)\t\t\t% PRId64 \n, header.data_offset); +if (header.data_file[0]) { +printf(data_file\t\t\t\t\t%s\n, header.data_file); +} +if (header.data_file_fmt[0]) { +printf(data_file_fmt\t\t\t\t\t%s\n, header.data_file_fmt); +} + +if (header.table_offset 0) { +printf(table_size (bytes)\t\t\t\t% PRId64 \n, header.table_size); +printf(avail_storage (bytes)\t\t\t\t% PRId64 \n, + s-avail_storage * 512); +printf(chunk_size (bytes)\t\t\t\t% PRId64 \n, header.chunk_size); +printf(used_chunks (bytes)\t\t\t\t% PRId64 \n, + s-used_storage * 512); +printf(storage_grow_unit (bytes)\t\t\t% PRId64 \n, + header.storage_grow_unit); +printf(table_offset (bytes)\t\t\t\t% PRId64 \n, + header.table_offset); +printf(table_size (bytes)\t\t\t\t% PRId64 \n, s-table_size); +printf(chunks_relocated\t\t\t\t%s\n, BOOL(s-chunks_relocated)); + +if (header.add_storage_cmd[0] != 0) { +printf(add_storage_cmd\t\t\t\t\t%s\n, header.add_storage_cmd); +} +} + +printf(clean_shutdown\t\t\t\t\t%s\n, BOOL(header.clean_shutdown)); +if (header.journal_size 0) { +printf(journal_offset\t\t\t\t\t% PRId64 \n, header.journal_offset); +printf(journal_size\t\t\t\t\t% PRId64 \n, header.journal_size); +printf(stable_journal_epoch\t\t\t\t% PRId64 \n, + header.stable_journal_epoch); +printf(journal_buf_size (bytes)\t\t\t% PRId64 \n, + header.journal_buf_size); +printf(journal_clean_buf_period (ms)\t\t\t% PRId64 \n, + header.journal_clean_buf_period); +} + +if (header.base_img[0] != 0) { +printf(base_img_fully_prefetched\t\t\t%s\n, + BOOL(header.base_img_fully_prefetched)); +printf(base_img\t\t\t\t\t%s\n, header.base_img); +if (header.base_img_fmt[0]) { +printf(base_img_fmt\t\t\t\t\t%s\n, header.base_img_fmt); +} +printf(base_img_size (bytes)\t\t\t\t% PRId64 \n, + header.base_img_size); +printf(bitmap_offset (bytes)\t\t\t\t% PRId64 \n, + header.bitmap_offset); +printf(bitmap_size (bytes)\t\t\t\t% PRId64 \n, header.bitmap_size); +printf(block_size\t\t\t\t\t% PRId64 \n, header.block_size); +printf(copy_on_read\t\t\t\t\t%s\n, BOOL(header.copy_on_read)); +printf(max_outstanding_copy_on_read_data (bytes)\t% PRId64 \n, + header.max_outstanding_copy_on_read_data); +printf(need_zero_init\t\t\t\t\t%s\n, BOOL(header.need_zero_init)); +printf(prefetch_start_delay (sec)\t\t\t% PRId64 \n, + header.prefetch_start_delay); +printf(num_prefetch_slots\t\t\t\t%d\n, header.num_prefetch_slots); +printf(bytes_per_prefetch\t\t\t\t% PRIu64 \n, + header.bytes_per_prefetch); +printf(prefetch_over_threshold_throttle_time (ms)\t% PRIu64 \n, + header.prefetch_throttle_time); +printf(prefetch_read_throughput_measure_time (ms)\t% PRIu64 \n, + header.prefetch_read_throughput_measure_time); +printf(prefetch_write_throughput_measure_time (ms)\t% PRIu64 \n, + header.prefetch_write_throughput_measure_time); +printf(prefetch_min_read_throughput (KB/s)\t\t% PRIu64 \n, + header.prefetch_min_read_throughput); +
[Qemu-devel] [PATCH 24/26] FVD: add impl of interface bdrv_has_zero_init()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_has_zero_init() interface. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-misc.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/block/fvd-misc.c b/block/fvd-misc.c index 766b62b..61e39bb 100644 --- a/block/fvd-misc.c +++ b/block/fvd-misc.c @@ -341,5 +341,12 @@ static int fvd_get_info(BlockDriverState * bs, BlockDriverInfo * bdi) static int fvd_has_zero_init(BlockDriverState * bs) { -return 0; +BDRVFvdState *s = bs-opaque; + +/* For a non-compact image, chunks_relocated is always false. For a + * compact image with chunks_relocated=true, it can no longer guarantee + * zero init even if the file system does that. This is because a partialy + * written chunk X may be relocated to a location previously used by + * another chunk Y and some garbage data are left there by Y. */ +return s-chunks_relocated ? 0 : bdrv_has_zero_init(s-fvd_data); } -- 1.7.0.4
[Qemu-devel] [PATCH 08/26] FVD: add debugging utilities
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds some debugging utilities to FVD. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/blksim.c |7 +- block/fvd-debug.c | 369 +++ block/fvd-ext.h | 71 ++ block/fvd-journal.c | 23 +++ block/fvd.c |2 + block/fvd.h |1 + qemu-io-auto.c | 17 ++- 7 files changed, 478 insertions(+), 12 deletions(-) create mode 100644 block/fvd-debug.c create mode 100644 block/fvd-ext.h create mode 100644 block/fvd-journal.c diff --git a/block/blksim.c b/block/blksim.c index 5c7ef43..16e44ee 100644 --- a/block/blksim.c +++ b/block/blksim.c @@ -19,12 +19,7 @@ #include qemu-queue.h #include qemu-common.h #include block/blksim.h - -#if 1 -# define QDEBUG(format,...) do {} while (0) -#else -# define QDEBUG printf -#endif +#include block/fvd-ext.h typedef enum { diff --git a/block/fvd-debug.c b/block/fvd-debug.c new file mode 100644 index 000..36b4c43 --- /dev/null +++ b/block/fvd-debug.c @@ -0,0 +1,369 @@ +/* + * QEMU Fast Virtual Disk Format Debugging Utilities + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef ENABLE_TRACE_IO +#define TRACE_REQUEST(...) do {} while (0) +#define TRACE_STORE_IN_FVD(...) do {} while (0) + +#else + +static void TRACE_REQUEST(int do_write, int64_t sector_num, int nb_sectors) +{ +if (do_write) { +QDEBUG(TRACE_REQUEST: write sector_num=% PRId64 +nb_sectors=%d[ , sector_num, nb_sectors); +} else { +QDEBUG(TRACE_REQUEST: read sector_num=% PRId64 nb_sectors=%d + [ , sector_num, nb_sectors); +} + +int64_t end = sector_num + nb_sectors; +int64_t sec; +for (sec = sector_num; sec end; sec++) { +QDEBUG(sec% PRId64 , sec); +} +QDEBUG( ]\n); +} + +static void TRACE_STORE_IN_FVD(const char *str, int64_t sector_num, + int nb_sectors) +{ +QDEBUG(TRACE_STORE: %s sector_num=% PRId64 nb_sectors=%d [ , + str, sector_num, nb_sectors); +int64_t end = sector_num + nb_sectors; +int64_t sec; +for (sec = sector_num; sec end; sec++) { +QDEBUG(sec% PRId64 , sec); +} +QDEBUG( ]\n); +} +#endif + +#ifndef FVD_DEBUG +#define my_qemu_malloc qemu_malloc +#define my_qemu_mallocz qemu_mallocz +#define my_qemu_blockalign qemu_blockalign +#define my_qemu_free qemu_free +#define my_qemu_vfree qemu_vfree +#define my_qemu_aio_get qemu_aio_get +#define my_qemu_aio_release qemu_aio_release +#define COPY_UUID(to,from) do {} while (0) + +#else +FILE *__fvd_debug_fp; +static unsigned long long int fvd_uuid = 1; +static int64_t pending_qemu_malloc = 0; +static int64_t pending_qemu_aio_get = 0; +static int64_t pending_local_writes = 0; +static const char *alloc_file; +static int alloc_line; + +#define my_qemu_malloc(size) \ +((void*)(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_malloc(size))) + +#define my_qemu_mallocz(size) \ +((void*)(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_mallocz(size))) + +#define my_qemu_blockalign(bs,size) \ +((void*)(alloc_file=__FILE__, \ + alloc_line=__LINE__, \ + _my_qemu_blockalign(bs,size))) + +#define my_qemu_aio_get(pool,bs,cb,op) \ +((void*)(alloc_file=__FILE__, \ + alloc_line=__LINE__, \ + _my_qemu_aio_get(pool,bs,cb,op))) + +#define my_qemu_free(p) \ +(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_free(p)) + +#define my_qemu_vfree(p) \ +(alloc_file=__FILE__, alloc_line=__LINE__, _my_qemu_vfree(p)) + +static void COPY_UUID(FvdAIOCB * to, FvdAIOCB * from) +{ +if (from) { +to-uuid = from-uuid; +FVD_DEBUG_ACB(to); +} +} + +#ifdef DEBUG_MEMORY_LEAK +#define MAX_TRACER 10485760 +static int alloc_tracer_used = 1; /* slot 0 is not used. */ +static void **alloc_tracers = NULL; + +static void __attribute__ ((constructor)) init_mem_alloc_tracers(void) +{ +if (!alloc_tracers) { +alloc_tracers = qemu_mallocz(sizeof(void *) * MAX_TRACER); +} +} + +static void trace_alloc(void *p, size_t size) +{ +alloc_tracer_t *t = p; +t-magic = FVD_ALLOC_MAGIC; +t-alloc_file = alloc_file; +t-alloc_line = alloc_line; +t-size = size; + +if (alloc_tracer_used MAX_TRACER) { +t-alloc_tracer = alloc_tracer_used++; +alloc_tracers[t-alloc_tracer] = t; +QDEBUG(Allocate memory using tracer%d in %s on line %d.\n, + t-alloc_tracer, alloc_file, alloc_line); +} else { +t-alloc_tracer = 0; +} + +/* Set header and footer to detect out-of-range writes. */ +if (size != (size_t) - 1) { +uint8_t *q = (uint8_t *) p; +
[Qemu-devel] [PATCH 16/26] FVD: add impl for buffered journal updates
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch enhances FVD's journal with the capability of buffering multiple metadata updates and sending them to the journal in a single write. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-journal-buf.c | 336 ++- 1 files changed, 333 insertions(+), 3 deletions(-) diff --git a/block/fvd-journal-buf.c b/block/fvd-journal-buf.c index 3efdd47..b4077ce 100644 --- a/block/fvd-journal-buf.c +++ b/block/fvd-journal-buf.c @@ -20,15 +20,345 @@ * case for cache!=writethrough. **/ +static inline int bjnl_write_buf(FvdAIOCB *acb); +static void bjnl_send_current_buf_to_write_queue(BlockDriverState *bs); + +static inline void bjnl_finish_write_buf(FvdAIOCB *acb, int ret) +{ +ASSERT (acb-type == OP_BJNL_BUF_WRITE); +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; + +QDEBUG(JOURNAL: bjnl_finish_write_buf acb%llu-%p\n, acb-uuid, acb); + +my_qemu_vfree(acb-jcb.iov.iov_base); +QTAILQ_REMOVE(s-bjnl.queued_bufs, acb, jcb.bjnl_next_queued_buf); +my_qemu_aio_release(acb); + +if (ret != 0) { +s-metadata_err_prohibit_write = true; +} +} + +static inline void bjnl_write_next_buf(BDRVFvdState *s) +{ +FvdAIOCB *acb; +while ((acb = QTAILQ_FIRST(s-bjnl.queued_bufs))) { +if (bjnl_write_buf(acb) == 0) { +return; +} +} +} + +static inline void bjnl_aio_flush_cb(void *opaque, int ret) +{ +FvdAIOCB *acb = (FvdAIOCB *) opaque; + +if (acb-cancel_in_progress) { +return; +} + +QDEBUG(JOURNAL: bjnl_aio_flush_cb acb%llu-%p\n, acb-uuid, acb); + +/* Invoke the callback initially provided to fvd_aio_flush(). */ +acb-common.cb(acb-common.opaque, ret); +my_qemu_aio_release(acb); +} + +static inline void bjnl_write_buf_cb(void *opaque, int ret) +{ +FvdAIOCB *acb = (FvdAIOCB *) opaque; +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; + +if (acb-cancel_in_progress) { +return; +} + +QDEBUG(JOURNAL: bjnl_write_buf_cb acb%llu-%p\n, acb-uuid, acb); +bjnl_finish_write_buf(acb, ret); +bjnl_write_next_buf(s); +} + +#ifndef ENABLE_QDEBUG +# define PRINT_JRECORDS(buf,len) do{}while(0) +#else +static void print_jrecords(const uint8_t *buf, size_t len); +# define PRINT_JRECORDS print_jrecords +#endif + +static int bjnl_write_buf_start(FvdAIOCB *acb) +{ +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; +int64_t journal_sec; +int nb_sectors = acb-jcb.iov.iov_len / 512; +int ret; + +ASSERT (nb_sectors = s-journal_size); +QDEBUG(JOURNAL: bjnl_write_buf_start acb%llu-%p\n, acb-uuid, acb); + +if (s-next_journal_sector + nb_sectors = s-journal_size) { +journal_sec = s-next_journal_sector; +s-next_journal_sector += nb_sectors; +} else { +if ((ret = recycle_journal(bs))) { +goto fail; +} +journal_sec = 0; +s-next_journal_sector = nb_sectors; +} + +PRINT_JRECORDS(acb-jcb.iov.iov_base, acb-jcb.iov.iov_len); + +acb-jcb.hd_acb = bdrv_aio_writev(s-fvd_metadata, + s-journal_offset + journal_sec, + acb-jcb.qiov, nb_sectors, + bjnl_write_buf_cb, acb); +if (acb-jcb.hd_acb) { +return 0; +} else { +ret = -EIO; +} + +fail: +bjnl_finish_write_buf(acb, ret); +return ret; +} + +static void bjnl_flush_data_before_update_bitmap_cb(void *opaque, int ret) +{ +FvdAIOCB *acb = opaque; + +if (acb-cancel_in_progress) { +return; +} + +QDEBUG(JOURNAL: bjnl_flush_data_before_update_bitmap_cb acb%llu-%p\n, + acb-uuid, acb); + +if (ret != 0) { +bjnl_finish_write_buf(acb, ret); +} else if (bjnl_write_buf_start(acb) == 0) { +return; +} + +bjnl_write_next_buf(acb-common.bs-opaque); +} + +static inline int bjnl_write_buf(FvdAIOCB *acb) +{ +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; + +QDEBUG(JOURNAL: bjnl_write_buf acb%llu-%p\n, acb-uuid, acb); + +if (!acb-jcb.bitmap_updated) { +return bjnl_write_buf_start(acb); +} + +/* If bitmap_updated, fvd_data need be flushed first before bitmap changes + * can be committed. Otherwise, a host crashes after bitmap metadata are + * updated but before the corresponding data are persisted on disk, the VM + * will get corrupted data, as correct data may be in the base image. */ +acb-jcb.hd_acb = bdrv_aio_flush(s-fvd_data, + bjnl_flush_data_before_update_bitmap_cb, + acb); +if (acb-jcb.hd_acb) { +return 0; +} else { +
[Qemu-devel] [PATCH 17/26] FVD: add impl of bdrv_flush() and bdrv_aio_flush()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_flush() and bdrv_aio_flush() interfaces. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-flush.c | 176 +- block/fvd-journal-buf.c | 218 +++ 2 files changed, 390 insertions(+), 4 deletions(-) diff --git a/block/fvd-flush.c b/block/fvd-flush.c index 34bd5cb..6658d27 100644 --- a/block/fvd-flush.c +++ b/block/fvd-flush.c @@ -1,5 +1,5 @@ /* - * QEMU Fast Virtual Disk Format bdrv_flush() and bdrv_aio_flush() + * QEMU Fast Virtual Disk Format Misc Functions of BlockDriver Interface * * Copyright IBM, Corp. 2010 * @@ -11,14 +11,182 @@ * */ +static void aio_wrapper_bh(void *opaque); +static int bjnl_sync_flush(BlockDriverState * bs); +static bool bjnl_clean_buf_on_aio_flush(BlockDriverState *bs, + BlockDriverCompletionFunc * cb, + void *opaque, BlockDriverAIOCB **p_acb); +static BlockDriverAIOCB *fvd_aio_flush_start(BlockDriverState * bs, + BlockDriverCompletionFunc * cb, + void *opaque, FvdAIOCB *parent_acb); + +static int fvd_flush(BlockDriverState * bs) +{ +BDRVFvdState *s = bs-opaque; +int ret; + +QDEBUG(fvd_flush() invoked\n); + +if (s-metadata_err_prohibit_write) { +return -EIO; +} + +if (!s-fvd_metadata-enable_write_cache) { +/* No need to flush since it uses O_DSYNC. */ +return 0; +} + +if (s-use_bjnl) { +return bjnl_sync_flush(bs); +} + +/* Simply flush for unbuffered journal update. */ +if ((ret = bdrv_flush(s-fvd_data))) { +return ret; +} +if (s-fvd_metadata == s-fvd_data) { +return 0; +} +return bdrv_flush(s-fvd_metadata); +} + static BlockDriverAIOCB *fvd_aio_flush(BlockDriverState * bs, BlockDriverCompletionFunc * cb, void *opaque) { -return NULL; +BDRVFvdState *s = bs-opaque; +BlockDriverAIOCB * pacb; +FvdAIOCB *acb; + +QDEBUG(fvd_aio_flush() invoked\n); + +if (s-metadata_err_prohibit_write) { +return NULL; +} + +if (!s-fvd_data-enable_write_cache) { +/* Need to flush since it uses O_DSYNC. Use a QEMUBH to invoke the + * callback. */ + +if (!(acb = my_qemu_aio_get(fvd_aio_pool, bs, cb, opaque))) { +return NULL; +} + +acb-type = OP_WRAPPER; +acb-cancel_in_progress = false; +acb-wrapper.bh = qemu_bh_new(aio_wrapper_bh, acb); +qemu_bh_schedule(acb-wrapper.bh); +return acb-common; +} + +if (!s-use_bjnl) { +QDEBUG(FLUSH: start now for unbuffered journal update); +return fvd_aio_flush_start(bs, cb, opaque, NULL); +} + +if (bjnl_clean_buf_on_aio_flush(bs, cb, opaque, pacb)) { +/* Waiting for the journal buffer to be cleaned first. */ +return pacb; +} + +/* No buffered journal data. Start flush now. */ +QDEBUG(FLUSH: start now as no buffered journal data); +return fvd_aio_flush_start(bs, cb, opaque, NULL); +} + +static inline void finish_flush(FvdAIOCB * acb) +{ +QDEBUG(FLUSH: acb%llu-%p finish_flush ret=%d\n, + acb-uuid, acb, acb-flush.ret); +acb-common.cb(acb-common.opaque, acb-flush.ret); +my_qemu_aio_release(acb); } -static int fvd_flush(BlockDriverState * bs) +static void flush_data_cb(void *opaque, int ret) { -return -ENOTSUP; +FvdAIOCB *acb = opaque; + +if (acb-cancel_in_progress) { +return; +} + +QDEBUG(FLUSH: acb%llu-%p flush_data_cb ret=%d\n, acb-uuid, acb, ret); + +if (acb-flush.ret == 0) { +acb-flush.ret = ret; +} + +acb-flush.data_acb = NULL; +acb-flush.num_finished++; +if (acb-flush.num_finished == 2) { +finish_flush(acb); +} +} + +static void flush_metadata_cb(void *opaque, int ret) +{ +FvdAIOCB *acb = opaque; + +if (acb-cancel_in_progress) { +return; +} + +QDEBUG(FLUSH: acb%llu-%p flush_metadata_cb ret=%d\n, + acb-uuid, acb, ret); + +if (acb-flush.ret == 0) { +acb-flush.ret = ret; +} + +acb-flush.metadata_acb = NULL; +acb-flush.num_finished++; +if (acb-flush.num_finished == 2) { +finish_flush(acb); +} +} + +static BlockDriverAIOCB *fvd_aio_flush_start(BlockDriverState * bs, + BlockDriverCompletionFunc * cb, + void *opaque, FvdAIOCB *parent_acb) +{ +BDRVFvdState *s = bs-opaque; +FvdAIOCB *acb; + +if (s-fvd_data == s-fvd_metadata) { +if (parent_acb) { +QDEBUG(FLUSH: acb%llu-%p started.\n,parent_acb-uuid,parent_acb); +} +return
[Qemu-devel] [PATCH 23/26] FVD: add impl of interface bdrv_is_allocated()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_is_allocated() interface. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-misc.c | 67 ++ 1 files changed, 67 insertions(+), 0 deletions(-) diff --git a/block/fvd-misc.c b/block/fvd-misc.c index 63ed168..766b62b 100644 --- a/block/fvd-misc.c +++ b/block/fvd-misc.c @@ -169,6 +169,73 @@ static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename) static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num, int nb_sectors, int *pnum) { +BDRVFvdState *s = bs-opaque; + +if (s-prefetch_state == PREFETCH_STATE_FINISHED || +sector_num = s-base_img_sectors || +!fresh_bitmap_show_sector_in_base_img(sector_num, s)) { +/* For the three cases that data may be saved in the FVD data file, we + * still need to check the underlying storage because those data could + * be holes in a sparse image, due to the optimization of free write + * to zero-filled blocks. See Section 3.3.3 of the FVD-cow paper. + * This also covers the case of no base image. */ + +if (!s-table) { +return bdrv_is_allocated(s-fvd_data, s-data_offset + sector_num, + nb_sectors, pnum); +} + +/* Use the table to figure it out. */ +int64_t first_chunk = sector_num / s-chunk_size; +int64_t last_chunk = (sector_num + nb_sectors - 1) / s-chunk_size; +int allocated = !IS_EMPTY(s-table[first_chunk]); +int count; + +if (first_chunk == last_chunk) { +/* All data in one chunk. */ +*pnum = nb_sectors; +return allocated; +} + +/* Data in the first chunk. */ +count = s-chunk_size - (sector_num % s-chunk_size); + +/* Full chunks. */ +first_chunk++; +while (first_chunk last_chunk) { +if ((allocated IS_EMPTY(s-table[first_chunk])) +|| (!allocated !IS_EMPTY(s-table[first_chunk]))) { +*pnum = count; +return allocated; +} + +count += s-chunk_size; +first_chunk++; +} + +/* Data in the last chunk. */ +if ((allocated !IS_EMPTY(s-table[last_chunk])) +|| (!allocated IS_EMPTY(s-table[last_chunk]))) { +int nb = (sector_num + nb_sectors) % s-chunk_size; +count += nb ? nb : s-chunk_size; +} + +*pnum = count; +return allocated; +} + +/* Use the FVD metadata to find out sectors in the base image. */ +int64_t end = sector_num + nb_sectors; +if (end s-base_img_sectors) { +end = s-base_img_sectors; +} + +int64_t next = sector_num + 1; +while (next end fresh_bitmap_show_sector_in_base_img(next, s)) { +next++; +} + +*pnum = next - sector_num; return 0; } -- 1.7.0.4
[Qemu-devel] [PATCH 03/26] FVD: add fully automated test-qcow2.sh
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. test-qcow2.sh drives 'qemu-io --auto' to perform fully automated testing for QCOW2. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- test-qcow2.sh | 89 + 1 files changed, 89 insertions(+), 0 deletions(-) create mode 100755 test-qcow2.sh diff --git a/test-qcow2.sh b/test-qcow2.sh new file mode 100755 index 000..d1e4dc0 --- /dev/null +++ b/test-qcow2.sh @@ -0,0 +1,89 @@ +#!/bin/bash + +# Drive 'qemu-io --auto' to test the QCOW2 image format. +# +# Copyright IBM, Corp. 2010 +# +# Authors: +# Chunqiang Tang ct...@us.ibm.com +# +# This work is licensed under the terms of the GNU GPL, version 2 or later. +# See the COPYING.LIB file in the top-level directory. + +if [ $USER != root ]; then +echo This command must be run by root in order to mount tmpfs. +exit 1 +fi + +QEMU_DIR=. +QEMU_IMG=$QEMU_DIR/qemu-img +QEMU_IO=$QEMU_DIR/qemu-io + +if [ ! -e $QEMU_IMG ]; then +echo $QEMU_IMG does not exist. +exit 1; +fi + +if [ ! -e $QEMU_IO ]; then +echo $QEMU_IO does not exist. +exit 1; +fi + +DATA_DIR=/var/ramdisk +TRUTH_IMG=$DATA_DIR/truth.raw +TEST_IMG=$DATA_DIR/test.qcow2 +TEST_BASE=$DATA_DIR/zero-500M.raw +CMD_LOG=./test-qcow2.log + +parallel=100 +round=1 +fail_prob=0.1 +cancel_prob=0 +instant_qemubh=true +seed=$RANDOM$RANDOM +count=0 + +function invoke() { +echo $* $CMD_LOG +$* +ret=$? +if [ $ret -ne 0 ]; then +echo Exit with error code $ret: $* +exit $ret +fi +} + +mount | grep $DATA_DIR /dev/null +if [ $? -ne 0 ]; then +echo Create tmpfs at $DATA_DIR to store testing images. +if [ ! -e $DATA_DIR ]; then mkdir -p $DATA_DIR ; fi +invoke mount -t tmpfs none $DATA_DIR -o size=4G +if [ $? -ne 0 ]; then exit 1; fi +fi + +/bin/rm -f $CMD_LOG $DATA_DIR/* +touch $CMD_LOG + +while [ -t ]; do +for cache in none writethrough writeback; do +for cluster_size in 65536 ; do +for io_size in 1048576 ; do +count=$[$count + 1] +echo Round $count $CMD_LOG + +# QCOW2 image is about 1G +img_size=$[(1073741824 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 512] + +# base image is about 500MB +base_size=$[(536870912 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 512] + +invoke /bin/rm -rf $TRUTH_IMG $TEST_IMG $TEST_BASE +invoke $QEMU_IO --auto --create=$TEST_BASE --seed=$seed --block_size=1048576 --empty_block_prob=0 --empty_block_chain=1 --file_size=$base_size +invoke cp --sparse=always $TEST_BASE $TRUTH_IMG +invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size +invoke $QEMU_IMG create -f qcow2 -ocluster_size=$cluster_size,backing_fmt=blksim -b $TEST_BASE $TEST_IMG $img_size + +invoke $QEMU_IO --auto --cache=$cache --seed=$seed --truth=$TRUTH_IMG --format=qcow2 --test=blksim:$TEST_IMG --verify_write=true --compare_before=false --compare_after=true --round=$round --parallel=$parallel --io_size=$io_size --fail_prob=$fail_prob --cancel_prob=$cancel_prob --instant_qemubh=$instant_qemubh + +seed=$[$seed + 1] +done; done; done; done -- 1.7.0.4
[Qemu-devel] [PATCH 26/26] FVD: add fully automated test-fvd.sh
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. test-fvd.sh drives 'qemu-io --auto' to perform fully automated testing for FVD. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- test-fvd.sh | 161 +++ 1 files changed, 161 insertions(+), 0 deletions(-) create mode 100755 test-fvd.sh diff --git a/test-fvd.sh b/test-fvd.sh new file mode 100755 index 000..3d67c3f --- /dev/null +++ b/test-fvd.sh @@ -0,0 +1,161 @@ +#!/bin/bash + +# Drive 'qemu-io --auto' to test the FVD image format. +# +# Copyright IBM, Corp. 2010 +# +# Authors: +# Chunqiang Tang ct...@us.ibm.com +# +# This work is licensed under the terms of the GNU GPL, version 2 or later. +# See the COPYING.LIB file in the top-level directory. + +if [ $USER != root ]; then +echo This command must be run by root in order to mount tmpfs. +exit 1 +fi + +QEMU_DIR=. +QEMU_IMG=$QEMU_DIR/qemu-img +QEMU_IO=$QEMU_DIR/qemu-io + +if [ ! -e $QEMU_IMG ]; then +echo $QEMU_IMG does not exist. +exit 1; +fi + +if [ ! -e $QEMU_IO ]; then +echo $QEMU_IO does not exist. +exit 1; +fi + +DATA_DIR=/var/ramdisk +TRUTH_IMG=$DATA_DIR/truth.raw +TEST_IMG=$DATA_DIR/test.fvd +TEST_BASE=$DATA_DIR/zero-500M.raw +TEST_IMG_DATA=$DATA_DIR/test.dat +CMD_LOG=./test-fvd.log + +G1=1073741824 +MAX_MEM=536870912 +MAX_ROUND=100 +MAX_IO_SIZE=1 +fail_prob=0.1 +cancel_prob=0.1 +flush_prob_base=0.05 +aio_flush_prob_base=0.1 +seed=$RANDOM$RANDOM +count=0 + +function invoke() { +echo $* $CMD_LOG +sync +$* +ret=$? +if [ $ret -ne 0 ]; then +echo $Exit with error code $ret: $* +exit $ret +fi +} + +mount | grep $DATA_DIR /dev/null +if [ $? -ne 0 ]; then +echo Create tmpfs at $DATA_DIR to store testing images. +if [ ! -e $DATA_DIR ]; then mkdir -p $DATA_DIR ; fi +invoke mount -t tmpfs none $DATA_DIR -o size=4G +if [ $? -ne 0 ]; then exit 1; fi +fi + +/bin/rm -f $CMD_LOG $DATA_DIR/* +touch $CMD_LOG + +while [ -t ]; do +for block_size in 7680 512 1024 15872 65536 65024 1048576 1048064; do +for chunk_mult in 5 1 2 3 7 9 12 16 33 99 ; do +for cache in writeback writethrough ; do +#for compact_image in on off ; do +for compact_image in on ; do +for prefetch_delay in 1 0; do +for copy_on_read in on off; do +for base_img in -b $TEST_BASE ; do +chunk_size=$[$block_size * $chunk_mult] +large_io_size=$[$chunk_size * 5] +if [ $large_io_size -gt $MAX_IO_SIZE ]; then large_io_size=$MAX_IO_SIZE; fi +for io_size in $large_io_size 1048576 ; do +for use_data_file in data_file=$TEST_IMG_DATA, ; do + +if [ cache == writethrough ]; then +JOURNAL_BUF_SIZE=0 +JOURNAL_CLEAN_BUF_PERIOD=0 +else +JOURNAL_BUF_SIZE=512 1024 65536 +JOURNAL_CLEAN_BUF_PERIOD=5000 1000 6 +fi + +for journal_buf_size in $JOURNAL_BUF_SIZE ; do +for journal_clean_buf_period in $JOURNAL_CLEAN_BUF_PERIOD ; do +/bin/rm -rf /tmp/fvd.log* + +# FVD image is about 1G +img_size=$[(1073741824 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 512] + +# base image is about 500MB +base_size=$[(536870912 + ($RANDOM$RANDOM$RANDOM % 104857600)) / 512 * 512] + +count=$[$count + 1] +echo Round $count $CMD_LOG + +invoke /bin/rm -rf $TRUTH_IMG $TEST_IMG $TEST_BASE $TEST_IMG_DATA + +if [ -z $base_img ]; then +# Use zero-filled empty images. +invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size +else +# Use images with random contents. +invoke $QEMU_IO --auto --create=$TEST_BASE --seed=$seed --block_size=$block_size --empty_block_prob=0.2 --empty_block_chain=10 --file_size=$base_size +invoke cp --sparse=always $TEST_BASE $TRUTH_IMG +invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size +fi + +if [ ! -z $use_data_file ]; then invoke touch $TEST_IMG_DATA; fi + +# Ensure the journal is large enough to hold at least one write. +mixed_records_per_journal_sector=119 +if [ cache == writethrough ]; then +journal_size_factor=1000 +else +journal_size_factor=100 +fi +journal_size=$[$io_size / $chunk_size ) + 1 ) / $mixed_records_per_journal_sector ) + 1) * 512 * (1 + $RANDOM$RANDOM % $journal_size_factor) ] + +invoke $QEMU_IMG create -f fvd $base_img -ojournal_buf_size=$journal_buf_size,journal_clean_buf_period=$journal_clean_buf_period,${use_data_file}data_file_fmt=blksim,backing_fmt=blksim,compact_image=$compact_image,copy_on_read=$copy_on_read,block_size=$block_size,chunk_size=$chunk_size,journal_size=$journal_size,prefetch_start_delay=$prefetch_delay $TEST_IMG $img_size +invoke $QEMU_IMG update -oinit_data_region=on $TEST_IMG +if [ $prefetch_delay -eq 1 ]; then
[Qemu-devel] [PATCH 21/26] FVD: add impl of interface bdrv_close()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_close() interface. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-misc.c | 78 ++ 1 files changed, 78 insertions(+), 0 deletions(-) diff --git a/block/fvd-misc.c b/block/fvd-misc.c index c515d74..63ed168 100644 --- a/block/fvd-misc.c +++ b/block/fvd-misc.c @@ -81,6 +81,84 @@ static void fvd_aio_cancel(BlockDriverAIOCB * blockacb) static void fvd_close(BlockDriverState * bs) { +BDRVFvdState *s = bs-opaque; +FvdAIOCB *acb; +int i; + +if (s-prefetch_state == PREFETCH_STATE_RUNNING) { +s-prefetch_state = PREFETCH_STATE_DISABLED; +} +if (s-prefetch_timer) { +qemu_del_timer(s-prefetch_timer); +qemu_free_timer(s-prefetch_timer); +s-prefetch_timer = NULL; +} + +if (s-prefetch_acb) { +/* Clean up prefetch operations. */ +for (i = 0; i s-num_prefetch_slots; i++) { +if (s-prefetch_acb[i] != NULL) { +fvd_aio_cancel_copy(s-prefetch_acb[i]); +s-prefetch_acb[i] = NULL; +} +} +my_qemu_free(s-prefetch_acb); +s-prefetch_acb = NULL; +} + +if (s-use_bjnl) { +/* Clean up buffered journal update. */ +bjnl_sync_flush(bs); +if (s-bjnl.timer_scheduled) { +qemu_del_timer(s-bjnl.clean_buf_timer); +} +qemu_free_timer(s-bjnl.clean_buf_timer); +} + +/* Clean up unfinished copy_on_read operations. */ +QLIST_FOREACH(acb, s-copy_locks, copy_lock.next) { +fvd_aio_cancel_copy(acb); +} + +flush_metadata_to_disk_on_exit(bs); + +if (s-stale_bitmap) { +my_qemu_vfree(s-stale_bitmap); +if (s-fresh_bitmap != s-stale_bitmap) { +my_qemu_vfree(s-fresh_bitmap); +} +s-stale_bitmap = NULL; +s-fresh_bitmap = NULL; +} + +if (s-table) { +my_qemu_vfree(s-table); +s-table = NULL; +} + +if (s-fvd_metadata) { +if (s-fvd_metadata != s-fvd_data) { +bdrv_delete(s-fvd_metadata); +} +s-fvd_metadata = NULL; +} +if (s-fvd_data) { +bdrv_delete(s-fvd_data); +s-fvd_data = NULL; +} + +if (s-add_storage_cmd) { +my_qemu_free(s-add_storage_cmd); +s-add_storage_cmd = NULL; +} + +if (s-leaked_chunks) { +my_qemu_free(s-leaked_chunks); +s-leaked_chunks = NULL; +} +#ifdef FVD_DEBUG +dump_resource_summary(s); +#endif } static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename) -- 1.7.0.4
[Qemu-devel] [PATCH 22/26] FVD: add impl of interface bdrv_update()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_update() interface. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-update.c | 274 +++- 1 files changed, 272 insertions(+), 2 deletions(-) diff --git a/block/fvd-update.c b/block/fvd-update.c index 2498618..4ef4969 100644 --- a/block/fvd-update.c +++ b/block/fvd-update.c @@ -1,5 +1,5 @@ /* - * QEMU Fast Virtual Disk Format bdrv_update + * QEMU Fast Virtual Disk Format Misc Functions of BlockDriver Interface * * Copyright IBM, Corp. 2010 * @@ -13,9 +13,279 @@ static int fvd_update(BlockDriverState * bs, QEMUOptionParameter * options) { -return -ENOTSUP; +BDRVFvdState *s = bs-opaque; +FvdHeader header; +int ret; + +read_fvd_header(s, header); + +while (options options-name) { +if (!strcmp(options-name, BLOCK_OPT_SIZE)) { +if (header.table_offset 0) { +fprintf(stderr, Cannot resize a compact FVD image.\n); +return -EINVAL; +} +if (options-value.n header.virtual_disk_size) { +printf(Warning: image's new size % PRId64 +is smaller than the original size % PRId64 + . Some image data will be truncated.\n, + options-value.n, header.virtual_disk_size); +} +header.virtual_disk_size = options-value.n; +printf(Image resized to % PRId64 bytes.\n, options-value.n); +} else if (!strcmp(options-name, BLOCK_OPT_BACKING_FILE)) { +if (strlen(options-value.s) 1023) { +fprintf(stderr, Error: the new base image name is longer +than 1023, which is not allowed.\n); +return -EINVAL; +} +memset(header.base_img, 0, 1024); +pstrcpy(header.base_img, 1024, options-value.s); +printf(Backing file updated to '%s'.\n, options-value.s); +} else if (!strcmp(options-name, data_file)) { +if (strlen(options-value.s) 1023) { +fprintf(stderr, Error: the new data file name is longer +than 1023, which is not allowed.\n); +return -EINVAL; +} + +memset(header.data_file, 0, 1024); +pstrcpy(header.data_file, 1024, options-value.s); +printf(Data file updated to '%s'.\n, options-value.s); +} else if (!strcmp(options-name, need_zero_init)) { +header.need_zero_init = options-value.n; +if (header.need_zero_init) { +printf(need_zero_init is turned on.\n); +} else { +printf(need_zero_init is turned off.\n); +} +} else if (!strcmp(options-name, copy_on_read)) { +header.copy_on_read = options-value.n; +if (header.copy_on_read) { +printf(Copy on read is enabled for this disk.\n); +} else { +printf(Copy on read is disabled for this disk.\n); +} +} else if (!strcmp(options-name, clean_shutdown)) { +header.clean_shutdown = options-value.n; +if (header.clean_shutdown) { +printf(clean_shutdown is manually set to true\n); +} else { +printf(clean_shutdown is manually set to false\n); +} +} else if (!strcmp(options-name, journal_buf_size)) { +header.journal_buf_size = options-value.n; +printf(journal_buf_size is updated to %PRIu64 bytes.\n, + header.journal_buf_size); +} else if (!strcmp(options-name, journal_clean_buf_period)) { +header.journal_clean_buf_period = options-value.n; +printf(journal_clean_buf_period is updated to %PRIu64 +milliseconds.\n, + header.journal_clean_buf_period); +} else if (!strcmp(options-name,max_outstanding_copy_on_read_data)) { +header.max_outstanding_copy_on_read_data = options-value.n; +if (header.max_outstanding_copy_on_read_data = 0) { +fprintf(stderr, Error: max_outstanding_copy_on_read_data +must be positive.\n); +return -EINVAL; +} +printf(max_outstanding_copy_on_read_data updated to % PRId64 + .\n, header.max_outstanding_copy_on_read_data); +} else if (!strcmp(options-name, init_data_region)) { +if (options-value.n !s-data_region_prepared) { +init_data_region(s); +} +} else if (!strcmp(options-name, prefetch_start_delay)) { +if (options-value.n = 0) { +header.prefetch_start_delay = -1; +} else { +header.prefetch_start_delay =
[Qemu-devel] [PATCH 15/26] FVD: add basic journal functionality
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the basic journal functionality to FVD. The journal provides several benefits. First, updating both the bitmap and the lookup table requires only a single write to journal. Second, K concurrent updates to any potions of the bitmap or the lookup table are converted to K sequential writes in the journal, which can be merged into a single write by the host Linux kernel. Third, it increases concurrency by avoiding locking the bitmap or the lookup table. For example, updating one bit in the bitmap requires writing a 512-byte sector to the on-disk bitmap. This bitmap sector covers a total of 512*8*64K=256MB data, and any two writes to that same bitmap sector cannot proceed concurrently. The journal solves this problem and eliminates unnecessary locking. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block.c |2 +- block/fvd-bitmap.c | 57 block/fvd-journal-buf.c | 34 ++ block/fvd-journal.c | 814 ++- block/fvd-write.c |1 + block/fvd.c | 19 ++ 6 files changed, 920 insertions(+), 7 deletions(-) create mode 100644 block/fvd-journal-buf.c diff --git a/block.c b/block.c index f7d91a2..8b3083d 100644 --- a/block.c +++ b/block.c @@ -58,7 +58,7 @@ static int bdrv_read_em(BlockDriverState *bs, int64_t sector_num, static int bdrv_write_em(BlockDriverState *bs, int64_t sector_num, const uint8_t *buf, int nb_sectors); -static QTAILQ_HEAD(, BlockDriverState) bdrv_states = +QTAILQ_HEAD(, BlockDriverState) bdrv_states = QTAILQ_HEAD_INITIALIZER(bdrv_states); static QLIST_HEAD(, BlockDriver) bdrv_drivers = diff --git a/block/fvd-bitmap.c b/block/fvd-bitmap.c index 30e4a4b..06d7912 100644 --- a/block/fvd-bitmap.c +++ b/block/fvd-bitmap.c @@ -66,6 +66,63 @@ static inline void update_fresh_bitmap(int64_t sector_num, int nb_sectors, } } +static void update_stale_bitmap(BDRVFvdState * s, int64_t sector_num, +int nb_sectors) +{ +if (sector_num = s-base_img_sectors) { +return; +} + +int64_t end = sector_num + nb_sectors; +if (end s-base_img_sectors) { +end = s-base_img_sectors; +} + +int64_t block_num = sector_num / s-block_size; +const int64_t block_end = (end - 1) / s-block_size; + +for (; block_num = block_end; block_num++) { +int64_t bitmap_byte_offset = block_num / 8; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t mask = (uint8_t) (0x01 bitmap_bit_offset); +uint8_t b = s-stale_bitmap[bitmap_byte_offset]; +if (!(b mask)) { +ASSERT(s-stale_bitmap == s-fresh_bitmap || + (s-fresh_bitmap[bitmap_byte_offset] mask)); +b |= mask; +s-stale_bitmap[bitmap_byte_offset] = b; +} +} +} + +static void update_both_bitmaps(BDRVFvdState * s, int64_t sector_num, +int nb_sectors) +{ +if (sector_num = s-base_img_sectors) { +return; +} + +int64_t end = sector_num + nb_sectors; +if (end s-base_img_sectors) { +end = s-base_img_sectors; +} + +int64_t block_num = sector_num / s-block_size; +const int64_t block_end = (end - 1) / s-block_size; + +for (; block_num = block_end; block_num++) { +int64_t bitmap_byte_offset = block_num / 8; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t mask = (uint8_t) (0x01 bitmap_bit_offset); +uint8_t b = s-fresh_bitmap[bitmap_byte_offset]; +if (!(b mask)) { +b |= mask; +s-fresh_bitmap[bitmap_byte_offset] = +s-stale_bitmap[bitmap_byte_offset] = b; +} +} +} + static inline bool bitmap_show_sector_in_base_img(int64_t sector_num, const BDRVFvdState * s, int bitmap_offset, diff --git a/block/fvd-journal-buf.c b/block/fvd-journal-buf.c new file mode 100644 index 000..3efdd47 --- /dev/null +++ b/block/fvd-journal-buf.c @@ -0,0 +1,34 @@ +/* + * QEMU Fast Virtual Disk Format Metadata Journal + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +/*= + * There are two different ways of writing metadata changes to the journal: + * immediate write or buffered write. If cache=writethrough, metadata changes + * are written to the journal immediately. If cache!=writethrough, metadata + * changes are buffered in memory and later written to the journal either + * triggered by bdrv_aio_flush() or by a timeout. This module implements the
[Qemu-devel] [PATCH 14/26] FVD: add impl of loading data from compact image
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the implementation of load data from a compact image. This capability is to support fvd_aio_readv() when FVD is configured to use its one-level lookup table to do storage allocation. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-load.c | 448 + block/fvd-utils.c | 40 + 2 files changed, 488 insertions(+), 0 deletions(-) diff --git a/block/fvd-load.c b/block/fvd-load.c index 80ab32c..88e5fb4 100644 --- a/block/fvd-load.c +++ b/block/fvd-load.c @@ -11,10 +11,458 @@ * */ +static void load_data_from_compact_image_cb(void *opaque, int ret); +static BlockDriverAIOCB *load_data_from_compact_image(FvdAIOCB *parent_acb, +BlockDriverState * bs, int64_t sector_num, +QEMUIOVector * qiov, int nb_sectors, +BlockDriverCompletionFunc * cb, void *opaque); +static inline FvdAIOCB *init_load_acb(FvdAIOCB * parent_acb, +BlockDriverState * bs, int64_t sector_num, +QEMUIOVector * orig_qiov, int nb_sectors, +BlockDriverCompletionFunc * cb, void *opaque); +static int load_create_child_requests(bool count_only, BDRVFvdState *s, +QEMUIOVector * orig_qiov, int64_t sector_num, +int nb_sectors, int *p_nziov, int *p_niov, int *p_nqiov, +FvdAIOCB *acb, QEMUIOVector *q, struct iovec *v); + static inline BlockDriverAIOCB *load_data(FvdAIOCB * parent_acb, BlockDriverState * bs, int64_t sector_num, QEMUIOVector * orig_qiov, int nb_sectors, BlockDriverCompletionFunc * cb, void *opaque) { +BDRVFvdState *s = bs-opaque; + +if (!s-table) { +/* Load directly since it is not a compact image. */ +return bdrv_aio_readv(s-fvd_data, s-data_offset + sector_num, + orig_qiov, nb_sectors, cb, opaque); +} else { +return load_data_from_compact_image(parent_acb, bs, sector_num, +orig_qiov, nb_sectors, cb, opaque); +} +} + +static BlockDriverAIOCB *load_data_from_compact_image(FvdAIOCB * parent_acb, +BlockDriverState * bs, int64_t sector_num, +QEMUIOVector * orig_qiov, int nb_sectors, +BlockDriverCompletionFunc * cb, void *opaque) +{ +BDRVFvdState *s = bs-opaque; +FvdAIOCB * acb; +int64_t start_sec = -1; +int nziov = 0; +int nqiov = 0; +int niov = 0; +int i; + +/* Count the number of qiov and iov needed to cover the continuous regions + * of the compact image. */ +load_create_child_requests(true/*count_only*/, s, orig_qiov, sector_num, + nb_sectors, nziov, niov, nqiov, NULL, NULL, NULL); + +if (nqiov + nziov == 1) { +/* All data can be read in one qiov. Reuse orig_qiov. */ +if (nziov == 1) { +/* This is a zero-filled region. */ +for (i = 0; i orig_qiov-niov; i++) { +memset(orig_qiov-iov[i].iov_base, + 0, orig_qiov-iov[i].iov_len); +} + +/* Use a bh to invoke the callback. */ +if (!(acb = my_qemu_aio_get(fvd_aio_pool, bs, cb, opaque))) { +return NULL; +} +COPY_UUID(acb, parent_acb); +QDEBUG(LOAD: acb%llu-%p load_fill_all_with_zeros\n, + acb-uuid, acb); +acb-type = OP_WRAPPER; +acb-cancel_in_progress = false; +acb-wrapper.bh = qemu_bh_new(aio_wrapper_bh, acb); +qemu_bh_schedule(acb-wrapper.bh); +return acb-common; +} else { +/* A non-empty region. */ +const uint32_t first_chunk = sector_num / s-chunk_size; +start_sec = READ_TABLE(s-table[first_chunk]) * s-chunk_size + +(sector_num % s-chunk_size); +if (parent_acb) { +QDEBUG(LOAD: acb%llu-%p + load_directly_as_one_continuous_region\n, + parent_acb-uuid, parent_acb); +} +return bdrv_aio_readv(s-fvd_data, s-data_offset + start_sec, + orig_qiov, nb_sectors, cb, opaque); +} +} + +/* Need to submit multiple requests to the lower layer. Initialize acb. */ +if (!(acb = init_load_acb(parent_acb, bs, sector_num, orig_qiov, + nb_sectors, cb, opaque))) { +return NULL; +} +acb-load.num_children = nqiov; + +/* Allocate memory and create multiple requests. */ +acb-load.children = my_qemu_malloc((sizeof(CompactChildCB) + + sizeof(QEMUIOVector)) * nqiov + +
[Qemu-devel] [PATCH 13/26] FVD: add impl of storing data in compact image
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the implementation of storing data in a compact image. This capability is needed for both copy-on-write (see fvd_aio_writev()) and copy-on-read (see fvd_aio_readv()). Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-store.c | 459 + block/fvd-utils.c | 65 2 files changed, 524 insertions(+), 0 deletions(-) diff --git a/block/fvd-store.c b/block/fvd-store.c index 85e45d4..fe670eb 100644 --- a/block/fvd-store.c +++ b/block/fvd-store.c @@ -11,10 +11,469 @@ * */ +static uint32_t allocate_chunk(BlockDriverState * bs); +static inline FvdAIOCB *init_store_acb(int soft_write, +QEMUIOVector * orig_qiov, BlockDriverState * bs, +int64_t sector_num, int nb_sectors, FvdAIOCB * parent_acb, +BlockDriverCompletionFunc * cb, void *opaque); +static BlockDriverAIOCB *store_data_in_compact_image(int soft_write, +struct FvdAIOCB *parent_acb, BlockDriverState * bs, +int64_t sector_num, QEMUIOVector * qiov, int nb_sectors, +BlockDriverCompletionFunc * cb, void *opaque); +static void store_data_in_compact_image_cb(void *opaque, int ret); + static inline BlockDriverAIOCB *store_data(int soft_write, FvdAIOCB * parent_acb, BlockDriverState * bs, int64_t sector_num, QEMUIOVector * orig_qiov, int nb_sectors, BlockDriverCompletionFunc * cb, void *opaque) { +BDRVFvdState *s = bs-opaque; + +TRACE_STORE_IN_FVD(store_data, sector_num, nb_sectors); + +if (!s-table) { +/* Write directly since it is not a compact image. */ +return bdrv_aio_writev(s-fvd_data, s-data_offset + sector_num, + orig_qiov, nb_sectors, cb, opaque); +} else { +return store_data_in_compact_image(soft_write, parent_acb, bs, + sector_num, orig_qiov, nb_sectors, + cb, opaque); +} +} + +/* Store data in the compact image. The argument 'soft_write' means + * the store was caused by copy-on-read or prefetching, which need not + * update metadata immediately. */ +static BlockDriverAIOCB *store_data_in_compact_image(int soft_write, + FvdAIOCB * parent_acb, + BlockDriverState * bs, + int64_t sector_num, + QEMUIOVector * orig_qiov, + const int nb_sectors, + BlockDriverCompletionFunc + * cb, void *opaque) +{ +BDRVFvdState *s = bs-opaque; +FvdAIOCB *acb; +const uint32_t first_chunk = sector_num / s-chunk_size; +const uint32_t last_chunk = (sector_num + nb_sectors - 1) / s-chunk_size; +int table_dirty = false; +uint32_t chunk; +int64_t start_sec; + +/* Check if storag space is allocated. */ +for (chunk = first_chunk; chunk = last_chunk; chunk++) { +if (IS_EMPTY(s-table[chunk])) { +uint32_t id = allocate_chunk(bs); +if (IS_EMPTY(id)) { +return NULL; +} +QDEBUG (STORE: map chunk %u to %u\n, chunk, id); +id |= DIRTY_TABLE; +WRITE_TABLE(s-table[chunk], id); +table_dirty = true; +} else if (IS_DIRTY(s-table[chunk])) { +/* This is possible in several cases. 1) If a previous soft-write + * allocated the storage space but did not flush the table entry + * change to the journal and hence did not clean the dirty bit. 2) + * This is possible if a previous hard-write was canceled before + * it could write the table entry to disk. 3) Finally, this is + * also possible with two concurrent hard-writes. The first + * hard-write allocated the storage space but has not flushed the + * table entry change to the journal yet and hence the table entry + * remains dirty. In this case, the second hard-write will also + * try to flush this dirty table entry to the journal. The outcome + * is correct since they store the same metadata change in the + * journal (although twice). For this race condition, we prefer to + * have two writes to the journal rather than introducing a + * locking mechanism, because this happens rarely and those two + * writes to the journal are likely to be merged by the kernel + * into a single write since they are likely to update + * back-to-back sectors in the journal. A locking
[Qemu-devel] FVD latest patches with your review comments addressed
Hi Andreas, Anthony, Stefan H., and Stefan W., I just posed the latest series of FVD patches to the mailing list, which addressed the review comments you previously made on FVD . Thank you for the feedback. Off the mailing list, Stefan Weil provided guidance on porting FVD to win32 and also sent me some patches, which have also been incorporated - many thanks. This new release addressed the following review comments. - Formatting issues like white space and empty lines are fixed. - Code style is made consistent with what is described in CODING_STYLE. - Non-portable header files are removed. - Non-portable code is rewritten. - File header is fixed with copyright and license information, and now includes more descriptive information. - 'qemu-img update' is modified to use QEMUOptionParameter, like that in qemu-img create. - Make FVD's testing tools part of qemu-io. - Patches are broken into smaller, coherent pieces. - No patch breaks the build or bisect. - FVD has been ported to win32 on Cygwin, 32-bit Linux on i686, and 64 bit Linux on x86_64. Your further comments and feedback are more than welcome. Thanks. Regards, ChunQiang (CQ) Tang Homepage: http://www.research.ibm.com/people/c/ctang
[Qemu-devel] [PATCH 07/26] FVD: extend FVD header fvd.h to be more complete
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch makes FVD's header file fvd.h more complete, by adding type definition for BDRVFvdState, FvdAIOCB, etc. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd.h | 337 +++ 1 files changed, 337 insertions(+), 0 deletions(-) diff --git a/block/fvd.h b/block/fvd.h index f2da330..b83b7aa 100644 --- a/block/fvd.h +++ b/block/fvd.h @@ -168,4 +168,341 @@ typedef struct __attribute__ ((__packed__)) FvdHeader { } FvdHeader; typedef struct BDRVFvdState { +BlockDriverState *fvd_metadata; +BlockDriverState *fvd_data; +uint64_t virtual_disk_size; /*in bytes. */ +uint64_t bitmap_offset; /* in sectors */ +uint64_t bitmap_size;/* in bytes. */ +uint64_t data_offset;/* in sectors. Begin of real data. */ +uint64_t base_img_sectors; +uint64_t block_size; /* in sectors. */ +bool copy_on_read; +uint64_t max_outstanding_copy_on_read_data;/* in bytes. */ +uint64_t outstanding_copy_on_read_data;/* in bytes. */ +bool data_region_prepared; +QLIST_HEAD(WriteLocks, FvdAIOCB) write_locks; /* All writes. */ +QLIST_HEAD(CopyLocks, FvdAIOCB) copy_locks; /* copy-on-read and CoW. */ + +/* Keep two copies of bitmap to reduce the overhead of updating the + * on-disk bitmap, i.e., copy-on-read and prefetching do not update the + * on-disk bitmap. See Section 3.3.4 of the FVD-cow paper. */ +uint8_t *fresh_bitmap; +uint8_t *stale_bitmap; + +/ Begin: for compact image. */ +uint32_t *table;/* Mapping table stored in memory in little endian. */ +uint64_t table_size;/* in bytes. */ +uint64_t used_storage;/* in sectors. */ +uint64_t avail_storage;/* in sectors. */ +uint64_t chunk_size; /* in sectors. */ +uint64_t storage_grow_unit; /* in sectors. */ +uint64_t table_offset;/* in sectors. */ +char *add_storage_cmd; +uint32_t *leaked_chunks; +uint32_t num_leaked_chunks; +uint32_t next_avail_leaked_chunk; +uint32_t chunks_relocated;/* Affect bdrv_has_zero_init(). */ +/ Begin: for compact image. */ + +/ Begin: for journal. ***/ +uint64_t journal_offset; /* in sectors. */ +uint64_t journal_size; /* in sectors. */ +uint64_t journal_epoch; +uint64_t next_journal_sector; /* in sector. */ +bool dirty_image; +bool metadata_err_prohibit_write; + +/* There are two different ways of writing metadata changes to the + * journal. If cache=writethrough, metadata changes are written to the + * journal immediately. If (cache!=writethrough||IN_QEMU_TOOL), metadata + * changes are buffered in memory (bjnl.journal_buf below), and later + * written to the journal either triggered by bdrv_aio_flush() or by a + * timeout (bjnl.clean_buf_timer below). */ +bool use_bjnl; /* 'bjnl' stands for buffered journal update. */ +union { +/* 'ujnl' stands for unbuffered journal update. */ +struct { +int active_writes; +/* Journal writes waiting for journal recycle to finish. + * See JournalCB.ujnl_next_wait4_recycle. */ +QLIST_HEAD(JournalRecycle, FvdAIOCB) wait4_recycle; +} ujnl; + +/* 'bjnl' stands for buffered journal update. */ +struct { +uint8_t *buf; +size_t buf_size; +size_t def_buf_size; +size_t buf_used; +bool buf_contains_bitmap_update; +QEMUTimer *clean_buf_timer; +bool timer_scheduled; +uint64_t clean_buf_period; +/* See JournalCB.bjnl_next_queued_buf. */ +QTAILQ_HEAD(CleanBuf, FvdAIOCB) queued_bufs; +} bjnl; +}; +/ End: for journal. / + +/ Begin: for prefetching. ***/ +struct FvdAIOCB **prefetch_acb; +int prefetch_state;/* PREFETCH_STATE_RUNNING, FINISHED, or DISABLED. */ +int num_prefetch_slots; +int num_filled_prefetch_slots; +int next_prefetch_read_slot; +bool prefetch_read_active; +bool pause_prefetch_requested; +int64_t prefetch_start_delay; /* in seconds */ +uint64_t unclaimed_prefetch_region_start; +uint64_t prefetch_read_time; /* in milliseconds. */ +uint64_t prefetch_write_time;/* in milliseconds. */ +uint64_t prefetch_data_read; /* in bytes. */ +uint64_t prefetch_data_written; /* in bytes. */ +double prefetch_read_throughput; /* in bytes/millisecond. */ +double
[Qemu-devel] [PATCH 25/26] FVD: add impl of interface bdrv_probe()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_probe() interface. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-misc.c |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/block/fvd-misc.c b/block/fvd-misc.c index 61e39bb..6315218 100644 --- a/block/fvd-misc.c +++ b/block/fvd-misc.c @@ -163,7 +163,14 @@ static void fvd_close(BlockDriverState * bs) static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename) { -return 0; +const FvdHeader *header = (const void *)buf; + +if (buf_size = sizeof(uint32_t) +le32_to_cpu(header-magic) == FVD_MAGIC) { +return 100; +} else { +return 0; +} } static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num, -- 1.7.0.4
[Qemu-devel] [PATCH 05/26] FVD: add the 'qemu-img update' command
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the 'update' command to qemu-img. It is a general interface that allows various image format specific manipulations. For example, 'qemu-img rebase' and 'qemu-img resize' can be considered as two special cases of update. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block_int.h |3 + qemu-img-cmds.hx |6 +++ qemu-img.c | 125 +++--- qemu-option.c| 79 ++ qemu-option.h|4 ++ 5 files changed, 201 insertions(+), 16 deletions(-) diff --git a/block_int.h b/block_int.h index 545ad11..8f6b6d0 100644 --- a/block_int.h +++ b/block_int.h @@ -98,6 +98,7 @@ struct BlockDriver { int (*bdrv_snapshot_load_tmp)(BlockDriverState *bs, const char *snapshot_name); int (*bdrv_get_info)(BlockDriverState *bs, BlockDriverInfo *bdi); +int (*bdrv_update)(BlockDriverState *bs, QEMUOptionParameter *options); int (*bdrv_save_vmstate)(BlockDriverState *bs, const uint8_t *buf, int64_t pos, int size); @@ -122,6 +123,8 @@ struct BlockDriver { /* List of options for creating images, terminated by name == NULL */ QEMUOptionParameter *create_options; +/* List of options for updating images, terminated by name == NULL */ +QEMUOptionParameter *update_options; /* * Returns 0 for completed check, -errno for internal errors. diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx index 6c7176f..a7ed395 100644 --- a/qemu-img-cmds.hx +++ b/qemu-img-cmds.hx @@ -39,6 +39,12 @@ STEXI @item info [-f @var{fmt}] @var{filename} ETEXI +DEF(update, img_update, +update [-f fmt] [-o options] filename) +STEXI +@item update [-f @var{fmt}] [-o @var{options}] @var{filename} [@var{size}] +ETEXI + DEF(snapshot, img_snapshot, snapshot [-l | -a snapshot | -c snapshot | -d snapshot] filename) STEXI diff --git a/qemu-img.c b/qemu-img.c index 7e3cc4c..215e7b9 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -179,10 +179,11 @@ static int read_password(char *buf, int buf_size) } #endif -static int print_block_option_help(const char *filename, const char *fmt) +static int print_block_option_help(const char *filename, const char *fmt, + bool create_options) { BlockDriver *drv, *proto_drv; -QEMUOptionParameter *create_options = NULL; +QEMUOptionParameter *options = NULL; /* Find driver and parse its options */ drv = bdrv_find_format(fmt); @@ -197,12 +198,15 @@ static int print_block_option_help(const char *filename, const char *fmt) return 1; } -create_options = append_option_parameters(create_options, - drv-create_options); -create_options = append_option_parameters(create_options, - proto_drv-create_options); -print_option_help(create_options); -free_option_parameters(create_options); +if (create_options) { +options = append_option_parameters(options, drv-create_options); +options = append_option_parameters(options, proto_drv-create_options); +} else { +options = append_option_parameters(options, drv-update_options); +options = append_option_parameters(options, proto_drv-update_options); +} +print_option_help(options); +free_option_parameters(options); return 0; } @@ -337,7 +341,7 @@ static int img_create(int argc, char **argv) } if (options !strcmp(options, ?)) { -ret = print_block_option_help(filename, fmt); +ret = print_block_option_help(filename, fmt, true /*create*/); goto out; } @@ -631,7 +635,7 @@ static int img_convert(int argc, char **argv) out_filename = argv[argc - 1]; if (options !strcmp(options, ?)) { -ret = print_block_option_help(out_filename, out_fmt); +ret = print_block_option_help(out_filename, out_fmt, true /*create*/); goto out; } @@ -869,7 +873,7 @@ static int img_convert(int argc, char **argv) assume that sectors which are unallocated in the input image are present in both the output's and input's base images (no need to copy them). */ -if (out_baseimg) { +if (out_baseimg || bs[bs_i]-backing_file[0]==0) { if (!bdrv_is_allocated(bs[bs_i], sector_num - bs_offset, n, n1)) { sector_num += n1; @@ -1040,11 +1044,6 @@ static int img_info(int argc, char **argv) if (bdrv_is_encrypted(bs)) { printf(encrypted: yes\n); } -if (bdrv_get_info(bs, bdi) = 0) { -if (bdi.cluster_size != 0) { -printf(cluster_size: %d\n, bdi.cluster_size); -} -}
[Qemu-devel] [PATCH 18/26] FVD: add support for base image prefetching
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds adaptive prefetching of base image to FVD. FVD supports both copy-on-write and copy-on-read of base image. Adaptive prefetching is similar to copy-on-read except that it is initiated by the FVD driver rather than triggered by the VM's read requests. FVD's prefetching is conservative in that, if it detects resource contention, it will back off and temporarily pause prefetching. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-prefetch.c | 600 +- block/fvd-read.c |1 + qemu-io-sim.c| 13 + 3 files changed, 613 insertions(+), 1 deletions(-) diff --git a/block/fvd-prefetch.c b/block/fvd-prefetch.c index 5844aa7..b8be98c 100644 --- a/block/fvd-prefetch.c +++ b/block/fvd-prefetch.c @@ -11,7 +11,605 @@ * */ +static void prefetch_read_cb(void *opaque, int ret); +static void resume_prefetch(BlockDriverState * bs); +static void do_next_prefetch_read(BlockDriverState * bs, int64_t current_time); + void fvd_init_prefetch(void *opaque) { -/* To be implemented. */ +BlockDriverState *bs = opaque; +BDRVFvdState *s = bs-opaque; +FvdAIOCB *acb; +int i; + +QDEBUG(Start prefetching\n); + +if (!s-data_region_prepared) { +init_data_region(s); +} + +s-prefetch_acb = my_qemu_malloc(sizeof(FvdAIOCB *)*s-num_prefetch_slots); + +for (i = 0; i s-num_prefetch_slots; i++) { +acb = my_qemu_aio_get(fvd_aio_pool, bs, prefetch_null_cb, NULL); +s-prefetch_acb[i] = acb; +if (!acb) { +int j; +for (j = 0; j i; j++) { +my_qemu_aio_release(s-prefetch_acb[j]); +s-prefetch_acb[j] = NULL; +} + +my_qemu_free(s-prefetch_acb); +s-prefetch_acb = NULL; +fprintf(stderr, No acb and cannot start prefetching.\n); +return; +} + +acb-type = OP_COPY; +acb-cancel_in_progress = false; +} + +s-prefetch_state = PREFETCH_STATE_RUNNING; + +for (i = 0; i s-num_prefetch_slots; i++) { +acb = s-prefetch_acb[i]; +acb-copy.buffered_sector_begin = acb-copy.buffered_sector_end = 0; +QLIST_INIT(acb-copy_lock.dependent_writes); +acb-copy_lock.next.le_prev = NULL; +acb-copy.hd_acb = NULL; +acb-sector_num = 0; +acb-nb_sectors = 0; +acb-copy.iov.iov_len = s-sectors_per_prefetch * 512; +acb-copy.buf = acb-copy.iov.iov_base = +my_qemu_blockalign(bs-backing_hd, acb-copy.iov.iov_len); +qemu_iovec_init_external(acb-copy.qiov, acb-copy.iov, 1); +} + +if (s-prefetch_timer) { +qemu_free_timer(s-prefetch_timer); +s-prefetch_timer = +qemu_new_timer(rt_clock, (QEMUTimerCB *) resume_prefetch, bs); +} + +s-pause_prefetch_requested = false; +s-unclaimed_prefetch_region_start = 0; +s-prefetch_read_throughput = -1; /* Indicate not initialized. */ +s-prefetch_write_throughput = -1; /* Indicate not initialized. */ +s-prefetch_read_time = 0; +s-prefetch_write_time = 0; +s-prefetch_data_read = 0; +s-prefetch_data_written = 0; +s-next_prefetch_read_slot = 0; +s-num_filled_prefetch_slots = 0; +s-prefetch_read_active = false; + +do_next_prefetch_read(bs, qemu_get_clock(rt_clock)); +} + +static void pause_prefetch(BDRVFvdState * s) +{ +int64_t ms = 1 + (int64_t) ((rand() / ((double)RAND_MAX)) +* s-prefetch_throttle_time); +QDEBUG(Pause prefetch for % PRId64 milliseconds\n, ms); +/* When the timer expires, it goes to resume_prefetch(). */ +qemu_mod_timer(s-prefetch_timer, qemu_get_clock(rt_clock) + ms); +} + +/* Return true if every bit of freshbitmap is set to 1. */ +static bool all_data_prefetched(BDRVFvdState *s) +{ +uint64_t n = s-base_img_sectors / s-block_size / sizeof(uint64_t) / 8; +uint64_t *p = (uint64_t*)s-fresh_bitmap; +uint64_t i; + +for (i = 0; i n; i++, p++) { +if (*p != UINT64_C(0x)) { +return false; +} +} + +uint64_t sec = n * sizeof(uint64_t) * 8 * s-block_size; +while (sec s-base_img_sectors) { +if (fresh_bitmap_show_sector_in_base_img(sec, s)) { +return false; +} +sec += s-block_size; +} + +return true; +} + +static void terminate_prefetch(BlockDriverState * bs, int final_state) +{ +BDRVFvdState *s = bs-opaque; +int i; + +ASSERT(!s-prefetch_read_active s-num_filled_prefetch_slots == 0); + +for (i = 0; i s-num_prefetch_slots; i++) { +if (s-prefetch_acb) { +my_qemu_vfree(s-prefetch_acb[i]-copy.buf); +my_qemu_aio_release(s-prefetch_acb[i]); +s-prefetch_acb[i] = NULL; +} +} +my_qemu_free(s-prefetch_acb); +s-prefetch_acb = NULL; + +
Re: [Qemu-devel] [RESENT][PATCH] HACKING: Update status of format checking
On 02/25/2011 04:20 PM, Stefan Weil wrote: This patch was already sent on 2011-01-24: Hopefully all functions with printf like arguments now use format checking. This was tested with default build configuration on linux and windows hosts (including some cross compilations), so chances are good that there remain few (if any) functions without format checking. Therefore the last comment in HACKING is no longer valid but misleading. Cc: Blue Swirl blauwir...@gmail.com Signed-off-by: Stefan Weil w...@mail.berlios.de Applied. Thanks. Regards, Anthony Liguori --- HACKING |3 --- 1 files changed, 0 insertions(+), 3 deletions(-) diff --git a/HACKING b/HACKING index 6ba9d7e..3af53fd 100644 --- a/HACKING +++ b/HACKING @@ -120,6 +120,3 @@ gcc's printf attribute directive in the prototype. This makes it so gcc's -Wformat and -Wformat-security options can do their jobs and cross-check format strings with the number and types of arguments. - -Currently many functions in QEMU are not following this rule but -patches to add the attribute would be very much appreciated.
[Qemu-devel] [PATCH 11/26] FVD: add impl of interface bdrv_aio_writev()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_aio_writev() interface. It supports copy-on-write in FVD. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-bitmap.c | 150 block/fvd-journal.c |4 + block/fvd-store.c | 20 +++ block/fvd-write.c | 468 ++- block/fvd.c |4 +- block/fvd.h |1 + 6 files changed, 645 insertions(+), 2 deletions(-) create mode 100644 block/fvd-bitmap.c create mode 100644 block/fvd-store.c diff --git a/block/fvd-bitmap.c b/block/fvd-bitmap.c new file mode 100644 index 000..7e96201 --- /dev/null +++ b/block/fvd-bitmap.c @@ -0,0 +1,150 @@ +/* + * QEMU Fast Virtual Disk Format Utility Functions for Bitmap + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static inline bool stale_bitmap_show_sector_in_base_img(int64_t sector_num, +const BDRVFvdState * s) +{ +if (sector_num = s-base_img_sectors) { +return false; +} + +int64_t block_num = sector_num / s-block_size; +int64_t bitmap_byte_offset = block_num / 8; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t b = s-stale_bitmap[bitmap_byte_offset]; +return 0 == (int)((b bitmap_bit_offset) 0x01); +} + +static inline bool fresh_bitmap_show_sector_in_base_img(int64_t sector_num, +const BDRVFvdState * s) +{ +if (sector_num = s-base_img_sectors) { +return false; +} + +int64_t block_num = sector_num / s-block_size; +int64_t bitmap_byte_offset = block_num / 8; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t b = s-fresh_bitmap[bitmap_byte_offset]; +return 0 == (int)((b bitmap_bit_offset) 0x01); +} + +static inline void update_fresh_bitmap(int64_t sector_num, int nb_sectors, + const BDRVFvdState * s) +{ +if (sector_num = s-base_img_sectors) { +return; +} + +int64_t end = sector_num + nb_sectors; +if (end s-base_img_sectors) { +end = s-base_img_sectors; +} + +int64_t block_num = sector_num / s-block_size; +int64_t block_end = (end - 1) / s-block_size; + +for (; block_num = block_end; block_num++) { +int64_t bitmap_byte_offset = block_num / 8; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t mask = (uint8_t) (0x01 bitmap_bit_offset); +uint8_t b = s-fresh_bitmap[bitmap_byte_offset]; +if (!(b mask)) { +b |= mask; +s-fresh_bitmap[bitmap_byte_offset] = b; +} +} +} + +static inline bool bitmap_show_sector_in_base_img(int64_t sector_num, + const BDRVFvdState * s, + int bitmap_offset, + uint8_t * bitmap) +{ +if (sector_num = s-base_img_sectors) { +return false; +} + +int64_t block_num = sector_num / s-block_size; +int64_t bitmap_byte_offset = block_num / 8 - bitmap_offset; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t b = bitmap[bitmap_byte_offset]; +return 0 == (int)((b bitmap_bit_offset) 0x01); +} + +static inline bool stale_bitmap_need_update(FvdAIOCB * acb) +{ +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; +int64_t end = acb-sector_num + acb-nb_sectors; + +if (end s-base_img_sectors) { +end = s-base_img_sectors; +} +int64_t block_end = (end - 1) / s-block_size; +int64_t block_num = acb-sector_num / s-block_size; + +for (; block_num = block_end; block_num++) { +int64_t bitmap_byte_offset = block_num / 8; +uint8_t bitmap_bit_offset = block_num % 8; +uint8_t mask = (uint8_t) (0x01 bitmap_bit_offset); +uint8_t b = s-stale_bitmap[bitmap_byte_offset]; +if (!(b mask)) { +return true; +} +} + +return false; +} + +/* Return true if stable_bitmap needs update. */ +static bool update_fresh_bitmap_and_check_stale_bitmap(FvdAIOCB * acb) +{ +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; + +if (acb-sector_num = s-base_img_sectors) { +return false; +} + +bool need_update = false; +int64_t end = acb-sector_num + acb-nb_sectors; + +if (end s-base_img_sectors) { +end = s-base_img_sectors; +} + +int64_t block_end = (end - 1) / s-block_size; +int64_t block_num = acb-sector_num / s-block_size; + +for (; block_num = block_end; block_num++) { +int64_t bitmap_byte_offset = block_num / 8; +
[Qemu-devel] [PATCH 01/26] FVD: add simulated block driver 'blksim'
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the 'blksim' block device driver, which is a tool to facilitate testing and debugging. blksim operates on a RAW image, but it uses neither AIO nor posix threads to perform actual I/Os. blksim function like an event-driven disk simulator, and allows a block device driver developer to fully control the order of disk I/Os, the order of callbacks, and the return code of every I/O operation. The purpose is to extensively test a block device driver under failures and race conditions. Bugs found by blksim under rare race conditions are guranteed to be precisely reproducible. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- Makefile.objs |1 + block/blksim.c | 757 block/blksim.h | 35 +++ 3 files changed, 793 insertions(+), 0 deletions(-) create mode 100644 block/blksim.c create mode 100644 block/blksim.h diff --git a/Makefile.objs b/Makefile.objs index 9e98a66..264aab3 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -23,6 +23,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o +block-nested-y += blksim.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block/blksim.c b/block/blksim.c new file mode 100644 index 000..5c7ef43 --- /dev/null +++ b/block/blksim.c @@ -0,0 +1,757 @@ +/* + * QEMU Simulated Block Device to Facilitate Testing and Debugging + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#include block_int.h +#include osdep.h +#include qemu-option.h +#include qemu-timer.h +#include block.h +#include qemu-queue.h +#include qemu-common.h +#include block/blksim.h + +#if 1 +# define QDEBUG(format,...) do {} while (0) +#else +# define QDEBUG printf +#endif + +typedef enum +{ +SIM_NULL, +SIM_READ, +SIM_WRITE, +SIM_FLUSH, +SIM_READ_CALLBACK, +SIM_WRITE_CALLBACK, +SIM_FLUSH_CALLBACK, +SIM_TIMER +} sim_op_t; + +static void sim_aio_cancel(BlockDriverAIOCB * acb); +static int64_t sim_uuid = 0; +static int64_t current_time = 0; +static int64_t rand_time = 0; +static int interactive_print = true; +static int blksim_invoked = false; +static bool instant_qemubh = true; +struct SimAIOCB; + +/* + * Note: disk_io_return_code, set_disk_io_return_code(), and insert_task() work + * together to ensure that multiple subrequests triggered by the same + * outtermost request either succeed together or fail together. This behavior + * is required by qemu-test. Here is one example of problems caused by + * departuring from this behavior. Consider a write request that generates + * two subrequests, w1 and w2. If w1 succeeds but w2 fails, the data will not + * be written into qemu-test's truth image but the part of the data handled + * by w1 will be written into qemu-test's test image. As a result, their + * contents diverge can automated testing cannot continue. + */ +static int disk_io_return_code = 0; + +typedef struct BDRVSimState +{ +int fd; +} BDRVSimState; + +typedef struct SimAIOCB +{ +BlockDriverAIOCB common; +int64_t uuid; +sim_op_t op; +int64_t sector_num; +QEMUIOVector *qiov; +int nb_sectors; +int ret; +int64_t time; +struct SimAIOCB *next; +struct SimAIOCB *prev; + +} SimAIOCB; + +static AIOPool sim_aio_pool = { +.aiocb_size = sizeof(SimAIOCB), +.cancel = sim_aio_cancel, +}; + +static SimAIOCB head = { +.uuid = -1, +.time = (int64_t) (9223372036854775807ULL), +.op = SIM_NULL, +.next = head, +.prev = head, +}; + +/* Debug a specific task.*/ +#if 0 +static inline void CHECK_TASK(int64_t uuid) +{ +if (uuid == 19LL) { +printf(CHECK_TASK pause for task % PRId64 \n, uuid); +} +} +#else +# define CHECK_TASK(acb) do { } while (0) +#endif + +/* do_io() should never fail. A failure indicates a bug in the upper layer + * block device driver, or failure in the real hardware. */ +static int do_io(BlockDriverState * bs, int64_t sector_num, uint8_t * buf, + int nb_sectors, int do_read) +{ +BDRVSimState *s = bs-opaque; +size_t size = nb_sectors * 512; +uint8_t *new_buf, *p; +int ret; + +if (interactive_print) { +printf (Do %s %s sector_num=%PRId64 nb_sectors=%d\n, +do_read ? READ : WRITE, bs-filename, +sector_num, nb_sectors); +} + +if ((ret=lseek(s-fd, sector_num * 512, SEEK_SET)) 0) { +fprintf(stderr, Error: lseek %s sector_num=%PRId64\n, +
[Qemu-devel] [PATCH 04/26] FVD: add fully automated test-vdi.sh
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. test-vdi.sh drives 'qemu-io --auto' to perform fully automated testing for VDI. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- test-vdi.sh | 83 +++ 1 files changed, 83 insertions(+), 0 deletions(-) create mode 100755 test-vdi.sh diff --git a/test-vdi.sh b/test-vdi.sh new file mode 100755 index 000..b0bfe65 --- /dev/null +++ b/test-vdi.sh @@ -0,0 +1,83 @@ +#!/bin/bash + +# Drive 'qemu-io --auto' to test the VDI image format. +# +# Copyright IBM, Corp. 2010 +# +# Authors: +# Chunqiang Tang ct...@us.ibm.com +# +# This work is licensed under the terms of the GNU GPL, version 2 or later. +# See the COPYING.LIB file in the top-level directory. + +if [ $USER != root ]; then +echo This command must be run by root in order to mount tmpfs. +exit 1 +fi + +QEMU_DIR=. +QEMU_IMG=$QEMU_DIR/qemu-img +QEMU_IO=$QEMU_DIR/qemu-io + +if [ ! -e $QEMU_IMG ]; then +echo $QEMU_IMG does not exist. +exit 1; +fi + +if [ ! -e $QEMU_IO ]; then +echo $QEMU_IO does not exist. +exit 1; +fi + +DATA_DIR=/var/ramdisk +TRUTH_IMG=$DATA_DIR/truth.raw +TEST_IMG=$DATA_DIR/test.vdi +CMD_LOG=./test-vdi.log + +parallel=10 +round=1000 +fail_prob=0.1 +cancel_prob=0 +flush_prob=0 +aio_flush_prob=0 +instant_qemubh=true +seed=$RANDOM$RANDOM +count=0 + +function invoke() { +echo $* $CMD_LOG +$* +ret=$? +if [ $? -ne 0 ]; then +echo Exit with error code $?: $* +exit $ret +fi +} + +mount | grep $DATA_DIR /dev/null +if [ $? -ne 0 ]; then +echo Create tmpfs at $DATA_DIR to store testing images. +if [ ! -e $DATA_DIR ]; then mkdir -p $DATA_DIR ; fi +invoke mount -t tmpfs none $DATA_DIR -o size=400M +if [ $? -ne 0 ]; then exit 1; fi +fi + +/bin/rm -f $CMD_LOG +touch $CMD_LOG + +while [ -t ]; do +for io_size in 3145728; do +count=$[$count + 1] +echo Round $count $CMD_LOG + +# VDI image is about 100M +img_size=$[(104857600 + ($RANDOM$RANDOM$RANDOM % 10485760)) / 512 * 512] + +invoke /bin/rm -rf $TRUTH_IMG $TEST_IMG +invoke dd if=/dev/zero of=$TRUTH_IMG count=0 bs=1 seek=$img_size +invoke $QEMU_IMG create -f vdi $TEST_IMG $img_size + +invoke $QEMU_IO --auto --seed=$seed --truth=$TRUTH_IMG --format=vdi --test=blksim:$TEST_IMG --verify_write=true --compare_before=false --compare_after=true --round=$round --parallel=$parallel --io_size=$io_size --fail_prob=$fail_prob --cancel_prob=$cancel_prob --aio_flush_prob=$aio_flush_prob --flush_prob=$flush_prob --instant_qemubh=$instant_qemubh + +seed=$[$seed + 1] +done; done -- 1.7.0.4
Re: [Qemu-devel] [patch 2/3] Add support for live block copy
On Wed, Feb 23, 2011 at 01:06:46PM -0600, Anthony Liguori wrote: On 02/22/2011 11:00 AM, Marcelo Tosatti wrote: Index: qemu/qerror.h === --- qemu.orig/qerror.h +++ qemu/qerror.h @@ -171,4 +171,13 @@ QError *qobject_to_qerror(const QObject #define QERR_VNC_SERVER_FAILED \ { 'class': 'VNCServerFailed', 'data': { 'target': %s } } +#define QERR_BLOCKCOPY_IN_PROGRESS \ +{ 'class': 'BlockCopyInProgress', 'data': { 'device': %s } } The caller already knows the device name by virtue of issuing the command so this is redundant. I think a better approach would be a QERR_IN_PROGRESS 'data': { 'operation': %s } For block copy, we'd say QERR_IN_PROGRESS(block copy). + +#define QERR_BLOCKCOPY_IMAGE_SIZE_DIFFERS \ +{ 'class': 'BlockCopyImageSizeDiffers', 'data': {} } + +#define QERR_MIGRATION_IN_PROGRESS \ +{ 'class': 'MigrationInProgress', 'data': {} } Then QERR_IN_PROGRESS(live migration) Can the error format change like that? What about applications that make use of it? If it can change, sure. (libvirt.git does not seem to be aware of MigrationInProgress). #endif /* QERROR_H */ Index: qemu/qmp-commands.hx === --- qemu.orig/qmp-commands.hx +++ qemu/qmp-commands.hx @@ -581,6 +581,75 @@ Example: EQMP { +.name = block_copy, +.args_type = device:s,filename:s,commit_filename:s?,inc:-i, +.params = device filename [commit_filename] [-i], +.help = live block copy device to image + \n\t\t\t optional commit filename + \n\t\t\t -i for incremental copy + (base image shared between src and destination), +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_bdrv_copy, +}, + +SQMP +block-copy +--- + +Live block copy. I'm not sure copy really describes what we're doing here. Maybe migrate-block? Problem its easy to confuse migrate-block with block migration. I could not come up with a better, non-confusing name than live block copy. +Arguments: + +- device: device name (json-string) +- filename: target image filename (json-string) Is this a created image? Is this an image to create? A previously created image. To future proof for blockdev, we should make this argument optional and if it's not specified throw an error about missing argument. This let's us introduce an optional blockdev argument such that we can use a blockdev name. What you mean blockdev? +- commit_filename: target commit filename (json-string, optional) I think we should drop this. Why? Sorry but this can't wait for non-config persistent storage. This mistake was made in the past with irqchip for example, lets not repeat it. Its OK to deprecate commit_filename in favour of its location in non-config persistent storage. Its not the end of the world for a mgmt app to handle change (not saying its not a good principle) such as this. +- inc: incremental disk copy (json-bool, optional) Let's use the full name (incremental) and we need to describe in detail what the semantics of this are. Will it scan the target block device to identify identical blocks? No, it does not attempt to identify identical blocks, yet. You are right, i'll write down a document to describe these details. +Example: + +- { execute: block_copy, +arguments: { device: ide0-hd1, + filename: /mnt/new-disk.img, + commit_filename: /mnt/commit-new-disk.img + } } + +- { return: {} } + +Notes: + +(1) The 'query-block-copy' command should be used to check block copy progress +and final result (this information is provided by the 'status' member) +(2) Boolean argument inc defaults to false We should also document error semantics. What errors are expected and why? Fair. +EQMP + +{ +.name = block_copy_cancel, +.args_type = device:s, +.params = device, +.help = cancel live block copy, +.user_print = monitor_user_noop, +.mhandler.cmd_new = do_bdrv_copy_cancel, +}, + +SQMP +block_copy_cancel +-- + +Cancel live block copy. + +Arguments: + +- device: device name (json-string) + +Example: + +- { execute: block_copy_cancel, arguments: { device: ide0-hd1 } } +- { return: {} } cancel-block-migration? Again, conflicts with block migration from live migration. What happens if: - No block copy is active anymore (it's completed) cancel succeeds. - A block copy was never started qerror_report(QERR_DEVICE_NOT_FOUND, device); - device refers to a device that no longer exists qerror_report(QERR_DEVICE_NOT_FOUND, device); - device refers to a device with no
[Qemu-devel] [PATCH 09/26] FVD: add impl of interface bdrv_create()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_create() interface. It supports FVD image creation. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-create.c | 702 ++- block/fvd-journal.c |5 + block/fvd.c |2 +- 3 files changed, 707 insertions(+), 2 deletions(-) diff --git a/block/fvd-create.c b/block/fvd-create.c index 5593cea..c8912aa 100644 --- a/block/fvd-create.c +++ b/block/fvd-create.c @@ -11,11 +11,711 @@ * */ +static void fvd_header_cpu_to_le(FvdHeader * header); +static inline int64_t calc_min_journal_size(int64_t table_entries); +static inline int search_empty_blocks(int fd, uint8_t * bitmap, + BlockDriverState * bs, + int64_t nb_sectors, + int32_t hole_size, + int32_t block_size); + static int fvd_create(const char *filename, QEMUOptionParameter * options) { -return -ENOTSUP; +int fd, ret = 0; +FvdHeader *header; +int64_t virtual_disk_size = DEF_PAGE_SIZE; +int32_t header_size; +const char *base_img = NULL; +const char *base_img_fmt = NULL; +const char *data_file = NULL; +const char *data_file_fmt = NULL; +int32_t hole_size = 0; +int copy_on_read = false; +int prefetch_start_delay = -1; +BlockDriverState *bs = NULL; +int bitmap_size = 0; +int64_t base_img_size = 0; +int64_t table_size = 0; +int64_t journal_size = 0; +int32_t block_size = 0; +int compact_image = false; +uint64_t max_copy_on_read = MAX_OUTSTANDING_COPY_ON_READ_DATA; +uint32_t num_prefetch_slots = NUM_PREFETCH_SLOTS; +uint64_t bytes_per_prefetch = BYTES_PER_PREFETCH; +uint64_t prefetch_throttle_time = PREFETCH_THROTTLING_TIME; +uint64_t prefetch_read_measure_time = PREFETCH_MIN_MEASURE_READ_TIME; +uint64_t prefetch_write_measure_time = PREFETCH_MIN_MEASURE_WRITE_TIME; +uint64_t prefetch_min_read_throughput = PREFETCH_MIN_READ_THROUGHPUT; +uint64_t prefetch_min_write_throughput = PREFETCH_MIN_WRITE_THROUGHPUT; +uint64_t prefetch_max_read_throughput = PREFETCH_MAX_READ_THROUGHPUT; +uint64_t prefetch_max_write_throughput = PREFETCH_MAX_WRITE_THROUGHPUT; + +header_size = sizeof(FvdHeader); +header_size = ROUND_UP(header_size, DEF_PAGE_SIZE); +header = my_qemu_mallocz(header_size); +header-header_size = header_size; + +/* Read out options */ +while (options options-name) { +if (!strcmp(options-name, BLOCK_OPT_SIZE)) { +virtual_disk_size = options-value.n; +} else if (!strcmp(options-name, prefetch_start_delay)) { +if (options-value.n = 0) { +prefetch_start_delay = -1; +} else { +prefetch_start_delay = options-value.n; +} +} else if (!strcmp(options-name, BLOCK_OPT_BACKING_FILE)) { +base_img = options-value.s; +} else if (!strcmp(options-name, BLOCK_OPT_BACKING_FMT)) { +base_img_fmt = options-value.s; +} else if (!strcmp(options-name, copy_on_read)) { +copy_on_read = options-value.n; +} else if (!strcmp(options-name, data_file)) { +data_file = options-value.s; +} else if (!strcmp(options-name, data_file_fmt)) { +data_file_fmt = options-value.s; +} else if (!strcmp(options-name, optimize_empty_block)) { +hole_size = options-value.n; +} else if (!strcmp(options-name, compact_image)) { +compact_image = options-value.n; +} else if (!strcmp(options-name, block_size)) { +block_size = options-value.n; +} else if (!strcmp(options-name, chunk_size)) { +header-chunk_size = options-value.n; +} else if (!strcmp(options-name, journal_size)) { +journal_size = options-value.n; +} else if (!strcmp(options-name, journal_buf_size)) { +header-journal_buf_size = options-value.n; +} else if (!strcmp(options-name, journal_clean_buf_period)) { +header-journal_clean_buf_period = options-value.n; +} else if (!strcmp(options-name, storage_grow_unit)) { +header-storage_grow_unit = options-value.n; +} else if (!strcmp(options-name, add_storage_cmd) + options-value.s) { +pstrcpy(header-add_storage_cmd, sizeof(header-add_storage_cmd), +options-value.s); +} else if (!strcmp(options-name, num_prefetch_slots) + options-value.n 0) { +num_prefetch_slots = options-value.n; +} else if (!strcmp(options-name, bytes_per_prefetch) + options-value.n 0) { +bytes_per_prefetch = options-value.n; +} else if
[Qemu-devel] [PATCH 19/26] FVD: add support for aio_cancel
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the support for aio_cancel into FVD. FVD faithfully cleans up all resources upon aio_cancel. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-journal-buf.c | 16 +++ block/fvd-load.c| 24 + block/fvd-misc.c| 67 +++ block/fvd-read.c| 37 ++ block/fvd-store.c | 31 + block/fvd-write.c | 23 +++- block/fvd.c | 25 + 7 files changed, 222 insertions(+), 1 deletions(-) diff --git a/block/fvd-journal-buf.c b/block/fvd-journal-buf.c index e99a585..c6b60f9 100644 --- a/block/fvd-journal-buf.c +++ b/block/fvd-journal-buf.c @@ -360,6 +360,22 @@ use_current_buf: return s-bjnl.buf; } +static void fvd_aio_cancel_bjnl_flush(FvdAIOCB * acb) +{ +BlockDriverState *bs = acb-common.bs; +BDRVFvdState *s = bs-opaque; +QTAILQ_REMOVE(s-bjnl.queued_bufs, acb, jcb.bjnl_next_queued_buf); +my_qemu_aio_release(acb); +} + +static void fvd_aio_cancel_bjnl_buf_write(FvdAIOCB * acb) +{ +/* OP_BJNL_BUF_WRITE is never exposed to any external entity, and this + * should not be invoked. Internal cancellation of OP_BJNL_BUF_WRITE + * is handled by bjnl_sync_flush(). */ +abort(); +} + static void bjnl_clean_buf_timer_cb(BlockDriverState * bs) { BDRVFvdState *s = bs-opaque; diff --git a/block/fvd-load.c b/block/fvd-load.c index 88e5fb4..9789cc5 100644 --- a/block/fvd-load.c +++ b/block/fvd-load.c @@ -188,6 +188,30 @@ static inline FvdAIOCB *init_load_acb(FvdAIOCB * parent_acb, return acb; } +static void fvd_aio_cancel_wrapper(FvdAIOCB * acb) +{ +qemu_bh_cancel(acb-wrapper.bh); +qemu_bh_delete(acb-wrapper.bh); +my_qemu_aio_release(acb); +} + +static void fvd_aio_cancel_load_compact(FvdAIOCB * acb) +{ +if (acb-load.children) { +int i; +for (i = 0; i acb-load.num_children; i++) { +if (acb-load.children[i].hd_acb) { +bdrv_aio_cancel(acb-load.children[i].hd_acb); +} +} +my_qemu_free(acb-load.children); +} +if (acb-load.one_child.hd_acb) { +bdrv_aio_cancel(acb-load.one_child.hd_acb); +} +my_qemu_aio_release(acb); +} + static inline int load_create_one_child(bool count_only, bool empty, QEMUIOVector * orig_qiov, int *iov_index, size_t *iov_left, uint8_t **iov_buf, int64_t start_sec, int sectors_in_region, diff --git a/block/fvd-misc.c b/block/fvd-misc.c index f4e1038..a42bfac 100644 --- a/block/fvd-misc.c +++ b/block/fvd-misc.c @@ -11,6 +11,73 @@ * */ +static void fvd_aio_cancel_bjnl_buf_write(FvdAIOCB * acb); +static void fvd_aio_cancel_bjnl_flush(FvdAIOCB * acb); +static void fvd_aio_cancel_read(FvdAIOCB * acb); +static void fvd_aio_cancel_write(FvdAIOCB * acb); +static void fvd_aio_cancel_copy(FvdAIOCB * acb); +static void fvd_aio_cancel_load_compact(FvdAIOCB * acb); +static void fvd_aio_cancel_store_compact(FvdAIOCB * acb); +static void fvd_aio_cancel_wrapper(FvdAIOCB * acb); +static void flush_metadata_to_disk_on_exit (BlockDriverState *bs); + +static void fvd_aio_cancel_flush(FvdAIOCB * acb) +{ +if (acb-flush.data_acb) { +bdrv_aio_cancel(acb-flush.data_acb); +} +if (acb-flush.metadata_acb) { +bdrv_aio_cancel(acb-flush.metadata_acb); +} +my_qemu_aio_release(acb); +} + +static void fvd_aio_cancel(BlockDriverAIOCB * blockacb) +{ +FvdAIOCB *acb = container_of(blockacb, FvdAIOCB, common); + +QDEBUG(CANCEL: acb%llu-%p\n, acb-uuid, acb); +acb-cancel_in_progress = true; + +switch (acb-type) { +case OP_READ: +fvd_aio_cancel_read(acb); +break; + +case OP_WRITE: +fvd_aio_cancel_write(acb); +break; + +case OP_COPY: +fvd_aio_cancel_copy(acb); +break; + +case OP_LOAD_COMPACT: +fvd_aio_cancel_load_compact(acb); +break; + +case OP_STORE_COMPACT: +fvd_aio_cancel_store_compact(acb); +break; + +case OP_WRAPPER: +fvd_aio_cancel_wrapper(acb); +break; + +case OP_FLUSH: +fvd_aio_cancel_flush(acb); +break; + +case OP_BJNL_BUF_WRITE: +fvd_aio_cancel_bjnl_buf_write(acb); +break; + +case OP_BJNL_FLUSH: +fvd_aio_cancel_bjnl_flush(acb); +break; +} +} + static void fvd_close(BlockDriverState * bs) { } diff --git a/block/fvd-read.c b/block/fvd-read.c index 675af9e..b18fdf2 100644 --- a/block/fvd-read.c +++ b/block/fvd-read.c @@ -502,3 +502,40 @@ static inline void calc_read_region(BDRVFvdState * s, int64_t sector_num, *p_first_sec_in_backing = first_sec_in_backing; *p_last_sec_in_backing = last_sec_in_backing; } + +static void fvd_aio_cancel_read(FvdAIOCB * acb) +{ +if
Re: [Qemu-devel] [PATCH] vnc: fix a memory leak in threaded vnc server
On 02/25/2011 03:54 PM, Corentin Chary wrote: VncJobQueue's buffer is intended to be used for as the output buffer for all operations in this queue, but unfortunatly. vnc_async_encoding_start() is in charge of setting this buffer as the current output buffer, but vnc_async_encoding_end() was not writting the changes back to VncJobQueue, resulting in a big and ugly memleak. Signed-off-by: Corentin Charycorenti...@iksaif.net Applied. Thanks. Regards, Anthony Liguori --- I believe this is a (slightly) better patch than Bruce's one, because it reduce memory allocations by using always the same buffer. ui/vnc-jobs-async.c |2 ++ 1 files changed, 2 insertions(+), 0 deletions(-) diff --git a/ui/vnc-jobs-async.c b/ui/vnc-jobs-async.c index 1d4c5e7..f596247 100644 --- a/ui/vnc-jobs-async.c +++ b/ui/vnc-jobs-async.c @@ -186,6 +186,8 @@ static void vnc_async_encoding_end(VncState *orig, VncState *local) orig-hextile = local-hextile; orig-zrle = local-zrle; orig-lossy_rect = local-lossy_rect; + +queue-buffer = local-output; } static int vnc_worker_thread_loop(VncJobQueue *queue)
[Qemu-devel] [PATCH 02/26] FVD: extend qemu-io to do fully automated testing
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch extends qemu-io in two ways. First, it adds the 'sim' command to work with the simulated block device driver 'blksim', which allows a developer to fully control the order of disk I/Os, the order of callbacks, and the return code of every I/O operation. Second, it adds a fully automated testing mode, 'qemu-io --auto'. This mode can, e.g., simulate 1,000 threads concurrently submitting overlapping disk I/O requests to QEMU block drivers, use blksim to inject I/O errors and race conditions, and automatically verify the correctness of I/O results. This tool can run unattended to exercise an unlimited number of randomized test cases. Once it finds a bug, the bug is precisely repeatable with the help of blksim, even if it is a rare race condition bug. This makes debugging much easier. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- qemu-io-auto.c | 947 qemu-io-sim.c | 127 qemu-io.c | 50 +++- qemu-tool.c| 107 ++- 4 files changed, 1209 insertions(+), 22 deletions(-) create mode 100644 qemu-io-auto.c create mode 100644 qemu-io-sim.c diff --git a/qemu-io-auto.c b/qemu-io-auto.c new file mode 100644 index 000..73d79c7 --- /dev/null +++ b/qemu-io-auto.c @@ -0,0 +1,947 @@ +/* + * Extension of qemu-io to perform automated random tests + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +/*= + * This module implements a fully automated testing tool for block device + * drivers. It works with block/blksim.c to test race conditions by + * randomizing event timing. It is recommended to perform automated testing + * on a ramdisk or tmpfs, which stores files in memory and avoids wearing out + * the disk. Below is one example of using qemu-io to perform a fully + * automated testing. + +# mount -t tmpfs none /var/tmpfs -o size=4G +# dd if=/dev/zero of=/var/tmpfs/truth.raw count=0 bs=1 seek=1G +# dd if=/dev/zero of=/var/tmpfs/zero-500M.raw count=0 bs=1 seek=500M +# qemu-img create -f qcow2 -obacking_fmt=blksim -b /var/tmpfs/zero-500M.raw \ +/var/tmpfs/test.qcow2 1G +# qemu-io --auto --seed=1 --truth=/var/tmpfs/truth.raw --format=qcow2 \ +--test=blksim:/var/tmpfs/test.qcow2 --verify_write=true \ +--compare_before=false --compare_after=true --round=10 \ +--parallel=1000 --io_size=10485760 --fail_prob=0 --cancel_prob=0 \ +--instant_qemubh=true + *= + */ + +#include qemu-timer.h +#include qemu-common.h +#include block_int.h +#include block/blksim.h + +#if 1 +# define QDEBUG(format,...) do {} while (0) +#else +# define QDEBUG printf +#endif + +#define die(format,...) \ +do { \ +fprintf (stderr, %s:%d --- , __FILE__, __LINE__); \ +fprintf (stderr, format, ##__VA_ARGS__); \ +abort(); \ +} while(0) + +typedef enum { OP_NULL = 0, OP_READ, OP_WRITE, OP_FLUSH, +OP_AIO_FLUSH } op_type_t; +const char *op_type_str[] = { NULL, READ, WRITE, FLUSH, AIO_FLUSH}; + +typedef struct CompareFullCB +{ +QEMUIOVector qiov; +struct iovec iov; +int64_t sector_num; +int nb_sectors; +int max_nb_sectors; +uint8_t *truth_buf; +} CompareFullCB; + +typedef struct RandomIO +{ +QEMUIOVector qiov; +int64_t sector_num; +int nb_sectors; +uint8_t *truth_buf; +uint8_t *test_buf; +op_type_t type; +int tester; +int64_t uuid; +int allow_cancel; +BlockDriverAIOCB *acb; +} RandomIO; + +static int fd; +static int64_t total_sectors; +static int64_t io_size = 262144; +static bool verify_write = false; +static int parallel = 1; +static int max_iov = 10; +static int64_t round = 10; +static int64_t finished_round = 0; +static RandomIO *testers = NULL; +static double fail_prob = 0; +static double cancel_prob = 0; +static double aio_flush_prob = 0; +static double flush_prob = 0; +static int64_t rand_time = 1000; +static int64_t test_uuid = 0; +static int finished_testers = 0; + +static void rand_io_cb(void *opaque, int ret); +static void perform_next_io(RandomIO * r); + +static void auto_test_usage(void) +{ +printf(%s --auto [--help]\n + \t[--truth=truth_img]\n + \t[--test=img_to_test]\n + \t[--seed=#d]\n + \t[--format=test_img_fmt]\n + \t[--round=#d]\n + \t[--instant_qemubh=true|false]\n + \t[--fail_prob=#f]\n + \t[--cancel_prob=#f]\n + \t[--aio_flush_prob=#f]\n + \t[--flush_prob=#f]\n + \t[--io_size=#d]\n + \t[--verify_write=[true|false]]\n + \t[--parallel=[#d]\n +
[Qemu-devel] [PATCH 06/26] FVD: skeleton of Fast Virtual Disk
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds the skeleton of the block device driver for Fast Virtual Disk (FVD). Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- Makefile.objs |2 +- block/fvd-create.c | 21 +++ block/fvd-flush.c | 24 +++ block/fvd-misc.c | 37 +++ block/fvd-open.c | 17 + block/fvd-read.c | 21 +++ block/fvd-update.c | 21 +++ block/fvd-write.c | 21 +++ block/fvd.c| 60 ++ block/fvd.h| 171 10 files changed, 394 insertions(+), 1 deletions(-) create mode 100644 block/fvd-create.c create mode 100644 block/fvd-flush.c create mode 100644 block/fvd-misc.c create mode 100644 block/fvd-open.c create mode 100644 block/fvd-read.c create mode 100644 block/fvd-update.c create mode 100644 block/fvd-write.c create mode 100644 block/fvd.c create mode 100644 block/fvd.h diff --git a/Makefile.objs b/Makefile.objs index 264aab3..9185d3e 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -23,7 +23,7 @@ block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o qcow block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o block-nested-y += qed-check.o block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o -block-nested-y += blksim.o +block-nested-y += blksim.o fvd.o block-nested-$(CONFIG_WIN32) += raw-win32.o block-nested-$(CONFIG_POSIX) += raw-posix.o block-nested-$(CONFIG_CURL) += curl.o diff --git a/block/fvd-create.c b/block/fvd-create.c new file mode 100644 index 000..5593cea --- /dev/null +++ b/block/fvd-create.c @@ -0,0 +1,21 @@ +/* + * QEMU Fast Virtual Disk Format bdrv_create() + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static int fvd_create(const char *filename, QEMUOptionParameter * options) +{ +return -ENOTSUP; +} + +static QEMUOptionParameter fvd_create_options[] = { +{NULL} +}; diff --git a/block/fvd-flush.c b/block/fvd-flush.c new file mode 100644 index 000..34bd5cb --- /dev/null +++ b/block/fvd-flush.c @@ -0,0 +1,24 @@ +/* + * QEMU Fast Virtual Disk Format bdrv_flush() and bdrv_aio_flush() + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static BlockDriverAIOCB *fvd_aio_flush(BlockDriverState * bs, + BlockDriverCompletionFunc * cb, + void *opaque) +{ +return NULL; +} + +static int fvd_flush(BlockDriverState * bs) +{ +return -ENOTSUP; +} diff --git a/block/fvd-misc.c b/block/fvd-misc.c new file mode 100644 index 000..f4e1038 --- /dev/null +++ b/block/fvd-misc.c @@ -0,0 +1,37 @@ +/* + * QEMU Fast Virtual Disk Format Misc Functions of BlockDriver Interface + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static void fvd_close(BlockDriverState * bs) +{ +} + +static int fvd_probe(const uint8_t * buf, int buf_size, const char *filename) +{ +return 0; +} + +static int fvd_is_allocated(BlockDriverState * bs, int64_t sector_num, +int nb_sectors, int *pnum) +{ +return 0; +} + +static int fvd_get_info(BlockDriverState * bs, BlockDriverInfo * bdi) +{ +return -ENOTSUP; +} + +static int fvd_has_zero_init(BlockDriverState * bs) +{ +return 0; +} diff --git a/block/fvd-open.c b/block/fvd-open.c new file mode 100644 index 000..056b994 --- /dev/null +++ b/block/fvd-open.c @@ -0,0 +1,17 @@ +/* + * QEMU Fast Virtual Disk Format bdrv_file_open() + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static int fvd_open(BlockDriverState * bs, const char *filename, int flags) +{ +return -ENOTSUP; +} diff --git a/block/fvd-read.c b/block/fvd-read.c new file mode 100644 index 000..b9f3ac9 --- /dev/null +++ b/block/fvd-read.c @@ -0,0 +1,21 @@ +/* + * QEMU Fast Virtual Disk Format bdrv_aio_readv() + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static BlockDriverAIOCB *fvd_aio_readv(BlockDriverState * bs, +
[Qemu-devel] [PATCH 12/26] FVD: add impl of interface bdrv_aio_readv()
This patch is part of the Fast Virtual Disk (FVD) proposal. See http://wiki.qemu.org/Features/FVD. This patch adds FVD's implementation of the bdrv_aio_readv() interface. It supports read and copy-on-read in FVD. Signed-off-by: Chunqiang Tang ct...@us.ibm.com --- block/fvd-bitmap.c | 88 ++ block/fvd-load.c | 20 +++ block/fvd-read.c | 484 +++- block/fvd-utils.c | 44 + block/fvd.c|2 + 5 files changed, 637 insertions(+), 1 deletions(-) create mode 100644 block/fvd-load.c create mode 100644 block/fvd-utils.c diff --git a/block/fvd-bitmap.c b/block/fvd-bitmap.c index 7e96201..30e4a4b 100644 --- a/block/fvd-bitmap.c +++ b/block/fvd-bitmap.c @@ -148,3 +148,91 @@ static bool update_fresh_bitmap_and_check_stale_bitmap(FvdAIOCB * acb) return need_update; } + +/* Return true if a valid region is found. */ +static bool find_region_in_base_img(BDRVFvdState * s, int64_t * from, +int64_t * to) +{ +int64_t sec = *from; +int64_t region_end = *to; + +if (region_end s-base_img_sectors) { +region_end = s-base_img_sectors; +} + +check_next_region: +if (sec = region_end) { +return false; +} + +if (!fresh_bitmap_show_sector_in_base_img(sec, s)) { +/* Find the first sector in the base image. */ + +sec = ROUND_UP(sec + 1, s-block_size); /* Begin of next block. */ +while (1) { +if (sec = region_end) { +return false; +} +if (fresh_bitmap_show_sector_in_base_img(sec, s)) { +break; +} +sec += s-block_size; /* Begin of the next block. */ +} +} + +/* Find the end of the region in the base image. */ +int64_t first_sec = sec; +sec = ROUND_UP(sec + 1, s-block_size); /* Begin of next block. */ +while (1) { +if (sec = region_end) { +sec = region_end; +break; +} +if (!fresh_bitmap_show_sector_in_base_img(sec, s)) { +break; +} +sec += s-block_size; /* Begin of the next block. */ +} +int64_t last_sec = sec; + +/* Check conflicting writes. */ +FvdAIOCB *old; +QLIST_FOREACH(old, s-write_locks, write.next_write_lock) { +int64_t old_begin = ROUND_DOWN(old-sector_num, s-block_size); +int64_t old_end = old-sector_num + old-nb_sectors; +old_end = ROUND_UP(old_end, s-block_size); +if (old_begin = first_sec first_sec old_end) { +first_sec = old_end; +} +if (old_begin last_sec last_sec = old_end) { +last_sec = old_begin; +} +} + +if (first_sec = last_sec) { +/* The region in [first_sec, sec) is fully covered. */ +goto check_next_region; +} + +/* This loop cannot be merged with the loop above. Otherwise, the logic + * would be incorrect. This loop covers the case that an old request + * spans over a subset of the region being checked. */ +QLIST_FOREACH(old, s-write_locks, write.next_write_lock) { +int64_t old_begin = ROUND_DOWN(old-sector_num, s-block_size); +if (first_sec = old_begin old_begin last_sec) { +last_sec = old_begin; +} +} + +if (first_sec = last_sec) { +/* The region in [first_sec, sec) is fully covered. */ +goto check_next_region; +} + +ASSERT(first_sec % s-block_size == 0 (last_sec % s-block_size == 0 || + last_sec == s-base_img_sectors)); + +*from = first_sec; +*to = last_sec; +return true; +} diff --git a/block/fvd-load.c b/block/fvd-load.c new file mode 100644 index 000..80ab32c --- /dev/null +++ b/block/fvd-load.c @@ -0,0 +1,20 @@ +/* + * QEMU Fast Virtual Disk Format Load Data from Compact Image + * + * Copyright IBM, Corp. 2010 + * + * Authors: + *Chunqiang Tang ct...@us.ibm.com + * + * This work is licensed under the terms of the GNU GPL, version 2 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +static inline BlockDriverAIOCB *load_data(FvdAIOCB * parent_acb, +BlockDriverState * bs, int64_t sector_num, +QEMUIOVector * orig_qiov, int nb_sectors, +BlockDriverCompletionFunc * cb, void *opaque) +{ +return NULL; +} diff --git a/block/fvd-read.c b/block/fvd-read.c index b9f3ac9..cd041e5 100644 --- a/block/fvd-read.c +++ b/block/fvd-read.c @@ -11,11 +11,493 @@ * */ +static void read_backing_for_copy_on_read_cb(void *opaque, int ret); +static void read_fvd_cb(void *opaque, int ret); +static inline void calc_read_region(BDRVFvdState * s, int64_t sector_num, +int nb_sectors, int64_t * p_first_sec_in_fvd, +int64_t * p_last_sec_in_fvd, +int64_t * p_first_sec_in_backing, +int64_t * p_last_sec_in_backing); +static
[Qemu-devel] [PATCH v3] rtl8139: add vlan support
I've posted v2 of these patches back in november http://article.gmane.org/gmane.comp.emulators.qemu/84252 Changes since v2: insertion: * moved insertion later in the process, to handle tso * use qemu_sendv_packet() to insert the tag for us * added dot1q_buf parameter to rtl8139_do_receive() to avoid some memcpy() in loopback mode. Note that the code path through that function is unchanged when dot1q_buf is NULL. extraction: * reduced the amount of copying by moving the frame too short logic after the removal of the vlan tag (as is done in e1000.c for example). Unfortunately, that logic can no longer be shared betwen C+ and C mode. I've tested on the following combinations of guest and hosts: host: x86_64, guest: x86_64 host: x86_64, guest: ppc32 host: ppc32, guest: ppc32 Testing on the x86_64 host used '-net tap' and consisted of: * making an http transfert on the untagged interface. * ping -s 0-1472 to another host on a vlan. * making an scp upload to another host on a vlan. Testing on the ppc32 host used '-net socket' connected to an x86_64 qemu-kvm running the virtio nic and consisted of: * establishing an ssh connection between the two using an untagged interface. * ping -s 0-1472 to the ppc32 using a vlan. * making an scp transfer in both directions using a vlan. All that was successful. Nevertheless, it doesn't exercise all code paths so care is in order. Please note that the lack of vlan support in rtl8139 has taken a few people aback: https://bugzilla.redhat.com/show_bug.cgi?id=516587 http://article.gmane.org/gmane.linux.network.general/14266 Thanks, -Ben
[Qemu-devel] [PATCH v3 1/2] rtl8139: add vlan tag insertion
Add support to the emulated hardware to insert vlan tags in packets going from the guest to the network. Signed-off-by: Benjamin Poirier benjamin.poir...@gmail.com Cc: Igor V. Kovalenko igor.v.kovale...@gmail.com Cc: Jason Wang jasow...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com --- hw/rtl8139.c | 123 +- 1 files changed, 96 insertions(+), 27 deletions(-) diff --git a/hw/rtl8139.c b/hw/rtl8139.c index a22530c..35ccd3d 100644 --- a/hw/rtl8139.c +++ b/hw/rtl8139.c @@ -47,6 +47,8 @@ * Darwin) */ +#include net/ethernet.h + #include hw.h #include pci.h #include qemu-timer.h @@ -68,6 +70,16 @@ #if defined(RTL8139_CALCULATE_RXCRC) /* For crc32 */ #include zlib.h + +static inline uLong rtl8139_crc32(uLong crc, const Bytef *buf, uInt len) +{ +return crc32(crc, buf, len); +} +#else +static inline uLong rtl8139_crc32(uLong crc, const Bytef *buf, uInt len) +{ +return 0; +} #endif #define SET_MASKED(input, mask, curr) \ @@ -77,6 +89,9 @@ #define MOD2(input, size) \ ( ( input ) ( size - 1 ) ) +#define VLAN_TCI_LEN 2 +#define VLAN_HDR_LEN (ETHER_TYPE_LEN + VLAN_TCI_LEN) + #if defined (DEBUG_RTL8139) # define DEBUG_PRINT(x) do { printf x ; } while (0) #else @@ -814,9 +829,11 @@ static int rtl8139_can_receive(VLANClientState *nc) } } -static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_t size_, int do_interrupt) +static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, +size_t buf_size, int do_interrupt, const uint8_t *dot1q_buf) { RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque; +int size_ = buf_size + (dot1q_buf ? VLAN_HDR_LEN : 0); int size = size_; uint32_t packet_header = 0; @@ -935,7 +952,14 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ /* if too small buffer, then expand it */ if (size MIN_BUF_SIZE) { -memcpy(buf1, buf, size); +if (unlikely(dot1q_buf)) { +memcpy(buf1, buf, 2 * ETHER_ADDR_LEN); +memcpy(buf1 + 2 * ETHER_ADDR_LEN, dot1q_buf, VLAN_HDR_LEN); +memcpy(buf1 + 2 * ETHER_ADDR_LEN + VLAN_HDR_LEN, buf + 2 * +ETHER_ADDR_LEN, buf_size - 2 * ETHER_ADDR_LEN); +} else { +memcpy(buf1, buf, size); +} memset(buf1 + size, 0, MIN_BUF_SIZE - size); buf = buf1; size = MIN_BUF_SIZE; @@ -1022,7 +1046,21 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ target_phys_addr_t rx_addr = rtl8139_addr64(rxbufLO, rxbufHI); /* receive/copy to target memory */ -cpu_physical_memory_write( rx_addr, buf, size ); +if (unlikely(dot1q_buf)) { +cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN); +val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN); +cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, dot1q_buf, +VLAN_HDR_LEN); +val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN); +cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN + +VLAN_HDR_LEN, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 * +ETHER_ADDR_LEN); +val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 * +ETHER_ADDR_LEN); +} else { +cpu_physical_memory_write(rx_addr, buf, size); +val = rtl8139_crc32(0, buf, size); +} if (s-CpCmd CPlusRxChkSum) { @@ -1031,9 +1069,7 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ /* write checksum */ #if defined (RTL8139_CALCULATE_RXCRC) -val = cpu_to_le32(crc32(0, buf, size)); -#else -val = 0; +val = cpu_to_le32(val); #endif cpu_physical_memory_write( rx_addr+size, (uint8_t *)val, 4); @@ -1133,13 +1169,24 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, size_ rtl8139_write_buffer(s, (uint8_t *)val, 4); -rtl8139_write_buffer(s, buf, size); +/* receive/copy to target memory */ +if (unlikely(dot1q_buf)) { +rtl8139_write_buffer(s, buf, 2 * ETHER_ADDR_LEN); +val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN); +rtl8139_write_buffer(s, dot1q_buf, VLAN_HDR_LEN); +val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN); +rtl8139_write_buffer(s, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 * +ETHER_ADDR_LEN); +val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 * +ETHER_ADDR_LEN); +} else { +rtl8139_write_buffer(s, buf, size); +val = rtl8139_crc32(0, buf, size); +} /* write checksum */ #if defined (RTL8139_CALCULATE_RXCRC) -val = cpu_to_le32(crc32(0, buf, size)); -#else -
[Qemu-devel] [PATCH v3 2/2] rtl8139: add vlan tag extraction
Add support to the emulated hardware to extract vlan tags in packets going from the network to the guest. Signed-off-by: Benjamin Poirier benjamin.poir...@gmail.com Cc: Igor V. Kovalenko igor.v.kovale...@gmail.com Cc: Jason Wang jasow...@redhat.com Cc: Michael S. Tsirkin m...@redhat.com -- AFAIK, extraction is optional to get vlans working. The driver requests rx detagging but should not assume that it was done. Under Linux, the mac layer will catch the vlan ethertype. I only added this part for completeness (to emulate the hardware more truthfully.) --- hw/rtl8139.c | 89 +- 1 files changed, 63 insertions(+), 26 deletions(-) diff --git a/hw/rtl8139.c b/hw/rtl8139.c index 35ccd3d..f3aaebc 100644 --- a/hw/rtl8139.c +++ b/hw/rtl8139.c @@ -835,10 +835,11 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, RTL8139State *s = DO_UPCAST(NICState, nc, nc)-opaque; int size_ = buf_size + (dot1q_buf ? VLAN_HDR_LEN : 0); int size = size_; +const uint8_t *next_part; +size_t next_part_size; uint32_t packet_header = 0; -uint8_t buf1[60]; static const uint8_t broadcast_macaddr[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff }; @@ -950,21 +951,6 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, } } -/* if too small buffer, then expand it */ -if (size MIN_BUF_SIZE) { -if (unlikely(dot1q_buf)) { -memcpy(buf1, buf, 2 * ETHER_ADDR_LEN); -memcpy(buf1 + 2 * ETHER_ADDR_LEN, dot1q_buf, VLAN_HDR_LEN); -memcpy(buf1 + 2 * ETHER_ADDR_LEN + VLAN_HDR_LEN, buf + 2 * -ETHER_ADDR_LEN, buf_size - 2 * ETHER_ADDR_LEN); -} else { -memcpy(buf1, buf, size); -} -memset(buf1 + size, 0, MIN_BUF_SIZE - size); -buf = buf1; -size = MIN_BUF_SIZE; -} - if (rtl8139_cp_receiver_enabled(s)) { DEBUG_PRINT((RTL8139: in C+ Rx mode \n)); @@ -1025,6 +1011,44 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, uint32_t rx_space = rxdw0 CP_RX_BUFFER_SIZE_MASK; +/* write VLAN info to descriptor variables */ +/* next_part starts right after the vlan header (if any), at the + * ethertype for the payload */ +next_part = buf[ETHER_ADDR_LEN * 2]; +if (s-CpCmd CPlusRxVLAN (dot1q_buf || be16_to_cpup((uint16_t *) +buf[ETHER_ADDR_LEN * 2]) == ETHERTYPE_VLAN)) { +if (!dot1q_buf) { +/* the tag is in the buffer */ +dot1q_buf = buf[ETHER_ADDR_LEN * 2]; +next_part += VLAN_HDR_LEN; +} +size -= VLAN_HDR_LEN; + +rxdw1 = ~CP_RX_VLAN_TAG_MASK; +/* BE + ~le_to_cpu()~ + cpu_to_le() = BE */ +rxdw1 |= CP_RX_TAVA | le16_to_cpup((uint16_t *) +buf[ETHER_HDR_LEN]); + +DEBUG_PRINT((RTL8139: C+ Rx mode : extracted vlan tag with tci: +%u\n, be16_to_cpup((uint16_t *) buf[ETHER_HDR_LEN]))); +} else { +/* reset VLAN tag flag */ +rxdw1 = ~CP_RX_TAVA; +} +next_part_size = buf + buf_size - next_part; + +/* if too small buffer, then expand it */ +if (size MIN_BUF_SIZE) { +size_t tmp_size = MIN_BUF_SIZE - ETHER_ADDR_LEN * 2; +uint8_t *tmp = alloca(tmp_size); + +memcpy(tmp, next_part, next_part_size); +memset(tmp + next_part_size, 0, tmp_size - next_part_size); +next_part = tmp; +next_part_size = tmp_size; +size = MIN_BUF_SIZE; +} + /* TODO: scatter the packet over available receive ring descriptors space */ if (size+4 rx_space) @@ -1049,14 +1073,11 @@ static ssize_t rtl8139_do_receive(VLANClientState *nc, const uint8_t *buf, if (unlikely(dot1q_buf)) { cpu_physical_memory_write(rx_addr, buf, 2 * ETHER_ADDR_LEN); val = rtl8139_crc32(0, buf, 2 * ETHER_ADDR_LEN); -cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, dot1q_buf, -VLAN_HDR_LEN); val = rtl8139_crc32(val, dot1q_buf, VLAN_HDR_LEN); -cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN + -VLAN_HDR_LEN, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 * -ETHER_ADDR_LEN); -val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, buf_size - 2 * -ETHER_ADDR_LEN); +cpu_physical_memory_write(rx_addr + 2 * ETHER_ADDR_LEN, next_part, +next_part_size); +val = rtl8139_crc32(val, buf + 2 * ETHER_ADDR_LEN, +next_part_size); } else { cpu_physical_memory_write(rx_addr, buf, size); val = rtl8139_crc32(0, buf, size); @@ -1115,9 +1136,6 @@ static ssize_t
[Qemu-devel] Congratulation
Nokia celebrates 40yrs.Your Mobile Number has won 600,000 pounds in the Nokia Awards. To claim your prize, send your Claim code: TN1, to nokiacare...@ymail.com
[Qemu-devel] Congratulation
Nokia celebrates 40yrs.Your Mobile Number has won 600,000 pounds in the Nokia Awards. To claim your prize, send your Claim code: TN1, to nokiacare...@ymail.com