Re: [Qemu-devel] [PATCH 03/12] VMDK: probe for mono flat image
On Wed, Jun 15, 2011 at 6:45 AM, Fam Zheng famc...@gmail.com wrote: Have you tried removing that comment line to see if VMware still recognizes the file? Yes, it recognizes iff the first option line (non-comment, non-empty) is version=1 or version=2“. (An interesting thing that it is both space sensitive and case sensitive) Cool, nice job figuring this out. We should do the same. Stefan
Re: [Qemu-devel] [PATCH RFC] target-ppc: Correctly handle translation address when bus unit ID = 0x07F
On 14.06.2011, at 23:49, Andreas Färber wrote: Am 13.06.2011 um 15:31 schrieb Alexander Graf: On 13.06.2011, at 12:13, Andreas Färber wrote: +/* Memory forced */ +ctx-raddr = ((sr 0xF) 28) | (eaddr 0x0FFF); This is exactly the same as ctx-raddr = eaddr, no? No, not that I see. The manual is explicit about this calculation: the processor [...] generates a memory access with the physical address specified by the lowest-order four bits in the segment register (SR[28–31]) concatenated with LA4–LA31. (6-63 / 6.10.4) That matches figure 6-26 on the following page, and I've added a similar comment in v2. I see no compelling reason why the lower nibble of the SR should always match the SR#, the upper nibble of eaddr. Ah, sr is not the segment register number, but the content of the segement register. Then it's ok :). Alex
Re: [Qemu-devel] [PATCH] usb-ehci: move device/vendor/class id to qdev
On 06/14/11 19:40, Michael S. Tsirkin wrote: This is on top of my pci tree. Looks good. Wanna queue this up in your PCI tree, so all devices are switched over at once when this is merged? cheers, Gerd
Re: [Qemu-devel] High speed polling
On Tue, Jun 14, 2011 at 11:32 PM, Clay Andreasen c...@cray.com wrote: I have a network device simulation that I am connecting to multiple instances of Qemu (nodes) via a shared memory queue. It works pretty well as long as all of the nodes are initiating communication but when one node is passive, it must poll to get packets. So far the fastest I have been able to get it to poll is about every 2M emulated clocks. This is with CONFIG_HIGH_RES_TIMERS and CONFIG_NO_HZ on the host. I also set MIN_TIMER_REARM_NS in qemu-timer.c to 10. Is there some way to increase the polling rate by about an order of magnitude? Without more details it's hard to say what is going on: Running an x86 guest? Are you using ./configure --enable-io-thread? It sounds like you may not be using KVM? How many vcpus are running on the host in total compared to the number of logical CPUs on the host? You haven't given details on how you are polling in the guest. Are you running a polling loop in ring 0 or is the guest running a full-blown OS and polling from userspace? Why are you polling in the first place - to minimize latency? Stefan
Re: [Qemu-devel] [PATCH 01/13] spice: Use cpu_register_physical_memory_log for dirty log enabling
Hi, Note: The addtional vga_dirty_log_start for the primary interface looked stray. qxl_write_config enabled logging for all interfaces anyway. No. qxl_write_config is hooked for the primary only. case QXL_RAM_RANGE_INDEX: -cpu_register_physical_memory(addr, size, qxl-vga.vram_offset | IO_MEM_RAM); +cpu_register_physical_memory_log(addr, size, + qxl-vga.vram_offset | IO_MEM_RAM, + 0, true); if (qxl-id == 0) { cpu_register_physical_memory_log(...) } else { cpu_register_physical_memory() } Only the primary is vga compatible and thus needs dirty logging. cheers, Gerd
Re: [Qemu-devel] [PATCH] usb-ehci: move device/vendor/class id to qdev
On Wed, Jun 15, 2011 at 08:32:50AM +0200, Gerd Hoffmann wrote: On 06/14/11 19:40, Michael S. Tsirkin wrote: This is on top of my pci tree. Looks good. Wanna queue this up in your PCI tree, so all devices are switched over at once when this is merged? cheers, Gerd Guess so, yes. Thanks for the review. -- MST
[Qemu-devel] [PATCH v2 01/13] spice: Use cpu_register_physical_memory_log for dirty log enabling
Drop outdated dirty log disable/enable around PCI remapping and register the BAR for dirty logging via cpu_register_physical_memory_log. That allows to remove all vga_dirty_log_start/stop references from qxl. CC: Gerd Hoffmann kra...@redhat.com Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Changes in v2: - don't enable logging for secondary adapter hw/qxl.c |9 +++-- 1 files changed, 3 insertions(+), 6 deletions(-) diff --git a/hw/qxl.c b/hw/qxl.c index 1906e84..01149ae 100644 --- a/hw/qxl.c +++ b/hw/qxl.c @@ -619,12 +619,10 @@ static void qxl_write_config(PCIDevice *d, uint32_t address, PCIQXLDevice *qxl = DO_UPCAST(PCIQXLDevice, pci, d); VGACommonState *vga = qxl-vga; -vga_dirty_log_stop(vga); pci_default_write_config(d, address, val, len); if (vga-map_addr qxl-pci.io_regions[0].addr == -1) { vga-map_addr = 0; } -vga_dirty_log_start(vga); } static void qxl_check_state(PCIQXLDevice *d) @@ -1037,12 +1035,11 @@ static void qxl_map(PCIDevice *pci, int region_num, qxl-io_base = addr; break; case QXL_RAM_RANGE_INDEX: -cpu_register_physical_memory(addr, size, qxl-vga.vram_offset | IO_MEM_RAM); +cpu_register_physical_memory_log(addr, size, + qxl-vga.vram_offset | IO_MEM_RAM, + 0, qxl-id == 0); qxl-vga.map_addr = addr; qxl-vga.map_end = addr + size; -if (qxl-id == 0) { -vga_dirty_log_start(qxl-vga); -} break; case QXL_ROM_RANGE_INDEX: cpu_register_physical_memory(addr, size, qxl-rom_offset | IO_MEM_ROM); -- 1.7.1
Re: [Qemu-devel] [PATCH, v2] vga: Fix type of lfb/map_addr/end.
On 2011-06-14 21:53, Richard Henderson wrote: These addresses have been passed through pci_to_cpu_addr, and thus need to be full target_phys_addr_t. Signed-off-by: Richard Henderson r...@twiddle.net Cc: Jan Kiszka jan.kis...@siemens.com --- V1-V2: lfb_addr/end also widened to guest address width. hw/vga_int.h | 10 +- 1 files changed, 5 insertions(+), 5 deletions(-) diff --git a/hw/vga_int.h b/hw/vga_int.h index d2811bd..eee91a8 100644 --- a/hw/vga_int.h +++ b/hw/vga_int.h @@ -106,13 +106,13 @@ typedef void (* vga_update_retrace_info_fn)(struct VGACommonState *s); typedef struct VGACommonState { uint8_t *vram_ptr; ram_addr_t vram_offset; +target_phys_addr_t lfb_addr; +target_phys_addr_t lfb_end; +target_phys_addr_t map_addr; +target_phys_addr_t map_end; uint32_t vram_size; -uint32_t lfb_addr; -uint32_t lfb_end; -uint32_t map_addr; -uint32_t map_end; -uint32_t lfb_vram_mapped; /* whether 0xa is mapped as ram */ uint32_t latch; +uint32_t lfb_vram_mapped; /* whether 0xa is mapped as ram */ uint8_t sr_index; uint8_t sr[256]; uint8_t gr_index; Acked-by: Jan Kiszka jan.kis...@siemens.com But one of us will have to rebase at the end of the day. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote: Yes, please. OK, I'll do it as soon I'll find time for it. On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote: For image files the block layer should be caching the device capacity (size) anyway, so you probably don't need to allocate the array, just call bdrv_get_geometry(). That might make it easier to write a self-contained function. I've tried to not cache, but it turned out that bdrv_get_geometry() calls are quite noticeable in profiler without caching. -- Дмитрий Конищев (Dmitry Konishchev) mailto:konishc...@gmail.com
Re: [Qemu-devel] [PATCH] Fix signal handling of SIG_IPI when io-thread is enabled
On 2011-06-15 07:20, Alexandre Raymond wrote: Both the signal thread (via sigwait()) and the cpu thread (via a normal signal handler) were attempting to catch SIG_IPI. Why? Ahh, because of qemu_cpu_kick_self: raise(SIG_IPI)! That should generate a per-process SIG_IPI. And that may not only affect Darwin. Looks good. Acked-by: Jan Kiszka jan.kis...@siemens.com This resulted in random freezes under Darwin. This patch separates SIG_IPI from the rest of the signals handled by the signal thread, because it is independently caught by the cpu thread. Signed-off-by: Alexandre Raymond cerb...@gmail.com --- cpus.c | 10 +- 1 files changed, 9 insertions(+), 1 deletions(-) diff --git a/cpus.c b/cpus.c index 18a1522..84ffd1c 100644 --- a/cpus.c +++ b/cpus.c @@ -394,10 +394,18 @@ static int qemu_signal_init(void) sigaddset(set, SIGUSR2); pthread_sigmask(SIG_UNBLOCK, set, NULL); +/* + * SIG_IPI must be blocked in the main thread and must not be caught + * by sigwait() in the signal thread. Otherwise, the cpu thread will + * not catch it reliably. + */ +sigemptyset(set); +sigaddset(set, SIG_IPI); +pthread_sigmask(SIG_BLOCK, set, NULL); + sigemptyset(set); sigaddset(set, SIGIO); sigaddset(set, SIGALRM); -sigaddset(set, SIG_IPI); sigaddset(set, SIGBUS); #else sigemptyset(set); -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
[Qemu-devel] [PATCH v2] linux-user: Fix the computation of the requested heap size
There were several remaining bugs in the previous implementation of do_brk(): 1. the value of new_alloc_size was one page too large when the requested brk was aligned on a host page boundary. 2. no new pages should be (re-)allocated when the requested brk is in the range of the pages that were already allocated previsouly (for the same purpose). Technically these pages are never unmapped in the current implementation. The problem/fix can be reproduced/validated with the test-suite above: #include unistd.h /* syscall(2), */ #include sys/syscall.h /* SYS_brk, */ #include stdio.h/* puts(3), */ #include stdlib.h /* exit(3), EXIT_*, */ #include stdint.h /* uint*_t, */ #include sys/mman.h /* mmap(2), MAP_*, */ #include string.h /* memset(3), */ int main() { int exit_status = EXIT_SUCCESS; uint8_t *current_brk = 0; uint8_t *initial_brk; uint8_t *new_brk; uint8_t *old_brk; int failure = 0; int i; void test_brk(int increment, int expected_result) { new_brk = (uint8_t *)syscall(SYS_brk, current_brk + increment); if ((new_brk == current_brk) == expected_result) failure = 1; current_brk = (uint8_t *)syscall(SYS_brk, 0); } void test_result() { if (!failure) puts(OK); else { puts(failure); exit_status = EXIT_FAILURE; } } void test_title(const char *title) { failure = 0; printf(%-45s : , title); fflush(stdout); } test_title(Initialization); test_brk(0, 1); initial_brk = current_brk; test_result(); test_title(Don't overlap \brk\ pages); test_brk(HOST_PAGE_SIZE, 1); test_brk(HOST_PAGE_SIZE, 1); test_result(); /* Preparation for the test Re-allocated heap is initialized. */ old_brk = current_brk - HOST_PAGE_SIZE; memset(old_brk, 0xFF, HOST_PAGE_SIZE); test_title(Don't allocate the same \brk\ page twice); test_brk(-HOST_PAGE_SIZE, 1); test_brk(HOST_PAGE_SIZE, 1); test_result(); test_title(Re-allocated \brk\ pages are initialized); for (i = 0; i HOST_PAGE_SIZE; i++) { if (old_brk[i] != 0) { printf((index = %d, value = 0x%x) , i, old_brk[i]); failure = 1; break; } } test_result(); test_title(Don't allocate \brk\ pages over \mmap\ pages); new_brk = mmap(current_brk, HOST_PAGE_SIZE / 2, PROT_READ, MAP_PRIVATE | MAP_ANONYMOUS | MAP_FIXED, -1, 0); if (new_brk == (void *) -1) puts(unknown); else { test_brk(HOST_PAGE_SIZE, 0); test_result(); } test_title(All \brk\ pages are writable (please wait)); if (munmap(current_brk, HOST_PAGE_SIZE / 2) != 0) puts(unknown); else { while (current_brk - initial_brk 2*1024*1024*1024UL) { old_brk = current_brk; test_brk(HOST_PAGE_SIZE, -1); if (old_brk == current_brk) break; for (i = 0; i HOST_PAGE_SIZE; i++) old_brk[i] = 0xAA; } puts(OK); } test_title(Maximum size of the heap 16MB); failure = (current_brk - initial_brk) 16*1024*1024; test_result(); exit(exit_status); } Changes introduced in patch v2: * extend the brk test-suite embedded within the commit message; * heap contents have to be initialized to zero, this bug was exposed by tst-calloc.c from the GNU C library; * don't [try to] allocate a new host page if the new brk is equal to the latest allocated host page (brk_page); and * print some debug information when DEBUGF_BRK is defined. Signed-off-by: Cédric VINCENT cedric.vinc...@st.com Reviewed-by: Christophe Guillon christophe.guil...@st.com Cc: Riku Voipio riku.voi...@iki.fi --- linux-user/syscall.c | 37 + 1 files changed, 29 insertions(+), 8 deletions(-) diff --git a/linux-user/syscall.c b/linux-user/syscall.c index b975730..608924a 100644 --- a/linux-user/syscall.c +++ b/linux-user/syscall.c @@ -709,29 +709,44 @@ char *target_strerror(int err) static abi_ulong target_brk; static abi_ulong target_original_brk; +static abi_ulong brk_page; void target_set_brk(abi_ulong new_brk) { target_original_brk = target_brk = HOST_PAGE_ALIGN(new_brk); +brk_page = HOST_PAGE_ALIGN(target_brk); } +//#define DEBUGF_BRK(message, args...) do { fprintf(stderr, (message), ## args); } while (0) +#define DEBUGF_BRK(message, args...) + /* do_brk() must return target values
Re: [Qemu-devel] [PATCH v2] target-ppc: Handle memory-forced I/O controller access
On 14.06.2011, at 23:27, Andreas Färber wrote: From: Hervé Poussineau hpous...@reactos.org On at least the PowerPC 601, a direct-store (T=1) with bus unit ID 0x07F is special-cased as memory-forced I/O controller access. It is supposed to be checked immediately if T=1, bypassing all protection mechanisms and acting cache-inhibited and global. Thanks, applied to ppc-next. Alex
Re: [Qemu-devel] [PATCH] xen: avoid tracking the region 0xa0000 - 0xbffff
On 14.06.2011, at 17:24, Stefano Stabellini wrote: On Tue, 14 Jun 2011, Alexander Graf wrote: On 14.06.2011, at 13:48, Stefano Stabellini wrote: On Tue, 14 Jun 2011, Alexander Graf wrote: On 03.06.2011, at 17:56, stefano.stabell...@eu.citrix.com stefano.stabell...@eu.citrix.com wrote: From: Stefano Stabellini stefano.stabell...@eu.citrix.com Xen can only do dirty bit tracking for one memory region, so we should explicitly avoid trying to track the legacy VGA region between 0xa and 0xb, rather than trying and failing. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com --- xen-all.c |4 1 files changed, 4 insertions(+), 0 deletions(-) diff --git a/xen-all.c b/xen-all.c index 9a5c3ec..1fdc2e8 100644 --- a/xen-all.c +++ b/xen-all.c @@ -218,6 +218,10 @@ static int xen_add_to_physmap(XenIOState *state, if (get_physmapping(state, start_addr, size)) { return 0; } +/* do not try to map legacy VGA memory */ +if (start_addr = 0xa start_addr + size = 0xb) { I don't quite like the hardcoded range here. What exactly is the issue? The fact that you can only map a single region? Then do a counter and fail when it's 1. That is what we were doing before: succeeding the first time and failing from the second time on. By coincidence the second time was the range 0xa-0xb so everything worked as expected, but it wasn't obvious why. I am just trying to make sure that one year from now it will be clear just looking at the code why it works. If you don't want to map the VGA region as memory slot, why not change the actual mapping code in the cirrus adapter? Because I didn't want to introduce any ugly if (xen_enable()) in generic code, if it is that simple to catch the issue from xen specific code. Well sure, but 2 years from now yet another region will be introduced that might even be registered before the FB and everyone's puzzled again :). How about you print a warning when anyone tries to map anything after the first map? Or - as Jan suggests - implement multiple regions. If you prefer, you could even check for the VGA range as known broken and only print warnings on others. I can do that (actually we already do it) but it wouldn't change the fact that if somebody modifies hw/cirrus_vga.c:map_linear_vram to call cpu_register_physical_memory_log for the legacy range first, it would break xen, that is the problem I was trying to solve. In order to make the code more reliable, and also catch the scenario where another region is registered before the framebuffer, we could do something like this, but it is not very pretty: diff --git a/xen-all.c b/xen-all.c index 9a5c3ec..de1e724 100644 --- a/xen-all.c +++ b/xen-all.c @@ -214,6 +214,7 @@ static int xen_add_to_physmap(XenIOState *state, unsigned long i = 0; int rc = 0; XenPhysmap *physmap = NULL; +RAMBlock *block; if (get_physmapping(state, start_addr, size)) { return 0; @@ -221,7 +222,16 @@ static int xen_add_to_physmap(XenIOState *state, if (size = 0) { return -1; } +/* only add the vga vram to physmap */ Please add a comment here, explaining that Xen can only handle a single dirty log region for now, and that we want the linear framebuffer to be that region. Also, please resend with proper patch headers and I'll pull it into the xen-next tree. Alex +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strcmp(block-idstr, vga.vram) block-offset == phys_offset + start_addr 0xb) { +goto go_physmap; +} +} +return -1; +go_physmap: DPRINTF(mapping vram to %llx - %llx, from %llx\n, start_addr, start_addr + size, phys_offset); for (i = 0; i size TARGET_PAGE_BITS; i++) { unsigned long idx = (phys_offset TARGET_PAGE_BITS) + i;
Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing
On 14.06.2011, at 15:27, Stefano Stabellini wrote: On Tue, 14 Jun 2011, Alexander Graf wrote: static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id) { PCII440FXState *d = opaque; @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char *device_name, d = pci_create_simple(b, 0, device_name); *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d); -piix3 = DO_UPCAST(PIIX3State, dev, - pci_create_simple_multifunction(b, -1, true, PIIX3)); +if (xen_enabled()) { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(b, -1, true, PIIX3-xen)); +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq, +piix3, XEN_PIIX_NUM_PIRQS); But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the reason behind this change? It is still a piix3, but also provides non-legacy interrupt links to the IO-APIC. The four pins of each PCI device on the bus not only are routed to the normal four pirqs (programmed writing to 0x60-0x63, see above) but also they are connected to the IO-APIC directly. These additional routes can only be discovered through ACPI, so you need matching ACPI tables. We used to build the old ACPI tables like this: /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */ printf(Name(PRTA, Package() {\n); for ( dev = 1; dev 32; dev++ ) for ( intx = 0; intx 4; intx++ ) /* INTA-D */ printf(Package(){0x%04x, %u, 0, %u},\n, dev, intx, ((dev*4+dev/8+intx)31)+16); printf(})\n); Interesting concept, but completely non-standard and very much different from real hardware. Please at least add a comment there to show readers that Xen is doing a hack which is not at all related to how the PIIX really works. Isn't this more a function of the wires on the motherboard than the PIIX specifically? i.e. this just encodes the permutation of the wires from the PCI slots into the IO-APIC input pins (bypassing the PIIX, which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)? Interrupts with PCI work slightly different. PCI devices can map (themselves or by software) to one of 4 interrupt lines: INTA, INTB, INTC, INTD. These get converted using PCI host controller specific logic to 4 interrupt lines which then go into the IO-APIC. The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could be 26 though. The number of redirection entries in the IOAPIC can be discovered reading from the IOAPICVER register and it is a property of a specific model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more pins than the most popular IOAPIC used with PIIX3. which means you're emulating hardware that never existed :). I haven't seen a single case where PCI devices have a direct link to the IO-APIC. I also have not seen any PCI host controller that exports more than 4 interrupts. Giving each PCI device its own line, on top of that more than ever could be in real hardware, is a plain hack IMHO. Actually this happens quite often: if I am not mistaken all the GSIs higher than 15 are actually the result of a direct connection between an interrupt source and the IOAPIC. I have several on my testboxes. Yes. Interrupt source meaning a wire on the board. I haven't seen any situation so far where you get direct IO-APIC connections to PCI _device_ pins. You obviously get plenty connections to PCI _bus_ pins. Also give a look at the Intel Multiprocessor Specification, section 3.6.2.3: as you can see from the diagram in Symmetric I/O Mode all the interrupts are routed through the IOAPIC directly. Did this really give you actual performance/latency/scalability gains? I still think for devices that matter, we should go with MSI rather than deriving from real hw. Not all the operating systems support MSIs, it is nice to be able to avoid interrupt sharing without recurring to MSIs. Yes and no. It's a tradeoff. If no interrupt sharing means that we emulate hardware that simply never could have existed the way we model it, I think it's a bad idea. Also this is how Xen has been working for more then 5 years in HVM mode, so this configuration is well tested and supported by most operating systems (at least all the ones we tried so far). I'm fine with Xen breaking its own neck, as long as it doesn't affect non-Xen code paths. Just be aware that I'm not a huge fan of this approach :). In any case I think it is a good idea to add a comment to better explain what we are doing, see below. commit 973bb091a967fdec37a1bc8fe30d46a483d2903d Author: Stefano Stabellini stefano.stabell...@eu.citrix.com Date: Tue May 17 12:10:36 2011 + xen: fix interrupt routing - remove i440FX-xen and i440fx_write_config_xen we don't need to intercept pci config writes to i440FX anymore; - introduce PIIX3-xen and piix3_write_config_xen
Re: [Qemu-devel] [PATCH] xen: fix interrupt routing
On 14.06.2011, at 20:18, Stefano Stabellini wrote: On Tue, 14 Jun 2011, Jan Kiszka wrote: On 2011-06-14 15:27, Stefano Stabellini wrote: On Tue, 14 Jun 2011, Alexander Graf wrote: static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id) { PCII440FXState *d = opaque; @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char *device_name, d = pci_create_simple(b, 0, device_name); *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d); -piix3 = DO_UPCAST(PIIX3State, dev, - pci_create_simple_multifunction(b, -1, true, PIIX3)); +if (xen_enabled()) { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(b, -1, true, PIIX3-xen)); +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq, +piix3, XEN_PIIX_NUM_PIRQS); But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the reason behind this change? It is still a piix3, but also provides non-legacy interrupt links to the IO-APIC. The four pins of each PCI device on the bus not only are routed to the normal four pirqs (programmed writing to 0x60-0x63, see above) but also they are connected to the IO-APIC directly. These additional routes can only be discovered through ACPI, so you need matching ACPI tables. We used to build the old ACPI tables like this: /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */ printf(Name(PRTA, Package() {\n); for ( dev = 1; dev 32; dev++ ) for ( intx = 0; intx 4; intx++ ) /* INTA-D */ printf(Package(){0x%04x, %u, 0, %u},\n, dev, intx, ((dev*4+dev/8+intx)31)+16); printf(})\n); Interesting concept, but completely non-standard and very much different from real hardware. Please at least add a comment there to show readers that Xen is doing a hack which is not at all related to how the PIIX really works. Isn't this more a function of the wires on the motherboard than the PIIX specifically? i.e. this just encodes the permutation of the wires from the PCI slots into the IO-APIC input pins (bypassing the PIIX, which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)? Interrupts with PCI work slightly different. PCI devices can map (themselves or by software) to one of 4 interrupt lines: INTA, INTB, INTC, INTD. These get converted using PCI host controller specific logic to 4 interrupt lines which then go into the IO-APIC. The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could be 26 though. The number of redirection entries in the IOAPIC can be discovered reading from the IOAPICVER register and it is a property of a specific model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more pins than the most popular IOAPIC used with PIIX3. Do real IOAPICs exist with more than 24 pins? Otherwise there is the risk that OSes aren't well prepared for this oddity - specifically not when the chipset is specified to include a 24-pin IOAPIC. Linux supports up to 128 pins and as I wrote before all the other OSes we tested so far seem to react well. I haven't seen a single case where PCI devices have a direct link to the IO-APIC. I also have not seen any PCI host controller that exports more than 4 interrupts. Giving each PCI device its own line, on top of that more than ever could be in real hardware, is a plain hack IMHO. Actually this happens quite often: if I am not mistaken all the GSIs higher than 15 are actually the result of a direct connection between an interrupt source and the IOAPIC. I have several on my testboxes. Except that the interrupt source is the chipset with its PCI bridge, not individual PCI devices. That is the most common configuration but it is not the only one: I have an ACPI table that has individual PCI devices as source in some test boxes. In fact there is even an example of it in this good article about interrupt routing from the FreeBSD guys (it is the last figure): http://people.freebsd.org/~jhb/papers/bsdcan/2007/article/node5.html Figure 6 contains a portion of an example _PRT. Specifically, it includes the first entry in the table. This corresponds to the PCI interrupt for PCI bus 3, slot 7 ..ZIP... For APIC mode, the interrupt is routed to GSI 66. For this machine, ACPI assigns a base GSI of 64 to the I/O APIC with an APIC ID of 10. Thus, GSI 66 corresponds to pin 2 on that I/O APIC Unless I am missing something I don't think that interrupt is going through any PCI bridges... I'm actually not quite sure what exactly he's describing here. But if it's bypassing the bus logic, it's not a normal PCI device :). Sure, there are special case devices that also expose a PCI interface. But real PCI cards that you plug in onto the PCI bus can't bypass the interrupt logic of the bus, as the only interrupt wires they have go to the bus. And since the PCI adapters we use in PC machines in Qemu are
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Wed, Jun 15, 2011 at 8:38 AM, Dmitry Konishchev konishc...@gmail.com wrote: On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote: Yes, please. OK, I'll do it as soon I'll find time for it. On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote: For image files the block layer should be caching the device capacity (size) anyway, so you probably don't need to allocate the array, just call bdrv_get_geometry(). That might make it easier to write a self-contained function. I've tried to not cache, but it turned out that bdrv_get_geometry() calls are quite noticeable in profiler without caching. Why is bdrv_get_geometry() slow? Stefan
Re: [Qemu-devel] [PATCH v2 01/13] spice: Use cpu_register_physical_memory_log for dirty log enabling
On 06/15/11 09:23, Jan Kiszka wrote: Drop outdated dirty log disable/enable around PCI remapping and register the BAR for dirty logging via cpu_register_physical_memory_log. That allows to remove all vga_dirty_log_start/stop references from qxl. Acked-by: Gerd Hoffmann kra...@redhat.com cheers, Gerd
[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)
I have to add that it's with kernel 2.6.39 and qemu 0.14.1 compiled from source [root@Plugbox ~]# uname -a Linux Plugbox 2.6.39-ARCH #1 PREEMPT Tue Jun 14 15:55:01 MDT 2011 armv5tel Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux [root@Plugbox ~]# /i386/usr/bin/qemu-i386 -h qemu-i386 version 0.14.1, Copyright (c) 2003-2008 Fabrice Bellard usage: qemu-i386 [options] program [arguments...] Linux CPU emulator (compiled for i386 emulation) ... -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/739785 Title: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument) Status in QEMU: New Bug description: Good time of day everybody, I have been trying to make usermode qemu on ARM with plugapps (archlinux) with archlinux i386 chroot to work. 1. I installed arch linux in a virtuabox and created a chroot for it with mkarchroot. Transferred it to my pogo plug into /i386/ 2. I comiled qemu-i386 static and put it into /i386/usr/bin/ ./configure --static --disable-blobs --disable-system --target-list=i386-linux-user make 3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and installed it. uname -a Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux 4. Added the following options into /etc/rc.local /sbin/modprobe binfmt_misc /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc echo ':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:' /proc/sys/fs/binfmt_misc/register 5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld- linux.so.3 is a link to that file) from /lib/ to /i386/lib/ 6.Now i chroot into /i386 and I get this: [root@Plugbox i386]# chroot . [II aI hnve ao n@P /]# pacman -Suy bash: fork: Invalid argument 7.I also downloaded linux-user-test-0.3 from qemu website and ran the test: [root@Plugbox linux-user-test-0.3]# make ./qemu-linux-user.sh [qemu-i386] ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l dummyfile BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: Assertion `needed != ((void *)0)' failed! make: *** [test] Error 127 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions
[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)
Same here. The workaround does not work on a DroboFS, since it was actually the default setting all along. root@DroboFS:~# cat /proc/cpu/alignment User: 74283231 System: 0 Skipped:0 Half: 0 Word: 74283231 DWord: 0 Multi: 0 User faults:2 (fixup) root@DroboFS:~# uname -a Linux DroboFS 2.6.22.18 #1 Thu Jan 20 14:29:47 PST 2011 armv5tejl GNU/Linux Tested qemu 0.14.0, but I can give 0.14.1 a try as well, if necessary. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/739785 Title: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument) Status in QEMU: New Bug description: Good time of day everybody, I have been trying to make usermode qemu on ARM with plugapps (archlinux) with archlinux i386 chroot to work. 1. I installed arch linux in a virtuabox and created a chroot for it with mkarchroot. Transferred it to my pogo plug into /i386/ 2. I comiled qemu-i386 static and put it into /i386/usr/bin/ ./configure --static --disable-blobs --disable-system --target-list=i386-linux-user make 3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and installed it. uname -a Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux 4. Added the following options into /etc/rc.local /sbin/modprobe binfmt_misc /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc echo ':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:' /proc/sys/fs/binfmt_misc/register 5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld- linux.so.3 is a link to that file) from /lib/ to /i386/lib/ 6.Now i chroot into /i386 and I get this: [root@Plugbox i386]# chroot . [II aI hnve ao n@P /]# pacman -Suy bash: fork: Invalid argument 7.I also downloaded linux-user-test-0.3 from qemu website and ran the test: [root@Plugbox linux-user-test-0.3]# make ./qemu-linux-user.sh [qemu-i386] ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l dummyfile BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: Assertion `needed != ((void *)0)' failed! make: *** [test] Error 127 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 06/14/2011 09:10 AM, Jan Kiszka wrote: On 2011-06-13 10:45, Avi Kivity wrote: On 06/11/2011 12:23 PM, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com These FPU states are properly maintained by KVM but not yet by TCG. So far we unconditionally set them to 0 in the guest which may cause state corruptions - not only during migration. -#define CPU_SAVE_VERSION 12 +#define CPU_SAVE_VERSION 13 Incrementing the version number seems excessive - I can't imagine a real-life guest will break due to fp pointer corruption However, I don't think we have a mechanism for optional state. We discussed this during the 18th VMState Subsection Symposium and IIRC agreed to re-raise the issue when we encountered it, which appears to be now. Whatever we invent, it has to be backported as well to allow that infamous traveling back in time, migrating VMs from newer to older versions. Would that backporting be simpler if we used an unconditional subsection for the additional states? Thinking about it, a conditional subsection would work fine. Most threads will never see an fpu error, and are all initialized to a clean slate. SDM 1 8.1.9.1 says: 8.1.9.1 Fopcode Compatibility Sub-mode Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 architecture provides program control over the storing of the last instruction opcode (sometimes referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR enables (set) or disables (clear) the fopcode compatibility mode. If FOP code compatibility mode is enabled, the FOP is defined as it has always been in previous IA32 implementations (always defined as the FOP of the last non-trans- parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code compatibility mode is disabled (default), FOP is only valid if the last non-transparent FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked exception. So fopcode will usually be clear. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)
OK, that's pretty conclusive. My initial theory about what's going on here can't be right -- I'll investigate further. -- You received this bug notification because you are a member of qemu- devel-ml, which is subscribed to QEMU. https://bugs.launchpad.net/bugs/739785 Title: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument) Status in QEMU: New Bug description: Good time of day everybody, I have been trying to make usermode qemu on ARM with plugapps (archlinux) with archlinux i386 chroot to work. 1. I installed arch linux in a virtuabox and created a chroot for it with mkarchroot. Transferred it to my pogo plug into /i386/ 2. I comiled qemu-i386 static and put it into /i386/usr/bin/ ./configure --static --disable-blobs --disable-system --target-list=i386-linux-user make 3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and installed it. uname -a Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux 4. Added the following options into /etc/rc.local /sbin/modprobe binfmt_misc /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc echo ':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:' /proc/sys/fs/binfmt_misc/register 5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld- linux.so.3 is a link to that file) from /lib/ to /i386/lib/ 6.Now i chroot into /i386 and I get this: [root@Plugbox i386]# chroot . [II aI hnve ao n@P /]# pacman -Suy bash: fork: Invalid argument 7.I also downloaded linux-user-test-0.3 from qemu website and ran the test: [root@Plugbox linux-user-test-0.3]# make ./qemu-linux-user.sh [qemu-i386] ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l dummyfile BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: Assertion `needed != ((void *)0)' failed! make: *** [test] Error 127 To manage notifications about this bug go to: https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Wed, Jun 15, 2011 at 12:39 PM, Stefan Hajnoczi stefa...@gmail.com wrote: Why is bdrv_get_geometry() slow? Mmm.. Frankly, I haven't looked so deep, but it is going to be slow at least for raw images due to using lseek(). -- Dmitry Konishchev mailto:konishc...@gmail.com
[Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
This patch optimizes 'qemu-img convert' operation for volumes which are almost fully unallocated. Here are the results of simple tests: We have a snapshot of a volume: $ qemu-img info snapshot.qcow2 image: snapshot.qcow2 file format: qcow2 virtual size: 5.0G (5372805120 bytes) disk size: 4.0G cluster_size: 65536 Create a volume from the snapshot and use it a little: $ qemu-img create -f qcow2 -o backing_file=snapshot.qcow2 volume.qcow2 For volumes which are almost fully allocated we have a little regression: $ time qemu-img convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 2m43.864s user 0m9.257s sys 0m40.559s $ time qemu-img-patched convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 2m46.899s user 0m9.749s sys 0m40.471s But now create a volume which is almost fully unallocated: $ qemu-img create -f qcow2 -o backing_file=snapshot.qcow2 volume.qcow2 1T And now we have more than twice decreased CPU consumption: $ time qemu-img convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 6m40.985s user 4m13.832s sys 0m33.738s $ time qemu-img-patched convert -O qcow2 volume.qcow2 volume_snapshot.qcow2 real 4m28.448s user 1m43.882s sys 0m33.894s Signed-off-by: Dmitry Konishchev konishc...@gmail.com --- qemu-img.c | 184 ++- 1 files changed, 143 insertions(+), 41 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 4f162d1..7f3d853 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -27,6 +27,7 @@ #include osdep.h #include sysemu.h #include block_int.h +#include block.h #include stdio.h #ifdef _WIN32 @@ -586,19 +587,95 @@ static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n, return res; } +/* + * Copies sectors from one image to another. + * Writes only non-zero bytes to the output image. + * + * Returns 0 on success, -1 on error. + */ +static int copy_allocated_sectors( +BlockDriverState *out_bs, BlockDriverState *bs, int n, +int64_t sector_num, int64_t bs_offset, const uint64_t *bs_geometry, uint8_t *buf) +{ +BlockDriverState *cur_bs; +uint64_t cur_sectors; +uint64_t bs_sector; +int backing_depth; +int allocated_num; +int sector_found; +int cur_n; + +while (n 0) { +/* Look for the sectors in the source image and if they are not + allocated - sequentially in all its backing images. */ + +cur_bs = bs; +bs_sector = sector_num - bs_offset; +backing_depth = 0; +sector_found = 0; + +do { +cur_sectors = bs_geometry[backing_depth++]; + +if (bs_sector = cur_sectors) { +continue; +} + +if (bs_sector + n = cur_sectors) { +cur_n = n; +} else { +cur_n = cur_sectors - bs_sector; +} + +if (bdrv_is_allocated(cur_bs, bs_sector, cur_n, allocated_num)) { +const uint8_t *cur_buf = buf; +sector_found = 1; + +if (bdrv_read(cur_bs, bs_sector, buf, allocated_num) 0) { +error_report(error while reading); +return -1; +} + +while (allocated_num 0) { +if (is_allocated_sectors(cur_buf, allocated_num, cur_n)) { +if (bdrv_write(out_bs, sector_num, cur_buf, cur_n) 0) { +error_report(error while writing); +return -1; +} +} + +n -= cur_n; +sector_num += cur_n; +allocated_num -= cur_n; +cur_buf += cur_n * BDRV_SECTOR_SIZE; +} + +break; +} +} while(( cur_bs = cur_bs-backing_hd )); + +if (!sector_found) { +sector_num++; +n--; +} +} + +return 0; +} + #define IO_BUF_SIZE (2 * 1024 * 1024) static int img_convert(int argc, char **argv) { -int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors; +int c, ret = 0, n, cur_n, bs_n, bs_i, compress, cluster_size, cluster_sectors; int progress = 0; const char *fmt, *out_fmt, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; int64_t total_sectors, nb_sectors, sector_num, bs_offset; uint64_t bs_sectors; +uint64_t *bs_geometry = NULL; uint8_t * buf = NULL; -const uint8_t *buf1; BlockDriverInfo bdi; QEMUOptionParameter *param = NULL, *create_options = NULL; QEMUOptionParameter *out_baseimg_param; @@ -874,14 +951,20 @@ static int img_convert(int argc, char **argv) /* signal EOF to align */ bdrv_write_compressed(out_bs, 0, NULL, 0); } else { +int bs_i_prev = -1; +float progress = 100; +BlockDriverState *cur_bs; int
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 2011-06-15 11:10, Avi Kivity wrote: On 06/14/2011 09:10 AM, Jan Kiszka wrote: On 2011-06-13 10:45, Avi Kivity wrote: On 06/11/2011 12:23 PM, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com These FPU states are properly maintained by KVM but not yet by TCG. So far we unconditionally set them to 0 in the guest which may cause state corruptions - not only during migration. -#define CPU_SAVE_VERSION 12 +#define CPU_SAVE_VERSION 13 Incrementing the version number seems excessive - I can't imagine a real-life guest will break due to fp pointer corruption However, I don't think we have a mechanism for optional state. We discussed this during the 18th VMState Subsection Symposium and IIRC agreed to re-raise the issue when we encountered it, which appears to be now. Whatever we invent, it has to be backported as well to allow that infamous traveling back in time, migrating VMs from newer to older versions. Would that backporting be simpler if we used an unconditional subsection for the additional states? Thinking about it, a conditional subsection would work fine. Most threads will never see an fpu error, and are all initialized to a clean slate. SDM 1 8.1.9.1 says: 8.1.9.1 Fopcode Compatibility Sub-mode Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 architecture provides program control over the storing of the last instruction opcode (sometimes referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR enables (set) or disables (clear) the fopcode compatibility mode. If FOP code compatibility mode is enabled, the FOP is defined as it has always been in previous IA32 implementations (always defined as the FOP of the last non-trans- parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code compatibility mode is disabled (default), FOP is only valid if the last non-transparent FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked exception. So fopcode will usually be clear. OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But if it's off, how to test for that other condition last non-transparent FP instruction ... had an unmasked exception from the host? Jan signature.asc Description: OpenPGP digital signature
[Qemu-devel] [PATCH] Allow nested qemu_bh_poll() after BH deletion
Without this, qemu segfaults when a BH handler first deletes its BH and then calls another function which involves a nested qemu_bh_poll() call. This can be reproduced by generating an I/O error (e.g. with blkdebug) on an IDE device and using rerror/werror=stop to stop the VM. When continuing the VM, qemu segfaults. Signed-off-by: Kevin Wolf kw...@redhat.com --- async.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/async.c b/async.c index 57ac3a8..fd313df 100644 --- a/async.c +++ b/async.c @@ -137,11 +137,12 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque) int qemu_bh_poll(void) { -QEMUBH *bh, **bhp; +QEMUBH *bh, **bhp, *next; int ret; ret = 0; -for (bh = async_context-first_bh; bh; bh = bh-next) { +for (bh = async_context-first_bh; bh; bh = next) { +next = bh-next; if (!bh-deleted bh-scheduled) { bh-scheduled = 0; if (!bh-idle) -- 1.7.5.2
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 2011-06-15 12:20, Jan Kiszka wrote: On 2011-06-15 11:10, Avi Kivity wrote: On 06/14/2011 09:10 AM, Jan Kiszka wrote: On 2011-06-13 10:45, Avi Kivity wrote: On 06/11/2011 12:23 PM, Jan Kiszka wrote: From: Jan Kiszkajan.kis...@siemens.com These FPU states are properly maintained by KVM but not yet by TCG. So far we unconditionally set them to 0 in the guest which may cause state corruptions - not only during migration. -#define CPU_SAVE_VERSION 12 +#define CPU_SAVE_VERSION 13 Incrementing the version number seems excessive - I can't imagine a real-life guest will break due to fp pointer corruption However, I don't think we have a mechanism for optional state. We discussed this during the 18th VMState Subsection Symposium and IIRC agreed to re-raise the issue when we encountered it, which appears to be now. Whatever we invent, it has to be backported as well to allow that infamous traveling back in time, migrating VMs from newer to older versions. Would that backporting be simpler if we used an unconditional subsection for the additional states? Thinking about it, a conditional subsection would work fine. Most threads will never see an fpu error, and are all initialized to a clean slate. SDM 1 8.1.9.1 says: 8.1.9.1 Fopcode Compatibility Sub-mode Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 architecture provides program control over the storing of the last instruction opcode (sometimes referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR enables (set) or disables (clear) the fopcode compatibility mode. If FOP code compatibility mode is enabled, the FOP is defined as it has always been in previous IA32 implementations (always defined as the FOP of the last non-trans- parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code compatibility mode is disabled (default), FOP is only valid if the last non-transparent FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked exception. So fopcode will usually be clear. OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But if it's off, how to test for that other condition last non-transparent FP instruction ... had an unmasked exception from the host? I briefly thought about status.ES == 1. But the guest may clear the flag in its exception handler before reading opcode etc. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH 00/13] QED image streaming
Hello, hopefully just a quick question... On Tuesday 14 June 2011 20:18:18 Stefan Hajnoczi wrote: This patch series adds image streaming support for QED image files. Other image formats can also be supported in the future. Image streaming populates the file in the background while the guest is running. This makes it possible to start the guest before its image file has been fully provisioned. ... When image streaming is enabled, the unallocated regions of the image file are populated with the data from the backing file. This occurs in the background and the guest can perform regular I/O in the meantime. Once the entire backing file has been streamed, the image no longer requires a backing file and will drop its reference. Does this also work for intermediate images, that is - master image on a NFS share, - local copy on a local drive using CoR-streaming from master, - local instance based on local copy using CoW. We normally have mostly independent servers without a shared storage, expect a global NFS share for the master images. Currently we have to copy the master image to the local server first, before we than can create lots of instances from then, which have few changed relative to the master, so we don't want the copying to happen there. If haven't looked at QED yet, so thanks in advance for your answer. Sincerely Philipp Hahn -- Philipp Hahn Open Source Software Engineer h...@univention.de Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0 Mary-Somerville-Str.1 D-28359 Bremen fax: +49 421 22 232-99 http://www.univention.de/ signature.asc Description: This is a digitally signed message part.
[Qemu-devel] [RFC] Specifying storage devices for live migration
Hello all, Currently there is no way to choose storage devices to be migrated. There was some discussion about making the monitor API a bit more flexible, but the code was never written: http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html So today users can choose only: 1) No storage migrated 2) All disks migrated with fully copy 3) All disks migrated with incremental I'd like to write the code to allow users to choose which disks to migrate, and how. What API would you recommend? What would break the least amount of utilities? Thanks, Avishay
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 06/15/2011 01:20 PM, Jan Kiszka wrote: So fopcode will usually be clear. OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But if it's off, how to test for that other condition last non-transparent FP instruction ... had an unmasked exception from the host? We save fopcode unconditionally. But if IA32_MISC_ENABLE_MSR[2]=0, then fopcode will be zero, and we can skip the subsection (if the data and instruction pointers are also zero, which they will be). If it isn't zero, there's still a good chance fopcode will be zero (64-bit userspace, thread that hasn't used the fpu since the last context switch, last opcode happened to be zero). -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 2011-06-15 13:26, Avi Kivity wrote: On 06/15/2011 01:20 PM, Jan Kiszka wrote: So fopcode will usually be clear. OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But if it's off, how to test for that other condition last non-transparent FP instruction ... had an unmasked exception from the host? We save fopcode unconditionally. But if IA32_MISC_ENABLE_MSR[2]=0, then fopcode will be zero, and we can skip the subsection (if the data and instruction pointers are also zero, which they will be). If it isn't zero, there's still a good chance fopcode will be zero (64-bit userspace, thread that hasn't used the fpu since the last context switch, last opcode happened to be zero). I do not yet find if fopcode is invalid, it is zero, just as IP and DP in the spec. What clears them reliably? Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 06/15/2011 02:32 PM, Jan Kiszka wrote: If it isn't zero, there's still a good chance fopcode will be zero (64-bit userspace, thread that hasn't used the fpu since the last context switch, last opcode happened to be zero). I do not yet find if fopcode is invalid, it is zero, just as IP and DP in the spec. What clears them reliably? FNINIT -- error compiling committee.c: too many arguments to function
Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
On 2011-06-15 13:33, Avi Kivity wrote: On 06/15/2011 02:32 PM, Jan Kiszka wrote: If it isn't zero, there's still a good chance fopcode will be zero (64-bit userspace, thread that hasn't used the fpu since the last context switch, last opcode happened to be zero). I do not yet find if fopcode is invalid, it is zero, just as IP and DP in the spec. What clears them reliably? FNINIT OK, I see. So we simply check for all fields being zero and skip the section in that case. The MSR doesn't actually to us here. Will write v2. Jan -- Siemens AG, Corporate Technology, CT T DE IT 1 Corporate Competence Center Embedded Linux
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Wed, Jun 15, 2011 at 10:50 AM, Dmitry Konishchev konishc...@gmail.com wrote: On Wed, Jun 15, 2011 at 12:39 PM, Stefan Hajnoczi stefa...@gmail.com wrote: Why is bdrv_get_geometry() slow? Mmm.. Frankly, I haven't looked so deep, but it is going to be slow at least for raw images due to using lseek(). We need to fully understand performance before applying optimizations on top. Otherwise it is possible to paper over a problem while leaving the root cause unsolved. Avoiding lseek(2) is very important, not just for qemu-img but also for running VMs. lseek(2) should only be invoked if the image is growable/removable. When I run a VM from a virtio-blk raw image I see no lseek(2) calls. On the host: strace -p $pid_of_qemu -f Inside the guest: dd if=/dev/vda of=/dev/null iflag=direct I see pread(2) from the posix-aio-compat.c worker threads but no lseek(2). The total_sectors cached value is being used. Does strace(1) show lseek(2) on your host? Stefan
Re: [Qemu-devel] [PATCH] Allow nested qemu_bh_poll() after BH deletion
On Wed, Jun 15, 2011 at 11:33 AM, Kevin Wolf kw...@redhat.com wrote: Without this, qemu segfaults when a BH handler first deletes its BH and then calls another function which involves a nested qemu_bh_poll() call. This can be reproduced by generating an I/O error (e.g. with blkdebug) on an IDE device and using rerror/werror=stop to stop the VM. When continuing the VM, qemu segfaults. Signed-off-by: Kevin Wolf kw...@redhat.com --- async.c | 5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) Almost worth switching to qemu-queue.h and FOREACH_SAFE(). Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Re: [Qemu-devel] [PATCH 00/13] QED image streaming
On Wed, Jun 15, 2011 at 11:46 AM, Philipp Hahn h...@univention.de wrote: On Tuesday 14 June 2011 20:18:18 Stefan Hajnoczi wrote: This patch series adds image streaming support for QED image files. Other image formats can also be supported in the future. Image streaming populates the file in the background while the guest is running. This makes it possible to start the guest before its image file has been fully provisioned. ... When image streaming is enabled, the unallocated regions of the image file are populated with the data from the backing file. This occurs in the background and the guest can perform regular I/O in the meantime. Once the entire backing file has been streamed, the image no longer requires a backing file and will drop its reference. Does this also work for intermediate images, that is - master image on a NFS share, - local copy on a local drive using CoR-streaming from master, - local instance based on local copy using CoW. We normally have mostly independent servers without a shared storage, expect a global NFS share for the master images. Currently we have to copy the master image to the local server first, before we than can create lots of instances from then, which have few changed relative to the master, so we don't want the copying to happen there. No, unfortunately adding another level of backing files prevents use of streaming today. The reason is because backing files are opened read-only. This ensures that multiple VMs running concurrently will not change the backing file. Image formats are typically not safe for concurrent updates from multiple applications just like regular file systems cannot be accessed from multiple hosts simultaneously. You'd need a cluster file system for that, and similarly we need multiple writers support for image formats or some other solution. We'd need a solution that makes it possible to update the local copy of the master image while multiple VMs are running. For example qemu-nbd serving the local copy of the master image and each VM image using the backing file over NBD. That way all accesses to the local copy of the master image go through qemu-nbd and streaming can be performed safely. I haven't tested this approach, it might require some improvements to qemu-nbd and I expect the performance would need work too. Thanks for raising this use case, I think it's an interesting one because you are saving disk space by keeping a local master copy. Stefan
[Qemu-devel] [PATCH v2 3/3] vdi: Avoid direct AIO callback
bdrv_aio_* must not call the callback before returning to its caller. In vdi, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf kw...@redhat.com --- v2: - Remove direct vdi_aio_read_cb() call block/vdi.c | 41 - 1 files changed, 36 insertions(+), 5 deletions(-) diff --git a/block/vdi.c b/block/vdi.c index 4c9e201..261cf9b 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -152,6 +152,7 @@ typedef struct { /* Buffer for new allocated block. */ void *block_buffer; void *orig_buf; +bool is_write; int header_modified; BlockDriverAIOCB *hd_aiocb; struct iovec hd_iov; @@ -504,6 +505,8 @@ static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, int64_t sector_num, acb-hd_aiocb = NULL; acb-sector_num = sector_num; acb-qiov = qiov; +acb-is_write = is_write; + if (qiov-niov 1) { acb-buf = qemu_blockalign(bs, qiov-size); acb-orig_buf = acb-buf; @@ -542,14 +545,20 @@ static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb) } static void vdi_aio_read_cb(void *opaque, int ret); +static void vdi_aio_write_cb(void *opaque, int ret); -static void vdi_aio_read_bh(void *opaque) +static void vdi_aio_rw_bh(void *opaque) { VdiAIOCB *acb = opaque; logout(\n); qemu_bh_delete(acb-bh); acb-bh = NULL; -vdi_aio_read_cb(opaque, 0); + +if (acb-is_write) { +vdi_aio_write_cb(opaque, 0); +} else { +vdi_aio_read_cb(opaque, 0); +} } static void vdi_aio_read_cb(void *opaque, int ret) @@ -597,7 +606,7 @@ static void vdi_aio_read_cb(void *opaque, int ret) if (bmap_entry == VDI_UNALLOCATED) { /* Block not allocated, return zeros, no need to wait. */ memset(acb-buf, 0, n_sectors * SECTOR_SIZE); -ret = vdi_schedule_bh(vdi_aio_read_bh, acb); +ret = vdi_schedule_bh(vdi_aio_rw_bh, acb); if (ret 0) { goto done; } @@ -630,12 +639,23 @@ static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque) { VdiAIOCB *acb; +int ret; + logout(\n); acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0); if (!acb) { return NULL; } -vdi_aio_read_cb(acb, 0); + +ret = vdi_schedule_bh(vdi_aio_rw_bh, acb); +if (ret 0) { +if (acb-qiov-niov 1) { +qemu_vfree(acb-orig_buf); +} +qemu_aio_release(acb); +return NULL; +} + return acb-common; } @@ -789,12 +809,23 @@ static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque) { VdiAIOCB *acb; +int ret; + logout(\n); acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1); if (!acb) { return NULL; } -vdi_aio_write_cb(acb, 0); + +ret = vdi_schedule_bh(vdi_aio_rw_bh, acb); +if (ret 0) { +if (acb-qiov-niov 1) { +qemu_vfree(acb-orig_buf); +} +qemu_aio_release(acb); +return NULL; +} + return acb-common; } -- 1.7.5.2
Re: [Qemu-devel] [RFC] Specifying storage devices for live migration
On Wed, Jun 15, 2011 at 12:06 PM, Avishay Traeger avis...@il.ibm.com wrote: Hello all, Currently there is no way to choose storage devices to be migrated. There was some discussion about making the monitor API a bit more flexible, but the code was never written: http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html Hmm...2009. Have you seen this recent thread? http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg00733.html Are you planning to extend the block migration API and fix known issues with block migration? Stefan
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Wed, Jun 15, 2011 at 4:02 PM, Stefan Hajnoczi stefa...@gmail.com wrote: We need to fully understand performance before applying optimizations on top. Otherwise it is possible to paper over a problem while leaving the root cause unsolved. Avoiding lseek(2) is very important, not just for qemu-img but also for running VMs. lseek(2) should only be invoked if the image is growable/removable. When I run a VM from a virtio-blk raw image I see no lseek(2) calls. On the host: strace -p $pid_of_qemu -f Inside the guest: dd if=/dev/vda of=/dev/null iflag=direct I see pread(2) from the posix-aio-compat.c worker threads but no lseek(2). The total_sectors cached value is being used. Does strace(1) show lseek(2) on your host? No, I don't see lseek() in the strace output. But in the other hand I see that bdrv_get_geometry() uses bdrv_getlength() which is implemented using lseek() in block/raw-posix.c... May be I'am mistaken about lseek(), but I get 9% slower version if disable caching. -- Дмитрий Конищев (Dmitry Konishchev) mailto:konishc...@gmail.com
[Qemu-devel] [PATCH v2][uq/master] kvm: x86: Save/restore FPU OP, IP and DP
These FPU states are properly maintained by KVM but not yet by TCG. So far we unconditionally set them to 0 in the guest which may cause state corruptions, though not with modern guests. To avoid breaking backward migration, use a conditional subsection that is only written if any of the three fields is non-zero. The guest's FNINIT clears them frequently, and cleared IA32_MISC_ENABLE MSR[2] reduces the probability of non-zero values further so that this subsection is not expected to restrict migration in any common scenario. Signed-off-by: Jan Kiszka jan.kis...@siemens.com --- Changes in v2: - use conditional subsection target-i386/cpu.h |4 target-i386/kvm.c | 20 +++- target-i386/machine.c | 23 +++ 3 files changed, 42 insertions(+), 5 deletions(-) diff --git a/target-i386/cpu.h b/target-i386/cpu.h index 9c3340d..cdf68ff 100644 --- a/target-i386/cpu.h +++ b/target-i386/cpu.h @@ -641,6 +641,10 @@ typedef struct CPUX86State { uint16_t fpuc; uint8_t fptags[8]; /* 0 = valid, 1 = empty */ FPReg fpregs[8]; +/* KVM-only so far */ +uint16_t fpop; +uint64_t fpip; +uint64_t fpdp; /* emulator internal variables */ float_status fp_status; diff --git a/target-i386/kvm.c b/target-i386/kvm.c index 5ebb054..938e0a3 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -718,6 +718,9 @@ static int kvm_put_fpu(CPUState *env) fpu.fsw = env-fpus ~(7 11); fpu.fsw |= (env-fpstt 7) 11; fpu.fcw = env-fpuc; +fpu.last_opcode = env-fpop; +fpu.last_ip = env-fpip; +fpu.last_dp = env-fpdp; for (i = 0; i 8; ++i) { fpu.ftwx |= (!env-fptags[i]) i; } @@ -740,7 +743,7 @@ static int kvm_put_xsave(CPUState *env) { int i, r; struct kvm_xsave* xsave; -uint16_t cwd, swd, twd, fop; +uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_put_fpu(env); @@ -748,7 +751,7 @@ static int kvm_put_xsave(CPUState *env) xsave = qemu_memalign(4096, sizeof(struct kvm_xsave)); memset(xsave, 0, sizeof(struct kvm_xsave)); -cwd = swd = twd = fop = 0; +cwd = swd = twd = 0; swd = env-fpus ~(7 11); swd |= (env-fpstt 7) 11; cwd = env-fpuc; @@ -756,7 +759,9 @@ static int kvm_put_xsave(CPUState *env) twd |= (!env-fptags[i]) i; } xsave-region[0] = (uint32_t)(swd 16) + cwd; -xsave-region[1] = (uint32_t)(fop 16) + twd; +xsave-region[1] = (uint32_t)(env-fpop 16) + twd; +memcpy(xsave-region[XSAVE_CWD_RIP], env-fpip, sizeof(env-fpip)); +memcpy(xsave-region[XSAVE_CWD_RDP], env-fpdp, sizeof(env-fpdp)); memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs, sizeof env-fpregs); memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs, @@ -921,6 +926,9 @@ static int kvm_get_fpu(CPUState *env) env-fpstt = (fpu.fsw 11) 7; env-fpus = fpu.fsw; env-fpuc = fpu.fcw; +env-fpop = fpu.last_opcode; +env-fpip = fpu.last_ip; +env-fpdp = fpu.last_dp; for (i = 0; i 8; ++i) { env-fptags[i] = !((fpu.ftwx i) 1); } @@ -935,7 +943,7 @@ static int kvm_get_xsave(CPUState *env) { struct kvm_xsave* xsave; int ret, i; -uint16_t cwd, swd, twd, fop; +uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_get_fpu(env); @@ -951,13 +959,15 @@ static int kvm_get_xsave(CPUState *env) cwd = (uint16_t)xsave-region[0]; swd = (uint16_t)(xsave-region[0] 16); twd = (uint16_t)xsave-region[1]; -fop = (uint16_t)(xsave-region[1] 16); +env-fpop = (uint16_t)(xsave-region[1] 16); env-fpstt = (swd 11) 7; env-fpus = swd; env-fpuc = cwd; for (i = 0; i 8; ++i) { env-fptags[i] = !((twd i) 1); } +memcpy(env-fpip, xsave-region[XSAVE_CWD_RIP], sizeof(env-fpip)); +memcpy(env-fpdp, xsave-region[XSAVE_CWD_RDP], sizeof(env-fpdp)); env-mxcsr = xsave-region[XSAVE_MXCSR]; memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE], sizeof env-fpregs); diff --git a/target-i386/machine.c b/target-i386/machine.c index bbeae88..d22a731 100644 --- a/target-i386/machine.c +++ b/target-i386/machine.c @@ -290,6 +290,26 @@ static const VMStateDescription vmstate_async_pf_msr = { } }; +static bool fpop_ip_dp_needed(void *opaque) +{ +CPUState *env = opaque; + +return env-fpop != 0 || env-fpip != 0 || env-fpdp != 0; +} + +static const VMStateDescription vmstate_fpop_ip_dp = { +.name = cpu/fpop_ip_dp, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_UINT16_V(fpop, CPUState, 13), +VMSTATE_UINT64_V(fpip, CPUState, 13), +VMSTATE_UINT64_V(fpdp, CPUState, 13), +VMSTATE_END_OF_LIST() +} +}; + static const VMStateDescription vmstate_cpu = { .name = cpu, .version_id = CPU_SAVE_VERSION, @@ -398,6 +418,9 @@ static const VMStateDescription vmstate_cpu
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Wed, Jun 15, 2011 at 2:14 PM, Dmitry Konishchev konishc...@gmail.com wrote: On Wed, Jun 15, 2011 at 4:02 PM, Stefan Hajnoczi stefa...@gmail.com wrote: We need to fully understand performance before applying optimizations on top. Otherwise it is possible to paper over a problem while leaving the root cause unsolved. Avoiding lseek(2) is very important, not just for qemu-img but also for running VMs. lseek(2) should only be invoked if the image is growable/removable. When I run a VM from a virtio-blk raw image I see no lseek(2) calls. On the host: strace -p $pid_of_qemu -f Inside the guest: dd if=/dev/vda of=/dev/null iflag=direct I see pread(2) from the posix-aio-compat.c worker threads but no lseek(2). The total_sectors cached value is being used. Does strace(1) show lseek(2) on your host? No, I don't see lseek() in the strace output. But in the other hand I see that bdrv_get_geometry() uses bdrv_getlength() which is implemented using lseek() in block/raw-posix.c... May be I'am mistaken about lseek(), but I get 9% slower version if disable caching. disable caching? Stefan
Re: [Qemu-devel] [PATCH] error framework: Fix compilation for w32/w64
On Mon, 13 Jun 2011 23:01:53 +0200 Stefan Weil w...@mail.berlios.de wrote: The declaration of function error_set() should use macro GCC_FMT_ATTR instead of gcc's format printf attribute. For w32/w64, both declarations are different and GCC_FMT_ATTR is needed. Compilation for w64 even failed with the original code because mingw64 defines a macro for printf. GCC_FMT_ATTR requires qemu-common.h, so add it in error.c (it's also included by error_int.h but too late). Remove assert.h which is included by qemu-common.h. Cc: Luiz Capitulino lcapitul...@redhat.com Cc: Anthony Liguori aligu...@us.ibm.com Signed-off-by: Stefan Weil w...@mail.berlios.de Applied to the monitor queue, thanks. --- error.c |3 ++- error.h |3 +-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/error.c b/error.c index 867eec2..74d7398 100644 --- a/error.c +++ b/error.c @@ -9,11 +9,12 @@ * This work is licensed under the terms of the GNU LGPL, version 2. See * the COPYING.LIB file in the top-level directory. */ + +#include qemu-common.h #include error.h #include error_int.h #include qemu-objects.h #include qerror.h -#include assert.h struct Error { diff --git a/error.h b/error.h index 003c855..0f92a6f 100644 --- a/error.h +++ b/error.h @@ -25,8 +25,7 @@ typedef struct Error Error; * Currently, qerror.h defines these error formats. This function is not * meant to be used outside of QEMU. */ -void error_set(Error **err, const char *fmt, ...) -__attribute__((format(printf, 2, 3))); +void error_set(Error **err, const char *fmt, ...) GCC_FMT_ATTR(2, 3); /** * Returns true if an indirect pointer to an error is pointing to a valid
[Qemu-devel] [PATCH v2] ide: Clear error_status after restarting flush
Clearing the error status flag was missing for restarting flushes. Now that the error status is separate from the BM status register, we can simply set it to 0 after restarting the request. This ensures that we never forget to clear a bit. Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/pci.c | 18 +++--- 1 files changed, 11 insertions(+), 7 deletions(-) diff --git a/hw/ide/pci.c b/hw/ide/pci.c index c940375..9f3050a 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -189,6 +189,7 @@ static void bmdma_restart_bh(void *opaque) BMDMAState *bm = opaque; IDEBus *bus = bm-bus; int is_read; +int error_status; qemu_bh_delete(bm-bh); bm-bh = NULL; @@ -199,22 +200,25 @@ static void bmdma_restart_bh(void *opaque) is_read = !!(bus-error_status BM_STATUS_RETRY_READ); -if (bus-error_status BM_STATUS_DMA_RETRY) { -if (bus-error_status BM_STATUS_RETRY_TRIM) { -bus-error_status = ~BM_STATUS_RETRY_TRIM; +/* The error status must be cleared before resubmitting the request: The + * request may fail again, and this case can only be distinguished if the + * called function can set a new error status. */ +error_status = bus-error_status; +bus-error_status = 0; + +if (error_status BM_STATUS_DMA_RETRY) { +if (error_status BM_STATUS_RETRY_TRIM) { bmdma_restart_dma(bm, IDE_DMA_TRIM); } else { -bus-error_status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ); bmdma_restart_dma(bm, is_read ? IDE_DMA_READ : IDE_DMA_WRITE); } -} else if (bus-error_status BM_STATUS_PIO_RETRY) { -bus-error_status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ); +} else if (error_status BM_STATUS_PIO_RETRY) { if (is_read) { ide_sector_read(bmdma_active_if(bm)); } else { ide_sector_write(bmdma_active_if(bm)); } -} else if (bus-error_status BM_STATUS_RETRY_FLUSH) { +} else if (error_status BM_STATUS_RETRY_FLUSH) { ide_flush_cache(bmdma_active_if(bm)); } } -- 1.7.5.2
[Qemu-devel] [PATCH] qemu-img: Add cache command line option
qemu-img currently writes disk images using writeback and filling up the cache buffers which are then flushed by the kernel preventing other processes from accessing the storage. This is particularly bad in cluster environments where time-based algorithms might be in place and accessing the storage within certain timeouts is critical. This patch adds the option to choose a cache method when writing disk images. Signed-off-by: Federico Simoncelli fsimo...@redhat.com --- qemu-img.c | 81 ++- 1 files changed, 68 insertions(+), 13 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 4f162d1..7609db5 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -40,6 +40,7 @@ typedef struct img_cmd_t { /* Default to cache=writeback as data integrity is not important for qemu-tcg. */ #define BDRV_O_FLAGS BDRV_O_CACHE_WB +#define BDRV_DEFAULT_CACHE writeback static void format_print(void *opaque, const char *name) { @@ -64,6 +65,8 @@ static void help(void) Command parameters:\n 'filename' is a disk image filename\n 'fmt' is the disk image format. It is guessed automatically in most cases\n + 'cache' is the cache mode used to write the output disk image, the valid\n + options are: 'none', 'writeback' (default), 'writethrough' and 'unsafe'\n 'size' is the disk image size in bytes. Optional suffixes\n 'k' or 'K' (kilobyte, 1024), 'M' (megabyte, 1024k), 'G' (gigabyte, 1024M)\n and T (terabyte, 1024G) are supported. 'b' is ignored.\n @@ -180,6 +183,27 @@ static int read_password(char *buf, int buf_size) } #endif +static int set_cache_flag(const char *mode, int *flags) +{ +*flags = ~BDRV_O_CACHE_MASK; + +if (!strcmp(mode, none) || !strcmp(mode, off)) { +*flags |= BDRV_O_CACHE_WB; +*flags |= BDRV_O_NOCACHE; +} else if (!strcmp(mode, writeback)) { +*flags |= BDRV_O_CACHE_WB; +} else if (!strcmp(mode, unsafe)) { +*flags |= BDRV_O_CACHE_WB; +*flags |= BDRV_O_NO_FLUSH; +} else if (!strcmp(mode, writethrough)) { +/* this is the default */ +} else { +return -1; +} + +return 0; +} + static int print_block_option_help(const char *filename, const char *fmt) { BlockDriver *drv, *proto_drv; @@ -441,13 +465,14 @@ static int img_check(int argc, char **argv) static int img_commit(int argc, char **argv) { -int c, ret; -const char *filename, *fmt; +int c, ret, flags; +const char *filename, *fmt, *cache; BlockDriverState *bs; fmt = NULL; +cache = BDRV_DEFAULT_CACHE; for(;;) { -c = getopt(argc, argv, f:h); +c = getopt(argc, argv, f:ht:); if (c == -1) { break; } @@ -459,6 +484,9 @@ static int img_commit(int argc, char **argv) case 'f': fmt = optarg; break; +case 't': +cache = optarg; +break; } } if (optind = argc) { @@ -466,7 +494,14 @@ static int img_commit(int argc, char **argv) } filename = argv[optind++]; -bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR); +flags = BDRV_O_RDWR; +ret = set_cache_flag(cache, flags); +if (ret 0) { +error_report(Invalid cache option: %s\n, cache); +return -1; +} + +bs = bdrv_new_open(filename, fmt, flags); if (!bs) { return 1; } @@ -591,8 +626,8 @@ static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n, static int img_convert(int argc, char **argv) { int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors; -int progress = 0; -const char *fmt, *out_fmt, *out_baseimg, *out_filename; +int progress = 0, flags; +const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; int64_t total_sectors, nb_sectors, sector_num, bs_offset; @@ -608,10 +643,11 @@ static int img_convert(int argc, char **argv) fmt = NULL; out_fmt = raw; +cache = BDRV_DEFAULT_CACHE; out_baseimg = NULL; compress = 0; for(;;) { -c = getopt(argc, argv, f:O:B:s:hce6o:p); +c = getopt(argc, argv, f:O:B:s:hce6o:pt:); if (c == -1) { break; } @@ -649,6 +685,9 @@ static int img_convert(int argc, char **argv) case 'p': progress = 1; break; +case 't': +cache = optarg; +break; } } @@ -779,8 +818,14 @@ static int img_convert(int argc, char **argv) goto out; } -out_bs = bdrv_new_open(out_filename, out_fmt, -BDRV_O_FLAGS | BDRV_O_RDWR | BDRV_O_NO_FLUSH); +flags = BDRV_O_RDWR; +ret = set_cache_flag(cache, flags); +if (ret 0) { +error_report(Invalid cache option: %s\n, cache); +
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
On Wed, Jun 15, 2011 at 5:33 PM, Stefan Hajnoczi stefa...@gmail.com wrote: disable caching? Image geometry caching. I meant If I call bdrv_get_geometry() every time I need image geometry instead of obtaining it from bs_geometry variable. -- Дмитрий Конищев (Dmitry Konishchev) mailto:konishc...@gmail.com
Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()
2011/6/15 Dmitry Konishchev konishc...@gmail.com: On Wed, Jun 15, 2011 at 5:33 PM, Stefan Hajnoczi stefa...@gmail.com wrote: disable caching? Image geometry caching. I meant If I call bdrv_get_geometry() every time I need image geometry instead of obtaining it from bs_geometry variable. Haha, sorry. Too much caching: -drive cache=none, total_sectors cached value, bdrv_get_geometry() cached values :). Anyway, bdrv_getlength() will return the total_sectors value instead of calling into raw-posix.c .bdrv_getlength(). That's why it should be cheap. Stefan
[Qemu-devel] [PATCHv2] qemu-img: Add cache command line option
qemu-img currently writes disk images using writeback and filling up the cache buffers which are then flushed by the kernel preventing other processes from accessing the storage. This is particularly bad in cluster environments where time-based algorithms might be in place and accessing the storage within certain timeouts is critical. This patch adds the option to choose a cache method when writing disk images. Signed-off-by: Federico Simoncelli fsimo...@redhat.com --- qemu-img.c | 80 ++- 1 files changed, 67 insertions(+), 13 deletions(-) diff --git a/qemu-img.c b/qemu-img.c index 4f162d1..c2a106e 100644 --- a/qemu-img.c +++ b/qemu-img.c @@ -40,6 +40,7 @@ typedef struct img_cmd_t { /* Default to cache=writeback as data integrity is not important for qemu-tcg. */ #define BDRV_O_FLAGS BDRV_O_CACHE_WB +#define BDRV_DEFAULT_CACHE writeback static void format_print(void *opaque, const char *name) { @@ -64,6 +65,8 @@ static void help(void) Command parameters:\n 'filename' is a disk image filename\n 'fmt' is the disk image format. It is guessed automatically in most cases\n + 'cache' is the cache mode used to write the output disk image, the valid\n + options are: 'none', 'writeback' (default), 'writethrough' and 'unsafe'\n 'size' is the disk image size in bytes. Optional suffixes\n 'k' or 'K' (kilobyte, 1024), 'M' (megabyte, 1024k), 'G' (gigabyte, 1024M)\n and T (terabyte, 1024G) are supported. 'b' is ignored.\n @@ -180,6 +183,27 @@ static int read_password(char *buf, int buf_size) } #endif +static int set_cache_flag(const char *mode, int *flags) +{ +*flags = ~BDRV_O_CACHE_MASK; + +if (!strcmp(mode, none) || !strcmp(mode, off)) { +*flags |= BDRV_O_CACHE_WB; +*flags |= BDRV_O_NOCACHE; +} else if (!strcmp(mode, writeback)) { +*flags |= BDRV_O_CACHE_WB; +} else if (!strcmp(mode, unsafe)) { +*flags |= BDRV_O_CACHE_WB; +*flags |= BDRV_O_NO_FLUSH; +} else if (!strcmp(mode, writethrough)) { +/* this is the default */ +} else { +return -1; +} + +return 0; +} + static int print_block_option_help(const char *filename, const char *fmt) { BlockDriver *drv, *proto_drv; @@ -441,13 +465,14 @@ static int img_check(int argc, char **argv) static int img_commit(int argc, char **argv) { -int c, ret; -const char *filename, *fmt; +int c, ret, flags; +const char *filename, *fmt, *cache; BlockDriverState *bs; fmt = NULL; +cache = BDRV_DEFAULT_CACHE; for(;;) { -c = getopt(argc, argv, f:h); +c = getopt(argc, argv, f:ht:); if (c == -1) { break; } @@ -459,6 +484,9 @@ static int img_commit(int argc, char **argv) case 'f': fmt = optarg; break; +case 't': +cache = optarg; +break; } } if (optind = argc) { @@ -466,7 +494,14 @@ static int img_commit(int argc, char **argv) } filename = argv[optind++]; -bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR); +flags = BDRV_O_RDWR; +ret = set_cache_flag(cache, flags); +if (ret 0) { +error_report(Invalid cache option: %s\n, cache); +return -1; +} + +bs = bdrv_new_open(filename, fmt, flags); if (!bs) { return 1; } @@ -591,8 +626,8 @@ static int compare_sectors(const uint8_t *buf1, const uint8_t *buf2, int n, static int img_convert(int argc, char **argv) { int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors; -int progress = 0; -const char *fmt, *out_fmt, *out_baseimg, *out_filename; +int progress = 0, flags; +const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename; BlockDriver *drv, *proto_drv; BlockDriverState **bs = NULL, *out_bs = NULL; int64_t total_sectors, nb_sectors, sector_num, bs_offset; @@ -608,10 +643,11 @@ static int img_convert(int argc, char **argv) fmt = NULL; out_fmt = raw; +cache = BDRV_DEFAULT_CACHE; out_baseimg = NULL; compress = 0; for(;;) { -c = getopt(argc, argv, f:O:B:s:hce6o:p); +c = getopt(argc, argv, f:O:B:s:hce6o:pt:); if (c == -1) { break; } @@ -649,6 +685,9 @@ static int img_convert(int argc, char **argv) case 'p': progress = 1; break; +case 't': +cache = optarg; +break; } } @@ -779,8 +818,14 @@ static int img_convert(int argc, char **argv) goto out; } -out_bs = bdrv_new_open(out_filename, out_fmt, -BDRV_O_FLAGS | BDRV_O_RDWR | BDRV_O_NO_FLUSH); +flags = BDRV_O_RDWR; +ret = set_cache_flag(cache, flags); +if (ret 0) { +error_report(Invalid cache option: %s\n, cache); +
[Qemu-devel] [PATCH 04/14] vdi: Avoid direct AIO callback
bdrv_aio_* must not call the callback before returning to its caller. In vdi, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf kw...@redhat.com --- block/vdi.c | 41 - 1 files changed, 36 insertions(+), 5 deletions(-) diff --git a/block/vdi.c b/block/vdi.c index 4c9e201..261cf9b 100644 --- a/block/vdi.c +++ b/block/vdi.c @@ -152,6 +152,7 @@ typedef struct { /* Buffer for new allocated block. */ void *block_buffer; void *orig_buf; +bool is_write; int header_modified; BlockDriverAIOCB *hd_aiocb; struct iovec hd_iov; @@ -504,6 +505,8 @@ static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, int64_t sector_num, acb-hd_aiocb = NULL; acb-sector_num = sector_num; acb-qiov = qiov; +acb-is_write = is_write; + if (qiov-niov 1) { acb-buf = qemu_blockalign(bs, qiov-size); acb-orig_buf = acb-buf; @@ -542,14 +545,20 @@ static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb) } static void vdi_aio_read_cb(void *opaque, int ret); +static void vdi_aio_write_cb(void *opaque, int ret); -static void vdi_aio_read_bh(void *opaque) +static void vdi_aio_rw_bh(void *opaque) { VdiAIOCB *acb = opaque; logout(\n); qemu_bh_delete(acb-bh); acb-bh = NULL; -vdi_aio_read_cb(opaque, 0); + +if (acb-is_write) { +vdi_aio_write_cb(opaque, 0); +} else { +vdi_aio_read_cb(opaque, 0); +} } static void vdi_aio_read_cb(void *opaque, int ret) @@ -597,7 +606,7 @@ static void vdi_aio_read_cb(void *opaque, int ret) if (bmap_entry == VDI_UNALLOCATED) { /* Block not allocated, return zeros, no need to wait. */ memset(acb-buf, 0, n_sectors * SECTOR_SIZE); -ret = vdi_schedule_bh(vdi_aio_read_bh, acb); +ret = vdi_schedule_bh(vdi_aio_rw_bh, acb); if (ret 0) { goto done; } @@ -630,12 +639,23 @@ static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque) { VdiAIOCB *acb; +int ret; + logout(\n); acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0); if (!acb) { return NULL; } -vdi_aio_read_cb(acb, 0); + +ret = vdi_schedule_bh(vdi_aio_rw_bh, acb); +if (ret 0) { +if (acb-qiov-niov 1) { +qemu_vfree(acb-orig_buf); +} +qemu_aio_release(acb); +return NULL; +} + return acb-common; } @@ -789,12 +809,23 @@ static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque) { VdiAIOCB *acb; +int ret; + logout(\n); acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1); if (!acb) { return NULL; } -vdi_aio_write_cb(acb, 0); + +ret = vdi_schedule_bh(vdi_aio_rw_bh, acb); +if (ret 0) { +if (acb-qiov-niov 1) { +qemu_vfree(acb-orig_buf); +} +qemu_aio_release(acb); +return NULL; +} + return acb-common; } -- 1.7.5.2
[Qemu-devel] [PATCH 09/14] ide: Add forgotten VMSTATE_END_OF_LIST in subsection
Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/core.c |1 + 1 files changed, 1 insertions(+), 0 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index e5def8b..399b74c 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1864,6 +1864,7 @@ const VMStateDescription vmstate_ide_atapi_gesn_state = { .fields = (VMStateField []) { VMSTATE_BOOL(events.new_media, IDEState), VMSTATE_BOOL(events.eject_request, IDEState), +VMSTATE_END_OF_LIST() } }; -- 1.7.5.2
[Qemu-devel] [PATCH 06/14] qcow2: Fix in-flight list after qcow2_cache_put failure
If qcow2_cache_put returns an error during cluster allocation and the allocation fails, it must be removed from the list of in-flight allocations. Otherwise we'd get a loop in the list when the ACB is used for the next allocation. Luckily, this qcow2_cache_put shouldn't fail anyway because the L2 table is only read, so that qcow2_cache_put doesn't even involve I/O. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Christoph Hellwig h...@lst.de --- block/qcow2-cluster.c | 12 1 files changed, 8 insertions(+), 4 deletions(-) diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c index c9e7bbd..882f50a 100644 --- a/block/qcow2-cluster.c +++ b/block/qcow2-cluster.c @@ -796,8 +796,8 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset, m-depends_on = old_alloc; m-nb_clusters = 0; *num = 0; -ret = 0; -goto fail; + +goto out_wait_dependency; } } } @@ -812,7 +812,6 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset, cluster_offset = qcow2_alloc_clusters(bs, nb_clusters * s-cluster_size); if (cluster_offset 0) { -QLIST_REMOVE(m, next_in_flight); ret = cluster_offset; goto fail; } @@ -825,7 +824,7 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, uint64_t offset, out: ret = qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table); if (ret 0) { -return ret; +goto fail_put; } m-nb_available = MIN(nb_clusters (s-cluster_bits - 9), n_end); @@ -835,8 +834,13 @@ out: return 0; +out_wait_dependency: +return qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table); + fail: qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table); +fail_put: +QLIST_REMOVE(m, next_in_flight); return ret; } -- 1.7.5.2
[Qemu-devel] [PATCH 01/14] block/rbd: Remove unused local variable
From: Stefan Weil w...@mail.berlios.de Variable 'snap' is assigned a value that is never used. Remove snap and the related code. Cc: Christian Brunner c...@muc.de Cc: Josh Durgin josh.dur...@dreamhost.com Cc: Kevin Wolf kw...@redhat.com Signed-off-by: Stefan Weil w...@mail.berlios.de Reviewed-by: Josh Durgin josh.dur...@dreamhost.com Signed-off-by: Kevin Wolf kw...@redhat.com --- block/rbd.c |4 1 files changed, 0 insertions(+), 4 deletions(-) diff --git a/block/rbd.c b/block/rbd.c index bdc448a..d5659cd 100644 --- a/block/rbd.c +++ b/block/rbd.c @@ -227,7 +227,6 @@ static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options) char name[RBD_MAX_IMAGE_NAME_SIZE]; char snap_buf[RBD_MAX_SNAP_NAME_SIZE]; char conf[RBD_MAX_CONF_SIZE]; -char *snap = NULL; rados_t cluster; rados_ioctx_t io_ctx; int ret; @@ -238,9 +237,6 @@ static int qemu_rbd_create(const char *filename, QEMUOptionParameter *options) conf, sizeof(conf)) 0) { return -EINVAL; } -if (snap_buf[0] != '\0') { -snap = snap_buf; -} /* Read out options */ while (options options-name) { -- 1.7.5.2
[Qemu-devel] [PATCH 02/14] qcow2: Avoid direct AIO callback
bdrv_aio_* must not call the callback before returning to its caller. In qcow2, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf kw...@redhat.com --- block/qcow2.c | 39 ++- 1 files changed, 30 insertions(+), 9 deletions(-) diff --git a/block/qcow2.c b/block/qcow2.c index 8451ded..2c51e7c 100644 --- a/block/qcow2.c +++ b/block/qcow2.c @@ -378,6 +378,7 @@ typedef struct QCowAIOCB { uint64_t bytes_done; uint64_t cluster_offset; uint8_t *cluster_data; +bool is_write; BlockDriverAIOCB *hd_aiocb; QEMUIOVector hd_qiov; QEMUBH *bh; @@ -399,12 +400,19 @@ static AIOPool qcow2_aio_pool = { }; static void qcow2_aio_read_cb(void *opaque, int ret); -static void qcow2_aio_read_bh(void *opaque) +static void qcow2_aio_write_cb(void *opaque, int ret); + +static void qcow2_aio_rw_bh(void *opaque) { QCowAIOCB *acb = opaque; qemu_bh_delete(acb-bh); acb-bh = NULL; -qcow2_aio_read_cb(opaque, 0); + +if (acb-is_write) { +qcow2_aio_write_cb(opaque, 0); +} else { +qcow2_aio_read_cb(opaque, 0); +} } static int qcow2_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb) @@ -493,14 +501,14 @@ static void qcow2_aio_read_cb(void *opaque, int ret) goto done; } } else { -ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb); +ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb); if (ret 0) goto done; } } else { /* Note: in this case, no need to wait */ qemu_iovec_memset(acb-hd_qiov, 0, 512 * acb-cur_nr_sectors); -ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb); +ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb); if (ret 0) goto done; } @@ -515,7 +523,7 @@ static void qcow2_aio_read_cb(void *opaque, int ret) s-cluster_cache + index_in_cluster * 512, 512 * acb-cur_nr_sectors); -ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb); +ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb); if (ret 0) goto done; } else { @@ -572,6 +580,7 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, int64_t sector_num, acb-hd_aiocb = NULL; acb-sector_num = sector_num; acb-qiov = qiov; +acb-is_write = is_write; qemu_iovec_init(acb-hd_qiov, qiov-niov); @@ -591,17 +600,22 @@ static BlockDriverAIOCB *qcow2_aio_readv(BlockDriverState *bs, void *opaque) { QCowAIOCB *acb; +int ret; acb = qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0); if (!acb) return NULL; -qcow2_aio_read_cb(acb, 0); +ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb); +if (ret 0) { +qemu_iovec_destroy(acb-hd_qiov); +qemu_aio_release(acb); +return NULL; +} + return acb-common; } -static void qcow2_aio_write_cb(void *opaque, int ret); - static void run_dependent_requests(QCowL2Meta *m) { QCowAIOCB *req; @@ -724,6 +738,7 @@ static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState *bs, { BDRVQcowState *s = bs-opaque; QCowAIOCB *acb; +int ret; s-cluster_cache_offset = -1; /* disable compressed cache */ @@ -731,7 +746,13 @@ static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState *bs, if (!acb) return NULL; -qcow2_aio_write_cb(acb, 0); +ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb); +if (ret 0) { +qemu_iovec_destroy(acb-hd_qiov); +qemu_aio_release(acb); +return NULL; +} + return acb-common; } -- 1.7.5.2
[Qemu-devel] [PATCH 03/14] qcow: Avoid direct AIO callback
bdrv_aio_* must not call the callback before returning to its caller. In qcow, this could happen in some error cases. This starts the real requests processing in a BH to avoid this situation. Signed-off-by: Kevin Wolf kw...@redhat.com --- block/qcow.c | 58 -- 1 files changed, 56 insertions(+), 2 deletions(-) diff --git a/block/qcow.c b/block/qcow.c index a26c886..227b104 100644 --- a/block/qcow.c +++ b/block/qcow.c @@ -496,6 +496,8 @@ typedef struct QCowAIOCB { uint64_t cluster_offset; uint8_t *cluster_data; struct iovec hd_iov; +bool is_write; +QEMUBH *bh; QEMUIOVector hd_qiov; BlockDriverAIOCB *hd_aiocb; } QCowAIOCB; @@ -525,6 +527,8 @@ static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs, acb-hd_aiocb = NULL; acb-sector_num = sector_num; acb-qiov = qiov; +acb-is_write = is_write; + if (qiov-niov 1) { acb-buf = acb-orig_buf = qemu_blockalign(bs, qiov-size); if (is_write) @@ -538,6 +542,38 @@ static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs, return acb; } +static void qcow_aio_read_cb(void *opaque, int ret); +static void qcow_aio_write_cb(void *opaque, int ret); + +static void qcow_aio_rw_bh(void *opaque) +{ +QCowAIOCB *acb = opaque; +qemu_bh_delete(acb-bh); +acb-bh = NULL; + +if (acb-is_write) { +qcow_aio_write_cb(opaque, 0); +} else { +qcow_aio_read_cb(opaque, 0); +} +} + +static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb) +{ +if (acb-bh) { +return -EIO; +} + +acb-bh = qemu_bh_new(cb, acb); +if (!acb-bh) { +return -EIO; +} + +qemu_bh_schedule(acb-bh); + +return 0; +} + static void qcow_aio_read_cb(void *opaque, int ret) { QCowAIOCB *acb = opaque; @@ -640,12 +676,21 @@ static BlockDriverAIOCB *qcow_aio_readv(BlockDriverState *bs, BlockDriverCompletionFunc *cb, void *opaque) { QCowAIOCB *acb; +int ret; acb = qcow_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0); if (!acb) return NULL; -qcow_aio_read_cb(acb, 0); +ret = qcow_schedule_bh(qcow_aio_rw_bh, acb); +if (ret 0) { +if (acb-qiov-niov 1) { +qemu_vfree(acb-orig_buf); +} +qemu_aio_release(acb); +return NULL; +} + return acb-common; } @@ -725,6 +770,7 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs, { BDRVQcowState *s = bs-opaque; QCowAIOCB *acb; +int ret; s-cluster_cache_offset = -1; /* disable compressed cache */ @@ -733,7 +779,15 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState *bs, return NULL; -qcow_aio_write_cb(acb, 0); +ret = qcow_schedule_bh(qcow_aio_rw_bh, acb); +if (ret 0) { +if (acb-qiov-niov 1) { +qemu_vfree(acb-orig_buf); +} +qemu_aio_release(acb); +return NULL; +} + return acb-common; } -- 1.7.5.2
[Qemu-devel] [PATCH 05/14] Replaced tabs with spaces in block.h and block_int.h
From: Devin Nakamura devin...@gmail.com Signed-off-by: Devin Nakamura devin...@gmail.com Signed-off-by: Kevin Wolf kw...@redhat.com --- block.h |6 +++--- block_int.h |4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/block.h b/block.h index da7d39c..859d1d9 100644 --- a/block.h +++ b/block.h @@ -110,7 +110,7 @@ int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res); typedef struct BlockDriverAIOCB BlockDriverAIOCB; typedef void BlockDriverCompletionFunc(void *opaque, int ret); typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector, -int sector_num); + int sector_num); BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *iov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque); @@ -118,7 +118,7 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, int64_t sector_num, QEMUIOVector *iov, int nb_sectors, BlockDriverCompletionFunc *cb, void *opaque); BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs, -BlockDriverCompletionFunc *cb, void *opaque); + BlockDriverCompletionFunc *cb, void *opaque); void bdrv_aio_cancel(BlockDriverAIOCB *acb); typedef struct BlockRequest { @@ -150,7 +150,7 @@ void bdrv_close_all(void); int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors); int bdrv_has_zero_init(BlockDriverState *bs); int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors, - int *pnum); + int *pnum); #define BIOS_ATA_TRANSLATION_AUTO 0 #define BIOS_ATA_TRANSLATION_NONE 1 diff --git a/block_int.h b/block_int.h index fa91337..1e265d2 100644 --- a/block_int.h +++ b/block_int.h @@ -203,8 +203,8 @@ struct BlockDriverState { void *private; }; -#define CHANGE_MEDIA 0x01 -#define CHANGE_SIZE0x02 +#define CHANGE_MEDIA0x01 +#define CHANGE_SIZE 0x02 struct BlockDriverAIOCB { AIOPool *pool; -- 1.7.5.2
[Qemu-devel] [PULL 00/14] Block patches
The following changes since commit 0b862cedf36d927818c50584ddd611b0370673df: configure: Detect and don't try to use older libcurl (2011-06-13 21:16:27 +0200) are available in the git repository at: git://repo.or.cz/qemu/kevin.git for-anthony Christoph Hellwig (3): make dma_bdrv_io available to drivers ide: allow other dma comands than read and write ide: add TRIM support Devin Nakamura (1): Replaced tabs with spaces in block.h and block_int.h Kevin Wolf (9): qcow2: Avoid direct AIO callback qcow: Avoid direct AIO callback vdi: Avoid direct AIO callback qcow2: Fix in-flight list after qcow2_cache_put failure ide: Split error status from status register ide: Fix ide_drive_pio_state_needed() ide: Add forgotten VMSTATE_END_OF_LIST in subsection ide: Clear error_status after restarting flush Allow nested qemu_bh_poll() after BH deletion Stefan Weil (1): block/rbd: Remove unused local variable async.c |5 +- block.h |6 +- block/qcow.c | 58 ++- block/qcow2-cluster.c | 12 +++- block/qcow2.c | 39 ++--- block/rbd.c |4 - block/vdi.c | 41 +++-- block_int.h |4 +- dma-helpers.c | 23 +++ dma.h |8 +++ hw/ide/core.c | 154 - hw/ide/internal.h | 32 ++- hw/ide/macio.c| 13 - hw/ide/pci.c | 88 +--- hw/ide/pci.h |4 + hw/ide/qdev.c |5 ++ 16 files changed, 423 insertions(+), 73 deletions(-)
[Qemu-devel] [PATCH 08/14] ide: Fix ide_drive_pio_state_needed()
When a failed PIO request caused the VM to stop, we still need to transfer the PIO state even though DRQ=0 at this point. Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/core.c |3 ++- 1 files changed, 2 insertions(+), 1 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index da250ac..e5def8b 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -1837,7 +1837,8 @@ static bool ide_drive_pio_state_needed(void *opaque) { IDEState *s = opaque; -return (s-status DRQ_STAT) != 0; +return ((s-status DRQ_STAT) != 0) +|| (s-bus-error_status BM_STATUS_PIO_RETRY); } static bool ide_atapi_gesn_needed(void *opaque) -- 1.7.5.2
[Qemu-devel] [PATCH 11/14] ide: allow other dma comands than read and write
From: Christoph Hellwig h...@lst.de Replace the is_read flag with a dma_cmd flag to allow the dma and restart logic to handler other commands like TRIM. Signed-off-by: Christoph Hellwig h...@lst.de Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/core.c | 25 ++--- hw/ide/internal.h | 10 +- hw/ide/macio.c|9 +++-- hw/ide/pci.c |6 +++--- 4 files changed, 33 insertions(+), 17 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 399b74c..14bda82 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -472,7 +472,7 @@ handle_rw_error: if (ret 0) { int op = BM_STATUS_DMA_RETRY; -if (s-is_read) +if (s-dma_cmd == IDE_DMA_READ) op |= BM_STATUS_RETRY_READ; if (ide_handle_rw_error(s, -ret, op)) { return; @@ -482,7 +482,7 @@ handle_rw_error: n = s-io_buffer_size 9; sector_num = ide_get_sector(s); if (n 0) { -dma_buf_commit(s, s-is_read); +dma_buf_commit(s, ide_cmd_is_read(s)); sector_num += n; ide_set_sector(s, sector_num); s-nsector -= n; @@ -499,23 +499,26 @@ handle_rw_error: n = s-nsector; s-io_buffer_index = 0; s-io_buffer_size = n * 512; -if (s-bus-dma-ops-prepare_buf(s-bus-dma, s-is_read) == 0) { +if (s-bus-dma-ops-prepare_buf(s-bus-dma, ide_cmd_is_read(s)) == 0) { /* The PRDs were too short. Reset the Active bit, but don't raise an * interrupt. */ goto eot; } #ifdef DEBUG_AIO -printf(ide_dma_cb: sector_num=% PRId64 n=%d, is_read=%d\n, - sector_num, n, s-is_read); +printf(ide_dma_cb: sector_num=% PRId64 n=%d, cmd_cmd=%d\n, + sector_num, n, s-dma_cmd); #endif -if (s-is_read) { +switch (s-dma_cmd) { +case IDE_DMA_READ: s-bus-dma-aiocb = dma_bdrv_read(s-bs, s-sg, sector_num, ide_dma_cb, s); -} else { +break; +case IDE_DMA_WRITE: s-bus-dma-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num, ide_dma_cb, s); +break; } if (!s-bus-dma-aiocb) { @@ -528,12 +531,12 @@ eot: ide_set_inactive(s); } -static void ide_sector_start_dma(IDEState *s, int is_read) +static void ide_sector_start_dma(IDEState *s, enum ide_dma_cmd dma_cmd) { s-status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT; s-io_buffer_index = 0; s-io_buffer_size = 0; -s-is_read = is_read; +s-dma_cmd = dma_cmd; s-bus-dma-ops-start_dma(s-bus-dma, s, ide_dma_cb); } @@ -916,7 +919,7 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) if (!s-bs) goto abort_cmd; ide_cmd_lba48_transform(s, lba48); -ide_sector_start_dma(s, 1); +ide_sector_start_dma(s, IDE_DMA_READ); break; case WIN_WRITEDMA_EXT: lba48 = 1; @@ -925,7 +928,7 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) if (!s-bs) goto abort_cmd; ide_cmd_lba48_transform(s, lba48); -ide_sector_start_dma(s, 0); +ide_sector_start_dma(s, IDE_DMA_WRITE); s-media_changed = 1; break; case WIN_READ_NATIVE_MAX_EXT: diff --git a/hw/ide/internal.h b/hw/ide/internal.h index 8d18cc3..ea3edf5 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -379,6 +379,14 @@ struct unreported_events { bool new_media; }; +enum ide_dma_cmd { +IDE_DMA_READ, +IDE_DMA_WRITE, +}; + +#define ide_cmd_is_read(s) \ + ((s)-dma_cmd == IDE_DMA_READ) + /* NOTE: IDEState represents in fact one drive */ struct IDEState { IDEBus *bus; @@ -446,7 +454,7 @@ struct IDEState { uint32_t mdata_size; uint8_t *mdata_storage; int media_changed; -int is_read; +enum ide_dma_cmd dma_cmd; /* SMART */ uint8_t smart_enabled; uint8_t smart_autosave; diff --git a/hw/ide/macio.c b/hw/ide/macio.c index 7107f6b..099b8cb 100644 --- a/hw/ide/macio.c +++ b/hw/ide/macio.c @@ -145,12 +145,17 @@ static void pmac_ide_transfer_cb(void *opaque, int ret) io-addr += io-len; io-len = 0; -if (s-is_read) +switch (s-dma_cmd) { +case IDE_DMA_READ: m-aiocb = dma_bdrv_read(s-bs, s-sg, sector_num, pmac_ide_transfer_cb, io); -else +break; +case IDE_DMA_WRITE: m-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num, pmac_ide_transfer_cb, io); +break; +} + if (!m-aiocb) pmac_ide_transfer_cb(io, -1); } diff --git a/hw/ide/pci.c b/hw/ide/pci.c index 7fa32bd..80c5794 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -169,7 +169,7 @@ static int bmdma_set_inactive(IDEDMA *dma) return 0; } -static void bmdma_restart_dma(BMDMAState *bm, int is_read) +static void bmdma_restart_dma(BMDMAState *bm, enum ide_dma_cmd dma_cmd) { IDEState *s = bmdma_active_if(bm); @@ -177,7 +177,7 @@
[Qemu-devel] [PATCH 07/14] ide: Split error status from status register
When adding the werror=stop mode, some flags were added to s-status which are used to determine what kind of operation should be restarted when the VM is continued. Unfortunately, it turns out that s-status is in fact a device register and as such is visible to the guest (some of the abused bits are even writable for the guest). For migration we keep on using the old VMState field (renamed to migration_compat_status) if the status register doesn't use any of the previously abused bits. If it does, we use a subsection with a clean copy of the status register. The error status is always sent in a subsection if there is any error. It can't use the old field because errors happen even without PCI. Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/core.c | 28 +++- hw/ide/internal.h |8 ++ hw/ide/pci.c | 73 +++- hw/ide/pci.h |4 +++ 4 files changed, 105 insertions(+), 8 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 95beb17..da250ac 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -446,7 +446,7 @@ static int ide_handle_rw_error(IDEState *s, int error, int op) if ((error == ENOSPC action == BLOCK_ERR_STOP_ENOSPC) || action == BLOCK_ERR_STOP_ANY) { s-bus-dma-ops-set_unit(s-bus-dma, s-unit); -s-bus-dma-ops-add_status(s-bus-dma, op); +s-bus-error_status = op; bdrv_mon_event(s-bs, BDRV_ACTION_STOP, is_read); vm_stop(VMSTOP_DISKFULL); } else { @@ -1847,6 +1847,13 @@ static bool ide_atapi_gesn_needed(void *opaque) return s-events.new_media || s-events.eject_request; } +static bool ide_error_needed(void *opaque) +{ +IDEBus *bus = opaque; + +return (bus-error_status != 0); +} + /* Fields for GET_EVENT_STATUS_NOTIFICATION ATAPI command */ const VMStateDescription vmstate_ide_atapi_gesn_state = { .name =ide_drive/atapi/gesn_state, @@ -1921,6 +1928,17 @@ const VMStateDescription vmstate_ide_drive = { } }; +const VMStateDescription vmstate_ide_error_status = { +.name =ide_bus/error, +.version_id = 1, +.minimum_version_id = 1, +.minimum_version_id_old = 1, +.fields = (VMStateField []) { +VMSTATE_INT32(error_status, IDEBus), +VMSTATE_END_OF_LIST() +} +}; + const VMStateDescription vmstate_ide_bus = { .name = ide_bus, .version_id = 1, @@ -1930,6 +1948,14 @@ const VMStateDescription vmstate_ide_bus = { VMSTATE_UINT8(cmd, IDEBus), VMSTATE_UINT8(unit, IDEBus), VMSTATE_END_OF_LIST() +}, +.subsections = (VMStateSubsection []) { +{ +.vmsd = vmstate_ide_error_status, +.needed = ide_error_needed, +}, { +/* empty */ +} } }; diff --git a/hw/ide/internal.h b/hw/ide/internal.h index c2b35ec..8d18cc3 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -486,6 +486,8 @@ struct IDEBus { uint8_t unit; uint8_t cmd; qemu_irq irq; + +int error_status; }; struct IDEDevice { @@ -505,11 +507,17 @@ struct IDEDeviceInfo { #define BM_STATUS_DMAING 0x01 #define BM_STATUS_ERROR 0x02 #define BM_STATUS_INT0x04 + +/* FIXME These are not status register bits */ #define BM_STATUS_DMA_RETRY 0x08 #define BM_STATUS_PIO_RETRY 0x10 #define BM_STATUS_RETRY_READ 0x20 #define BM_STATUS_RETRY_FLUSH 0x40 +#define BM_MIGRATION_COMPAT_STATUS_BITS \ +(BM_STATUS_DMA_RETRY | BM_STATUS_PIO_RETRY | \ +BM_STATUS_RETRY_READ | BM_STATUS_RETRY_FLUSH) + #define BM_CMD_START 0x01 #define BM_CMD_READ 0x08 diff --git a/hw/ide/pci.c b/hw/ide/pci.c index a4726ad..7fa32bd 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -183,27 +183,33 @@ static void bmdma_restart_dma(BMDMAState *bm, int is_read) bmdma_start_dma(bm-dma, s, bm-dma_cb); } +/* TODO This should be common IDE code */ static void bmdma_restart_bh(void *opaque) { BMDMAState *bm = opaque; +IDEBus *bus = bm-bus; int is_read; qemu_bh_delete(bm-bh); bm-bh = NULL; -is_read = !!(bm-status BM_STATUS_RETRY_READ); +if (bm-unit == (uint8_t) -1) { +return; +} -if (bm-status BM_STATUS_DMA_RETRY) { -bm-status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ); +is_read = !!(bus-error_status BM_STATUS_RETRY_READ); + +if (bus-error_status BM_STATUS_DMA_RETRY) { +bus-error_status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ); bmdma_restart_dma(bm, is_read); -} else if (bm-status BM_STATUS_PIO_RETRY) { -bm-status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ); +} else if (bus-error_status BM_STATUS_PIO_RETRY) { +bus-error_status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ); if (is_read) { ide_sector_read(bmdma_active_if(bm)); } else { ide_sector_write(bmdma_active_if(bm)); } -} else if (bm-status BM_STATUS_RETRY_FLUSH)
Re: [Qemu-devel] [PATCH] Fix signal handling of SIG_IPI when io-thread is enabled
Hi Jan, Why? Ahh, because of qemu_cpu_kick_self: raise(SIG_IPI)! That should generate a per-process SIG_IPI. And that may not only affect Darwin. Looks good. Actually, with io-thread enabled, it goes through qemu_cpu_kick_self() - qemu_cpu_kick_thread() - pthread_kill(..., SIG_IPI). I think the problem is with sigwait(). It doesn't state so in the Linux or Darwin man pages, but on Solaris, it says : All signals identified by the set argument must be blocked on all threads, including the calling thread; otherwise, sigwait() might not work correctly, which might correspond to the issue I've been witnessing (ie: sigwait() unblocking once in a while on a SIGUSR1 (SIG_IPI) in the event thread). In any case, I don't think it should attempt to catch this signal at all since the cpu thread is already catching it. Alexandre
[Qemu-devel] [PATCH 13/14] ide: Clear error_status after restarting flush
Clearing the error status flag was missing for restarting flushes. Now that the error status is separate from the BM status register, we can simply set it to 0 after restarting the request. This ensures that we never forget to clear a bit. Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/pci.c | 18 +++--- 1 files changed, 11 insertions(+), 7 deletions(-) diff --git a/hw/ide/pci.c b/hw/ide/pci.c index c940375..9f3050a 100644 --- a/hw/ide/pci.c +++ b/hw/ide/pci.c @@ -189,6 +189,7 @@ static void bmdma_restart_bh(void *opaque) BMDMAState *bm = opaque; IDEBus *bus = bm-bus; int is_read; +int error_status; qemu_bh_delete(bm-bh); bm-bh = NULL; @@ -199,22 +200,25 @@ static void bmdma_restart_bh(void *opaque) is_read = !!(bus-error_status BM_STATUS_RETRY_READ); -if (bus-error_status BM_STATUS_DMA_RETRY) { -if (bus-error_status BM_STATUS_RETRY_TRIM) { -bus-error_status = ~BM_STATUS_RETRY_TRIM; +/* The error status must be cleared before resubmitting the request: The + * request may fail again, and this case can only be distinguished if the + * called function can set a new error status. */ +error_status = bus-error_status; +bus-error_status = 0; + +if (error_status BM_STATUS_DMA_RETRY) { +if (error_status BM_STATUS_RETRY_TRIM) { bmdma_restart_dma(bm, IDE_DMA_TRIM); } else { -bus-error_status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ); bmdma_restart_dma(bm, is_read ? IDE_DMA_READ : IDE_DMA_WRITE); } -} else if (bus-error_status BM_STATUS_PIO_RETRY) { -bus-error_status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ); +} else if (error_status BM_STATUS_PIO_RETRY) { if (is_read) { ide_sector_read(bmdma_active_if(bm)); } else { ide_sector_write(bmdma_active_if(bm)); } -} else if (bus-error_status BM_STATUS_RETRY_FLUSH) { +} else if (error_status BM_STATUS_RETRY_FLUSH) { ide_flush_cache(bmdma_active_if(bm)); } } -- 1.7.5.2
[Qemu-devel] [PATCH 10/14] make dma_bdrv_io available to drivers
From: Christoph Hellwig h...@lst.de Make dma_bdrv_io available for drivers, and pass an explicit I/O function instead of hardcoding bdrv_aio_readv/bdrv_aio_writev. This is required to implement non-READ/WRITE dma commands in the ide driver, e.g. the upcoming TRIM support. Signed-off-by: Christoph Hellwig h...@lst.de Signed-off-by: Kevin Wolf kw...@redhat.com --- dma-helpers.c | 23 ++- dma.h |8 2 files changed, 18 insertions(+), 13 deletions(-) diff --git a/dma-helpers.c b/dma-helpers.c index 712ed89..ba7f897 100644 --- a/dma-helpers.c +++ b/dma-helpers.c @@ -47,6 +47,7 @@ typedef struct { target_phys_addr_t sg_cur_byte; QEMUIOVector iov; QEMUBH *bh; +DMAIOFunc *io_func; } DMAAIOCB; static void dma_bdrv_cb(void *opaque, int ret); @@ -116,13 +117,8 @@ static void dma_bdrv_cb(void *opaque, int ret) return; } -if (dbs-is_write) { -dbs-acb = bdrv_aio_writev(dbs-bs, dbs-sector_num, dbs-iov, - dbs-iov.size / 512, dma_bdrv_cb, dbs); -} else { -dbs-acb = bdrv_aio_readv(dbs-bs, dbs-sector_num, dbs-iov, - dbs-iov.size / 512, dma_bdrv_cb, dbs); -} +dbs-acb = dbs-io_func(dbs-bs, dbs-sector_num, dbs-iov, +dbs-iov.size / 512, dma_bdrv_cb, dbs); if (!dbs-acb) { dma_bdrv_unmap(dbs); qemu_iovec_destroy(dbs-iov); @@ -144,12 +140,12 @@ static AIOPool dma_aio_pool = { .cancel = dma_aio_cancel, }; -static BlockDriverAIOCB *dma_bdrv_io( +BlockDriverAIOCB *dma_bdrv_io( BlockDriverState *bs, QEMUSGList *sg, uint64_t sector_num, -BlockDriverCompletionFunc *cb, void *opaque, -int is_write) +DMAIOFunc *io_func, BlockDriverCompletionFunc *cb, +void *opaque, int is_write) { -DMAAIOCB *dbs = qemu_aio_get(dma_aio_pool, bs, cb, opaque); +DMAAIOCB *dbs = qemu_aio_get(dma_aio_pool, bs, cb, opaque); dbs-acb = NULL; dbs-bs = bs; @@ -158,6 +154,7 @@ static BlockDriverAIOCB *dma_bdrv_io( dbs-sg_cur_index = 0; dbs-sg_cur_byte = 0; dbs-is_write = is_write; +dbs-io_func = io_func; dbs-bh = NULL; qemu_iovec_init(dbs-iov, sg-nsg); dma_bdrv_cb(dbs, 0); @@ -173,12 +170,12 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs, QEMUSGList *sg, uint64_t sector, void (*cb)(void *opaque, int ret), void *opaque) { -return dma_bdrv_io(bs, sg, sector, cb, opaque, 0); +return dma_bdrv_io(bs, sg, sector, bdrv_aio_readv, cb, opaque, 0); } BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs, QEMUSGList *sg, uint64_t sector, void (*cb)(void *opaque, int ret), void *opaque) { -return dma_bdrv_io(bs, sg, sector, cb, opaque, 1); +return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, 1); } diff --git a/dma.h b/dma.h index f3bb275..3d8324b 100644 --- a/dma.h +++ b/dma.h @@ -32,6 +32,14 @@ void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t base, target_phys_addr_t len); void qemu_sglist_destroy(QEMUSGList *qsg); +typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, int64_t sector_num, + QEMUIOVector *iov, int nb_sectors, + BlockDriverCompletionFunc *cb, void *opaque); + +BlockDriverAIOCB *dma_bdrv_io(BlockDriverState *bs, + QEMUSGList *sg, uint64_t sector_num, + DMAIOFunc *io_func, BlockDriverCompletionFunc *cb, + void *opaque, int is_write); BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs, QEMUSGList *sg, uint64_t sector, BlockDriverCompletionFunc *cb, void *opaque); -- 1.7.5.2
Re: [Qemu-devel] [PULL 00/14] Block patches
On 06/15/2011 09:02 AM, Kevin Wolf wrote: The following changes since commit 0b862cedf36d927818c50584ddd611b0370673df: configure: Detect and don't try to use older libcurl (2011-06-13 21:16:27 +0200) are available in the git repository at: git://repo.or.cz/qemu/kevin.git for-anthony Pulled. Thanks. Regards, Anthony Liguori Christoph Hellwig (3): make dma_bdrv_io available to drivers ide: allow other dma comands than read and write ide: add TRIM support Devin Nakamura (1): Replaced tabs with spaces in block.h and block_int.h Kevin Wolf (9): qcow2: Avoid direct AIO callback qcow: Avoid direct AIO callback vdi: Avoid direct AIO callback qcow2: Fix in-flight list after qcow2_cache_put failure ide: Split error status from status register ide: Fix ide_drive_pio_state_needed() ide: Add forgotten VMSTATE_END_OF_LIST in subsection ide: Clear error_status after restarting flush Allow nested qemu_bh_poll() after BH deletion Stefan Weil (1): block/rbd: Remove unused local variable async.c |5 +- block.h |6 +- block/qcow.c | 58 ++- block/qcow2-cluster.c | 12 +++- block/qcow2.c | 39 ++--- block/rbd.c |4 - block/vdi.c | 41 +++-- block_int.h |4 +- dma-helpers.c | 23 +++ dma.h |8 +++ hw/ide/core.c | 154 - hw/ide/internal.h | 32 ++- hw/ide/macio.c| 13 - hw/ide/pci.c | 88 +--- hw/ide/pci.h |4 + hw/ide/qdev.c |5 ++ 16 files changed, 423 insertions(+), 73 deletions(-)
[Qemu-devel] [PATCH 12/14] ide: add TRIM support
From: Christoph Hellwig h...@lst.de Add support for TRIM sub function of the data set management command, and wire it up to the qemu discard infrastructure. Signed-off-by: Christoph Hellwig h...@lst.de Signed-off-by: Kevin Wolf kw...@redhat.com --- hw/ide/core.c | 97 +++- hw/ide/internal.h | 14 +++- hw/ide/macio.c|4 ++ hw/ide/pci.c |9 - hw/ide/qdev.c |5 +++ 5 files changed, 124 insertions(+), 5 deletions(-) diff --git a/hw/ide/core.c b/hw/ide/core.c index 14bda82..ca17a43 100644 --- a/hw/ide/core.c +++ b/hw/ide/core.c @@ -78,7 +78,7 @@ static void ide_identify(IDEState *s) { uint16_t *p; unsigned int oldsize; -IDEDevice *dev; +IDEDevice *dev = s-unit ? s-bus-slave : s-bus-master; if (s-identify_set) { memcpy(s-io_buffer, s-identify_data, sizeof(s-identify_data)); @@ -124,6 +124,9 @@ static void ide_identify(IDEState *s) put_le16(p + 66, 120); put_le16(p + 67, 120); put_le16(p + 68, 120); +if (dev dev-conf.discard_granularity) { +put_le16(p + 69, (1 14)); /* determinate TRIM behavior */ +} if (s-ncq_queues) { put_le16(p + 75, s-ncq_queues - 1); @@ -154,9 +157,12 @@ static void ide_identify(IDEState *s) put_le16(p + 101, s-nb_sectors 16); put_le16(p + 102, s-nb_sectors 32); put_le16(p + 103, s-nb_sectors 48); -dev = s-unit ? s-bus-slave : s-bus-master; + if (dev dev-conf.physical_block_size) put_le16(p + 106, 0x6000 | get_physical_block_exp(dev-conf)); +if (dev dev-conf.discard_granularity) { +put_le16(p + 169, 1); /* TRIM support */ +} memcpy(s-identify_data, p, sizeof(s-identify_data)); s-identify_set = 1; @@ -299,6 +305,74 @@ static void ide_set_signature(IDEState *s) } } +typedef struct TrimAIOCB { +BlockDriverAIOCB common; +QEMUBH *bh; +int ret; +} TrimAIOCB; + +static void trim_aio_cancel(BlockDriverAIOCB *acb) +{ +TrimAIOCB *iocb = container_of(acb, TrimAIOCB, common); + +qemu_bh_delete(iocb-bh); +iocb-bh = NULL; +qemu_aio_release(iocb); +} + +static AIOPool trim_aio_pool = { +.aiocb_size = sizeof(TrimAIOCB), +.cancel = trim_aio_cancel, +}; + +static void ide_trim_bh_cb(void *opaque) +{ +TrimAIOCB *iocb = opaque; + +iocb-common.cb(iocb-common.opaque, iocb-ret); + +qemu_bh_delete(iocb-bh); +iocb-bh = NULL; + +qemu_aio_release(iocb); +} + +BlockDriverAIOCB *ide_issue_trim(BlockDriverState *bs, +int64_t sector_num, QEMUIOVector *qiov, int nb_sectors, +BlockDriverCompletionFunc *cb, void *opaque) +{ +TrimAIOCB *iocb; +int i, j, ret; + +iocb = qemu_aio_get(trim_aio_pool, bs, cb, opaque); +iocb-bh = qemu_bh_new(ide_trim_bh_cb, iocb); +iocb-ret = 0; + +for (j = 0; j qiov-niov; j++) { +uint64_t *buffer = qiov-iov[j].iov_base; + +for (i = 0; i qiov-iov[j].iov_len / 8; i++) { +/* 6-byte LBA + 2-byte range per entry */ +uint64_t entry = le64_to_cpu(buffer[i]); +uint64_t sector = entry 0xULL; +uint16_t count = entry 48; + +if (count == 0) { +break; +} + +ret = bdrv_discard(bs, sector, count); +if (!iocb-ret) { +iocb-ret = ret; +} +} +} + +qemu_bh_schedule(iocb-bh); + +return iocb-common; +} + static inline void ide_abort_command(IDEState *s) { s-status = READY_STAT | ERR_STAT; @@ -474,6 +548,9 @@ handle_rw_error: if (s-dma_cmd == IDE_DMA_READ) op |= BM_STATUS_RETRY_READ; +else if (s-dma_cmd == IDE_DMA_TRIM) +op |= BM_STATUS_RETRY_TRIM; + if (ide_handle_rw_error(s, -ret, op)) { return; } @@ -519,6 +596,10 @@ handle_rw_error: s-bus-dma-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num, ide_dma_cb, s); break; +case IDE_DMA_TRIM: +s-bus-dma-aiocb = dma_bdrv_io(s-bs, s-sg, sector_num, + ide_issue_trim, ide_dma_cb, s, 1); +break; } if (!s-bus-dma-aiocb) { @@ -818,6 +899,18 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val) return; switch(val) { +case WIN_DSM: +switch (s-feature) { +case DSM_TRIM: +if (!s-bs) { +goto abort_cmd; +} +ide_sector_start_dma(s, IDE_DMA_TRIM); +break; +default: +goto abort_cmd; +} +break; case WIN_IDENTIFY: if (s-bs s-drive_kind != IDE_CD) { if (s-drive_kind != IDE_CFATA) diff --git a/hw/ide/internal.h b/hw/ide/internal.h index ea3edf5..02e805f 100644 --- a/hw/ide/internal.h +++ b/hw/ide/internal.h @@ -62,7 +62,11 @@ typedef struct IDEDMAOps IDEDMAOps; */ #define
[Qemu-devel] [PATCH 14/14] Allow nested qemu_bh_poll() after BH deletion
Without this, qemu segfaults when a BH handler first deletes its BH and then calls another function which involves a nested qemu_bh_poll() call. This can be reproduced by generating an I/O error (e.g. with blkdebug) on an IDE device and using rerror/werror=stop to stop the VM. When continuing the VM, qemu segfaults. Signed-off-by: Kevin Wolf kw...@redhat.com Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- async.c |5 +++-- 1 files changed, 3 insertions(+), 2 deletions(-) diff --git a/async.c b/async.c index 57ac3a8..fd313df 100644 --- a/async.c +++ b/async.c @@ -137,11 +137,12 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque) int qemu_bh_poll(void) { -QEMUBH *bh, **bhp; +QEMUBH *bh, **bhp, *next; int ret; ret = 0; -for (bh = async_context-first_bh; bh; bh = bh-next) { +for (bh = async_context-first_bh; bh; bh = next) { +next = bh-next; if (!bh-deleted bh-scheduled) { bh-scheduled = 0; if (!bh-idle) -- 1.7.5.2
Re: [Qemu-devel] [PATCH 33/34] hw/usb-ohci.c: Implement remote wakeup
On 14 June 2011 12:05, Gerd Hoffmann kra...@redhat.com wrote: From: Peter Maydell peter.mayd...@linaro.org Implement the wakeup callback in the OHCI USBPortOps, so that when a downstream device wakes up it correctly causes the OHCI controller to come out of suspend. Just to let you know, this patch turns out to not be quite right: * if the port is suspended and the controller is not, we need to raise OHCI_INTR_RHSC * if the controller is suspended we need to put it into the resume state * the controller can be suspended when the port is not, so the check on s-ctl musn't be inside the port-ctrl if(). Sorry for the half-baked patch. I'll send a fixed version shortly. -- PMM
Re: [Qemu-devel] [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue
On 06/01/2011 12:31 PM, Marcelo Tosatti wrote: The following changes since commit 578c7b2ca8ee9e97fa8693b1a83d517e8e3f962e: audio: fix integer overflow expression (2011-06-01 00:14:07 +0400) Pulled. Thanks. Regards, Anthony Liguori are available in the git repository at: git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master Yang, Wei Y (1): kvm: Enable CPU SMEP feature brill...@viatech.com.cn (1): kvm: Add CPUID support for VIA CPU target-i386/cpu.h |9 ++- target-i386/cpuid.c | 66 +- target-i386/kvm.c | 15 +++ 3 files changed, 87 insertions(+), 3 deletions(-)
Re: [Qemu-devel] [PATCH 1/2] Allow silent system resets
On Tue, 14 Jun 2011 18:29:43 +0200 Jan Kiszka jan.kis...@siemens.com wrote: This allows qemu_system_reset to be issued silently for internal purposes, ie. without sending out a monitor event. Convert the system reset after startup to the silent mode. Signed-off-by: Jan Kiszka jan.kis...@siemens.com Only this patch is monitor related, but I've applied the two in the monitor branch. Thanks. --- sysemu.h |5 - vl.c | 10 ++ xen-all.c |2 +- 3 files changed, 11 insertions(+), 6 deletions(-) diff --git a/sysemu.h b/sysemu.h index 7e70daa..d3013f5 100644 --- a/sysemu.h +++ b/sysemu.h @@ -34,6 +34,9 @@ void qemu_del_vm_change_state_handler(VMChangeStateEntry *e); #define VMSTOP_LOADVM7 #define VMSTOP_MIGRATE 8 +#define VMRESET_SILENT false +#define VMRESET_REPORT true + void vm_start(void); void vm_stop(int reason); @@ -50,7 +53,7 @@ int qemu_powerdown_requested(void); void qemu_system_killed(int signal, pid_t pid); void qemu_kill_report(void); extern qemu_irq qemu_system_powerdown; -void qemu_system_reset(void); +void qemu_system_reset(bool report); void qemu_add_exit_notifier(Notifier *notify); void qemu_remove_exit_notifier(Notifier *notify); diff --git a/vl.c b/vl.c index d7f905d..91380d2 100644 --- a/vl.c +++ b/vl.c @@ -1249,7 +1249,7 @@ void qemu_unregister_reset(QEMUResetHandler *func, void *opaque) } } -void qemu_system_reset(void) +void qemu_system_reset(bool report) { QEMUResetEntry *re, *nre; @@ -1257,7 +1257,9 @@ void qemu_system_reset(void) QTAILQ_FOREACH_SAFE(re, reset_handlers, entry, nre) { re-func(re-opaque); } -monitor_protocol_event(QEVENT_RESET, NULL); +if (report) { +monitor_protocol_event(QEVENT_RESET, NULL); +} cpu_synchronize_all_post_reset(); } @@ -1399,7 +1401,7 @@ static void main_loop(void) if (qemu_reset_requested()) { pause_all_vcpus(); cpu_synchronize_all_states(); -qemu_system_reset(); +qemu_system_reset(VMRESET_REPORT); resume_all_vcpus(); } if (qemu_powerdown_requested()) { @@ -3272,7 +3274,7 @@ int main(int argc, char **argv, char **envp) qemu_register_reset(qbus_reset_all_fn, sysbus_get_default()); qemu_run_machine_init_done_notifiers(); -qemu_system_reset(); +qemu_system_reset(VMRESET_SILENT); if (loadvm) { if (load_vmstate(loadvm) 0) { autostart = 0; diff --git a/xen-all.c b/xen-all.c index 0eac202..41fd98a 100644 --- a/xen-all.c +++ b/xen-all.c @@ -452,7 +452,7 @@ static void cpu_handle_ioreq(void *opaque) destroy_hvm_domain(); } if (qemu_reset_requested_get()) { -qemu_system_reset(); +qemu_system_reset(VMRESET_REPORT); } }
Re: [Qemu-devel] [PULL] libcacard libtoolized + usb-ccid fix
On 05/31/2011 11:42 AM, Alon Levy wrote: Hi, This pull request includes the libcacard.la library target and a memory leak fix by Markus. Please pull. The following changes since commit b1d7d2b93a1d6b2d2848b616cc35acdf521c923c: Merge remote-tracking branch 'stefanha/trivial-patches' into staging (2011-05-31 08:23:11 -0500) are available in the git repository at: git://anongit.freedesktop.org/~alon/qemu pull-libcacard-1 Pulled. Thanks. Regards, Anthony Liguori Alon Levy (2): configure: add libdir and --libdir libcacard: add libcacard.la target Markus Armbruster (1): usb-ccid: Plug memory leak on qdev exit() Makefile | 20 +++- Makefile.objs |8 configure | 17 - hw/usb-ccid.c | 28 libcacard/Makefile | 32 rules.mak |8 6 files changed, 87 insertions(+), 26 deletions(-)
Re: [Qemu-devel] [PULL] usb patch queue
On 06/14/2011 06:05 AM, Gerd Hoffmann wrote: Hi, The USB patch queue has been rebased, got a minor fix (wrong comment in patch #8, spotted by David Ahern) and three new patches. I'm just posting the three new patches to avoid spamming the list with 30 identical patches ... please pull, Gerd Pulled. Thanks. Regards, Anthony Liguori The following changes since commit 0b862cedf36d927818c50584ddd611b0370673df: configure: Detect and don't try to use older libcurl (2011-06-13 21:16:27 +0200) are available in the git repository at: git://git.kraxel.org/qemu usb.16 Brad Hards (3): usb: Add defines for USB Serial Bus Release Number register usb: Use defines for serial bus release number register for UHCI usb: Use defines for serial bus release number register for EHCI Gerd Hoffmann (18): usb-linux: catch ENODEV in more places. usb-ehci: trace mmio and usbsts usb-ehci: trace state machine changes usb-ehci: trace port state usb-ehci: improve mmio tracing usb-ehci: trace buffer copy usb-ehci: add queue data struct usb-ehci: multiqueue support usb-ehci: fix offset writeback in ehci_buffer_rw usb-ehci: fix error handling. usb: cancel async packets on unplug usb-ehci: drop EXECUTING checks. usb-ehci: itd handling fixes. usb-ehci: split trace calls to handle arg count limits usb: documentation update usb-linux: only cleanup in host_close when host_open was successful. usb: don't call usb_host_device_open from vl.c usb-uhci: fix expire time initialization. Hans de Goede (9): ehci: fix a number of unused-but-set-variable warnings (new with gcc-4.6) usb-linux: Get speed from sysfs rather then from the connectinfo ioctl usb-linux: Teach about super speed usb-linux: Don't do perror when errno is not set usb-linux: Ensure devep != 0 usb-linux: Don't try to open the same device twice usb-linux: Enlarge buffer for descriptors to 8192 bytes usb-bus: Add knowledge of USB_SPEED_SUPER to usb_speed helper usb-bus: Don't detach non attached devices on device exit Kevin O'Connor (2): Fix USB mouse Set_Protocol behavior The USB tablet should not claim boot protocol support. Peter Maydell (2): hw/usb-ohci.c: Ignore writes to HcPeriodCurrentED register hw/usb-ohci.c: Implement remote wakeup docs/usb2.txt | 85 hw/milkymist-softusb.c | 10 +- hw/usb-bus.c | 10 +- hw/usb-ehci.c | 1198 hw/usb-hid.c |5 +- hw/usb-musb.c | 23 +- hw/usb-ohci.c | 37 ++- hw/usb-uhci.c | 32 ++- hw/usb.h | 14 +- trace-events | 20 + usb-linux.c| 96 +++-- vl.c |6 +- 12 files changed, 990 insertions(+), 546 deletions(-)
Re: [Qemu-devel] [PATCH v3 04/21] qapi: add QAPI visitor core
On Mon, 13 Jun 2011 21:31:09 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: Base definitions/includes for Visiter interface used by generated visiter/marshalling code. Includes a GenericList type. Our lists require an embedded element. Since these types are generated, if you want to use them in a different type of data structure, there's no easy way to add another embedded element. The solution is to have non-embedded lists and that what this is. Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com --- Makefile.objs |6 +++ qapi/qapi-types-core.h | 21 ++ qapi/qapi-visit-core.c | 101 qapi/qapi-visit-core.h | 68 4 files changed, 196 insertions(+), 0 deletions(-) create mode 100644 qapi/qapi-types-core.h create mode 100644 qapi/qapi-visit-core.c create mode 100644 qapi/qapi-visit-core.h diff --git a/Makefile.objs b/Makefile.objs index a7807e8..68d7b5a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -364,6 +364,12 @@ endif libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o vcard_emul_type.o card_7816.o +## +# qapi + +qapi-nested-y = qapi-visit-core.o +qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y)) To build this I have to add qapi-obj-y to common-obj-y and add the qapi directory to the DIR variable in configure (so that it builds in a different directory). I saw that you send the configure change in a different patch, but it has to be done here. + vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS) vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS) diff --git a/qapi/qapi-types-core.h b/qapi/qapi-types-core.h new file mode 100644 index 000..de733ab --- /dev/null +++ b/qapi/qapi-types-core.h @@ -0,0 +1,21 @@ +/* + * Core Definitions for QAPI-generated Types + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguori aligu...@us.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QAPI_TYPES_CORE_H +#define QAPI_TYPES_CORE_H + +#include stdbool.h +#include stdint.h +#include error.h + +#endif diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c new file mode 100644 index 000..948818e --- /dev/null +++ b/qapi/qapi-visit-core.c @@ -0,0 +1,101 @@ +#include qapi/qapi-visit-core.h No license info. + +void visit_start_handle(Visitor *v, void **obj, const char *kind, const char *name, Error **errp) +{ +if (!error_is_set(errp) v-start_handle) { +v-start_handle(v, obj, kind, name, errp); +} +} + +void visit_end_handle(Visitor *v, Error **errp) +{ +if (!error_is_set(errp) v-end_handle) { +v-end_handle(v, errp); +} +} + +void visit_start_struct(Visitor *v, void **obj, const char *kind, const char *name, size_t size, Error **errp) +{ +if (!error_is_set(errp)) { +v-start_struct(v, obj, kind, name, size, errp); +} +} + +void visit_end_struct(Visitor *v, Error **errp) +{ +if (!error_is_set(errp)) { +v-end_struct(v, errp); +} +} + +void visit_start_list(Visitor *v, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-start_list(v, name, errp); +} +} + +GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp) +{ +if (!error_is_set(errp)) { +return v-next_list(v, list, errp); +} + +return 0; +} + +void visit_end_list(Visitor *v, Error **errp) +{ +if (!error_is_set(errp)) { +v-end_list(v, errp); +} +} + +void visit_start_optional(Visitor *v, bool *present, const char *name, Error **errp) +{ +if (!error_is_set(errp) v-start_optional) { +v-start_optional(v, present, name, errp); +} +} + +void visit_end_optional(Visitor *v, Error **errp) +{ +if (!error_is_set(errp) v-end_optional) { +v-end_optional(v, errp); +} +} + +void visit_type_enum(Visitor *v, int *obj, const char *kind, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_enum(v, obj, kind, name, errp); +} +} + +void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_int(v, obj, name, errp); +} +} + +void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_bool(v, obj, name, errp); +} +} + +void visit_type_str(Visitor *v, char **obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_str(v, obj, name, errp); +} +} + +void visit_type_number(Visitor *v, double *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_number(v, obj, name,
Re: [Qemu-devel] [RFC] Specifying storage devices for live migration
Stefan Hajnoczi stefa...@gmail.com wrote on 15/06/2011 15:49:12: Hello all, Currently there is no way to choose storage devices to be migrated. There was some discussion about making the monitor API a bit more flexible, but the code was never written: http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html Hmm...2009. Have you seen this recent thread? http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg00733.html Are you planning to extend the block migration API and fix known issues with block migration? Stefan This feature (selectively migrating block devices during live migration) is missing for us right now. I know it's an old thread, but as far as I know, this hasn't been fixed. In the more recent thread I see that there are thoughts to re-work this code (pre-copy vs. post-copy), but doesn't the interface need to be fixed anyway? Where are the known issues with block migration documented? Thanks, Avishay
Re: [Qemu-devel] [PATCH v3 04/21] qapi: add QAPI visitor core
On 06/15/2011 09:33 AM, Luiz Capitulino wrote: On Mon, 13 Jun 2011 21:31:09 -0500 Michael Rothmdr...@linux.vnet.ibm.com wrote: Base definitions/includes for Visiter interface used by generated visiter/marshalling code. Includes a GenericList type. Our lists require an embedded element. Since these types are generated, if you want to use them in a different type of data structure, there's no easy way to add another embedded element. The solution is to have non-embedded lists and that what this is. Signed-off-by: Michael Rothmdr...@linux.vnet.ibm.com --- Makefile.objs |6 +++ qapi/qapi-types-core.h | 21 ++ qapi/qapi-visit-core.c | 101 qapi/qapi-visit-core.h | 68 4 files changed, 196 insertions(+), 0 deletions(-) create mode 100644 qapi/qapi-types-core.h create mode 100644 qapi/qapi-visit-core.c create mode 100644 qapi/qapi-visit-core.h diff --git a/Makefile.objs b/Makefile.objs index a7807e8..68d7b5a 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -364,6 +364,12 @@ endif libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o vcard_emul_type.o card_7816.o +## +# qapi + +qapi-nested-y = qapi-visit-core.o +qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y)) To build this I have to add qapi-obj-y to common-obj-y and add the qapi directory to the DIR variable in configure (so that it builds in a different directory). Yah, nobody actually uses these right now except the unit tests/guest agent so for test builds you have to manually add qapi-obj-y to another target like qemu-img. I'll move the configure change to this patch in the next run. I saw that you send the configure change in a different patch, but it has to be done here. + vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS) vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS) diff --git a/qapi/qapi-types-core.h b/qapi/qapi-types-core.h new file mode 100644 index 000..de733ab --- /dev/null +++ b/qapi/qapi-types-core.h @@ -0,0 +1,21 @@ +/* + * Core Definitions for QAPI-generated Types + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguorialigu...@us.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QAPI_TYPES_CORE_H +#define QAPI_TYPES_CORE_H + +#includestdbool.h +#includestdint.h +#include error.h + +#endif diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c new file mode 100644 index 000..948818e --- /dev/null +++ b/qapi/qapi-visit-core.c @@ -0,0 +1,101 @@ +#include qapi/qapi-visit-core.h No license info. Doh! Something about the term copyright header frequently causes me to disregard non-.h files. These will all get fixed in the next run. + +void visit_start_handle(Visitor *v, void **obj, const char *kind, const char *name, Error **errp) +{ +if (!error_is_set(errp) v-start_handle) { +v-start_handle(v, obj, kind, name, errp); +} +} + +void visit_end_handle(Visitor *v, Error **errp) +{ +if (!error_is_set(errp) v-end_handle) { +v-end_handle(v, errp); +} +} + +void visit_start_struct(Visitor *v, void **obj, const char *kind, const char *name, size_t size, Error **errp) +{ +if (!error_is_set(errp)) { +v-start_struct(v, obj, kind, name, size, errp); +} +} + +void visit_end_struct(Visitor *v, Error **errp) +{ +if (!error_is_set(errp)) { +v-end_struct(v, errp); +} +} + +void visit_start_list(Visitor *v, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-start_list(v, name, errp); +} +} + +GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp) +{ +if (!error_is_set(errp)) { +return v-next_list(v, list, errp); +} + +return 0; +} + +void visit_end_list(Visitor *v, Error **errp) +{ +if (!error_is_set(errp)) { +v-end_list(v, errp); +} +} + +void visit_start_optional(Visitor *v, bool *present, const char *name, Error **errp) +{ +if (!error_is_set(errp) v-start_optional) { +v-start_optional(v, present, name, errp); +} +} + +void visit_end_optional(Visitor *v, Error **errp) +{ +if (!error_is_set(errp) v-end_optional) { +v-end_optional(v, errp); +} +} + +void visit_type_enum(Visitor *v, int *obj, const char *kind, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_enum(v, obj, kind, name, errp); +} +} + +void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_int(v, obj, name, errp); +} +} + +void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp) +{ +if (!error_is_set(errp)) { +v-type_bool(v, obj, name, errp); +} +} + +void visit_type_str(Visitor *v, char **obj, const char *name, Error
Re: [Qemu-devel] [PATCH v3 05/21] qapi: add QMP input visitor
On Mon, 13 Jun 2011 21:31:10 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: A type of Visiter class that is used to walk a qobject's structure and assign each entry to the corresponding native C type. Command marshaling function will use this to pull out QMP command parameters recieved over the wire and pass them as native arguments to the corresponding C functions. Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com --- Makefile.objs|2 +- qapi/qmp-input-visitor.c | 251 ++ qapi/qmp-input-visitor.h | 27 + qerror.h |3 + 4 files changed, 282 insertions(+), 1 deletions(-) create mode 100644 qapi/qmp-input-visitor.c create mode 100644 qapi/qmp-input-visitor.h diff --git a/Makefile.objs b/Makefile.objs index 68d7b5a..2eb90b8 100644 --- a/Makefile.objs +++ b/Makefile.objs @@ -367,7 +367,7 @@ libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o vcard_emul_type.o ## # qapi -qapi-nested-y = qapi-visit-core.o +qapi-nested-y = qapi-visit-core.o qmp-input-visitor.o qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y)) vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS) diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c new file mode 100644 index 000..9344d37 --- /dev/null +++ b/qapi/qmp-input-visitor.c No license info. It seems to be missing in more files. @@ -0,0 +1,251 @@ +#include qmp-input-visitor.h +#include qemu-queue.h +#include qemu-common.h +#include qemu-objects.h +#include qerror.h + +#define QIV_STACK_SIZE 1024 + +typedef struct StackObject +{ +QObject *obj; +const QListEntry *entry; +} StackObject; + +struct QmpInputVisitor +{ +Visitor visitor; +QObject *obj; +StackObject stack[QIV_STACK_SIZE]; +int nb_stack; +}; + +static QmpInputVisitor *to_qiv(Visitor *v) +{ +return container_of(v, QmpInputVisitor, visitor); +} + +static QObject *qmp_input_get_object(QmpInputVisitor *qiv, const char *name) +{ +QObject *qobj; + +if (qiv-nb_stack == 0) { +qobj = qiv-obj; +} else { +qobj = qiv-stack[qiv-nb_stack - 1].obj; +} + +if (name qobject_type(qobj) == QTYPE_QDICT) { +return qdict_get(qobject_to_qdict(qobj), name); +} else if (qiv-nb_stack 0 qobject_type(qobj) == QTYPE_QLIST) { +return qlist_entry_obj(qiv-stack[qiv-nb_stack - 1].entry); +} + +return qobj; +} + +static void qmp_input_push(QmpInputVisitor *qiv, QObject *obj, Error **errp) +{ +qiv-stack[qiv-nb_stack].obj = obj; +if (qobject_type(obj) == QTYPE_QLIST) { +qiv-stack[qiv-nb_stack].entry = qlist_first(qobject_to_qlist(obj)); +} +qiv-nb_stack++; + +if (qiv-nb_stack = QIV_STACK_SIZE) { +error_set(errp, QERR_QAPI_VISITOR_STACK_OVERRUN); QAPI is something internal to qemu, the error name and description should be more generic like not enough memory or not enough space. +return; +} +} + +static void qmp_input_pop(QmpInputVisitor *qiv, Error **errp) +{ +qiv-nb_stack--; +if (qiv-nb_stack 0) { +error_set(errp, QERR_QAPI_VISITOR_STACK_OVERRUN); +return; +} +} + +static void qmp_input_start_struct(Visitor *v, void **obj, const char *kind, const char *name, size_t size, Error **errp) +{ +QmpInputVisitor *qiv = to_qiv(v); +QObject *qobj = qmp_input_get_object(qiv, name); + +if (!qobj || qobject_type(qobj) != QTYPE_QDICT) { +error_set(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : null, QDict); +return; +} + +qmp_input_push(qiv, qobj, errp); +if (error_is_set(errp)) { +return; +} + +if (obj) { +*obj = qemu_mallocz(size); +} +} + +static void qmp_input_end_struct(Visitor *v, Error **errp) +{ +QmpInputVisitor *qiv = to_qiv(v); + +qmp_input_pop(qiv, errp); +} + +static void qmp_input_start_list(Visitor *v, const char *name, Error **errp) +{ +QmpInputVisitor *qiv = to_qiv(v); +QObject *qobj = qmp_input_get_object(qiv, name); + +if (!qobj || qobject_type(qobj) != QTYPE_QLIST) { +error_set(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : null, list); +return; +} + +qmp_input_push(qiv, qobj, errp); +} + +static GenericList *qmp_input_next_list(Visitor *v, GenericList **list, Error **errp) +{ +QmpInputVisitor *qiv = to_qiv(v); +GenericList *entry; +StackObject *so = qiv-stack[qiv-nb_stack - 1]; + +if (so-entry == NULL) { +return NULL; +} + +entry = qemu_mallocz(sizeof(*entry)); +if (*list) { +so-entry = qlist_next(so-entry); +if (so-entry == NULL) { +qemu_free(entry); +return NULL; +} +(*list)-next = entry; +} +*list = entry; + + +
Re: [Qemu-devel] [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability
On Tue, Jun 14, 2011 at 9:12 AM, M. Mohan Kumar mo...@in.ibm.com wrote: [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability In passthrough security model, following a symbolic link in the server side could result in TOCTTOU vulnerability. Use clone system call to create a thread which runs in chrooted environment. All passthrough model file operations are done from this thread to avoid TOCTTOU vulnerability. Signed-off-by: Venkateswararao Jujjuri jv...@linux.vnet.ibm.com Signed-off-by: M. Mohan Kumar mo...@in.ibm.com --- fsdev/file-op-9p.h | 1 + hw/9pfs/virtio-9p-coth.c | 105 +-- hw/9pfs/virtio-9p-coth.h | 13 +- hw/9pfs/virtio-9p-device.c | 7 +++- hw/9pfs/virtio-9p.h | 6 ++- 5 files changed, 124 insertions(+), 8 deletions(-) This patch isn't against upstream virtio-9p. Please post a link to a repo or more information. diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h index 5d088d4..c54f6bf 100644 --- a/fsdev/file-op-9p.h +++ b/fsdev/file-op-9p.h @@ -60,6 +60,7 @@ typedef struct FsContext SecModel fs_sm; uid_t uid; int open_flags; + int enable_chroot; Please use bool. Using int is like using void*, it throws away information. struct xattr_operations **xops; /* fs driver specific data */ void *private; diff --git a/hw/9pfs/virtio-9p-coth.c b/hw/9pfs/virtio-9p-coth.c index e61b656..aa71a83 100644 --- a/hw/9pfs/virtio-9p-coth.c +++ b/hw/9pfs/virtio-9p-coth.c @@ -17,6 +17,8 @@ #include qemu-thread.h #include qemu-coroutine.h #include virtio-9p-coth.h +#include sys/syscall.h Do you need this header? Please include system headers first, then QEMU headers. +#include qemu-error.h /* v9fs glib thread pool */ static V9fsThPool v9fs_pool; @@ -56,7 +58,96 @@ static void v9fs_thread_routine(gpointer data, gpointer user_data) } while (len == -1 errno == EINTR); } -int v9fs_init_worker_threads(void) +static int v9fs_chroot_function(void *arg) +{ + GError *err; + V9fsChrootState *csp = arg; + FsContext *fs_ctx = csp-fs_ctx; + V9fsThPool *p = v9fs_pool; + int notifier_fds[2]; + Must acquire cs-mutex before calling any QEMU functions here - otherwise this thread runs concurrently with QEMU threads and could lead to race conditions. + if (chroot(fs_ctx-fs_root) 0) { + error_report(chroot); + goto error; + } + + if (qemu_pipe(notifier_fds) == -1) { + return -1; + } + p-pool = g_thread_pool_new(v9fs_thread_routine, p, 10, TRUE, err); + if (!p-pool) { + error_report(g_thread_pool_new); + goto error; + } + + p-completed = g_async_queue_new(); + if (!p-completed) { + /* + * We are going to terminate. + * So don't worry about cleanup + */ + goto error; + } + p-rfd = notifier_fds[0]; + p-wfd = notifier_fds[1]; + + fcntl(p-rfd, F_SETFL, O_NONBLOCK); + fcntl(p-wfd, F_SETFL, O_NONBLOCK); + + qemu_set_fd_handler(p-rfd, v9fs_qemu_process_req_done, NULL, NULL); These should be in a common function that v9fs_init_worker_threads() can call instead of copy-paste. + + /* Signal parent thread that thread pools are created */ + g_cond_signal(csp-cond); Now cs-mutex can be released and the QEMU thread can continue. + g_mutex_lock(csp-mutex_term); + /* TODO: Wake up cs-terminate during 9p hotplug support */ + g_cond_wait(csp-terminate, csp-mutex); + g_mutex_unlock(csp-mutex_term); + g_mutex_free(csp-mutex); + g_cond_free(csp-cond); + g_mutex_free(csp-mutex_term); + g_cond_free(csp-terminate); + qemu_free(csp-stack); There goes my stack! It's only safe to free this in the QEMU thread that signalled, but you need a way to join this thread. Did you try syscall(SYS_exit, 0) instead? I think this would make cleanup much easier and you wouldn't have to juggle around many heap allocated resources. + qemu_free(csp); + return 0; +error: + p-pool = NULL; + g_cond_signal(csp-cond); + return 0; +} + +static int v9fs_clone_chroot_th(FsContext *fs_ctx) +{ + pid_t pid; + V9fsChrootState *cs; + + cs = qemu_malloc(sizeof(*cs)); + cs-stack = qemu_malloc(THREAD_STACK); + cs-mutex = g_mutex_new(); + cs-cond = g_cond_new(); + cs-mutex_term = g_mutex_new(); + cs-terminate = g_cond_new(); + cs-fs_ctx = fs_ctx; + + g_mutex_lock(cs-mutex); + pid = clone(v9fs_chroot_function, cs-stack + THREAD_STACK, CLONE_FILES | + CLONE_SIGHAND | CLONE_VM | CLONE_THREAD, cs); I'm a little concerned that thread-local storage is going to be broken and glib will act weird, but I don't know the implementation details. + if (pid == -1) { + error_report(clone); + g_mutex_unlock(cs-mutex); + g_mutex_free(cs-mutex); +
Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing
On Wed, 15 Jun 2011, Alexander Graf wrote: commit 973bb091a967fdec37a1bc8fe30d46a483d2903d Author: Stefano Stabellini stefano.stabell...@eu.citrix.com Date: Tue May 17 12:10:36 2011 + xen: fix interrupt routing - remove i440FX-xen and i440fx_write_config_xen we don't need to intercept pci config writes to i440FX anymore; - introduce PIIX3-xen and piix3_write_config_xen we do need to intercept pci config write to the PCI-ISA bridge to update the PCI link routing; - set the number of PIIX3-xen interrupts line to 128; I still find it unpretty and I'm pretty sure it's completely different from real hardware, but since Xen code is your call and this doesn't affect non-Xen workloads, I won't block it, unless someone else is very much opposed to it. Please resend as proper patch. Thanks, I'll do that.
Re: [Qemu-devel] [PATCH 10/10] linux-user: remove unused variables
On 14.06.2011, at 19:36, Michael S. Tsirkin wrote: Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-user/flatload.c | 10 ++ linux-user/linuxload.c | 25 + linux-user/main.c |6 +++--- linux-user/signal.c|5 - linux-user/syscall.c |6 -- 5 files changed, 10 insertions(+), 42 deletions(-) diff --git a/linux-user/flatload.c b/linux-user/flatload.c index cd7af7c..2933c5f 100644 --- a/linux-user/flatload.c +++ b/linux-user/flatload.c @@ -379,12 +379,14 @@ static int load_flat_file(struct linux_binprm * bprm, abi_long result; abi_ulong realdatastart = 0; abi_ulong text_len, data_len, bss_len, stack_len, flags; -abi_ulong memp = 0; /* for finding the brk area */ abi_ulong extra; abi_ulong reloc = 0, rp; int i, rev, relocs = 0; abi_ulong fpos; -abi_ulong start_code, end_code; +abi_ulong start_code; +#ifdef DEBUG +abi_ulong end_code; +#endif abi_ulong indx_len; hdr = ((struct flat_hdr *) bprm-buf);/* exec-header */ @@ -491,7 +493,6 @@ static int load_flat_file(struct linux_binprm * bprm, } reloc = datapos + (ntohl(hdr-reloc_start) - text_len); -memp = realdatastart; } else { @@ -506,7 +507,6 @@ static int load_flat_file(struct linux_binprm * bprm, realdatastart = textpos + ntohl(hdr-data_start); datapos = realdatastart + indx_len; reloc = (textpos + ntohl(hdr-reloc_start) + indx_len); -memp = textpos; #ifdef CONFIG_BINFMT_ZFLAT #error code needs checking @@ -552,7 +552,9 @@ static int load_flat_file(struct linux_binprm * bprm, /* The main program needs a little extra setup in the task structure */ start_code = textpos + sizeof (struct flat_hdr); +#ifdef DEBUG end_code = textpos + text_len; +#endif DBG_FLT(%s %s: TEXT=%x-%x DATA=%x-%x BSS=%x-%x\n, id ? Lib : Load, bprm-filename, diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c index ac8c486..62ebc7e 100644 --- a/linux-user/linuxload.c +++ b/linux-user/linuxload.c @@ -26,22 +26,6 @@ abi_long memcpy_to_target(abi_ulong dest, const void *src, return 0; } -static int in_group_p(gid_t g) -{ -/* return TRUE if we're in the specified group, FALSE otherwise */ -int ngroup; -int i; -gid_tgrouplist[NGROUPS]; - -ngroup = getgroups(NGROUPS, grouplist); -for(i = 0; i ngroup; i++) { - if(grouplist[i] == g) { - return 1; - } -} -return 0; -} - static int count(char ** vec) { int i; @@ -57,7 +41,7 @@ static int prepare_binprm(struct linux_binprm *bprm) { struct stat st; int mode; -int retval, id_change; +int retval; if(fstat(bprm-fd, st) 0) { return(-errno); @@ -73,14 +57,10 @@ static int prepare_binprm(struct linux_binprm *bprm) bprm-e_uid = geteuid(); bprm-e_gid = getegid(); -id_change = 0; /* Set-uid? */ if(mode S_ISUID) { bprm-e_uid = st.st_uid; - if(bprm-e_uid != geteuid()) { - id_change = 1; - } } /* Set-gid? */ @@ -91,9 +71,6 @@ static int prepare_binprm(struct linux_binprm *bprm) */ if ((mode (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm-e_gid = st.st_gid; - if (!in_group_p(bprm-e_gid)) { - id_change = 1; - } } retval = read(bprm-fd, bprm-buf, BPRM_BUF_SIZE); diff --git a/linux-user/main.c b/linux-user/main.c index 04da0a4..9b995e5 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2053,15 +2053,15 @@ void cpu_loop(CPUMIPSState *env) } else { int nb_args; abi_ulong sp_reg; -abi_ulong arg5 = 0, arg6 = 0, arg7 = 0, arg8 = 0; +abi_ulong arg5 = 0, arg6 = 0; nb_args = mips_syscall_args[syscall_num]; sp_reg = env-active_tc.gpr[29]; switch (nb_args) { /* these arguments are taken from the stack */ /* FIXME - what to do if get_user() fails? */ -case 8: get_user_ual(arg8, sp_reg + 28); -case 7: get_user_ual(arg7, sp_reg + 24); +case 8: /* get_user_ual(arg8, sp_reg + 28); */ +case 7: /* get_user_ual(arg7, sp_reg + 24); */ I'd prefer to see these and the respective variable definitions #if 0'd with a comment, stating that they're currently unused. case 6: get_user_ual(arg6, sp_reg + 20); case 5: get_user_ual(arg5, sp_reg + 16); default: diff --git a/linux-user/signal.c b/linux-user/signal.c index 11b25be..685ae61 100644 --- a/linux-user/signal.c +++ b/linux-user/signal.c @@ -2080,7 +2080,6 @@ long do_sigreturn(CPUState *env) uint32_t up_psr, pc, npc;
Re: [Qemu-devel] [PATCH 06/10] kvm: remove unused variables
Am 14.06.2011 19:36, schrieb Michael S. Tsirkin: Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-pci.h |8 +--- target-i386/kvm.c |3 +-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index a4b5fd3..b518917 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -37,7 +37,9 @@ typedef struct { bool ioeventfd_started; } VirtIOPCIProxy; -extern void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, -uint16_t vendor, uint16_t device, -uint16_t class_code, uint8_t pif); +void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev); + +/* Virtio ABI version, if we increment this, we break the guest driver. */ +#define VIRTIO_PCI_ABI_VERSION 0 + #endif Is this hunk there intentionally? Kevin diff --git a/target-i386/kvm.c b/target-i386/kvm.c index faedc6c..58a70bc 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -970,7 +970,7 @@ static int kvm_get_xsave(CPUState *env) #ifdef KVM_CAP_XSAVE struct kvm_xsave* xsave; int ret, i; -uint16_t cwd, swd, twd, fop; +uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_get_fpu(env); @@ -986,7 +986,6 @@ static int kvm_get_xsave(CPUState *env) cwd = (uint16_t)xsave-region[0]; swd = (uint16_t)(xsave-region[0] 16); twd = (uint16_t)xsave-region[1]; -fop = (uint16_t)(xsave-region[1] 16); env-fpstt = (swd 11) 7; env-fpus = swd; env-fpuc = cwd;
Re: [Qemu-devel] [PATCH 03/10] usb-ehci: remove unused variables
On 06/14/11 19:35, Michael S. Tsirkin wrote: Signed-off-by: Michael S. Tsirkinm...@redhat.com usb patch queue has a simliar fix already. I'd suggest to drop this to reduce merge conflicts. cheers, Gerd
Re: [Qemu-devel] [PATCH 06/10] kvm: remove unused variables
On Wed, Jun 15, 2011 at 09:38:39AM +0200, Kevin Wolf wrote: Am 14.06.2011 19:36, schrieb Michael S. Tsirkin: Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/virtio-pci.h |8 +--- target-i386/kvm.c |3 +-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index a4b5fd3..b518917 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -37,7 +37,9 @@ typedef struct { bool ioeventfd_started; } VirtIOPCIProxy; -extern void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, -uint16_t vendor, uint16_t device, -uint16_t class_code, uint8_t pif); +void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev); + +/* Virtio ABI version, if we increment this, we break the guest driver. */ +#define VIRTIO_PCI_ABI_VERSION 0 + #endif Is this hunk there intentionally? Kevin Sorry, this belongs in another patch. Thanks for pointing this out. Otherwise ack? diff --git a/target-i386/kvm.c b/target-i386/kvm.c index faedc6c..58a70bc 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -970,7 +970,7 @@ static int kvm_get_xsave(CPUState *env) #ifdef KVM_CAP_XSAVE struct kvm_xsave* xsave; int ret, i; -uint16_t cwd, swd, twd, fop; +uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_get_fpu(env); @@ -986,7 +986,6 @@ static int kvm_get_xsave(CPUState *env) cwd = (uint16_t)xsave-region[0]; swd = (uint16_t)(xsave-region[0] 16); twd = (uint16_t)xsave-region[1]; -fop = (uint16_t)(xsave-region[1] 16); env-fpstt = (swd 11) 7; env-fpus = swd; env-fpuc = cwd;
Re: [Qemu-devel] [PATCH 06/10] kvm: remove unused variables
On 06/14/2011 12:36 PM, Michael S. Tsirkin wrote: Signed-off-by: Michael S. Tsirkinm...@redhat.com --- hw/virtio-pci.h |8 +--- target-i386/kvm.c |3 +-- 2 files changed, 6 insertions(+), 5 deletions(-) diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h index a4b5fd3..b518917 100644 --- a/hw/virtio-pci.h +++ b/hw/virtio-pci.h @@ -37,7 +37,9 @@ typedef struct { bool ioeventfd_started; } VirtIOPCIProxy; -extern void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev, -uint16_t vendor, uint16_t device, -uint16_t class_code, uint8_t pif); +void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev); Hello MST, The above change shows why we should be including a header, rather than having private prototypes (w/bitrot). + +/* Virtio ABI version, if we increment this, we break the guest driver. */ +#define VIRTIO_PCI_ABI_VERSION 0 + This change seems unrelated. Thanks Chris #endif diff --git a/target-i386/kvm.c b/target-i386/kvm.c index faedc6c..58a70bc 100644 --- a/target-i386/kvm.c +++ b/target-i386/kvm.c @@ -970,7 +970,7 @@ static int kvm_get_xsave(CPUState *env) #ifdef KVM_CAP_XSAVE struct kvm_xsave* xsave; int ret, i; -uint16_t cwd, swd, twd, fop; +uint16_t cwd, swd, twd; if (!kvm_has_xsave()) { return kvm_get_fpu(env); @@ -986,7 +986,6 @@ static int kvm_get_xsave(CPUState *env) cwd = (uint16_t)xsave-region[0]; swd = (uint16_t)(xsave-region[0] 16); twd = (uint16_t)xsave-region[1]; -fop = (uint16_t)(xsave-region[1] 16); env-fpstt = (swd 11) 7; env-fpus = swd; env-fpuc = cwd;
Re: [Qemu-devel] [PATCH 10/10] linux-user: remove unused variables
On Wed, Jun 15, 2011 at 10:35:19AM +0200, Alexander Graf wrote: On 14.06.2011, at 19:36, Michael S. Tsirkin wrote: Signed-off-by: Michael S. Tsirkin m...@redhat.com --- linux-user/flatload.c | 10 ++ linux-user/linuxload.c | 25 + linux-user/main.c |6 +++--- linux-user/signal.c|5 - linux-user/syscall.c |6 -- 5 files changed, 10 insertions(+), 42 deletions(-) diff --git a/linux-user/flatload.c b/linux-user/flatload.c index cd7af7c..2933c5f 100644 --- a/linux-user/flatload.c +++ b/linux-user/flatload.c @@ -379,12 +379,14 @@ static int load_flat_file(struct linux_binprm * bprm, abi_long result; abi_ulong realdatastart = 0; abi_ulong text_len, data_len, bss_len, stack_len, flags; -abi_ulong memp = 0; /* for finding the brk area */ abi_ulong extra; abi_ulong reloc = 0, rp; int i, rev, relocs = 0; abi_ulong fpos; -abi_ulong start_code, end_code; +abi_ulong start_code; +#ifdef DEBUG +abi_ulong end_code; +#endif abi_ulong indx_len; hdr = ((struct flat_hdr *) bprm-buf); /* exec-header */ @@ -491,7 +493,6 @@ static int load_flat_file(struct linux_binprm * bprm, } reloc = datapos + (ntohl(hdr-reloc_start) - text_len); -memp = realdatastart; } else { @@ -506,7 +507,6 @@ static int load_flat_file(struct linux_binprm * bprm, realdatastart = textpos + ntohl(hdr-data_start); datapos = realdatastart + indx_len; reloc = (textpos + ntohl(hdr-reloc_start) + indx_len); -memp = textpos; #ifdef CONFIG_BINFMT_ZFLAT #error code needs checking @@ -552,7 +552,9 @@ static int load_flat_file(struct linux_binprm * bprm, /* The main program needs a little extra setup in the task structure */ start_code = textpos + sizeof (struct flat_hdr); +#ifdef DEBUG end_code = textpos + text_len; +#endif DBG_FLT(%s %s: TEXT=%x-%x DATA=%x-%x BSS=%x-%x\n, id ? Lib : Load, bprm-filename, diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c index ac8c486..62ebc7e 100644 --- a/linux-user/linuxload.c +++ b/linux-user/linuxload.c @@ -26,22 +26,6 @@ abi_long memcpy_to_target(abi_ulong dest, const void *src, return 0; } -static int in_group_p(gid_t g) -{ -/* return TRUE if we're in the specified group, FALSE otherwise */ -intngroup; -inti; -gid_t grouplist[NGROUPS]; - -ngroup = getgroups(NGROUPS, grouplist); -for(i = 0; i ngroup; i++) { - if(grouplist[i] == g) { - return 1; - } -} -return 0; -} - static int count(char ** vec) { int i; @@ -57,7 +41,7 @@ static int prepare_binprm(struct linux_binprm *bprm) { struct stat st; int mode; -int retval, id_change; +int retval; if(fstat(bprm-fd, st) 0) { return(-errno); @@ -73,14 +57,10 @@ static int prepare_binprm(struct linux_binprm *bprm) bprm-e_uid = geteuid(); bprm-e_gid = getegid(); -id_change = 0; /* Set-uid? */ if(mode S_ISUID) { bprm-e_uid = st.st_uid; - if(bprm-e_uid != geteuid()) { - id_change = 1; - } } /* Set-gid? */ @@ -91,9 +71,6 @@ static int prepare_binprm(struct linux_binprm *bprm) */ if ((mode (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) { bprm-e_gid = st.st_gid; - if (!in_group_p(bprm-e_gid)) { - id_change = 1; - } } retval = read(bprm-fd, bprm-buf, BPRM_BUF_SIZE); diff --git a/linux-user/main.c b/linux-user/main.c index 04da0a4..9b995e5 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -2053,15 +2053,15 @@ void cpu_loop(CPUMIPSState *env) } else { int nb_args; abi_ulong sp_reg; -abi_ulong arg5 = 0, arg6 = 0, arg7 = 0, arg8 = 0; +abi_ulong arg5 = 0, arg6 = 0; nb_args = mips_syscall_args[syscall_num]; sp_reg = env-active_tc.gpr[29]; switch (nb_args) { /* these arguments are taken from the stack */ /* FIXME - what to do if get_user() fails? */ -case 8: get_user_ual(arg8, sp_reg + 28); -case 7: get_user_ual(arg7, sp_reg + 24); +case 8: /* get_user_ual(arg8, sp_reg + 28); */ +case 7: /* get_user_ual(arg7, sp_reg + 24); */ I'd prefer to see these and the respective variable definitions #if 0'd with a comment, stating that they're currently unused. case 6: get_user_ual(arg6, sp_reg + 20); case 5: get_user_ual(arg5, sp_reg + 16); default: diff --git a/linux-user/signal.c b/linux-user/signal.c index
Re: [Qemu-devel] [PATCH 04/10] lsi53c895a: remove unused variables
On Tue, Jun 14, 2011 at 08:35:44PM +0300, Michael S. Tsirkin wrote: Signed-off-by: Michael S. Tsirkin m...@redhat.com --- hw/lsi53c895a.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) This one is already in the trivial-patches tree. Stefan
Re: [Qemu-devel] [patch 6/7] QEMU live block copy (update)
On Tue, Jun 07, 2011 at 12:15:55PM +0200, Jiri Denemark wrote: On Mon, Jun 06, 2011 at 14:03:44 -0300, Marcelo Tosatti wrote: ... + return:[ +{device:ide1-hd0, +status:active, +info:{ + percentage:23, +} +}, What about using the same form of progress reporting as used by query-migrate? That is, instead of info:{ percentage:23 } the following would be reported: disk:{ transferred:123, remaining:123, total:246 } One can trivially compute percentage from that but it's impossible to get this kind of data when only percentage is reported. And total can even change in time if needed (just like it changes during migration). Done. +{device:ide1-hd1, + status:failed +} Is there any way we can get the exact error which made it fail, such as EPERM or ENOSPC? The errors that can generate failed here are internal to QEMU, so i don't think exporting them to management application makes sense. This are errors such as EIO from file system. Error processing should be done in the block_copy command, that is where checks of the destination image are performed.
Re: [Qemu-devel] [PATCH 10/10] linux-user: remove unused variables
On 06/15/2011 01:35 AM, Alexander Graf wrote: -abi_ulong arg5 = 0, arg6 = 0, arg7 = 0, arg8 = 0; +abi_ulong arg5 = 0, arg6 = 0; nb_args = mips_syscall_args[syscall_num]; sp_reg = env-active_tc.gpr[29]; switch (nb_args) { /* these arguments are taken from the stack */ /* FIXME - what to do if get_user() fails? */ -case 8: get_user_ual(arg8, sp_reg + 28); -case 7: get_user_ual(arg7, sp_reg + 24); +case 8: /* get_user_ual(arg8, sp_reg + 28); */ +case 7: /* get_user_ual(arg7, sp_reg + 24); */ I'd prefer to see these and the respective variable definitions #if 0'd with a comment, stating that they're currently unused. I'd prefer not to see if 0 code. Better, I think, to mark the variables as __attribute__((unused)) with that same comment. @@ -7058,18 +7056,14 @@ abi_long do_syscall(void *cpu_env, int num, abi_long arg1, case TARGET_NR_osf_sigprocmask: { abi_ulong mask; -int how = arg1; sigset_t set, oldset; switch(arg1) { case TARGET_SIG_BLOCK: -how = SIG_BLOCK; break; case TARGET_SIG_UNBLOCK: -how = SIG_UNBLOCK; break; case TARGET_SIG_SETMASK: -how = SIG_SETMASK; why go through the effort of setting how and then not using it? I'm pretty sure this is a bug as well. A few lines down is the following code: sigprocmask(arg1, set, oldset); which in TARGET_NR_sigprocmask would be: ret = get_errno(sigprocmask(how, set, oldset)); So we end up sending guest masks to the host. Richard, this is Alpha specific code. Mind to double-check? I remember fixing this before. Perhaps it was in a patch tree that never got pulled... r~
Re: [Qemu-devel] [PATCH] xen: avoid tracking the region 0xa0000 - 0xbffff
On Wed, 15 Jun 2011, Alexander Graf wrote: Please add a comment here, explaining that Xen can only handle a single dirty log region for now, and that we want the linear framebuffer to be that region. Also, please resend with proper patch headers and I'll pull it into the xen-next tree. OK, I'll do that.
[Qemu-devel] [PATCH] xen: only track the linear framebuffer
Xen can only do dirty bit tracking for one memory region, so we should explicitly avoid trying to track anything but the vga vram region. Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com diff --git a/xen-all.c b/xen-all.c index 9a5c3ec..fa1d2e1 100644 --- a/xen-all.c +++ b/xen-all.c @@ -214,6 +214,7 @@ static int xen_add_to_physmap(XenIOState *state, unsigned long i = 0; int rc = 0; XenPhysmap *physmap = NULL; +RAMBlock *block; if (get_physmapping(state, start_addr, size)) { return 0; @@ -221,7 +222,19 @@ static int xen_add_to_physmap(XenIOState *state, if (size = 0) { return -1; } +/* Xen can only handle a single dirty log region for now and we want + * the linear framebuffer to be that region. + * Avoid tracking any regions that is not videoram and avoid tracking + * the legacy vga region. */ +QLIST_FOREACH(block, ram_list.blocks, next) { +if (!strcmp(block-idstr, vga.vram) block-offset == phys_offset + start_addr 0xb) { +goto go_physmap; +} +} +return -1; +go_physmap: DPRINTF(mapping vram to %llx - %llx, from %llx\n, start_addr, start_addr + size, phys_offset); for (i = 0; i size TARGET_PAGE_BITS; i++) { unsigned long idx = (phys_offset TARGET_PAGE_BITS) + i;
[Qemu-devel] [PATCH 1/2] lsi: Fix unused-but-set-variable warning
From: Christophe Fergeau cferg...@redhat.com This warning is new in gcc 4.6. Signed-off-by: Christophe Fergeau cferg...@redhat.com Acked-by: Paolo Bonzini pbonz...@redhat.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- hw/lsi53c895a.c |2 -- 1 files changed, 0 insertions(+), 2 deletions(-) diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c index 83084b6..90c6cbc 100644 --- a/hw/lsi53c895a.c +++ b/hw/lsi53c895a.c @@ -889,7 +889,6 @@ static void lsi_do_msgout(LSIState *s) uint8_t msg; int len; uint32_t current_tag; -SCSIDevice *current_dev; lsi_request *current_req, *p, *p_next; int id; @@ -901,7 +900,6 @@ static void lsi_do_msgout(LSIState *s) current_req = lsi_find_by_tag(s, current_tag); } id = (current_tag 8) 0xf; -current_dev = s-bus.devs[id]; DPRINTF(MSG out len=%d\n, s-dbc); while (s-dbc) { -- 1.7.5.3
Re: [Qemu-devel] [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability
On Tue, Jun 14, 2011 at 9:12 AM, M. Mohan Kumar mo...@in.ibm.com wrote: [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability In passthrough security model, following a symbolic link in the server side could result in TOCTTOU vulnerability. Use clone system call to create a thread which runs in chrooted environment. All passthrough model file operations are done from this thread to avoid TOCTTOU vulnerability. How will chroot(2) work when QEMU runs as non-root (i.e. secure production environments)? Stefan
[Qemu-devel] [RFC] qxl: set mm_time in vga update
This fixes a problem where on windows 7 startup phase, before the qxl driver is loaded, the drawables are sufficiently large and video like to trigger a stream, but the lack of a filled mm time field triggers a warning in spice-gtk. --- ui/spice-display.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/ui/spice-display.c b/ui/spice-display.c index 15f0704..f73 100644 --- a/ui/spice-display.c +++ b/ui/spice-display.c @@ -70,6 +70,7 @@ static SimpleSpiceUpdate *qemu_spice_create_update(SimpleSpiceDisplay *ssd) QXLCommand *cmd; uint8_t *src, *dst; int by, bw, bh; +struct timespec time_space; if (qemu_spice_rect_is_empty(ssd-dirty)) { return NULL; @@ -96,6 +97,10 @@ static SimpleSpiceUpdate *qemu_spice_create_update(SimpleSpiceDisplay *ssd) drawable-surfaces_dest[0] = -1; drawable-surfaces_dest[1] = -1; drawable-surfaces_dest[2] = -1; +clock_gettime(CLOCK_MONOTONIC, time_space); +/* time in milliseconds from epoch. */ +drawable-mm_time = time_space.tv_sec * 1000 + + time_space.tv_nsec / 1000 / 1000; drawable-u.copy.rop_descriptor = SPICE_ROPD_OP_PUT; drawable-u.copy.src_bitmap = (intptr_t)image; -- 1.7.5.2
Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing
On 06/15/2011 11:24 AM, Alexander Graf wrote: I'm actually not quite sure what exactly he's describing here. But if it's bypassing the bus logic, it's not a normal PCI device :). Sure, there are special case devices that also expose a PCI interface. But real PCI cards that you plug in onto the PCI bus can't bypass the interrupt logic of the bus, as the only interrupt wires they have go to the bus. And since the PCI adapters we use in PC machines in Qemu are all non-special, guests can possibly choke on this. There actually is a special device in qemu - acpi power management is configured as a PCI device, but its interrupt is hard-wired to gsi 9 and is edge-triggered (so it can't share the irq line). I other devices that are special in this regard to also be part of the chipset, not devices you can plug into arbitrary slots. -- error compiling committee.c: too many arguments to function
[Qemu-devel] [PATCH 2/2] Fix typo in cpus.c
From: Alexandre Raymond cerb...@gmail.com filed - failed Signed-off-by: Alexandre Raymond cerb...@gmail.com Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com --- cpus.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/cpus.c b/cpus.c index 1fc34b7..4ab76f0 100644 --- a/cpus.c +++ b/cpus.c @@ -297,7 +297,7 @@ static void qemu_event_increment(void) /* EAGAIN is fine, a read must be pending. */ if (ret 0 errno != EAGAIN) { -fprintf(stderr, qemu_event_increment: write() filed: %s\n, +fprintf(stderr, qemu_event_increment: write() failed: %s\n, strerror(errno)); exit (1); } -- 1.7.5.3
[Qemu-devel] [patch 0/4] live block copy (v5)
v5: - fix several leaks in block-copy and blkmirror - change progress status reporting - support migration - fix buffer overwrite bug in blkmirror v4: - add block-copy documentation - block-copy style fixes - fix blkmirror_cancel method - blkmirror style fixes - add escaping to blkmirror pathnames - add blkmirror documentation v3: - replace commit file with mirrored writes - address comments from round 2 v2: - use reference counting to be safe against device hotplug / bdrv_truncate - add comment about usage of timer
[Qemu-devel] [PULL 0/2] Trivial patches for June 9 to June 15 2011
The following changes since commit 71f34ad05359d7fa97996562d904979281ddc7f5: Merge remote-tracking branch 'alon/pull-libcacard-1' into staging (2011-06-15 09:03:49 -0500) are available in the git repository at: ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches Alexandre Raymond (1): Fix typo in cpus.c Christophe Fergeau (1): lsi: Fix unused-but-set-variable warning cpus.c |2 +- hw/lsi53c895a.c |2 -- 2 files changed, 1 insertions(+), 3 deletions(-) -- 1.7.5.3
[Qemu-devel] [PATCH v2] xen: fix interrupt routing
Compared to the last version I only added a comment to the code. - remove i440FX-xen and i440fx_write_config_xen we don't need to intercept pci config writes to i440FX anymore; - introduce PIIX3-xen and piix3_write_config_xen we do need to intercept pci config write to the PCI-ISA bridge to update the PCI link routing; - set the number of PIIX3-xen interrupts line to 128; Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com diff --git a/hw/pc.h b/hw/pc.h index 0dcbee7..6d5730b 100644 --- a/hw/pc.h +++ b/hw/pc.h @@ -176,7 +176,6 @@ struct PCII440FXState; typedef struct PCII440FXState PCII440FXState; PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, qemu_irq *pic, ram_addr_t ram_size); -PCIBus *i440fx_xen_init(PCII440FXState **pi440fx_state, int *piix3_devfn, qemu_irq *pic, ram_addr_t ram_size); void i440fx_init_memory_mappings(PCII440FXState *d); /* piix4.c */ diff --git a/hw/pc_piix.c b/hw/pc_piix.c index 9a22a8a..ba198de 100644 --- a/hw/pc_piix.c +++ b/hw/pc_piix.c @@ -124,11 +124,7 @@ static void pc_init1(ram_addr_t ram_size, isa_irq = qemu_allocate_irqs(isa_irq_handler, isa_irq_state, 24); if (pci_enabled) { -if (!xen_enabled()) { -pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_irq, ram_size); -} else { -pci_bus = i440fx_xen_init(i440fx_state, piix3_devfn, isa_irq, ram_size); -} +pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_irq, ram_size); } else { pci_bus = NULL; i440fx_state = NULL; diff --git a/hw/piix_pci.c b/hw/piix_pci.c index 7f1c4cc..1e29ec4 100644 --- a/hw/piix_pci.c +++ b/hw/piix_pci.c @@ -40,6 +40,7 @@ typedef PCIHostState I440FXState; #define PIIX_NUM_PIC_IRQS 16 /* i8259 * 2 */ #define PIIX_NUM_PIRQS 4ULL/* PIRQ[A-D] */ +#define XEN_PIIX_NUM_PIRQS 128ULL #define PIIX_PIRQC 0x60 typedef struct PIIX3State { @@ -78,6 +79,8 @@ struct PCII440FXState { #define I440FX_SMRAM0x72 static void piix3_set_irq(void *opaque, int pirq, int level); +static void piix3_write_config_xen(PCIDevice *dev, + uint32_t address, uint32_t val, int len); /* return the global irq number corresponding to a given device irq pin. We could also use the bus number to have a more precise @@ -173,13 +176,6 @@ static void i440fx_write_config(PCIDevice *dev, } } -static void i440fx_write_config_xen(PCIDevice *dev, -uint32_t address, uint32_t val, int len) -{ -xen_piix_pci_write_config_client(address, val, len); -i440fx_write_config(dev, address, val, len); -} - static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id) { PCII440FXState *d = opaque; @@ -267,8 +263,21 @@ static PCIBus *i440fx_common_init(const char *device_name, d = pci_create_simple(b, 0, device_name); *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d); -piix3 = DO_UPCAST(PIIX3State, dev, - pci_create_simple_multifunction(b, -1, true, PIIX3)); +/* Xen supports additional interrupt routes from the PCI devices to + * the IOAPIC: the four pins of each PCI device on the bus are also + * connected to the IOAPIC directly. + * These additional routes can be discovered through ACPI. */ +if (xen_enabled()) { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(b, -1, true, PIIX3-xen)); +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq, +piix3, XEN_PIIX_NUM_PIRQS); +} else { +piix3 = DO_UPCAST(PIIX3State, dev, +pci_create_simple_multifunction(b, -1, true, PIIX3)); +pci_bus_irqs(b, piix3_set_irq, pci_slot_get_pirq, piix3, +PIIX_NUM_PIRQS); +} piix3-pic = pic; (*pi440fx_state)-piix3 = piix3; @@ -289,21 +298,6 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix3_devfn, PCIBus *b; b = i440fx_common_init(i440FX, pi440fx_state, piix3_devfn, pic, ram_size); -pci_bus_irqs(b, piix3_set_irq, pci_slot_get_pirq, (*pi440fx_state)-piix3, - PIIX_NUM_PIRQS); - -return b; -} - -PCIBus *i440fx_xen_init(PCII440FXState **pi440fx_state, int *piix3_devfn, -qemu_irq *pic, ram_addr_t ram_size) -{ -PCIBus *b; - -b = i440fx_common_init(i440FX-xen, pi440fx_state, piix3_devfn, pic, ram_size); -pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq, - (*pi440fx_state)-piix3, PIIX_NUM_PIRQS); - return b; } @@ -365,6 +359,13 @@ static void piix3_write_config(PCIDevice *dev, } } +static void piix3_write_config_xen(PCIDevice *dev, + uint32_t address, uint32_t val, int len) +{ +xen_piix_pci_write_config_client(address, val, len); +piix3_write_config(dev, address, val, len); +} + static void piix3_reset(void *opaque) { PIIX3State *d =
Re: [Qemu-devel] [RFC] Specifying storage devices for live migration
On Wed, Jun 15, 2011 at 02:22:22PM -0300, Marcelo Tosatti wrote: On Wed, Jun 15, 2011 at 02:06:21PM +0300, Avishay Traeger wrote: Hello all, Currently there is no way to choose storage devices to be migrated. There was some discussion about making the monitor API a bit more flexible, but the code was never written: http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html So today users can choose only: 1) No storage migrated 2) All disks migrated with fully copy 3) All disks migrated with incremental I'd like to write the code to allow users to choose which disks to migrate, and how. What API would you recommend? What would break the least amount of utilities? I'm unaware of any application that makes use of the -b flag, so if thats true breaking the changing the current syntax is fine. The one Liran suggests looks alright. We also need the same feature, BTW.
Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing
Am 15.06.2011 um 18:34 schrieb Avi Kivity a...@redhat.com: On 06/15/2011 11:24 AM, Alexander Graf wrote: I'm actually not quite sure what exactly he's describing here. But if it's bypassing the bus logic, it's not a normal PCI device :). Sure, there are special case devices that also expose a PCI interface. But real PCI cards that you plug in onto the PCI bus can't bypass the interrupt logic of the bus, as the only interrupt wires they have go to the bus. And since the PCI adapters we use in PC machines in Qemu are all non-special, guests can possibly choke on this. There actually is a special device in qemu - acpi power management is configured as a PCI device, but its interrupt is hard-wired to gsi 9 and is edge-triggered (so it can't share the irq line). I other devices that are special in this regard to also be part of the chipset, not devices you can plug into arbitrary slots. Sure, platform devices can do that. Real PCI cards can not. Have you ever seen an e1000 with direct mapped interrupt lines? :) I admit though that we also emulate platform devices that happen to expose themselves on the PCI bus. It's not common though and I wouldn't expect every OS/driver to be happy about it. Alex
Re: [Qemu-devel] [PATCH] Command line support for altering the log file location
Thanks, applied. On Wed, Jun 8, 2011 at 5:32 AM, Matthew Fernandez matthew.fernan...@gmail.com wrote: Add command line support for logging to a location other than /tmp/qemu.log. With logging enabled (command line option -d), the log is written to the hard-coded path /tmp/qemu.log. This patch adds support for writing the log to a different location by passing the -D option. Signed-off-by: Matthew Fernandez matthew.fernan...@gmail.com Thanks Kevin and Blue Swirl for the feedback. Amended patch is below. On 7 June 2011 18:49, Kevin Wolf kw...@redhat.com wrote: Am 05.06.2011 02:46, schrieb Matthew Fernandez: So, to clarify, all text above the '' is included as the commit message and the sign off line should be in this section. All text below the '' before the first diff is treated as comments that are not to be committed. Is this correct? If so I'll send through an ammended patch with these changes. Yes, this is correct. Kevin On 4 June 2011 20:18, Blue Swirl blauwir...@gmail.com wrote: On Tue, May 31, 2011 at 9:20 AM, Matthew Fernandez matthew.fernan...@gmail.com wrote: Hi, The included patch adds command line support for logging to a location other than /tmp/qemu.log. The diff is relative to commit 2eb9f241824d000fcd90bd7f4b49e40b88e62975. Please let me know if anything needs to be cleaned up or changed. Anthony, I'm not sure who should be responsible for reviewing/accepting this, but I've CCed you as it touches vl.c. The patch looks OK, however the above text (which git-am would use as the commit message) needs to be changed. How about something like this: Add command line support for logging to a location other than /tmp/qemu.log. If you want to add comments which should not go to the commit message, please put them below the line with ''. Thanks, Matthew Signed-off-by: Matthew Fernandez matthew.fernan...@gmail.com Also this needs to be above the '' line. diff --git a/bsd-user/main.c b/bsd-user/main.c index 0c3fca1..0af8a7e 100644 --- a/bsd-user/main.c +++ b/bsd-user/main.c @@ -690,7 +690,8 @@ static void usage(void) -bsd type select emulated BSD type FreeBSD/NetBSD/OpenBSD (default)\n \n Debug options:\n - -d options activate log (logfile=%s)\n + -d options activate log (default logfile=%s)\n + -D logfile override default logfile location\n -p pagesize set the host page size to 'pagesize'\n -singlestep always run in singlestep mode\n -strace log system calls\n @@ -731,6 +732,8 @@ int main(int argc, char **argv) { const char *filename; const char *cpu_model; + const char *log_file = DEBUG_LOGFILE; + const char *log_mask = NULL; struct target_pt_regs regs1, *regs = regs1; struct image_info info1, *info = info1; TaskState ts1, *ts = ts1; @@ -745,9 +748,6 @@ int main(int argc, char **argv) if (argc = 1) usage(); - /* init debug */ - cpu_set_log_filename(DEBUG_LOGFILE); - if ((envlist = envlist_create()) == NULL) { (void) fprintf(stderr, Unable to allocate envlist\n); exit(1); @@ -775,22 +775,15 @@ int main(int argc, char **argv) if (!strcmp(r, -)) { break; } else if (!strcmp(r, d)) { - int mask; - const CPULogItem *item; - - if (optind = argc) + if (optind = argc) { break; - - r = argv[optind++]; - mask = cpu_str_to_log_mask(r); - if (!mask) { - printf(Log items (comma separated):\n); - for(item = cpu_log_items; item-mask != 0; item++) { - printf(%-10s %s\n, item-name, item-help); - } - exit(1); } - cpu_set_log(mask); + log_mask = argv[optind++]; + } else if (!strcmp(r, D)) { + if (optind = argc) { + break; + } + log_file = argv[optind++]; } else if (!strcmp(r, E)) { r = argv[optind++]; if (envlist_setenv(envlist, r) != 0) @@ -867,6 +860,23 @@ int main(int argc, char **argv) usage(); filename = argv[optind]; + /* init debug */ + cpu_set_log_filename(log_file); + if (log_mask) { + int mask; + const CPULogItem *item; + + mask = cpu_str_to_log_mask(r); + if (!mask) { + printf(Log items (comma separated):\n); + for (item = cpu_log_items; item-mask != 0; item++) { + printf(%-10s %s\n, item-name, item-help); + } + exit(1); + } + cpu_set_log(mask); + } + /* Zero out regs */ memset(regs, 0, sizeof(struct target_pt_regs)); diff --git a/cpus.c b/cpus.c index 1fc34b7..17e96b5 100644 --- a/cpus.c +++ b/cpus.c @@ -1142,6
Re: [Qemu-devel] [RFC] Specifying storage devices for live migration
On Wed, Jun 15, 2011 at 02:06:21PM +0300, Avishay Traeger wrote: Hello all, Currently there is no way to choose storage devices to be migrated. There was some discussion about making the monitor API a bit more flexible, but the code was never written: http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html So today users can choose only: 1) No storage migrated 2) All disks migrated with fully copy 3) All disks migrated with incremental I'd like to write the code to allow users to choose which disks to migrate, and how. What API would you recommend? What would break the least amount of utilities? I'm unaware of any application that makes use of the -b flag, so if thats true breaking the changing the current syntax is fine. The one Liran suggests looks alright.
Re: [Qemu-devel] [PATCH v3 05/21] qapi: add QMP input visitor
On Wed, 15 Jun 2011 11:56:03 -0500 Michael Roth mdr...@linux.vnet.ibm.com wrote: On 06/15/2011 11:11 AM, Luiz Capitulino wrote: On Mon, 13 Jun 2011 21:31:10 -0500 Michael Rothmdr...@linux.vnet.ibm.com wrote: snip +Visitor *qmp_input_get_visitor(QmpInputVisitor *v) +{ +returnv-visitor; +} + +void qmp_input_visitor_cleanup(QmpInputVisitor *v) +{ You're not decrementing the reference to v-obj. Actually, we should also increment the reference to it before storing or document that the reference ownership is transferred to this function. Ownership never gets transferred to the visitor in the dispatch path; the caller is supposed to free it. I thought there was some manipulation elsewhere that prevented us from using a const argument to qmp_input_visitor_new() instead, but it looks like we can. Otherwise, I'd prefer to do an inc+dec to make this more explicit. Yeah, I prefer that too. +qemu_free(v); +} + +QmpInputVisitor *qmp_input_visitor_new(QObject *obj) +{ +QmpInputVisitor *v; + +v = qemu_mallocz(sizeof(*v)); + +v-visitor.start_struct = qmp_input_start_struct; +v-visitor.end_struct = qmp_input_end_struct; +v-visitor.start_list = qmp_input_start_list; +v-visitor.next_list = qmp_input_next_list; +v-visitor.end_list = qmp_input_end_list; +v-visitor.type_enum = qmp_input_type_enum; +v-visitor.type_int = qmp_input_type_int; +v-visitor.type_bool = qmp_input_type_bool; +v-visitor.type_str = qmp_input_type_str; +v-visitor.type_number = qmp_input_type_number; +v-visitor.start_optional = qmp_input_start_optional; +v-visitor.end_optional = qmp_input_end_optional; + +v-obj = obj; + +return v; +} diff --git a/qapi/qmp-input-visitor.h b/qapi/qmp-input-visitor.h new file mode 100644 index 000..3f798f0 --- /dev/null +++ b/qapi/qmp-input-visitor.h @@ -0,0 +1,27 @@ +/* + * Input Visitor + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguorialigu...@us.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QMP_INPUT_VISITOR_H +#define QMP_INPUT_VISITOR_H + +#include qapi-visit-core.h +#include qobject.h + +typedef struct QmpInputVisitor QmpInputVisitor; + +QmpInputVisitor *qmp_input_visitor_new(QObject *obj); +void qmp_input_visitor_cleanup(QmpInputVisitor *v); + +Visitor *qmp_input_get_visitor(QmpInputVisitor *v); + +#endif diff --git a/qerror.h b/qerror.h index 16c830d..7a89a50 100644 --- a/qerror.h +++ b/qerror.h @@ -124,6 +124,9 @@ QError *qobject_to_qerror(const QObject *obj); #define QERR_JSON_PARSE_ERROR \ { 'class': 'JSONParseError', 'data': { 'message': %s } } +#define QERR_QAPI_VISITOR_STACK_OVERRUN \ +{ 'class': 'QAPIVisitorStackOverrun', 'data': {} } + #define QERR_KVM_MISSING_CAP \ { 'class': 'KVMMissingCap', 'data': { 'capability': %s, 'feature': %s } }
Re: [Qemu-devel] [PATCH v3 05/21] qapi: add QMP input visitor
On 06/15/2011 11:11 AM, Luiz Capitulino wrote: On Mon, 13 Jun 2011 21:31:10 -0500 Michael Rothmdr...@linux.vnet.ibm.com wrote: snip +Visitor *qmp_input_get_visitor(QmpInputVisitor *v) +{ +returnv-visitor; +} + +void qmp_input_visitor_cleanup(QmpInputVisitor *v) +{ You're not decrementing the reference to v-obj. Actually, we should also increment the reference to it before storing or document that the reference ownership is transferred to this function. Ownership never gets transferred to the visitor in the dispatch path; the caller is supposed to free it. I thought there was some manipulation elsewhere that prevented us from using a const argument to qmp_input_visitor_new() instead, but it looks like we can. Otherwise, I'd prefer to do an inc+dec to make this more explicit. +qemu_free(v); +} + +QmpInputVisitor *qmp_input_visitor_new(QObject *obj) +{ +QmpInputVisitor *v; + +v = qemu_mallocz(sizeof(*v)); + +v-visitor.start_struct = qmp_input_start_struct; +v-visitor.end_struct = qmp_input_end_struct; +v-visitor.start_list = qmp_input_start_list; +v-visitor.next_list = qmp_input_next_list; +v-visitor.end_list = qmp_input_end_list; +v-visitor.type_enum = qmp_input_type_enum; +v-visitor.type_int = qmp_input_type_int; +v-visitor.type_bool = qmp_input_type_bool; +v-visitor.type_str = qmp_input_type_str; +v-visitor.type_number = qmp_input_type_number; +v-visitor.start_optional = qmp_input_start_optional; +v-visitor.end_optional = qmp_input_end_optional; + +v-obj = obj; + +return v; +} diff --git a/qapi/qmp-input-visitor.h b/qapi/qmp-input-visitor.h new file mode 100644 index 000..3f798f0 --- /dev/null +++ b/qapi/qmp-input-visitor.h @@ -0,0 +1,27 @@ +/* + * Input Visitor + * + * Copyright IBM, Corp. 2011 + * + * Authors: + * Anthony Liguorialigu...@us.ibm.com + * + * This work is licensed under the terms of the GNU LGPL, version 2.1 or later. + * See the COPYING.LIB file in the top-level directory. + * + */ + +#ifndef QMP_INPUT_VISITOR_H +#define QMP_INPUT_VISITOR_H + +#include qapi-visit-core.h +#include qobject.h + +typedef struct QmpInputVisitor QmpInputVisitor; + +QmpInputVisitor *qmp_input_visitor_new(QObject *obj); +void qmp_input_visitor_cleanup(QmpInputVisitor *v); + +Visitor *qmp_input_get_visitor(QmpInputVisitor *v); + +#endif diff --git a/qerror.h b/qerror.h index 16c830d..7a89a50 100644 --- a/qerror.h +++ b/qerror.h @@ -124,6 +124,9 @@ QError *qobject_to_qerror(const QObject *obj); #define QERR_JSON_PARSE_ERROR \ { 'class': 'JSONParseError', 'data': { 'message': %s } } +#define QERR_QAPI_VISITOR_STACK_OVERRUN \ +{ 'class': 'QAPIVisitorStackOverrun', 'data': {} } + #define QERR_KVM_MISSING_CAP \ { 'class': 'KVMMissingCap', 'data': { 'capability': %s, 'feature': %s } }
[Qemu-devel] [PATCH] Added legacy 9P2000 support back into QEMU
It was tested with v9fs in the guest by using: sudo mount -t 9p -o trans=virtio,version=9p2000 v_boot /pmnt Signed-off-by: William K. Bittner wkbit...@us.ibm.com diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c index b5fc52b..febfba2 100644 --- a/hw/9pfs/virtio-9p.c +++ b/hw/9pfs/virtio-9p.c @@ -1186,6 +1186,8 @@ static void v9fs_version(V9fsState *s, V9fsPDU *pdu) s-proto_version = V9FS_PROTO_2000U; } else if (!strcmp(version.data, 9P2000.L)) { s-proto_version = V9FS_PROTO_2000L; +} else if (!strcmp(version.data, 9P2000)) { +s-proto_version = V9FS_PROTO_2000; } else { v9fs_string_sprintf(version, unknown); } diff --git a/hw/9pfs/virtio-9p.h b/hw/9pfs/virtio-9p.h index 622928f..c5b5229 100644 --- a/hw/9pfs/virtio-9p.h +++ b/hw/9pfs/virtio-9p.h @@ -94,6 +94,7 @@ enum { }; enum p9_proto_version { +V9FS_PROTO_2000 = 0x00, V9FS_PROTO_2000U = 0x01, V9FS_PROTO_2000L = 0x02, }; -- 1.7.4.1