Re: [Qemu-devel] [PATCH 03/12] VMDK: probe for mono flat image

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 6:45 AM, Fam Zheng famc...@gmail.com wrote:
 Have you tried removing that comment line to see if VMware still
 recognizes the file?


 Yes, it recognizes iff the first option line (non-comment, non-empty)
 is version=1 or version=2“.

 (An interesting thing that it is both space sensitive and case sensitive)

Cool, nice job figuring this out.  We should do the same.

Stefan



Re: [Qemu-devel] [PATCH RFC] target-ppc: Correctly handle translation address when bus unit ID = 0x07F

2011-06-15 Thread Alexander Graf

On 14.06.2011, at 23:49, Andreas Färber wrote:

 Am 13.06.2011 um 15:31 schrieb Alexander Graf:
 
 On 13.06.2011, at 12:13, Andreas Färber wrote:
 
 +/* Memory forced */
 +ctx-raddr = ((sr  0xF)  28) | (eaddr  0x0FFF);
 
 This is exactly the same as ctx-raddr = eaddr, no?
 
 No, not that I see. The manual is explicit about this calculation:
 
 the processor [...] generates a memory access
 with the physical address specified by the lowest-order four bits in the 
 segment register
 (SR[28–31]) concatenated with LA4–LA31. (6-63 / 6.10.4)
 
 That matches figure 6-26 on the following page, and I've added a similar 
 comment in v2.
 
 I see no compelling reason why the lower nibble of the SR should always match 
 the SR#, the upper nibble of eaddr.

Ah, sr is not the segment register number, but the content of the segement 
register. Then it's ok :).


Alex




Re: [Qemu-devel] [PATCH] usb-ehci: move device/vendor/class id to qdev

2011-06-15 Thread Gerd Hoffmann

On 06/14/11 19:40, Michael S. Tsirkin wrote:

This is on top of my pci tree.


Looks good.  Wanna queue this up in your PCI tree, so all devices are 
switched over at once when this is merged?


cheers,
  Gerd




Re: [Qemu-devel] High speed polling

2011-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2011 at 11:32 PM, Clay Andreasen c...@cray.com wrote:
 I have a network device simulation that I am connecting to multiple
 instances of Qemu (nodes) via a shared memory queue.  It works pretty well
 as
 long as all of the nodes are initiating communication but when one node is
 passive, it must poll to get packets.  So far the fastest I have been able
 to
 get it to poll is about every 2M emulated clocks.
 This is with CONFIG_HIGH_RES_TIMERS and CONFIG_NO_HZ on the host.
 I also set MIN_TIMER_REARM_NS in qemu-timer.c to 10.
 Is there some way to increase the polling rate by about an order of
 magnitude?

Without more details it's hard to say what is going on:

Running an x86 guest?  Are you using ./configure --enable-io-thread?
It sounds like you may not be using KVM?  How many vcpus are running
on the host in total compared to the number of logical CPUs on the
host?

You haven't given details on how you are polling in the guest.  Are
you running a polling loop in ring 0 or is the guest running a
full-blown OS and polling from userspace?

Why are you polling in the first place - to minimize latency?

Stefan



Re: [Qemu-devel] [PATCH 01/13] spice: Use cpu_register_physical_memory_log for dirty log enabling

2011-06-15 Thread Gerd Hoffmann

  Hi,


Note: The addtional vga_dirty_log_start for the primary interface looked
stray. qxl_write_config enabled logging for all interfaces anyway.


No.  qxl_write_config is hooked for the primary only.


  case QXL_RAM_RANGE_INDEX:
-cpu_register_physical_memory(addr, size, qxl-vga.vram_offset | 
IO_MEM_RAM);
+cpu_register_physical_memory_log(addr, size,
+ qxl-vga.vram_offset | IO_MEM_RAM,
+ 0, true);


if (qxl-id == 0) {
cpu_register_physical_memory_log(...)
} else {
cpu_register_physical_memory()
}

Only the primary is vga compatible and thus needs dirty logging.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH] usb-ehci: move device/vendor/class id to qdev

2011-06-15 Thread Michael S. Tsirkin
On Wed, Jun 15, 2011 at 08:32:50AM +0200, Gerd Hoffmann wrote:
 On 06/14/11 19:40, Michael S. Tsirkin wrote:
 This is on top of my pci tree.
 
 Looks good.  Wanna queue this up in your PCI tree, so all devices
 are switched over at once when this is merged?
 
 cheers,
   Gerd

Guess so, yes.
Thanks for the review.

-- 
MST



[Qemu-devel] [PATCH v2 01/13] spice: Use cpu_register_physical_memory_log for dirty log enabling

2011-06-15 Thread Jan Kiszka
Drop outdated dirty log disable/enable around PCI remapping and register
the BAR for dirty logging via cpu_register_physical_memory_log. That
allows to remove all vga_dirty_log_start/stop references from qxl.

CC: Gerd Hoffmann kra...@redhat.com
Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Changes in v2:
 - don't enable logging for secondary adapter

 hw/qxl.c |9 +++--
 1 files changed, 3 insertions(+), 6 deletions(-)

diff --git a/hw/qxl.c b/hw/qxl.c
index 1906e84..01149ae 100644
--- a/hw/qxl.c
+++ b/hw/qxl.c
@@ -619,12 +619,10 @@ static void qxl_write_config(PCIDevice *d, uint32_t 
address,
 PCIQXLDevice *qxl = DO_UPCAST(PCIQXLDevice, pci, d);
 VGACommonState *vga = qxl-vga;
 
-vga_dirty_log_stop(vga);
 pci_default_write_config(d, address, val, len);
 if (vga-map_addr  qxl-pci.io_regions[0].addr == -1) {
 vga-map_addr = 0;
 }
-vga_dirty_log_start(vga);
 }
 
 static void qxl_check_state(PCIQXLDevice *d)
@@ -1037,12 +1035,11 @@ static void qxl_map(PCIDevice *pci, int region_num,
 qxl-io_base = addr;
 break;
 case QXL_RAM_RANGE_INDEX:
-cpu_register_physical_memory(addr, size, qxl-vga.vram_offset | 
IO_MEM_RAM);
+cpu_register_physical_memory_log(addr, size,
+ qxl-vga.vram_offset | IO_MEM_RAM,
+ 0, qxl-id == 0);
 qxl-vga.map_addr = addr;
 qxl-vga.map_end = addr + size;
-if (qxl-id == 0) {
-vga_dirty_log_start(qxl-vga);
-}
 break;
 case QXL_ROM_RANGE_INDEX:
 cpu_register_physical_memory(addr, size, qxl-rom_offset | IO_MEM_ROM);
-- 
1.7.1



Re: [Qemu-devel] [PATCH, v2] vga: Fix type of lfb/map_addr/end.

2011-06-15 Thread Jan Kiszka
On 2011-06-14 21:53, Richard Henderson wrote:
 These addresses have been passed through pci_to_cpu_addr,
 and thus need to be full target_phys_addr_t.
 
 Signed-off-by: Richard Henderson r...@twiddle.net
 Cc: Jan Kiszka jan.kis...@siemens.com
 ---
 
  V1-V2:
   lfb_addr/end also widened to guest address width.
 
  hw/vga_int.h |   10 +-
  1 files changed, 5 insertions(+), 5 deletions(-)
 
 diff --git a/hw/vga_int.h b/hw/vga_int.h
 index d2811bd..eee91a8 100644
 --- a/hw/vga_int.h
 +++ b/hw/vga_int.h
 @@ -106,13 +106,13 @@ typedef void (* vga_update_retrace_info_fn)(struct 
 VGACommonState *s);
  typedef struct VGACommonState {
  uint8_t *vram_ptr;
  ram_addr_t vram_offset;
 +target_phys_addr_t lfb_addr;
 +target_phys_addr_t lfb_end;
 +target_phys_addr_t map_addr;
 +target_phys_addr_t map_end;
  uint32_t vram_size;
 -uint32_t lfb_addr;
 -uint32_t lfb_end;
 -uint32_t map_addr;
 -uint32_t map_end;
 -uint32_t lfb_vram_mapped; /* whether 0xa is mapped as ram */
  uint32_t latch;
 +uint32_t lfb_vram_mapped; /* whether 0xa is mapped as ram */
  uint8_t sr_index;
  uint8_t sr[256];
  uint8_t gr_index;

Acked-by: Jan Kiszka jan.kis...@siemens.com

But one of us will have to rebase at the end of the day.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Dmitry Konishchev
On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 Yes, please.

OK, I'll do it as soon I'll find time for it.


On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 For image files the block layer should be caching the device capacity (size)
 anyway, so you probably don't need to allocate the array, just call
 bdrv_get_geometry().  That might make it easier to write a self-contained
 function.

I've tried to not cache, but it turned out that bdrv_get_geometry()
calls are quite noticeable in profiler without caching.

-- 
Дмитрий Конищев (Dmitry Konishchev)
mailto:konishc...@gmail.com



Re: [Qemu-devel] [PATCH] Fix signal handling of SIG_IPI when io-thread is enabled

2011-06-15 Thread Jan Kiszka
On 2011-06-15 07:20, Alexandre Raymond wrote:
 Both the signal thread (via sigwait()) and the cpu thread (via
 a normal signal handler) were attempting to catch SIG_IPI.

Why? Ahh, because of qemu_cpu_kick_self: raise(SIG_IPI)! That should
generate a per-process SIG_IPI. And that may not only affect Darwin.
Looks good.

Acked-by: Jan Kiszka jan.kis...@siemens.com

 
 This resulted in random freezes under Darwin.
 
 This patch separates SIG_IPI from the rest of the signals handled
 by the signal thread, because it is independently caught by the cpu
 thread.
 
 Signed-off-by: Alexandre Raymond cerb...@gmail.com
 ---
  cpus.c |   10 +-
  1 files changed, 9 insertions(+), 1 deletions(-)
 
 diff --git a/cpus.c b/cpus.c
 index 18a1522..84ffd1c 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -394,10 +394,18 @@ static int qemu_signal_init(void)
  sigaddset(set, SIGUSR2);
  pthread_sigmask(SIG_UNBLOCK, set, NULL);
  
 +/*
 + * SIG_IPI must be blocked in the main thread and must not be caught
 + * by sigwait() in the signal thread. Otherwise, the cpu thread will
 + * not catch it reliably.
 + */
 +sigemptyset(set);
 +sigaddset(set, SIG_IPI);
 +pthread_sigmask(SIG_BLOCK, set, NULL);
 +
  sigemptyset(set);
  sigaddset(set, SIGIO);
  sigaddset(set, SIGALRM);
 -sigaddset(set, SIG_IPI);
  sigaddset(set, SIGBUS);
  #else
  sigemptyset(set);

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



[Qemu-devel] [PATCH v2] linux-user: Fix the computation of the requested heap size

2011-06-15 Thread Cédric VINCENT
There were several remaining bugs in the previous implementation of
do_brk():

1. the value of new_alloc_size was one page too large when the
   requested brk was aligned on a host page boundary.

2. no new pages should be (re-)allocated when the requested brk is
   in the range of the pages that were already allocated
   previsouly (for the same purpose).  Technically these pages are
   never unmapped in the current implementation.

The problem/fix can be reproduced/validated with the test-suite above:

#include unistd.h   /* syscall(2),  */
#include sys/syscall.h  /* SYS_brk, */
#include stdio.h/* puts(3), */
#include stdlib.h   /* exit(3), EXIT_*, */
#include stdint.h   /* uint*_t, */
#include sys/mman.h /* mmap(2), MAP_*,  */
#include string.h   /* memset(3), */

int main()
{
int exit_status = EXIT_SUCCESS;
uint8_t *current_brk = 0;
uint8_t *initial_brk;
uint8_t *new_brk;
uint8_t *old_brk;
int failure = 0;
int i;

void test_brk(int increment, int expected_result) {
new_brk = (uint8_t *)syscall(SYS_brk, current_brk + increment);
if ((new_brk == current_brk) == expected_result)
failure = 1;
current_brk = (uint8_t *)syscall(SYS_brk, 0);
}

void test_result() {
if (!failure)
puts(OK);
else {
puts(failure);
exit_status = EXIT_FAILURE;
}
}

void test_title(const char *title) {
failure = 0;
printf(%-45s : , title);
fflush(stdout);
}

test_title(Initialization);
test_brk(0, 1);
initial_brk = current_brk;
test_result();

test_title(Don't overlap \brk\ pages);
test_brk(HOST_PAGE_SIZE, 1);
test_brk(HOST_PAGE_SIZE, 1);
test_result();

/* Preparation for the test Re-allocated heap is initialized.  */
old_brk = current_brk - HOST_PAGE_SIZE;
memset(old_brk, 0xFF, HOST_PAGE_SIZE);

test_title(Don't allocate the same \brk\ page twice);
test_brk(-HOST_PAGE_SIZE, 1);
test_brk(HOST_PAGE_SIZE, 1);
test_result();

test_title(Re-allocated \brk\ pages are initialized);
for (i = 0; i  HOST_PAGE_SIZE; i++) {
if (old_brk[i] != 0) {
printf((index = %d, value = 0x%x) , i, old_brk[i]);
failure = 1;
break;
}
}
test_result();

test_title(Don't allocate \brk\ pages over \mmap\ pages);
new_brk = mmap(current_brk, HOST_PAGE_SIZE / 2, PROT_READ, MAP_PRIVATE 
| MAP_ANONYMOUS | MAP_FIXED, -1, 0);
if (new_brk == (void *) -1)
puts(unknown);
else {
test_brk(HOST_PAGE_SIZE, 0);
test_result();
}

test_title(All \brk\ pages are writable (please wait));
if (munmap(current_brk, HOST_PAGE_SIZE / 2) != 0)
puts(unknown);
else {
while (current_brk - initial_brk  2*1024*1024*1024UL) {
old_brk = current_brk;

test_brk(HOST_PAGE_SIZE, -1);
if (old_brk == current_brk)
break;

for (i = 0; i  HOST_PAGE_SIZE; i++)
old_brk[i] = 0xAA;
}
puts(OK);
}

test_title(Maximum size of the heap  16MB);
failure = (current_brk - initial_brk)  16*1024*1024;
test_result();

exit(exit_status);
}

Changes introduced in patch v2:

* extend the brk test-suite embedded within the commit message;

* heap contents have to be initialized to zero, this bug was
  exposed by tst-calloc.c from the GNU C library;

* don't [try to] allocate a new host page if the new brk is
  equal to the latest allocated host page (brk_page); and

* print some debug information when DEBUGF_BRK is defined.

Signed-off-by: Cédric VINCENT cedric.vinc...@st.com
Reviewed-by: Christophe Guillon christophe.guil...@st.com
Cc: Riku Voipio riku.voi...@iki.fi
---
 linux-user/syscall.c |   37 +
 1 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index b975730..608924a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -709,29 +709,44 @@ char *target_strerror(int err)
 
 static abi_ulong target_brk;
 static abi_ulong target_original_brk;
+static abi_ulong brk_page;
 
 void target_set_brk(abi_ulong new_brk)
 {
 target_original_brk = target_brk = HOST_PAGE_ALIGN(new_brk);
+brk_page = HOST_PAGE_ALIGN(target_brk);
 }
 
+//#define DEBUGF_BRK(message, args...) do { fprintf(stderr, (message), ## 
args); } while (0)
+#define DEBUGF_BRK(message, args...)
+
 /* do_brk() must return target values 

Re: [Qemu-devel] [PATCH v2] target-ppc: Handle memory-forced I/O controller access

2011-06-15 Thread Alexander Graf

On 14.06.2011, at 23:27, Andreas Färber wrote:

 From: Hervé Poussineau hpous...@reactos.org
 
 On at least the PowerPC 601, a direct-store (T=1) with bus unit ID 0x07F
 is special-cased as memory-forced I/O controller access. It is supposed
 to be checked immediately if T=1, bypassing all protection mechanisms
 and acting cache-inhibited and global.

Thanks, applied to ppc-next.


Alex




Re: [Qemu-devel] [PATCH] xen: avoid tracking the region 0xa0000 - 0xbffff

2011-06-15 Thread Alexander Graf

On 14.06.2011, at 17:24, Stefano Stabellini wrote:

 On Tue, 14 Jun 2011, Alexander Graf wrote:
 On 14.06.2011, at 13:48, Stefano Stabellini wrote:
 
 On Tue, 14 Jun 2011, Alexander Graf wrote:
 On 03.06.2011, at 17:56, stefano.stabell...@eu.citrix.com 
 stefano.stabell...@eu.citrix.com wrote:
 
 From: Stefano Stabellini stefano.stabell...@eu.citrix.com
 
 Xen can only do dirty bit tracking for one memory region, so we should
 explicitly avoid trying to track the legacy VGA region between 0xa
 and 0xb, rather than trying and failing.
 
 Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 ---
 xen-all.c |4 
 1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/xen-all.c b/xen-all.c
 index 9a5c3ec..1fdc2e8 100644
 --- a/xen-all.c
 +++ b/xen-all.c
 @@ -218,6 +218,10 @@ static int xen_add_to_physmap(XenIOState *state,
   if (get_physmapping(state, start_addr, size)) {
   return 0;
   }
 +/* do not try to map legacy VGA memory */
 +if (start_addr = 0xa  start_addr + size = 0xb) {
 
 I don't quite like the hardcoded range here. What exactly is the issue? 
 The fact that you can only map a single region? Then do a counter and fail 
 when it's  1. 
 
 That is what we were doing before: succeeding the first time and
 failing from the second time on.
 By coincidence the second time was the range 0xa-0xb so
 everything worked as expected, but it wasn't obvious why.
 I am just trying to make sure that one year from now it will be clear
 just looking at the code why it works.
 
 
 If you don't want to map the VGA region as memory slot, why not change the 
 actual mapping code in the cirrus adapter?
 
 Because I didn't want to introduce any ugly if (xen_enable()) in generic
 code, if it is that simple to catch the issue from xen specific code.
 
 Well sure, but 2 years from now yet another region will be introduced that 
 might even be registered before the FB and everyone's puzzled again :). How 
 about you print a warning when anyone tries to map anything after the first 
 map? Or - as Jan suggests - implement multiple regions.
 
 If you prefer, you could even check for the VGA range as known broken and 
 only print warnings on others.
 
 I can do that (actually we already do it) but it wouldn't change the
 fact that if somebody modifies hw/cirrus_vga.c:map_linear_vram to call
 cpu_register_physical_memory_log for the legacy range first, it would
 break xen, that is the problem I was trying to solve.
 
 In order to make the code more reliable, and also catch the scenario
 where another region is registered before the framebuffer, we could do
 something like this, but it is not very pretty:
 
 
 
 diff --git a/xen-all.c b/xen-all.c
 index 9a5c3ec..de1e724 100644
 --- a/xen-all.c
 +++ b/xen-all.c
 @@ -214,6 +214,7 @@ static int xen_add_to_physmap(XenIOState *state,
 unsigned long i = 0;
 int rc = 0;
 XenPhysmap *physmap = NULL;
 +RAMBlock *block;
 
 if (get_physmapping(state, start_addr, size)) {
 return 0;
 @@ -221,7 +222,16 @@ static int xen_add_to_physmap(XenIOState *state,
 if (size = 0) {
 return -1;
 }
 +/* only add the vga vram to physmap */

Please add a comment here, explaining that Xen can only handle a single dirty 
log region for now, and that we want the linear framebuffer to be that region. 
Also, please resend with proper patch headers and I'll pull it into the 
xen-next tree.


Alex

 +QLIST_FOREACH(block, ram_list.blocks, next) {
 +if (!strcmp(block-idstr, vga.vram)  block-offset == phys_offset
 + start_addr  0xb) {
 +goto go_physmap;
 +}
 +}
 +return -1;
 
 +go_physmap:
 DPRINTF(mapping vram to %llx - %llx, from %llx\n, start_addr, 
 start_addr + size, phys_offset);
 for (i = 0; i  size  TARGET_PAGE_BITS; i++) {
 unsigned long idx = (phys_offset  TARGET_PAGE_BITS) + i;




Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing

2011-06-15 Thread Alexander Graf

On 14.06.2011, at 15:27, Stefano Stabellini wrote:

 On Tue, 14 Jun 2011, Alexander Graf wrote:
 static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
 {
  PCII440FXState *d = opaque;
 @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char 
 *device_name,
  d = pci_create_simple(b, 0, device_name);
  *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d);
 
 -piix3 = DO_UPCAST(PIIX3State, dev,
 -  pci_create_simple_multifunction(b, -1, true, 
 PIIX3));
 +if (xen_enabled()) {
 +piix3 = DO_UPCAST(PIIX3State, dev,
 +pci_create_simple_multifunction(b, -1, true, 
 PIIX3-xen));
 +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
 +piix3, XEN_PIIX_NUM_PIRQS);
 
 But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the 
 reason behind this change?
 
 It is still a piix3, but also provides non-legacy interrupt links to the
 IO-APIC.
 The four pins of each PCI device on the bus not only are routed to the
 normal four pirqs (programmed writing to 0x60-0x63, see above) but also
 they are connected to the IO-APIC directly.
 These additional routes can only be discovered through ACPI, so you need
 matching ACPI tables. We used to build the old ACPI tables like this:
 
 /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */
 printf(Name(PRTA, Package() {\n);
 for ( dev = 1; dev  32; dev++ )
  for ( intx = 0; intx  4; intx++ ) /* INTA-D */
  printf(Package(){0x%04x, %u, 0, %u},\n,
 dev, intx, ((dev*4+dev/8+intx)31)+16);
 printf(})\n);
 
 
 Interesting concept, but completely non-standard and very much
 different from real hardware. Please at least add a comment there to
 show readers that Xen is doing a hack which is not at all related to
 how the PIIX really works.
 
 Isn't this more a function of the wires on the motherboard than the
 PIIX specifically? i.e. this just encodes the permutation of the wires
 from the PCI slots into the IO-APIC input pins (bypassing the PIIX,
 which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)?
 
 Interrupts with PCI work slightly different. PCI devices can map (themselves 
 or by software) to one of 4 interrupt lines: INTA, INTB, INTC, INTD. These 
 get converted using PCI host controller specific logic to 4 interrupt lines 
 which then go into the IO-APIC.
 
 The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could 
 be 26 though.
 
 The number of redirection entries in the IOAPIC can be discovered
 reading from the IOAPICVER register and it is a property of a specific
 model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more
 pins than the most popular IOAPIC used with PIIX3.

which means you're emulating hardware that never existed :).

 
 
 I haven't seen a single case where PCI devices have a direct link to the 
 IO-APIC. I also have not seen any PCI host controller that exports more than 
 4 interrupts. Giving each PCI device its own line, on top of that more than 
 ever could be in real hardware, is a plain hack IMHO.
 
 Actually this happens quite often: if I am not mistaken all the GSIs
 higher than 15 are actually the result of a direct connection between
 an interrupt source and the IOAPIC. I have several on my testboxes.

Yes. Interrupt source meaning a wire on the board. I haven't seen any 
situation so far where you get direct IO-APIC connections to PCI _device_ pins. 
You obviously get plenty connections to PCI _bus_ pins.

 Also give a look at the Intel Multiprocessor Specification, section
 3.6.2.3: as you can see from the diagram in Symmetric I/O Mode all the
 interrupts are routed through the IOAPIC directly.
 
 
 Did this really give you actual performance/latency/scalability gains? I 
 still think for devices that matter, we should go with MSI rather than 
 deriving from real hw.
 
 
 Not all the operating systems support MSIs, it is nice to be able to
 avoid interrupt sharing without recurring to MSIs.

Yes and no. It's a tradeoff. If no interrupt sharing means that we emulate 
hardware that simply never could have existed the way we model it, I think it's 
a bad idea.

 Also this is how Xen has been working for more then 5 years in HVM mode,
 so this configuration is well tested and supported by most operating
 systems (at least all the ones we tried so far).

I'm fine with Xen breaking its own neck, as long as it doesn't affect non-Xen 
code paths. Just be aware that I'm not a huge fan of this approach :).

 In any case I think it is a good idea to add a comment to better explain
 what we are doing, see below.
 
 
 
 commit 973bb091a967fdec37a1bc8fe30d46a483d2903d
 Author: Stefano Stabellini stefano.stabell...@eu.citrix.com
 Date:   Tue May 17 12:10:36 2011 +
 
xen: fix interrupt routing
 
- remove i440FX-xen and i440fx_write_config_xen
we don't need to intercept pci config writes to i440FX anymore;
 
- introduce PIIX3-xen and piix3_write_config_xen
   

Re: [Qemu-devel] [PATCH] xen: fix interrupt routing

2011-06-15 Thread Alexander Graf

On 14.06.2011, at 20:18, Stefano Stabellini wrote:

 On Tue, 14 Jun 2011, Jan Kiszka wrote:
 On 2011-06-14 15:27, Stefano Stabellini wrote:
 On Tue, 14 Jun 2011, Alexander Graf wrote:
 static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
 {
  PCII440FXState *d = opaque;
 @@ -267,8 +263,17 @@ static PCIBus *i440fx_common_init(const char 
 *device_name,
  d = pci_create_simple(b, 0, device_name);
  *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d);
 
 -piix3 = DO_UPCAST(PIIX3State, dev,
 -  pci_create_simple_multifunction(b, -1, true, 
 PIIX3));
 +if (xen_enabled()) {
 +piix3 = DO_UPCAST(PIIX3State, dev,
 +pci_create_simple_multifunction(b, -1, true, 
 PIIX3-xen));
 +pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
 +piix3, XEN_PIIX_NUM_PIRQS);
 
 But with XEN_PIIX_NUM_PIRQS it's not a piix3 anymore, no? What's the 
 reason behind this change?
 
 It is still a piix3, but also provides non-legacy interrupt links to the
 IO-APIC.
 The four pins of each PCI device on the bus not only are routed to the
 normal four pirqs (programmed writing to 0x60-0x63, see above) but also
 they are connected to the IO-APIC directly.
 These additional routes can only be discovered through ACPI, so you need
 matching ACPI tables. We used to build the old ACPI tables like this:
 
 /* PRTA: APIC routing table (via non-legacy IOAPIC GSIs). */
 printf(Name(PRTA, Package() {\n);
 for ( dev = 1; dev  32; dev++ )
  for ( intx = 0; intx  4; intx++ ) /* INTA-D */
  printf(Package(){0x%04x, %u, 0, %u},\n,
 dev, intx, ((dev*4+dev/8+intx)31)+16);
 printf(})\n);
 
 
 Interesting concept, but completely non-standard and very much
 different from real hardware. Please at least add a comment there to
 show readers that Xen is doing a hack which is not at all related to
 how the PIIX really works.
 
 Isn't this more a function of the wires on the motherboard than the
 PIIX specifically? i.e. this just encodes the permutation of the wires
 from the PCI slots into the IO-APIC input pins (bypassing the PIIX,
 which is only used for legacy ISA IRQs i.e. by non-APIC aware OSes)?
 
 Interrupts with PCI work slightly different. PCI devices can map 
 (themselves or by software) to one of 4 interrupt lines: INTA, INTB, INTC, 
 INTD. These get converted using PCI host controller specific logic to 4 
 interrupt lines which then go into the IO-APIC.
 
 The IO-APIC is a chip with a limited number of pins. IIRC it was 24, could 
 be 26 though.
 
 The number of redirection entries in the IOAPIC can be discovered
 reading from the IOAPICVER register and it is a property of a specific
 model of IOAPIC. As a matter of fact Xen's emulated IOAPIC supports more
 pins than the most popular IOAPIC used with PIIX3.
 
 Do real IOAPICs exist with more than 24 pins? Otherwise there is the
 risk that OSes aren't well prepared for this oddity - specifically not
 when the chipset is specified to include a 24-pin IOAPIC.
 
 Linux supports up to 128 pins and as I wrote before all the other OSes
 we tested so far seem to react well.
 
 
 I haven't seen a single case where PCI devices have a direct link to the 
 IO-APIC. I also have not seen any PCI host controller that exports more 
 than 4 interrupts. Giving each PCI device its own line, on top of that 
 more than ever could be in real hardware, is a plain hack IMHO.
 
 Actually this happens quite often: if I am not mistaken all the GSIs
 higher than 15 are actually the result of a direct connection between
 an interrupt source and the IOAPIC. I have several on my testboxes.
 
 Except that the interrupt source is the chipset with its PCI bridge, not
 individual PCI devices.
 
 That is the most common configuration but it is not the only one: I have
 an ACPI table that has individual PCI devices as source in some test
 boxes.
 In fact there is even an example of it in this good article about
 interrupt routing from the FreeBSD guys (it is the last figure):
 
 http://people.freebsd.org/~jhb/papers/bsdcan/2007/article/node5.html
 
 Figure 6 contains a portion of an example _PRT. Specifically, it
 includes the first entry in the table. This corresponds to the PCI
 interrupt for PCI bus 3, slot 7
 
 ..ZIP...
 
 For APIC mode, the interrupt is routed to GSI 66.  For this machine,
 ACPI assigns a base GSI of 64 to the I/O APIC with an APIC ID of 10.
 Thus, GSI 66 corresponds to pin 2 on that I/O APIC
 
 Unless I am missing something I don't think that interrupt is going
 through any PCI bridges...

I'm actually not quite sure what exactly he's describing here. But if it's 
bypassing the bus logic, it's not a normal PCI device :). Sure, there are 
special case devices that also expose a PCI interface. But real PCI cards that 
you plug in onto the PCI bus can't bypass the interrupt logic of the bus, as 
the only interrupt wires they have go to the bus. And since the PCI adapters we 
use in PC machines in Qemu are 

Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 8:38 AM, Dmitry Konishchev konishc...@gmail.com wrote:
 On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 Yes, please.

 OK, I'll do it as soon I'll find time for it.


 On Tue, Jun 14, 2011 at 7:58 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 For image files the block layer should be caching the device capacity (size)
 anyway, so you probably don't need to allocate the array, just call
 bdrv_get_geometry().  That might make it easier to write a self-contained
 function.

 I've tried to not cache, but it turned out that bdrv_get_geometry()
 calls are quite noticeable in profiler without caching.

Why is bdrv_get_geometry() slow?

Stefan



Re: [Qemu-devel] [PATCH v2 01/13] spice: Use cpu_register_physical_memory_log for dirty log enabling

2011-06-15 Thread Gerd Hoffmann

On 06/15/11 09:23, Jan Kiszka wrote:

Drop outdated dirty log disable/enable around PCI remapping and register
the BAR for dirty logging via cpu_register_physical_memory_log. That
allows to remove all vga_dirty_log_start/stop references from qxl.


Acked-by: Gerd Hoffmann kra...@redhat.com

cheers,
  Gerd



[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

2011-06-15 Thread moonman
I have to add that it's with kernel 2.6.39 and qemu 0.14.1 compiled from
source

[root@Plugbox ~]# uname -a
Linux Plugbox 2.6.39-ARCH #1 PREEMPT Tue Jun 14 15:55:01 MDT 2011 armv5tel 
Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux
[root@Plugbox ~]# /i386/usr/bin/qemu-i386 -h
qemu-i386 version 0.14.1, Copyright (c) 2003-2008 Fabrice Bellard
usage: qemu-i386 [options] program [arguments...]
Linux CPU emulator (compiled for i386 emulation)
...

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/739785

Title:
  qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

Status in QEMU:
  New

Bug description:
  Good time of day everybody,

  I have been trying to make usermode qemu on ARM with plugapps
  (archlinux) with archlinux i386 chroot to work.

  1. I installed arch linux in a virtuabox and created a chroot for it with 
mkarchroot. Transferred it to my pogo plug into /i386/
  2. I comiled qemu-i386 static and put it into /i386/usr/bin/
  ./configure --static --disable-blobs --disable-system 
--target-list=i386-linux-user
  make

  3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and 
installed it.
  uname -a
  Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel 
Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux

  4. Added the following options into /etc/rc.local
  /sbin/modprobe binfmt_misc
  /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
  echo 
':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:'
 /proc/sys/fs/binfmt_misc/register

  5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld-
  linux.so.3 is a link to that file) from /lib/ to /i386/lib/

  6.Now i chroot into /i386 and I get this:
  [root@Plugbox i386]# chroot .
  [II aI hnve ao n@P /]# pacman -Suy
  bash: fork: Invalid argument

  7.I also downloaded linux-user-test-0.3 from qemu website and ran the test:
  [root@Plugbox linux-user-test-0.3]# make
  ./qemu-linux-user.sh
  [qemu-i386]
  ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l 
dummyfile
  BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: 
Assertion `needed != ((void *)0)' failed!
  make: *** [test] Error 127

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions



[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

2011-06-15 Thread Ricardo Padilha
Same here. The workaround does not work on a DroboFS, since it was
actually the default setting all along.

root@DroboFS:~# cat /proc/cpu/alignment 
User:   74283231
System: 0
Skipped:0
Half:   0
Word:   74283231
DWord:  0
Multi:  0
User faults:2 (fixup)

root@DroboFS:~# uname -a

Linux DroboFS 2.6.22.18 #1 Thu Jan 20 14:29:47 PST 2011 armv5tejl GNU/Linux

Tested qemu 0.14.0, but I can give 0.14.1 a try as well, if necessary.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/739785

Title:
  qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

Status in QEMU:
  New

Bug description:
  Good time of day everybody,

  I have been trying to make usermode qemu on ARM with plugapps
  (archlinux) with archlinux i386 chroot to work.

  1. I installed arch linux in a virtuabox and created a chroot for it with 
mkarchroot. Transferred it to my pogo plug into /i386/
  2. I comiled qemu-i386 static and put it into /i386/usr/bin/
  ./configure --static --disable-blobs --disable-system 
--target-list=i386-linux-user
  make

  3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and 
installed it.
  uname -a
  Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel 
Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux

  4. Added the following options into /etc/rc.local
  /sbin/modprobe binfmt_misc
  /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
  echo 
':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:'
 /proc/sys/fs/binfmt_misc/register

  5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld-
  linux.so.3 is a link to that file) from /lib/ to /i386/lib/

  6.Now i chroot into /i386 and I get this:
  [root@Plugbox i386]# chroot .
  [II aI hnve ao n@P /]# pacman -Suy
  bash: fork: Invalid argument

  7.I also downloaded linux-user-test-0.3 from qemu website and ran the test:
  [root@Plugbox linux-user-test-0.3]# make
  ./qemu-linux-user.sh
  [qemu-i386]
  ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l 
dummyfile
  BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: 
Assertion `needed != ((void *)0)' failed!
  make: *** [test] Error 127

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions



Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Avi Kivity

On 06/14/2011 09:10 AM, Jan Kiszka wrote:

On 2011-06-13 10:45, Avi Kivity wrote:
  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
  From: Jan Kiszkajan.kis...@siemens.com

  These FPU states are properly maintained by KVM but not yet by TCG. So
  far we unconditionally set them to 0 in the guest which may cause
  state corruptions - not only during migration.


  -#define CPU_SAVE_VERSION 12
  +#define CPU_SAVE_VERSION 13


  Incrementing the version number seems excessive - I can't imagine a
  real-life guest will break due to fp pointer corruption

  However, I don't think we have a mechanism for optional state.  We
  discussed this during the 18th VMState Subsection Symposium and IIRC
  agreed to re-raise the issue when we encountered it, which appears to be
  now.


Whatever we invent, it has to be backported as well to allow that
infamous traveling back in time, migrating VMs from newer to older versions.

Would that backporting be simpler if we used an unconditional subsection
for the additional states?


Thinking about it, a conditional subsection would work fine.  Most 
threads will never see an fpu error, and are all initialized to a clean 
slate.


SDM 1 8.1.9.1 says:


8.1.9.1 Fopcode Compatibility Sub-mode
Beginning with the Pentium 4 and Intel Xeon processors, the IA-32 
architecture
provides program control over the storing of the last instruction 
opcode (sometimes
referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR 
enables (set)

or disables (clear) the fopcode compatibility mode.
If FOP code compatibility mode is enabled, the FOP is defined as it 
has always been
in previous IA32 implementations (always defined as the FOP of the 
last non-trans-

parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code
compatibility mode is disabled (default), FOP is only valid if the 
last non-transparent
FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked 
exception.


So fopcode will usually be clear.

--
error compiling committee.c: too many arguments to function




[Qemu-devel] [Bug 739785] Re: qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

2011-06-15 Thread Peter Maydell
OK, that's pretty conclusive. My initial theory about what's going on
here can't be right -- I'll investigate further.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/739785

Title:
  qemu-i386 user mode on ARMv5 host fails (bash: fork: Invalid argument)

Status in QEMU:
  New

Bug description:
  Good time of day everybody,

  I have been trying to make usermode qemu on ARM with plugapps
  (archlinux) with archlinux i386 chroot to work.

  1. I installed arch linux in a virtuabox and created a chroot for it with 
mkarchroot. Transferred it to my pogo plug into /i386/
  2. I comiled qemu-i386 static and put it into /i386/usr/bin/
  ./configure --static --disable-blobs --disable-system 
--target-list=i386-linux-user
  make

  3. I also compiled linux kernel 2.6.38 with CONFIG_BINFMT_MISC=y and 
installed it.
  uname -a
  Linux Plugbox 2.6.38 #4 PREEMPT Fri Mar 18 22:19:10 CDT 2011 armv5tel 
Feroceon 88FR131 rev 1 (v5l) Marvell SheevaPlug Reference Board GNU/Linux

  4. Added the following options into /etc/rc.local
  /sbin/modprobe binfmt_misc
  /bin/mount binfmt_misc -t binfmt_misc /proc/sys/fs/binfmt_misc
  echo 
':qemu-i386:M::\x7fELF\x01\x01\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x02\x00\x03\x00:\xff\xff\xff\xff\xff\xfe\xfe\xff\xff\xff\xff\xff\xff\xff\xff\xff\xfb\xff\xff\xff:/usr/bin/qemu-i386:'
 /proc/sys/fs/binfmt_misc/register

  5. Also copied ld-linux.so.3 (actually ld-2.13.so because ld-
  linux.so.3 is a link to that file) from /lib/ to /i386/lib/

  6.Now i chroot into /i386 and I get this:
  [root@Plugbox i386]# chroot .
  [II aI hnve ao n@P /]# pacman -Suy
  bash: fork: Invalid argument

  7.I also downloaded linux-user-test-0.3 from qemu website and ran the test:
  [root@Plugbox linux-user-test-0.3]# make
  ./qemu-linux-user.sh
  [qemu-i386]
  ../qemu-0.14.0/i386-linux-user/qemu-i386 -L ./gnemul/qemu-i386 i386/ls -l 
dummyfile
  BUG IN DYNAMIC LINKER ld.so: dl-version.c: 210: _dl_check_map_versions: 
Assertion `needed != ((void *)0)' failed!
  make: *** [test] Error 127

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/739785/+subscriptions



Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Dmitry Konishchev
On Wed, Jun 15, 2011 at 12:39 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 Why is bdrv_get_geometry() slow?

Mmm.. Frankly, I haven't looked so deep, but it is going to be slow at
least for raw images due to using lseek().

-- 
Dmitry Konishchev
mailto:konishc...@gmail.com



[Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Dmitry Konishchev
This patch optimizes 'qemu-img convert' operation for volumes which are
almost fully unallocated. Here are the results of simple tests:

We have a snapshot of a volume:
$ qemu-img info snapshot.qcow2
image: snapshot.qcow2
file format: qcow2
virtual size: 5.0G (5372805120 bytes)
disk size: 4.0G
cluster_size: 65536

Create a volume from the snapshot and use it a little:
$ qemu-img create -f qcow2 -o backing_file=snapshot.qcow2 volume.qcow2

For volumes which are almost fully allocated we have a little regression:
$ time qemu-img convert -O qcow2 volume.qcow2 volume_snapshot.qcow2
real  2m43.864s
user  0m9.257s
sys   0m40.559s
$ time qemu-img-patched convert -O qcow2 volume.qcow2 volume_snapshot.qcow2
real  2m46.899s
user  0m9.749s
sys   0m40.471s

But now create a volume which is almost fully unallocated:
$ qemu-img create -f qcow2 -o backing_file=snapshot.qcow2 volume.qcow2 1T

And now we have more than twice decreased CPU consumption:
$ time qemu-img convert -O qcow2 volume.qcow2 volume_snapshot.qcow2
real  6m40.985s
user  4m13.832s
sys   0m33.738s
$ time qemu-img-patched convert -O qcow2 volume.qcow2 volume_snapshot.qcow2
real  4m28.448s
user  1m43.882s
sys   0m33.894s

Signed-off-by: Dmitry Konishchev konishc...@gmail.com
---
 qemu-img.c |  184 ++-
 1 files changed, 143 insertions(+), 41 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4f162d1..7f3d853 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -27,6 +27,7 @@
 #include osdep.h
 #include sysemu.h
 #include block_int.h
+#include block.h
 #include stdio.h
 
 #ifdef _WIN32
@@ -586,19 +587,95 @@ static int compare_sectors(const uint8_t *buf1, const 
uint8_t *buf2, int n,
 return res;
 }
 
+/*
+ * Copies sectors from one image to another.
+ * Writes only non-zero bytes to the output image.
+ *
+ * Returns 0 on success, -1 on error.
+ */
+static int copy_allocated_sectors(
+BlockDriverState *out_bs, BlockDriverState *bs, int n,
+int64_t sector_num, int64_t bs_offset, const uint64_t *bs_geometry, 
uint8_t *buf)
+{
+BlockDriverState *cur_bs;
+uint64_t cur_sectors;
+uint64_t bs_sector;
+int backing_depth;
+int allocated_num;
+int sector_found;
+int cur_n;
+
+while (n  0) {
+/* Look for the sectors in the source image and if they are not
+   allocated - sequentially in all its backing images. */
+
+cur_bs = bs;
+bs_sector = sector_num - bs_offset;
+backing_depth = 0;
+sector_found = 0;
+
+do {
+cur_sectors = bs_geometry[backing_depth++];
+
+if (bs_sector = cur_sectors) {
+continue;
+}
+
+if (bs_sector + n = cur_sectors) {
+cur_n = n;
+} else {
+cur_n = cur_sectors - bs_sector;
+}
+
+if (bdrv_is_allocated(cur_bs, bs_sector, cur_n, allocated_num)) {
+const uint8_t *cur_buf = buf;
+sector_found = 1;
+
+if (bdrv_read(cur_bs, bs_sector, buf, allocated_num)  0) {
+error_report(error while reading);
+return -1;
+}
+
+while (allocated_num  0) {
+if (is_allocated_sectors(cur_buf, allocated_num, cur_n)) {
+if (bdrv_write(out_bs, sector_num, cur_buf, cur_n)  
0) {
+error_report(error while writing);
+return -1;
+}
+}
+
+n -= cur_n;
+sector_num += cur_n;
+allocated_num -= cur_n;
+cur_buf += cur_n * BDRV_SECTOR_SIZE;
+}
+
+break;
+}
+} while(( cur_bs = cur_bs-backing_hd ));
+
+if (!sector_found) {
+sector_num++;
+n--;
+}
+}
+
+return 0;
+}
+
 #define IO_BUF_SIZE (2 * 1024 * 1024)
 
 static int img_convert(int argc, char **argv)
 {
-int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors;
+int c, ret = 0, n, cur_n, bs_n, bs_i, compress, cluster_size, 
cluster_sectors;
 int progress = 0;
 const char *fmt, *out_fmt, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
 BlockDriverState **bs = NULL, *out_bs = NULL;
 int64_t total_sectors, nb_sectors, sector_num, bs_offset;
 uint64_t bs_sectors;
+uint64_t *bs_geometry = NULL;
 uint8_t * buf = NULL;
-const uint8_t *buf1;
 BlockDriverInfo bdi;
 QEMUOptionParameter *param = NULL, *create_options = NULL;
 QEMUOptionParameter *out_baseimg_param;
@@ -874,14 +951,20 @@ static int img_convert(int argc, char **argv)
 /* signal EOF to align */
 bdrv_write_compressed(out_bs, 0, NULL, 0);
 } else {
+int bs_i_prev = -1;
+float progress = 100;
+BlockDriverState *cur_bs;
 int 

Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Jan Kiszka
On 2011-06-15 11:10, Avi Kivity wrote:
 On 06/14/2011 09:10 AM, Jan Kiszka wrote:
 On 2011-06-13 10:45, Avi Kivity wrote:
   On 06/11/2011 12:23 PM, Jan Kiszka wrote:
   From: Jan Kiszkajan.kis...@siemens.com
 
   These FPU states are properly maintained by KVM but not yet by
 TCG. So
   far we unconditionally set them to 0 in the guest which may cause
   state corruptions - not only during migration.
 
 
   -#define CPU_SAVE_VERSION 12
   +#define CPU_SAVE_VERSION 13
 
 
   Incrementing the version number seems excessive - I can't imagine a
   real-life guest will break due to fp pointer corruption
 
   However, I don't think we have a mechanism for optional state.  We
   discussed this during the 18th VMState Subsection Symposium and IIRC
   agreed to re-raise the issue when we encountered it, which appears
 to be
   now.
 

 Whatever we invent, it has to be backported as well to allow that
 infamous traveling back in time, migrating VMs from newer to older
 versions.

 Would that backporting be simpler if we used an unconditional subsection
 for the additional states?
 
 Thinking about it, a conditional subsection would work fine.  Most
 threads will never see an fpu error, and are all initialized to a clean
 slate.
 
 SDM 1 8.1.9.1 says:
 
 8.1.9.1 Fopcode Compatibility Sub-mode
 Beginning with the Pentium 4 and Intel Xeon processors, the IA-32
 architecture
 provides program control over the storing of the last instruction
 opcode (sometimes
 referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR
 enables (set)
 or disables (clear) the fopcode compatibility mode.
 If FOP code compatibility mode is enabled, the FOP is defined as it
 has always been
 in previous IA32 implementations (always defined as the FOP of the
 last non-trans-
 parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code
 compatibility mode is disabled (default), FOP is only valid if the
 last non-transparent
 FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked
 exception.
 
 So fopcode will usually be clear.
 

OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
if it's off, how to test for that other condition last non-transparent
FP instruction ... had an unmasked exception from the host?

Jan



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH] Allow nested qemu_bh_poll() after BH deletion

2011-06-15 Thread Kevin Wolf
Without this, qemu segfaults when a BH handler first deletes its BH and
then calls another function which involves a nested qemu_bh_poll() call.

This can be reproduced by generating an I/O error (e.g. with blkdebug) on
an IDE device and using rerror/werror=stop to stop the VM. When continuing
the VM, qemu segfaults.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 async.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/async.c b/async.c
index 57ac3a8..fd313df 100644
--- a/async.c
+++ b/async.c
@@ -137,11 +137,12 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
 
 int qemu_bh_poll(void)
 {
-QEMUBH *bh, **bhp;
+QEMUBH *bh, **bhp, *next;
 int ret;
 
 ret = 0;
-for (bh = async_context-first_bh; bh; bh = bh-next) {
+for (bh = async_context-first_bh; bh; bh = next) {
+next = bh-next;
 if (!bh-deleted  bh-scheduled) {
 bh-scheduled = 0;
 if (!bh-idle)
-- 
1.7.5.2




Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Jan Kiszka
On 2011-06-15 12:20, Jan Kiszka wrote:
 On 2011-06-15 11:10, Avi Kivity wrote:
 On 06/14/2011 09:10 AM, Jan Kiszka wrote:
 On 2011-06-13 10:45, Avi Kivity wrote:
  On 06/11/2011 12:23 PM, Jan Kiszka wrote:
  From: Jan Kiszkajan.kis...@siemens.com

  These FPU states are properly maintained by KVM but not yet by
 TCG. So
  far we unconditionally set them to 0 in the guest which may cause
  state corruptions - not only during migration.


  -#define CPU_SAVE_VERSION 12
  +#define CPU_SAVE_VERSION 13


  Incrementing the version number seems excessive - I can't imagine a
  real-life guest will break due to fp pointer corruption

  However, I don't think we have a mechanism for optional state.  We
  discussed this during the 18th VMState Subsection Symposium and IIRC
  agreed to re-raise the issue when we encountered it, which appears
 to be
  now.


 Whatever we invent, it has to be backported as well to allow that
 infamous traveling back in time, migrating VMs from newer to older
 versions.

 Would that backporting be simpler if we used an unconditional subsection
 for the additional states?

 Thinking about it, a conditional subsection would work fine.  Most
 threads will never see an fpu error, and are all initialized to a clean
 slate.

 SDM 1 8.1.9.1 says:

 8.1.9.1 Fopcode Compatibility Sub-mode
 Beginning with the Pentium 4 and Intel Xeon processors, the IA-32
 architecture
 provides program control over the storing of the last instruction
 opcode (sometimes
 referred to as the fopcode). Here, bit 2 of the IA32_MISC_ENABLE MSR
 enables (set)
 or disables (clear) the fopcode compatibility mode.
 If FOP code compatibility mode is enabled, the FOP is defined as it
 has always been
 in previous IA32 implementations (always defined as the FOP of the
 last non-trans-
 parent FP instruction executed before a FSAVE/FSTENV/FXSAVE). If FOP code
 compatibility mode is disabled (default), FOP is only valid if the
 last non-transparent
 FP instruction executed before a FSAVE/FSTENV/FXSAVE had an unmasked
 exception.

 So fopcode will usually be clear.

 
 OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
 if it's off, how to test for that other condition last non-transparent
 FP instruction ... had an unmasked exception from the host?

I briefly thought about status.ES == 1. But the guest may clear the flag
in its exception handler before reading opcode etc.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH 00/13] QED image streaming

2011-06-15 Thread Philipp Hahn
Hello,

hopefully just a quick question...

On Tuesday 14 June 2011 20:18:18 Stefan Hajnoczi wrote:
 This patch series adds image streaming support for QED image files.  Other
 image formats can also be supported in the future.

 Image streaming populates the file in the background while the guest is
 running.  This makes it possible to start the guest before its image file
 has been fully provisioned.
...
 When image streaming is enabled, the unallocated regions of the image file
 are populated with the data from the backing file.  This occurs in the
 background and the guest can perform regular I/O in the meantime.  Once the
 entire backing file has been streamed, the image no longer requires a
 backing file and will drop its reference.

Does this also work for intermediate images, that is
- master image on a NFS share,
- local copy on a local drive using CoR-streaming from master,
- local instance based on local copy using CoW.

We normally have mostly independent servers without a shared storage, expect a 
global NFS share for the master images. Currently we have to copy the master 
image to the local server first, before we than can create lots of instances 
from then, which have few changed relative to the master, so we don't want 
the copying to happen there.

If haven't looked at QED yet, so thanks in advance for your answer.

Sincerely
Philipp Hahn
-- 
Philipp Hahn   Open Source Software Engineer  h...@univention.de
Univention GmbHLinux for Your Businessfon: +49 421 22 232- 0
Mary-Somerville-Str.1  D-28359 Bremen fax: +49 421 22 232-99
   http://www.univention.de/


signature.asc
Description: This is a digitally signed message part.


[Qemu-devel] [RFC] Specifying storage devices for live migration

2011-06-15 Thread Avishay Traeger

Hello all,
Currently there is no way to choose storage devices to be migrated.  There
was some discussion about making the monitor API a bit more flexible, but
the code was never written:
http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html

So today users can choose only:
1) No storage migrated
2) All disks migrated with fully copy
3) All disks migrated with incremental

I'd like to write the code to allow users to choose which disks to migrate,
and how.  What API would you recommend?  What would break the least amount
of utilities?

Thanks,
Avishay




Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Avi Kivity

On 06/15/2011 01:20 PM, Jan Kiszka wrote:


  So fopcode will usually be clear.


OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
if it's off, how to test for that other condition last non-transparent
FP instruction ... had an unmasked exception from the host?



We save fopcode unconditionally.  But if IA32_MISC_ENABLE_MSR[2]=0, then 
fopcode will be zero, and we can skip the subsection (if the data and 
instruction pointers are also zero, which they will be).


If it isn't zero, there's still a good chance fopcode will be zero 
(64-bit userspace, thread that hasn't used the fpu since the last 
context switch, last opcode happened to be zero).


--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Jan Kiszka
On 2011-06-15 13:26, Avi Kivity wrote:
 On 06/15/2011 01:20 PM, Jan Kiszka wrote:
 
   So fopcode will usually be clear.
 

 OK. So if bit 2 of IA32_MISC_ENABLE MSR, we must save that fields. But
 if it's off, how to test for that other condition last non-transparent
 FP instruction ... had an unmasked exception from the host?

 
 We save fopcode unconditionally.  But if IA32_MISC_ENABLE_MSR[2]=0, then
 fopcode will be zero, and we can skip the subsection (if the data and
 instruction pointers are also zero, which they will be).
 
 If it isn't zero, there's still a good chance fopcode will be zero
 (64-bit userspace, thread that hasn't used the fpu since the last
 context switch, last opcode happened to be zero).

I do not yet find if fopcode is invalid, it is zero, just as IP and DP
in the spec. What clears them reliably?

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Avi Kivity

On 06/15/2011 02:32 PM, Jan Kiszka wrote:


  If it isn't zero, there's still a good chance fopcode will be zero
  (64-bit userspace, thread that hasn't used the fpu since the last
  context switch, last opcode happened to be zero).

I do not yet find if fopcode is invalid, it is zero, just as IP and DP
in the spec. What clears them reliably?


FNINIT

--
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [PATCH][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Jan Kiszka
On 2011-06-15 13:33, Avi Kivity wrote:
 On 06/15/2011 02:32 PM, Jan Kiszka wrote:

  If it isn't zero, there's still a good chance fopcode will be zero
  (64-bit userspace, thread that hasn't used the fpu since the last
  context switch, last opcode happened to be zero).

 I do not yet find if fopcode is invalid, it is zero, just as IP and DP
 in the spec. What clears them reliably?
 
 FNINIT

OK, I see. So we simply check for all fields being zero and skip the
section in that case. The MSR doesn't actually to us here.

Will write v2.

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux



Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 10:50 AM, Dmitry Konishchev
konishc...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 12:39 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 Why is bdrv_get_geometry() slow?

 Mmm.. Frankly, I haven't looked so deep, but it is going to be slow at
 least for raw images due to using lseek().

We need to fully understand performance before applying optimizations
on top.  Otherwise it is possible to paper over a problem while
leaving the root cause unsolved.  Avoiding lseek(2) is very important,
not just for qemu-img but also for running VMs.  lseek(2) should only
be invoked if the image is growable/removable.

When I run a VM from a virtio-blk raw image I see no lseek(2) calls.

On the host:
strace -p $pid_of_qemu -f

Inside the guest:
dd if=/dev/vda of=/dev/null iflag=direct

I see pread(2) from the posix-aio-compat.c worker threads but no
lseek(2).  The total_sectors cached value is being used.

Does strace(1) show lseek(2) on your host?

Stefan



Re: [Qemu-devel] [PATCH] Allow nested qemu_bh_poll() after BH deletion

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 11:33 AM, Kevin Wolf kw...@redhat.com wrote:
 Without this, qemu segfaults when a BH handler first deletes its BH and
 then calls another function which involves a nested qemu_bh_poll() call.

 This can be reproduced by generating an I/O error (e.g. with blkdebug) on
 an IDE device and using rerror/werror=stop to stop the VM. When continuing
 the VM, qemu segfaults.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  async.c |    5 +++--
  1 files changed, 3 insertions(+), 2 deletions(-)

Almost worth switching to qemu-queue.h and FOREACH_SAFE().

Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com



Re: [Qemu-devel] [PATCH 00/13] QED image streaming

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 11:46 AM, Philipp Hahn h...@univention.de wrote:
 On Tuesday 14 June 2011 20:18:18 Stefan Hajnoczi wrote:
 This patch series adds image streaming support for QED image files.  Other
 image formats can also be supported in the future.

 Image streaming populates the file in the background while the guest is
 running.  This makes it possible to start the guest before its image file
 has been fully provisioned.
 ...
 When image streaming is enabled, the unallocated regions of the image file
 are populated with the data from the backing file.  This occurs in the
 background and the guest can perform regular I/O in the meantime.  Once the
 entire backing file has been streamed, the image no longer requires a
 backing file and will drop its reference.

 Does this also work for intermediate images, that is
 - master image on a NFS share,
 - local copy on a local drive using CoR-streaming from master,
 - local instance based on local copy using CoW.

 We normally have mostly independent servers without a shared storage, expect a
 global NFS share for the master images. Currently we have to copy the master
 image to the local server first, before we than can create lots of instances
 from then, which have few changed relative to the master, so we don't want
 the copying to happen there.

No, unfortunately adding another level of backing files prevents use
of streaming today.  The reason is because backing files are opened
read-only.  This ensures that multiple VMs running concurrently will
not change the backing file.  Image formats are typically not safe for
concurrent updates from multiple applications just like regular file
systems cannot be accessed from multiple hosts simultaneously.  You'd
need a cluster file system for that, and similarly we need multiple
writers support for image formats or some other solution.

We'd need a solution that makes it possible to update the local copy
of the master image while multiple VMs are running.  For example
qemu-nbd serving the local copy of the master image and each VM image
using the backing file over NBD.  That way all accesses to the local
copy of the master image go through qemu-nbd and streaming can be
performed safely.  I haven't tested this approach, it might require
some improvements to qemu-nbd and I expect the performance would need
work too.

Thanks for raising this use case, I think it's an interesting one
because you are saving disk space by keeping a local master copy.

Stefan



[Qemu-devel] [PATCH v2 3/3] vdi: Avoid direct AIO callback

2011-06-15 Thread Kevin Wolf
bdrv_aio_* must not call the callback before returning to its caller. In vdi,
this could happen in some error cases. This starts the real requests processing
in a BH to avoid this situation.

Signed-off-by: Kevin Wolf kw...@redhat.com
---

v2:
- Remove direct vdi_aio_read_cb() call

 block/vdi.c |   41 -
 1 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 4c9e201..261cf9b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -152,6 +152,7 @@ typedef struct {
 /* Buffer for new allocated block. */
 void *block_buffer;
 void *orig_buf;
+bool is_write;
 int header_modified;
 BlockDriverAIOCB *hd_aiocb;
 struct iovec hd_iov;
@@ -504,6 +505,8 @@ static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, 
int64_t sector_num,
 acb-hd_aiocb = NULL;
 acb-sector_num = sector_num;
 acb-qiov = qiov;
+acb-is_write = is_write;
+
 if (qiov-niov  1) {
 acb-buf = qemu_blockalign(bs, qiov-size);
 acb-orig_buf = acb-buf;
@@ -542,14 +545,20 @@ static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb)
 }
 
 static void vdi_aio_read_cb(void *opaque, int ret);
+static void vdi_aio_write_cb(void *opaque, int ret);
 
-static void vdi_aio_read_bh(void *opaque)
+static void vdi_aio_rw_bh(void *opaque)
 {
 VdiAIOCB *acb = opaque;
 logout(\n);
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
-vdi_aio_read_cb(opaque, 0);
+
+if (acb-is_write) {
+vdi_aio_write_cb(opaque, 0);
+} else {
+vdi_aio_read_cb(opaque, 0);
+}
 }
 
 static void vdi_aio_read_cb(void *opaque, int ret)
@@ -597,7 +606,7 @@ static void vdi_aio_read_cb(void *opaque, int ret)
 if (bmap_entry == VDI_UNALLOCATED) {
 /* Block not allocated, return zeros, no need to wait. */
 memset(acb-buf, 0, n_sectors * SECTOR_SIZE);
-ret = vdi_schedule_bh(vdi_aio_read_bh, acb);
+ret = vdi_schedule_bh(vdi_aio_rw_bh, acb);
 if (ret  0) {
 goto done;
 }
@@ -630,12 +639,23 @@ static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState 
*bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
 VdiAIOCB *acb;
+int ret;
+
 logout(\n);
 acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
 if (!acb) {
 return NULL;
 }
-vdi_aio_read_cb(acb, 0);
+
+ret = vdi_schedule_bh(vdi_aio_rw_bh, acb);
+if (ret  0) {
+if (acb-qiov-niov  1) {
+qemu_vfree(acb-orig_buf);
+}
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
@@ -789,12 +809,23 @@ static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState 
*bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
 VdiAIOCB *acb;
+int ret;
+
 logout(\n);
 acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
 if (!acb) {
 return NULL;
 }
-vdi_aio_write_cb(acb, 0);
+
+ret = vdi_schedule_bh(vdi_aio_rw_bh, acb);
+if (ret  0) {
+if (acb-qiov-niov  1) {
+qemu_vfree(acb-orig_buf);
+}
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
-- 
1.7.5.2




Re: [Qemu-devel] [RFC] Specifying storage devices for live migration

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 12:06 PM, Avishay Traeger avis...@il.ibm.com wrote:

 Hello all,
 Currently there is no way to choose storage devices to be migrated.  There
 was some discussion about making the monitor API a bit more flexible, but
 the code was never written:
 http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html

Hmm...2009.  Have you seen this recent thread?

http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg00733.html

Are you planning to extend the block migration API and fix known
issues with block migration?

Stefan



Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Dmitry Konishchev
On Wed, Jun 15, 2011 at 4:02 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 We need to fully understand performance before applying optimizations
 on top.  Otherwise it is possible to paper over a problem while
 leaving the root cause unsolved.  Avoiding lseek(2) is very important,
 not just for qemu-img but also for running VMs.  lseek(2) should only
 be invoked if the image is growable/removable.

 When I run a VM from a virtio-blk raw image I see no lseek(2) calls.

 On the host:
 strace -p $pid_of_qemu -f

 Inside the guest:
 dd if=/dev/vda of=/dev/null iflag=direct

 I see pread(2) from the posix-aio-compat.c worker threads but no
 lseek(2).  The total_sectors cached value is being used.

 Does strace(1) show lseek(2) on your host?

No, I don't see lseek() in the strace output. But in the other hand I
see that bdrv_get_geometry() uses bdrv_getlength() which is
implemented using lseek() in block/raw-posix.c...
May be I'am mistaken about lseek(), but I get 9% slower version if
disable caching.

-- 
Дмитрий Конищев (Dmitry Konishchev)
mailto:konishc...@gmail.com



[Qemu-devel] [PATCH v2][uq/master] kvm: x86: Save/restore FPU OP, IP and DP

2011-06-15 Thread Jan Kiszka
These FPU states are properly maintained by KVM but not yet by TCG. So
far we unconditionally set them to 0 in the guest which may cause
state corruptions, though not with modern guests.

To avoid breaking backward migration, use a conditional subsection that
is only written if any of the three fields is non-zero. The guest's
FNINIT clears them frequently, and cleared IA32_MISC_ENABLE MSR[2]
reduces the probability of non-zero values further so that this
subsection is not expected to restrict migration in any common scenario.

Signed-off-by: Jan Kiszka jan.kis...@siemens.com
---

Changes in v2:
 - use conditional subsection

 target-i386/cpu.h |4 
 target-i386/kvm.c |   20 +++-
 target-i386/machine.c |   23 +++
 3 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 9c3340d..cdf68ff 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -641,6 +641,10 @@ typedef struct CPUX86State {
 uint16_t fpuc;
 uint8_t fptags[8];   /* 0 = valid, 1 = empty */
 FPReg fpregs[8];
+/* KVM-only so far */
+uint16_t fpop;
+uint64_t fpip;
+uint64_t fpdp;
 
 /* emulator internal variables */
 float_status fp_status;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index 5ebb054..938e0a3 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -718,6 +718,9 @@ static int kvm_put_fpu(CPUState *env)
 fpu.fsw = env-fpus  ~(7  11);
 fpu.fsw |= (env-fpstt  7)  11;
 fpu.fcw = env-fpuc;
+fpu.last_opcode = env-fpop;
+fpu.last_ip = env-fpip;
+fpu.last_dp = env-fpdp;
 for (i = 0; i  8; ++i) {
 fpu.ftwx |= (!env-fptags[i])  i;
 }
@@ -740,7 +743,7 @@ static int kvm_put_xsave(CPUState *env)
 {
 int i, r;
 struct kvm_xsave* xsave;
-uint16_t cwd, swd, twd, fop;
+uint16_t cwd, swd, twd;
 
 if (!kvm_has_xsave()) {
 return kvm_put_fpu(env);
@@ -748,7 +751,7 @@ static int kvm_put_xsave(CPUState *env)
 
 xsave = qemu_memalign(4096, sizeof(struct kvm_xsave));
 memset(xsave, 0, sizeof(struct kvm_xsave));
-cwd = swd = twd = fop = 0;
+cwd = swd = twd = 0;
 swd = env-fpus  ~(7  11);
 swd |= (env-fpstt  7)  11;
 cwd = env-fpuc;
@@ -756,7 +759,9 @@ static int kvm_put_xsave(CPUState *env)
 twd |= (!env-fptags[i])  i;
 }
 xsave-region[0] = (uint32_t)(swd  16) + cwd;
-xsave-region[1] = (uint32_t)(fop  16) + twd;
+xsave-region[1] = (uint32_t)(env-fpop  16) + twd;
+memcpy(xsave-region[XSAVE_CWD_RIP], env-fpip, sizeof(env-fpip));
+memcpy(xsave-region[XSAVE_CWD_RDP], env-fpdp, sizeof(env-fpdp));
 memcpy(xsave-region[XSAVE_ST_SPACE], env-fpregs,
 sizeof env-fpregs);
 memcpy(xsave-region[XSAVE_XMM_SPACE], env-xmm_regs,
@@ -921,6 +926,9 @@ static int kvm_get_fpu(CPUState *env)
 env-fpstt = (fpu.fsw  11)  7;
 env-fpus = fpu.fsw;
 env-fpuc = fpu.fcw;
+env-fpop = fpu.last_opcode;
+env-fpip = fpu.last_ip;
+env-fpdp = fpu.last_dp;
 for (i = 0; i  8; ++i) {
 env-fptags[i] = !((fpu.ftwx  i)  1);
 }
@@ -935,7 +943,7 @@ static int kvm_get_xsave(CPUState *env)
 {
 struct kvm_xsave* xsave;
 int ret, i;
-uint16_t cwd, swd, twd, fop;
+uint16_t cwd, swd, twd;
 
 if (!kvm_has_xsave()) {
 return kvm_get_fpu(env);
@@ -951,13 +959,15 @@ static int kvm_get_xsave(CPUState *env)
 cwd = (uint16_t)xsave-region[0];
 swd = (uint16_t)(xsave-region[0]  16);
 twd = (uint16_t)xsave-region[1];
-fop = (uint16_t)(xsave-region[1]  16);
+env-fpop = (uint16_t)(xsave-region[1]  16);
 env-fpstt = (swd  11)  7;
 env-fpus = swd;
 env-fpuc = cwd;
 for (i = 0; i  8; ++i) {
 env-fptags[i] = !((twd  i)  1);
 }
+memcpy(env-fpip, xsave-region[XSAVE_CWD_RIP], sizeof(env-fpip));
+memcpy(env-fpdp, xsave-region[XSAVE_CWD_RDP], sizeof(env-fpdp));
 env-mxcsr = xsave-region[XSAVE_MXCSR];
 memcpy(env-fpregs, xsave-region[XSAVE_ST_SPACE],
 sizeof env-fpregs);
diff --git a/target-i386/machine.c b/target-i386/machine.c
index bbeae88..d22a731 100644
--- a/target-i386/machine.c
+++ b/target-i386/machine.c
@@ -290,6 +290,26 @@ static const VMStateDescription vmstate_async_pf_msr = {
 }
 };
 
+static bool fpop_ip_dp_needed(void *opaque)
+{
+CPUState *env = opaque;
+
+return env-fpop != 0 || env-fpip != 0 || env-fpdp != 0;
+}
+
+static const VMStateDescription vmstate_fpop_ip_dp = {
+.name = cpu/fpop_ip_dp,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields  = (VMStateField []) {
+VMSTATE_UINT16_V(fpop, CPUState, 13),
+VMSTATE_UINT64_V(fpip, CPUState, 13),
+VMSTATE_UINT64_V(fpdp, CPUState, 13),
+VMSTATE_END_OF_LIST()
+}
+};
+
 static const VMStateDescription vmstate_cpu = {
 .name = cpu,
 .version_id = CPU_SAVE_VERSION,
@@ -398,6 +418,9 @@ static const VMStateDescription vmstate_cpu 

Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Stefan Hajnoczi
On Wed, Jun 15, 2011 at 2:14 PM, Dmitry Konishchev konishc...@gmail.com wrote:
 On Wed, Jun 15, 2011 at 4:02 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 We need to fully understand performance before applying optimizations
 on top.  Otherwise it is possible to paper over a problem while
 leaving the root cause unsolved.  Avoiding lseek(2) is very important,
 not just for qemu-img but also for running VMs.  lseek(2) should only
 be invoked if the image is growable/removable.

 When I run a VM from a virtio-blk raw image I see no lseek(2) calls.

 On the host:
 strace -p $pid_of_qemu -f

 Inside the guest:
 dd if=/dev/vda of=/dev/null iflag=direct

 I see pread(2) from the posix-aio-compat.c worker threads but no
 lseek(2).  The total_sectors cached value is being used.

 Does strace(1) show lseek(2) on your host?

 No, I don't see lseek() in the strace output. But in the other hand I
 see that bdrv_get_geometry() uses bdrv_getlength() which is
 implemented using lseek() in block/raw-posix.c...
 May be I'am mistaken about lseek(), but I get 9% slower version if
 disable caching.

disable caching?

Stefan



Re: [Qemu-devel] [PATCH] error framework: Fix compilation for w32/w64

2011-06-15 Thread Luiz Capitulino
On Mon, 13 Jun 2011 23:01:53 +0200
Stefan Weil w...@mail.berlios.de wrote:

 The declaration of function error_set() should use macro GCC_FMT_ATTR
 instead of gcc's format printf attribute.
 
 For w32/w64, both declarations are different and GCC_FMT_ATTR is needed.
 Compilation for w64 even failed with the original code because mingw64
 defines a macro for printf.
 
 GCC_FMT_ATTR requires qemu-common.h, so add it in error.c
 (it's also included by error_int.h but too late).
 
 Remove assert.h which is included by qemu-common.h.
 
 Cc: Luiz Capitulino lcapitul...@redhat.com
 Cc: Anthony Liguori aligu...@us.ibm.com
 Signed-off-by: Stefan Weil w...@mail.berlios.de

Applied to the monitor queue, thanks.

 ---
  error.c |3 ++-
  error.h |3 +--
  2 files changed, 3 insertions(+), 3 deletions(-)
 
 diff --git a/error.c b/error.c
 index 867eec2..74d7398 100644
 --- a/error.c
 +++ b/error.c
 @@ -9,11 +9,12 @@
   * This work is licensed under the terms of the GNU LGPL, version 2.  See
   * the COPYING.LIB file in the top-level directory.
   */
 +
 +#include qemu-common.h
  #include error.h
  #include error_int.h
  #include qemu-objects.h
  #include qerror.h
 -#include assert.h
  
  struct Error
  {
 diff --git a/error.h b/error.h
 index 003c855..0f92a6f 100644
 --- a/error.h
 +++ b/error.h
 @@ -25,8 +25,7 @@ typedef struct Error Error;
   * Currently, qerror.h defines these error formats.  This function is not
   * meant to be used outside of QEMU.
   */
 -void error_set(Error **err, const char *fmt, ...)
 -__attribute__((format(printf, 2, 3)));
 +void error_set(Error **err, const char *fmt, ...) GCC_FMT_ATTR(2, 3);
  
  /**
   * Returns true if an indirect pointer to an error is pointing to a valid




[Qemu-devel] [PATCH v2] ide: Clear error_status after restarting flush

2011-06-15 Thread Kevin Wolf
Clearing the error status flag was missing for restarting flushes. Now that the
error status is separate from the BM status register, we can simply set it to 0
after restarting the request. This ensures that we never forget to clear a bit.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/pci.c |   18 +++---
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index c940375..9f3050a 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -189,6 +189,7 @@ static void bmdma_restart_bh(void *opaque)
 BMDMAState *bm = opaque;
 IDEBus *bus = bm-bus;
 int is_read;
+int error_status;
 
 qemu_bh_delete(bm-bh);
 bm-bh = NULL;
@@ -199,22 +200,25 @@ static void bmdma_restart_bh(void *opaque)
 
 is_read = !!(bus-error_status  BM_STATUS_RETRY_READ);
 
-if (bus-error_status  BM_STATUS_DMA_RETRY) {
-if (bus-error_status  BM_STATUS_RETRY_TRIM) {
-bus-error_status = ~BM_STATUS_RETRY_TRIM;
+/* The error status must be cleared before resubmitting the request: The
+ * request may fail again, and this case can only be distinguished if the
+ * called function can set a new error status. */
+error_status = bus-error_status;
+bus-error_status = 0;
+
+if (error_status  BM_STATUS_DMA_RETRY) {
+if (error_status  BM_STATUS_RETRY_TRIM) {
 bmdma_restart_dma(bm, IDE_DMA_TRIM);
 } else {
-bus-error_status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ);
 bmdma_restart_dma(bm, is_read ? IDE_DMA_READ : IDE_DMA_WRITE);
 }
-} else if (bus-error_status  BM_STATUS_PIO_RETRY) {
-bus-error_status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ);
+} else if (error_status  BM_STATUS_PIO_RETRY) {
 if (is_read) {
 ide_sector_read(bmdma_active_if(bm));
 } else {
 ide_sector_write(bmdma_active_if(bm));
 }
-} else if (bus-error_status  BM_STATUS_RETRY_FLUSH) {
+} else if (error_status  BM_STATUS_RETRY_FLUSH) {
 ide_flush_cache(bmdma_active_if(bm));
 }
 }
-- 
1.7.5.2




[Qemu-devel] [PATCH] qemu-img: Add cache command line option

2011-06-15 Thread Federico Simoncelli
qemu-img currently writes disk images using writeback and filling
up the cache buffers which are then flushed by the kernel preventing
other processes from accessing the storage.
This is particularly bad in cluster environments where time-based
algorithms might be in place and accessing the storage within
certain timeouts is critical.
This patch adds the option to choose a cache method when writing
disk images.

Signed-off-by: Federico Simoncelli fsimo...@redhat.com
---
 qemu-img.c |   81 ++-
 1 files changed, 68 insertions(+), 13 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4f162d1..7609db5 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -40,6 +40,7 @@ typedef struct img_cmd_t {
 
 /* Default to cache=writeback as data integrity is not important for qemu-tcg. 
*/
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
+#define BDRV_DEFAULT_CACHE writeback
 
 static void format_print(void *opaque, const char *name)
 {
@@ -64,6 +65,8 @@ static void help(void)
Command parameters:\n
  'filename' is a disk image filename\n
  'fmt' is the disk image format. It is guessed automatically in 
most cases\n
+ 'cache' is the cache mode used to write the output disk image, 
the valid\n
+   options are: 'none', 'writeback' (default), 'writethrough' and 
'unsafe'\n
  'size' is the disk image size in bytes. Optional suffixes\n
'k' or 'K' (kilobyte, 1024), 'M' (megabyte, 1024k), 'G' 
(gigabyte, 1024M)\n
and T (terabyte, 1024G) are supported. 'b' is ignored.\n
@@ -180,6 +183,27 @@ static int read_password(char *buf, int buf_size)
 }
 #endif
 
+static int set_cache_flag(const char *mode, int *flags)
+{
+*flags = ~BDRV_O_CACHE_MASK;
+
+if (!strcmp(mode, none) || !strcmp(mode, off)) {
+*flags |= BDRV_O_CACHE_WB;
+*flags |= BDRV_O_NOCACHE;
+} else if (!strcmp(mode, writeback)) {
+*flags |= BDRV_O_CACHE_WB;
+} else if (!strcmp(mode, unsafe)) {
+*flags |= BDRV_O_CACHE_WB;
+*flags |= BDRV_O_NO_FLUSH;
+} else if (!strcmp(mode, writethrough)) {
+/* this is the default */
+} else {
+return -1;
+}
+
+return 0;
+}
+
 static int print_block_option_help(const char *filename, const char *fmt)
 {
 BlockDriver *drv, *proto_drv;
@@ -441,13 +465,14 @@ static int img_check(int argc, char **argv)
 
 static int img_commit(int argc, char **argv)
 {
-int c, ret;
-const char *filename, *fmt;
+int c, ret, flags;
+const char *filename, *fmt, *cache;
 BlockDriverState *bs;
 
 fmt = NULL;
+cache = BDRV_DEFAULT_CACHE;
 for(;;) {
-c = getopt(argc, argv, f:h);
+c = getopt(argc, argv, f:ht:);
 if (c == -1) {
 break;
 }
@@ -459,6 +484,9 @@ static int img_commit(int argc, char **argv)
 case 'f':
 fmt = optarg;
 break;
+case 't':
+cache = optarg;
+break;
 }
 }
 if (optind = argc) {
@@ -466,7 +494,14 @@ static int img_commit(int argc, char **argv)
 }
 filename = argv[optind++];
 
-bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR);
+flags = BDRV_O_RDWR;
+ret = set_cache_flag(cache, flags);
+if (ret  0) {
+error_report(Invalid cache option: %s\n, cache);
+return -1;
+}
+
+bs = bdrv_new_open(filename, fmt, flags);
 if (!bs) {
 return 1;
 }
@@ -591,8 +626,8 @@ static int compare_sectors(const uint8_t *buf1, const 
uint8_t *buf2, int n,
 static int img_convert(int argc, char **argv)
 {
 int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors;
-int progress = 0;
-const char *fmt, *out_fmt, *out_baseimg, *out_filename;
+int progress = 0, flags;
+const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
 BlockDriverState **bs = NULL, *out_bs = NULL;
 int64_t total_sectors, nb_sectors, sector_num, bs_offset;
@@ -608,10 +643,11 @@ static int img_convert(int argc, char **argv)
 
 fmt = NULL;
 out_fmt = raw;
+cache = BDRV_DEFAULT_CACHE;
 out_baseimg = NULL;
 compress = 0;
 for(;;) {
-c = getopt(argc, argv, f:O:B:s:hce6o:p);
+c = getopt(argc, argv, f:O:B:s:hce6o:pt:);
 if (c == -1) {
 break;
 }
@@ -649,6 +685,9 @@ static int img_convert(int argc, char **argv)
 case 'p':
 progress = 1;
 break;
+case 't':
+cache = optarg;
+break;
 }
 }
 
@@ -779,8 +818,14 @@ static int img_convert(int argc, char **argv)
 goto out;
 }
 
-out_bs = bdrv_new_open(out_filename, out_fmt,
-BDRV_O_FLAGS | BDRV_O_RDWR | BDRV_O_NO_FLUSH);
+flags = BDRV_O_RDWR;
+ret = set_cache_flag(cache, flags);
+if (ret  0) {
+error_report(Invalid cache option: %s\n, cache);
+

Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Dmitry Konishchev
On Wed, Jun 15, 2011 at 5:33 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 disable caching?

Image geometry caching. I meant If I call bdrv_get_geometry() every
time I need image geometry instead of obtaining it from bs_geometry
variable.

-- 
Дмитрий Конищев (Dmitry Konishchev)
mailto:konishc...@gmail.com



Re: [Qemu-devel] [PATCH] CPU consumption optimization of 'qemu-img convert' using bdrv_is_allocated()

2011-06-15 Thread Stefan Hajnoczi
2011/6/15 Dmitry Konishchev konishc...@gmail.com:
 On Wed, Jun 15, 2011 at 5:33 PM, Stefan Hajnoczi stefa...@gmail.com wrote:
 disable caching?

 Image geometry caching. I meant If I call bdrv_get_geometry() every
 time I need image geometry instead of obtaining it from bs_geometry
 variable.

Haha, sorry.  Too much caching: -drive cache=none, total_sectors
cached value, bdrv_get_geometry() cached values :).

Anyway, bdrv_getlength() will return the total_sectors value instead
of calling into raw-posix.c .bdrv_getlength().  That's why it should
be cheap.

Stefan



[Qemu-devel] [PATCHv2] qemu-img: Add cache command line option

2011-06-15 Thread Federico Simoncelli
qemu-img currently writes disk images using writeback and filling
up the cache buffers which are then flushed by the kernel preventing
other processes from accessing the storage.
This is particularly bad in cluster environments where time-based
algorithms might be in place and accessing the storage within
certain timeouts is critical.
This patch adds the option to choose a cache method when writing
disk images.

Signed-off-by: Federico Simoncelli fsimo...@redhat.com
---
 qemu-img.c |   80 ++-
 1 files changed, 67 insertions(+), 13 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index 4f162d1..c2a106e 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -40,6 +40,7 @@ typedef struct img_cmd_t {
 
 /* Default to cache=writeback as data integrity is not important for qemu-tcg. 
*/
 #define BDRV_O_FLAGS BDRV_O_CACHE_WB
+#define BDRV_DEFAULT_CACHE writeback
 
 static void format_print(void *opaque, const char *name)
 {
@@ -64,6 +65,8 @@ static void help(void)
Command parameters:\n
  'filename' is a disk image filename\n
  'fmt' is the disk image format. It is guessed automatically in 
most cases\n
+ 'cache' is the cache mode used to write the output disk image, 
the valid\n
+   options are: 'none', 'writeback' (default), 'writethrough' and 
'unsafe'\n
  'size' is the disk image size in bytes. Optional suffixes\n
'k' or 'K' (kilobyte, 1024), 'M' (megabyte, 1024k), 'G' 
(gigabyte, 1024M)\n
and T (terabyte, 1024G) are supported. 'b' is ignored.\n
@@ -180,6 +183,27 @@ static int read_password(char *buf, int buf_size)
 }
 #endif
 
+static int set_cache_flag(const char *mode, int *flags)
+{
+*flags = ~BDRV_O_CACHE_MASK;
+
+if (!strcmp(mode, none) || !strcmp(mode, off)) {
+*flags |= BDRV_O_CACHE_WB;
+*flags |= BDRV_O_NOCACHE;
+} else if (!strcmp(mode, writeback)) {
+*flags |= BDRV_O_CACHE_WB;
+} else if (!strcmp(mode, unsafe)) {
+*flags |= BDRV_O_CACHE_WB;
+*flags |= BDRV_O_NO_FLUSH;
+} else if (!strcmp(mode, writethrough)) {
+/* this is the default */
+} else {
+return -1;
+}
+
+return 0;
+}
+
 static int print_block_option_help(const char *filename, const char *fmt)
 {
 BlockDriver *drv, *proto_drv;
@@ -441,13 +465,14 @@ static int img_check(int argc, char **argv)
 
 static int img_commit(int argc, char **argv)
 {
-int c, ret;
-const char *filename, *fmt;
+int c, ret, flags;
+const char *filename, *fmt, *cache;
 BlockDriverState *bs;
 
 fmt = NULL;
+cache = BDRV_DEFAULT_CACHE;
 for(;;) {
-c = getopt(argc, argv, f:h);
+c = getopt(argc, argv, f:ht:);
 if (c == -1) {
 break;
 }
@@ -459,6 +484,9 @@ static int img_commit(int argc, char **argv)
 case 'f':
 fmt = optarg;
 break;
+case 't':
+cache = optarg;
+break;
 }
 }
 if (optind = argc) {
@@ -466,7 +494,14 @@ static int img_commit(int argc, char **argv)
 }
 filename = argv[optind++];
 
-bs = bdrv_new_open(filename, fmt, BDRV_O_FLAGS | BDRV_O_RDWR);
+flags = BDRV_O_RDWR;
+ret = set_cache_flag(cache, flags);
+if (ret  0) {
+error_report(Invalid cache option: %s\n, cache);
+return -1;
+}
+
+bs = bdrv_new_open(filename, fmt, flags);
 if (!bs) {
 return 1;
 }
@@ -591,8 +626,8 @@ static int compare_sectors(const uint8_t *buf1, const 
uint8_t *buf2, int n,
 static int img_convert(int argc, char **argv)
 {
 int c, ret = 0, n, n1, bs_n, bs_i, compress, cluster_size, cluster_sectors;
-int progress = 0;
-const char *fmt, *out_fmt, *out_baseimg, *out_filename;
+int progress = 0, flags;
+const char *fmt, *out_fmt, *cache, *out_baseimg, *out_filename;
 BlockDriver *drv, *proto_drv;
 BlockDriverState **bs = NULL, *out_bs = NULL;
 int64_t total_sectors, nb_sectors, sector_num, bs_offset;
@@ -608,10 +643,11 @@ static int img_convert(int argc, char **argv)
 
 fmt = NULL;
 out_fmt = raw;
+cache = BDRV_DEFAULT_CACHE;
 out_baseimg = NULL;
 compress = 0;
 for(;;) {
-c = getopt(argc, argv, f:O:B:s:hce6o:p);
+c = getopt(argc, argv, f:O:B:s:hce6o:pt:);
 if (c == -1) {
 break;
 }
@@ -649,6 +685,9 @@ static int img_convert(int argc, char **argv)
 case 'p':
 progress = 1;
 break;
+case 't':
+cache = optarg;
+break;
 }
 }
 
@@ -779,8 +818,14 @@ static int img_convert(int argc, char **argv)
 goto out;
 }
 
-out_bs = bdrv_new_open(out_filename, out_fmt,
-BDRV_O_FLAGS | BDRV_O_RDWR | BDRV_O_NO_FLUSH);
+flags = BDRV_O_RDWR;
+ret = set_cache_flag(cache, flags);
+if (ret  0) {
+error_report(Invalid cache option: %s\n, cache);
+

[Qemu-devel] [PATCH 04/14] vdi: Avoid direct AIO callback

2011-06-15 Thread Kevin Wolf
bdrv_aio_* must not call the callback before returning to its caller. In vdi,
this could happen in some error cases. This starts the real requests processing
in a BH to avoid this situation.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/vdi.c |   41 -
 1 files changed, 36 insertions(+), 5 deletions(-)

diff --git a/block/vdi.c b/block/vdi.c
index 4c9e201..261cf9b 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -152,6 +152,7 @@ typedef struct {
 /* Buffer for new allocated block. */
 void *block_buffer;
 void *orig_buf;
+bool is_write;
 int header_modified;
 BlockDriverAIOCB *hd_aiocb;
 struct iovec hd_iov;
@@ -504,6 +505,8 @@ static VdiAIOCB *vdi_aio_setup(BlockDriverState *bs, 
int64_t sector_num,
 acb-hd_aiocb = NULL;
 acb-sector_num = sector_num;
 acb-qiov = qiov;
+acb-is_write = is_write;
+
 if (qiov-niov  1) {
 acb-buf = qemu_blockalign(bs, qiov-size);
 acb-orig_buf = acb-buf;
@@ -542,14 +545,20 @@ static int vdi_schedule_bh(QEMUBHFunc *cb, VdiAIOCB *acb)
 }
 
 static void vdi_aio_read_cb(void *opaque, int ret);
+static void vdi_aio_write_cb(void *opaque, int ret);
 
-static void vdi_aio_read_bh(void *opaque)
+static void vdi_aio_rw_bh(void *opaque)
 {
 VdiAIOCB *acb = opaque;
 logout(\n);
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
-vdi_aio_read_cb(opaque, 0);
+
+if (acb-is_write) {
+vdi_aio_write_cb(opaque, 0);
+} else {
+vdi_aio_read_cb(opaque, 0);
+}
 }
 
 static void vdi_aio_read_cb(void *opaque, int ret)
@@ -597,7 +606,7 @@ static void vdi_aio_read_cb(void *opaque, int ret)
 if (bmap_entry == VDI_UNALLOCATED) {
 /* Block not allocated, return zeros, no need to wait. */
 memset(acb-buf, 0, n_sectors * SECTOR_SIZE);
-ret = vdi_schedule_bh(vdi_aio_read_bh, acb);
+ret = vdi_schedule_bh(vdi_aio_rw_bh, acb);
 if (ret  0) {
 goto done;
 }
@@ -630,12 +639,23 @@ static BlockDriverAIOCB *vdi_aio_readv(BlockDriverState 
*bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
 VdiAIOCB *acb;
+int ret;
+
 logout(\n);
 acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
 if (!acb) {
 return NULL;
 }
-vdi_aio_read_cb(acb, 0);
+
+ret = vdi_schedule_bh(vdi_aio_rw_bh, acb);
+if (ret  0) {
+if (acb-qiov-niov  1) {
+qemu_vfree(acb-orig_buf);
+}
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
@@ -789,12 +809,23 @@ static BlockDriverAIOCB *vdi_aio_writev(BlockDriverState 
*bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
 VdiAIOCB *acb;
+int ret;
+
 logout(\n);
 acb = vdi_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 1);
 if (!acb) {
 return NULL;
 }
-vdi_aio_write_cb(acb, 0);
+
+ret = vdi_schedule_bh(vdi_aio_rw_bh, acb);
+if (ret  0) {
+if (acb-qiov-niov  1) {
+qemu_vfree(acb-orig_buf);
+}
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
-- 
1.7.5.2




[Qemu-devel] [PATCH 09/14] ide: Add forgotten VMSTATE_END_OF_LIST in subsection

2011-06-15 Thread Kevin Wolf
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/core.c |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index e5def8b..399b74c 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1864,6 +1864,7 @@ const VMStateDescription vmstate_ide_atapi_gesn_state = {
 .fields = (VMStateField []) {
 VMSTATE_BOOL(events.new_media, IDEState),
 VMSTATE_BOOL(events.eject_request, IDEState),
+VMSTATE_END_OF_LIST()
 }
 };
 
-- 
1.7.5.2




[Qemu-devel] [PATCH 06/14] qcow2: Fix in-flight list after qcow2_cache_put failure

2011-06-15 Thread Kevin Wolf
If qcow2_cache_put returns an error during cluster allocation and the
allocation fails, it must be removed from the list of in-flight allocations.
Otherwise we'd get a loop in the list when the ACB is used for the next
allocation.

Luckily, this qcow2_cache_put shouldn't fail anyway because the L2 table is
only read, so that qcow2_cache_put doesn't even involve I/O.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Christoph Hellwig h...@lst.de
---
 block/qcow2-cluster.c |   12 
 1 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/block/qcow2-cluster.c b/block/qcow2-cluster.c
index c9e7bbd..882f50a 100644
--- a/block/qcow2-cluster.c
+++ b/block/qcow2-cluster.c
@@ -796,8 +796,8 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 m-depends_on = old_alloc;
 m-nb_clusters = 0;
 *num = 0;
-ret = 0;
-goto fail;
+
+goto out_wait_dependency;
 }
 }
 }
@@ -812,7 +812,6 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 
 cluster_offset = qcow2_alloc_clusters(bs, nb_clusters * s-cluster_size);
 if (cluster_offset  0) {
-QLIST_REMOVE(m, next_in_flight);
 ret = cluster_offset;
 goto fail;
 }
@@ -825,7 +824,7 @@ int qcow2_alloc_cluster_offset(BlockDriverState *bs, 
uint64_t offset,
 out:
 ret = qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table);
 if (ret  0) {
-return ret;
+goto fail_put;
 }
 
 m-nb_available = MIN(nb_clusters  (s-cluster_bits - 9), n_end);
@@ -835,8 +834,13 @@ out:
 
 return 0;
 
+out_wait_dependency:
+return qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table);
+
 fail:
 qcow2_cache_put(bs, s-l2_table_cache, (void**) l2_table);
+fail_put:
+QLIST_REMOVE(m, next_in_flight);
 return ret;
 }
 
-- 
1.7.5.2




[Qemu-devel] [PATCH 01/14] block/rbd: Remove unused local variable

2011-06-15 Thread Kevin Wolf
From: Stefan Weil w...@mail.berlios.de

Variable 'snap' is assigned a value that is never used.
Remove snap and the related code.

Cc: Christian Brunner c...@muc.de
Cc: Josh Durgin josh.dur...@dreamhost.com
Cc: Kevin Wolf kw...@redhat.com
Signed-off-by: Stefan Weil w...@mail.berlios.de
Reviewed-by: Josh Durgin josh.dur...@dreamhost.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/rbd.c |4 
 1 files changed, 0 insertions(+), 4 deletions(-)

diff --git a/block/rbd.c b/block/rbd.c
index bdc448a..d5659cd 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -227,7 +227,6 @@ static int qemu_rbd_create(const char *filename, 
QEMUOptionParameter *options)
 char name[RBD_MAX_IMAGE_NAME_SIZE];
 char snap_buf[RBD_MAX_SNAP_NAME_SIZE];
 char conf[RBD_MAX_CONF_SIZE];
-char *snap = NULL;
 rados_t cluster;
 rados_ioctx_t io_ctx;
 int ret;
@@ -238,9 +237,6 @@ static int qemu_rbd_create(const char *filename, 
QEMUOptionParameter *options)
conf, sizeof(conf))  0) {
 return -EINVAL;
 }
-if (snap_buf[0] != '\0') {
-snap = snap_buf;
-}
 
 /* Read out options */
 while (options  options-name) {
-- 
1.7.5.2




[Qemu-devel] [PATCH 02/14] qcow2: Avoid direct AIO callback

2011-06-15 Thread Kevin Wolf
bdrv_aio_* must not call the callback before returning to its caller. In qcow2,
this could happen in some error cases. This starts the real requests processing
in a BH to avoid this situation.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/qcow2.c |   39 ++-
 1 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 8451ded..2c51e7c 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -378,6 +378,7 @@ typedef struct QCowAIOCB {
 uint64_t bytes_done;
 uint64_t cluster_offset;
 uint8_t *cluster_data;
+bool is_write;
 BlockDriverAIOCB *hd_aiocb;
 QEMUIOVector hd_qiov;
 QEMUBH *bh;
@@ -399,12 +400,19 @@ static AIOPool qcow2_aio_pool = {
 };
 
 static void qcow2_aio_read_cb(void *opaque, int ret);
-static void qcow2_aio_read_bh(void *opaque)
+static void qcow2_aio_write_cb(void *opaque, int ret);
+
+static void qcow2_aio_rw_bh(void *opaque)
 {
 QCowAIOCB *acb = opaque;
 qemu_bh_delete(acb-bh);
 acb-bh = NULL;
-qcow2_aio_read_cb(opaque, 0);
+
+if (acb-is_write) {
+qcow2_aio_write_cb(opaque, 0);
+} else {
+qcow2_aio_read_cb(opaque, 0);
+}
 }
 
 static int qcow2_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
@@ -493,14 +501,14 @@ static void qcow2_aio_read_cb(void *opaque, int ret)
 goto done;
 }
 } else {
-ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
+ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb);
 if (ret  0)
 goto done;
 }
 } else {
 /* Note: in this case, no need to wait */
 qemu_iovec_memset(acb-hd_qiov, 0, 512 * acb-cur_nr_sectors);
-ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
+ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb);
 if (ret  0)
 goto done;
 }
@@ -515,7 +523,7 @@ static void qcow2_aio_read_cb(void *opaque, int ret)
 s-cluster_cache + index_in_cluster * 512,
 512 * acb-cur_nr_sectors);
 
-ret = qcow2_schedule_bh(qcow2_aio_read_bh, acb);
+ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb);
 if (ret  0)
 goto done;
 } else {
@@ -572,6 +580,7 @@ static QCowAIOCB *qcow2_aio_setup(BlockDriverState *bs, 
int64_t sector_num,
 acb-hd_aiocb = NULL;
 acb-sector_num = sector_num;
 acb-qiov = qiov;
+acb-is_write = is_write;
 
 qemu_iovec_init(acb-hd_qiov, qiov-niov);
 
@@ -591,17 +600,22 @@ static BlockDriverAIOCB *qcow2_aio_readv(BlockDriverState 
*bs,
  void *opaque)
 {
 QCowAIOCB *acb;
+int ret;
 
 acb = qcow2_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
 if (!acb)
 return NULL;
 
-qcow2_aio_read_cb(acb, 0);
+ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb);
+if (ret  0) {
+qemu_iovec_destroy(acb-hd_qiov);
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
-static void qcow2_aio_write_cb(void *opaque, int ret);
-
 static void run_dependent_requests(QCowL2Meta *m)
 {
 QCowAIOCB *req;
@@ -724,6 +738,7 @@ static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState 
*bs,
 {
 BDRVQcowState *s = bs-opaque;
 QCowAIOCB *acb;
+int ret;
 
 s-cluster_cache_offset = -1; /* disable compressed cache */
 
@@ -731,7 +746,13 @@ static BlockDriverAIOCB *qcow2_aio_writev(BlockDriverState 
*bs,
 if (!acb)
 return NULL;
 
-qcow2_aio_write_cb(acb, 0);
+ret = qcow2_schedule_bh(qcow2_aio_rw_bh, acb);
+if (ret  0) {
+qemu_iovec_destroy(acb-hd_qiov);
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
-- 
1.7.5.2




[Qemu-devel] [PATCH 03/14] qcow: Avoid direct AIO callback

2011-06-15 Thread Kevin Wolf
bdrv_aio_* must not call the callback before returning to its caller. In qcow,
this could happen in some error cases. This starts the real requests processing
in a BH to avoid this situation.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block/qcow.c |   58 --
 1 files changed, 56 insertions(+), 2 deletions(-)

diff --git a/block/qcow.c b/block/qcow.c
index a26c886..227b104 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -496,6 +496,8 @@ typedef struct QCowAIOCB {
 uint64_t cluster_offset;
 uint8_t *cluster_data;
 struct iovec hd_iov;
+bool is_write;
+QEMUBH *bh;
 QEMUIOVector hd_qiov;
 BlockDriverAIOCB *hd_aiocb;
 } QCowAIOCB;
@@ -525,6 +527,8 @@ static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs,
 acb-hd_aiocb = NULL;
 acb-sector_num = sector_num;
 acb-qiov = qiov;
+acb-is_write = is_write;
+
 if (qiov-niov  1) {
 acb-buf = acb-orig_buf = qemu_blockalign(bs, qiov-size);
 if (is_write)
@@ -538,6 +542,38 @@ static QCowAIOCB *qcow_aio_setup(BlockDriverState *bs,
 return acb;
 }
 
+static void qcow_aio_read_cb(void *opaque, int ret);
+static void qcow_aio_write_cb(void *opaque, int ret);
+
+static void qcow_aio_rw_bh(void *opaque)
+{
+QCowAIOCB *acb = opaque;
+qemu_bh_delete(acb-bh);
+acb-bh = NULL;
+
+if (acb-is_write) {
+qcow_aio_write_cb(opaque, 0);
+} else {
+qcow_aio_read_cb(opaque, 0);
+}
+}
+
+static int qcow_schedule_bh(QEMUBHFunc *cb, QCowAIOCB *acb)
+{
+if (acb-bh) {
+return -EIO;
+}
+
+acb-bh = qemu_bh_new(cb, acb);
+if (!acb-bh) {
+return -EIO;
+}
+
+qemu_bh_schedule(acb-bh);
+
+return 0;
+}
+
 static void qcow_aio_read_cb(void *opaque, int ret)
 {
 QCowAIOCB *acb = opaque;
@@ -640,12 +676,21 @@ static BlockDriverAIOCB *qcow_aio_readv(BlockDriverState 
*bs,
 BlockDriverCompletionFunc *cb, void *opaque)
 {
 QCowAIOCB *acb;
+int ret;
 
 acb = qcow_aio_setup(bs, sector_num, qiov, nb_sectors, cb, opaque, 0);
 if (!acb)
 return NULL;
 
-qcow_aio_read_cb(acb, 0);
+ret = qcow_schedule_bh(qcow_aio_rw_bh, acb);
+if (ret  0) {
+if (acb-qiov-niov  1) {
+qemu_vfree(acb-orig_buf);
+}
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
@@ -725,6 +770,7 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState 
*bs,
 {
 BDRVQcowState *s = bs-opaque;
 QCowAIOCB *acb;
+int ret;
 
 s-cluster_cache_offset = -1; /* disable compressed cache */
 
@@ -733,7 +779,15 @@ static BlockDriverAIOCB *qcow_aio_writev(BlockDriverState 
*bs,
 return NULL;
 
 
-qcow_aio_write_cb(acb, 0);
+ret = qcow_schedule_bh(qcow_aio_rw_bh, acb);
+if (ret  0) {
+if (acb-qiov-niov  1) {
+qemu_vfree(acb-orig_buf);
+}
+qemu_aio_release(acb);
+return NULL;
+}
+
 return acb-common;
 }
 
-- 
1.7.5.2




[Qemu-devel] [PATCH 05/14] Replaced tabs with spaces in block.h and block_int.h

2011-06-15 Thread Kevin Wolf
From: Devin Nakamura devin...@gmail.com

Signed-off-by: Devin Nakamura devin...@gmail.com
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 block.h |6 +++---
 block_int.h |4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block.h b/block.h
index da7d39c..859d1d9 100644
--- a/block.h
+++ b/block.h
@@ -110,7 +110,7 @@ int bdrv_check(BlockDriverState *bs, BdrvCheckResult *res);
 typedef struct BlockDriverAIOCB BlockDriverAIOCB;
 typedef void BlockDriverCompletionFunc(void *opaque, int ret);
 typedef void BlockDriverDirtyHandler(BlockDriverState *bs, int64_t sector,
-int sector_num);
+ int sector_num);
 BlockDriverAIOCB *bdrv_aio_readv(BlockDriverState *bs, int64_t sector_num,
  QEMUIOVector *iov, int nb_sectors,
  BlockDriverCompletionFunc *cb, void *opaque);
@@ -118,7 +118,7 @@ BlockDriverAIOCB *bdrv_aio_writev(BlockDriverState *bs, 
int64_t sector_num,
   QEMUIOVector *iov, int nb_sectors,
   BlockDriverCompletionFunc *cb, void *opaque);
 BlockDriverAIOCB *bdrv_aio_flush(BlockDriverState *bs,
-BlockDriverCompletionFunc *cb, void *opaque);
+ BlockDriverCompletionFunc *cb, void *opaque);
 void bdrv_aio_cancel(BlockDriverAIOCB *acb);
 
 typedef struct BlockRequest {
@@ -150,7 +150,7 @@ void bdrv_close_all(void);
 int bdrv_discard(BlockDriverState *bs, int64_t sector_num, int nb_sectors);
 int bdrv_has_zero_init(BlockDriverState *bs);
 int bdrv_is_allocated(BlockDriverState *bs, int64_t sector_num, int nb_sectors,
-   int *pnum);
+  int *pnum);
 
 #define BIOS_ATA_TRANSLATION_AUTO   0
 #define BIOS_ATA_TRANSLATION_NONE   1
diff --git a/block_int.h b/block_int.h
index fa91337..1e265d2 100644
--- a/block_int.h
+++ b/block_int.h
@@ -203,8 +203,8 @@ struct BlockDriverState {
 void *private;
 };
 
-#define CHANGE_MEDIA   0x01
-#define CHANGE_SIZE0x02
+#define CHANGE_MEDIA0x01
+#define CHANGE_SIZE 0x02
 
 struct BlockDriverAIOCB {
 AIOPool *pool;
-- 
1.7.5.2




[Qemu-devel] [PULL 00/14] Block patches

2011-06-15 Thread Kevin Wolf
The following changes since commit 0b862cedf36d927818c50584ddd611b0370673df:

  configure: Detect and don't try to use older libcurl (2011-06-13 21:16:27 
+0200)

are available in the git repository at:
  git://repo.or.cz/qemu/kevin.git for-anthony

Christoph Hellwig (3):
  make dma_bdrv_io available to drivers
  ide: allow other dma comands than read and write
  ide: add TRIM support

Devin Nakamura (1):
  Replaced tabs with spaces in block.h and block_int.h

Kevin Wolf (9):
  qcow2: Avoid direct AIO callback
  qcow: Avoid direct AIO callback
  vdi: Avoid direct AIO callback
  qcow2: Fix in-flight list after qcow2_cache_put failure
  ide: Split error status from status register
  ide: Fix ide_drive_pio_state_needed()
  ide: Add forgotten VMSTATE_END_OF_LIST in subsection
  ide: Clear error_status after restarting flush
  Allow nested qemu_bh_poll() after BH deletion

Stefan Weil (1):
  block/rbd: Remove unused local variable

 async.c   |5 +-
 block.h   |6 +-
 block/qcow.c  |   58 ++-
 block/qcow2-cluster.c |   12 +++-
 block/qcow2.c |   39 ++---
 block/rbd.c   |4 -
 block/vdi.c   |   41 +++--
 block_int.h   |4 +-
 dma-helpers.c |   23 +++
 dma.h |8 +++
 hw/ide/core.c |  154 -
 hw/ide/internal.h |   32 ++-
 hw/ide/macio.c|   13 -
 hw/ide/pci.c  |   88 +---
 hw/ide/pci.h  |4 +
 hw/ide/qdev.c |5 ++
 16 files changed, 423 insertions(+), 73 deletions(-)



[Qemu-devel] [PATCH 08/14] ide: Fix ide_drive_pio_state_needed()

2011-06-15 Thread Kevin Wolf
When a failed PIO request caused the VM to stop, we still need to transfer the
PIO state even though DRQ=0 at this point.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/core.c |3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index da250ac..e5def8b 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -1837,7 +1837,8 @@ static bool ide_drive_pio_state_needed(void *opaque)
 {
 IDEState *s = opaque;
 
-return (s-status  DRQ_STAT) != 0;
+return ((s-status  DRQ_STAT) != 0)
+|| (s-bus-error_status  BM_STATUS_PIO_RETRY);
 }
 
 static bool ide_atapi_gesn_needed(void *opaque)
-- 
1.7.5.2




[Qemu-devel] [PATCH 11/14] ide: allow other dma comands than read and write

2011-06-15 Thread Kevin Wolf
From: Christoph Hellwig h...@lst.de

Replace the is_read flag with a dma_cmd flag to allow the dma and
restart logic to handler other commands like TRIM.

Signed-off-by: Christoph Hellwig h...@lst.de
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/core.c |   25 ++---
 hw/ide/internal.h |   10 +-
 hw/ide/macio.c|9 +++--
 hw/ide/pci.c  |6 +++---
 4 files changed, 33 insertions(+), 17 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 399b74c..14bda82 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -472,7 +472,7 @@ handle_rw_error:
 if (ret  0) {
 int op = BM_STATUS_DMA_RETRY;
 
-if (s-is_read)
+if (s-dma_cmd == IDE_DMA_READ)
 op |= BM_STATUS_RETRY_READ;
 if (ide_handle_rw_error(s, -ret, op)) {
 return;
@@ -482,7 +482,7 @@ handle_rw_error:
 n = s-io_buffer_size  9;
 sector_num = ide_get_sector(s);
 if (n  0) {
-dma_buf_commit(s, s-is_read);
+dma_buf_commit(s, ide_cmd_is_read(s));
 sector_num += n;
 ide_set_sector(s, sector_num);
 s-nsector -= n;
@@ -499,23 +499,26 @@ handle_rw_error:
 n = s-nsector;
 s-io_buffer_index = 0;
 s-io_buffer_size = n * 512;
-if (s-bus-dma-ops-prepare_buf(s-bus-dma, s-is_read) == 0) {
+if (s-bus-dma-ops-prepare_buf(s-bus-dma, ide_cmd_is_read(s)) == 0) {
 /* The PRDs were too short. Reset the Active bit, but don't raise an
  * interrupt. */
 goto eot;
 }
 
 #ifdef DEBUG_AIO
-printf(ide_dma_cb: sector_num=% PRId64  n=%d, is_read=%d\n,
-   sector_num, n, s-is_read);
+printf(ide_dma_cb: sector_num=% PRId64  n=%d, cmd_cmd=%d\n,
+   sector_num, n, s-dma_cmd);
 #endif
 
-if (s-is_read) {
+switch (s-dma_cmd) {
+case IDE_DMA_READ:
 s-bus-dma-aiocb = dma_bdrv_read(s-bs, s-sg, sector_num,
ide_dma_cb, s);
-} else {
+break;
+case IDE_DMA_WRITE:
 s-bus-dma-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num,
 ide_dma_cb, s);
+break;
 }
 
 if (!s-bus-dma-aiocb) {
@@ -528,12 +531,12 @@ eot:
ide_set_inactive(s);
 }
 
-static void ide_sector_start_dma(IDEState *s, int is_read)
+static void ide_sector_start_dma(IDEState *s, enum ide_dma_cmd dma_cmd)
 {
 s-status = READY_STAT | SEEK_STAT | DRQ_STAT | BUSY_STAT;
 s-io_buffer_index = 0;
 s-io_buffer_size = 0;
-s-is_read = is_read;
+s-dma_cmd = dma_cmd;
 s-bus-dma-ops-start_dma(s-bus-dma, s, ide_dma_cb);
 }
 
@@ -916,7 +919,7 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 if (!s-bs)
 goto abort_cmd;
ide_cmd_lba48_transform(s, lba48);
-ide_sector_start_dma(s, 1);
+ide_sector_start_dma(s, IDE_DMA_READ);
 break;
case WIN_WRITEDMA_EXT:
lba48 = 1;
@@ -925,7 +928,7 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 if (!s-bs)
 goto abort_cmd;
ide_cmd_lba48_transform(s, lba48);
-ide_sector_start_dma(s, 0);
+ide_sector_start_dma(s, IDE_DMA_WRITE);
 s-media_changed = 1;
 break;
 case WIN_READ_NATIVE_MAX_EXT:
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index 8d18cc3..ea3edf5 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -379,6 +379,14 @@ struct unreported_events {
 bool new_media;
 };
 
+enum ide_dma_cmd {
+IDE_DMA_READ,
+IDE_DMA_WRITE,
+};
+
+#define ide_cmd_is_read(s) \
+   ((s)-dma_cmd == IDE_DMA_READ)
+
 /* NOTE: IDEState represents in fact one drive */
 struct IDEState {
 IDEBus *bus;
@@ -446,7 +454,7 @@ struct IDEState {
 uint32_t mdata_size;
 uint8_t *mdata_storage;
 int media_changed;
-int is_read;
+enum ide_dma_cmd dma_cmd;
 /* SMART */
 uint8_t smart_enabled;
 uint8_t smart_autosave;
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index 7107f6b..099b8cb 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -145,12 +145,17 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
 io-addr += io-len;
 io-len = 0;
 
-if (s-is_read)
+switch (s-dma_cmd) {
+case IDE_DMA_READ:
 m-aiocb = dma_bdrv_read(s-bs, s-sg, sector_num,
 pmac_ide_transfer_cb, io);
-else
+break;
+case IDE_DMA_WRITE:
 m-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num,
  pmac_ide_transfer_cb, io);
+break;
+}
+
 if (!m-aiocb)
 pmac_ide_transfer_cb(io, -1);
 }
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 7fa32bd..80c5794 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -169,7 +169,7 @@ static int bmdma_set_inactive(IDEDMA *dma)
 return 0;
 }
 
-static void bmdma_restart_dma(BMDMAState *bm, int is_read)
+static void bmdma_restart_dma(BMDMAState *bm, enum ide_dma_cmd dma_cmd)
 {
 IDEState *s = bmdma_active_if(bm);
 
@@ -177,7 +177,7 @@ 

[Qemu-devel] [PATCH 07/14] ide: Split error status from status register

2011-06-15 Thread Kevin Wolf
When adding the werror=stop mode, some flags were added to s-status
which are used to determine what kind of operation should be restarted
when the VM is continued.

Unfortunately, it turns out that s-status is in fact a device register
and as such is visible to the guest (some of the abused bits are even
writable for the guest).

For migration we keep on using the old VMState field (renamed to
migration_compat_status) if the status register doesn't use any of the
previously abused bits. If it does, we use a subsection with a clean copy of
the status register.

The error status is always sent in a subsection if there is any error. It can't
use the old field because errors happen even without PCI.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/core.c |   28 +++-
 hw/ide/internal.h |8 ++
 hw/ide/pci.c  |   73 +++-
 hw/ide/pci.h  |4 +++
 4 files changed, 105 insertions(+), 8 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 95beb17..da250ac 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -446,7 +446,7 @@ static int ide_handle_rw_error(IDEState *s, int error, int 
op)
 if ((error == ENOSPC  action == BLOCK_ERR_STOP_ENOSPC)
 || action == BLOCK_ERR_STOP_ANY) {
 s-bus-dma-ops-set_unit(s-bus-dma, s-unit);
-s-bus-dma-ops-add_status(s-bus-dma, op);
+s-bus-error_status = op;
 bdrv_mon_event(s-bs, BDRV_ACTION_STOP, is_read);
 vm_stop(VMSTOP_DISKFULL);
 } else {
@@ -1847,6 +1847,13 @@ static bool ide_atapi_gesn_needed(void *opaque)
 return s-events.new_media || s-events.eject_request;
 }
 
+static bool ide_error_needed(void *opaque)
+{
+IDEBus *bus = opaque;
+
+return (bus-error_status != 0);
+}
+
 /* Fields for GET_EVENT_STATUS_NOTIFICATION ATAPI command */
 const VMStateDescription vmstate_ide_atapi_gesn_state = {
 .name =ide_drive/atapi/gesn_state,
@@ -1921,6 +1928,17 @@ const VMStateDescription vmstate_ide_drive = {
 }
 };
 
+const VMStateDescription vmstate_ide_error_status = {
+.name =ide_bus/error,
+.version_id = 1,
+.minimum_version_id = 1,
+.minimum_version_id_old = 1,
+.fields = (VMStateField []) {
+VMSTATE_INT32(error_status, IDEBus),
+VMSTATE_END_OF_LIST()
+}
+};
+
 const VMStateDescription vmstate_ide_bus = {
 .name = ide_bus,
 .version_id = 1,
@@ -1930,6 +1948,14 @@ const VMStateDescription vmstate_ide_bus = {
 VMSTATE_UINT8(cmd, IDEBus),
 VMSTATE_UINT8(unit, IDEBus),
 VMSTATE_END_OF_LIST()
+},
+.subsections = (VMStateSubsection []) {
+{
+.vmsd = vmstate_ide_error_status,
+.needed = ide_error_needed,
+}, {
+/* empty */
+}
 }
 };
 
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index c2b35ec..8d18cc3 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -486,6 +486,8 @@ struct IDEBus {
 uint8_t unit;
 uint8_t cmd;
 qemu_irq irq;
+
+int error_status;
 };
 
 struct IDEDevice {
@@ -505,11 +507,17 @@ struct IDEDeviceInfo {
 #define BM_STATUS_DMAING 0x01
 #define BM_STATUS_ERROR  0x02
 #define BM_STATUS_INT0x04
+
+/* FIXME These are not status register bits */
 #define BM_STATUS_DMA_RETRY  0x08
 #define BM_STATUS_PIO_RETRY  0x10
 #define BM_STATUS_RETRY_READ  0x20
 #define BM_STATUS_RETRY_FLUSH 0x40
 
+#define BM_MIGRATION_COMPAT_STATUS_BITS \
+(BM_STATUS_DMA_RETRY | BM_STATUS_PIO_RETRY | \
+BM_STATUS_RETRY_READ | BM_STATUS_RETRY_FLUSH)
+
 #define BM_CMD_START 0x01
 #define BM_CMD_READ  0x08
 
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index a4726ad..7fa32bd 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -183,27 +183,33 @@ static void bmdma_restart_dma(BMDMAState *bm, int is_read)
 bmdma_start_dma(bm-dma, s, bm-dma_cb);
 }
 
+/* TODO This should be common IDE code */
 static void bmdma_restart_bh(void *opaque)
 {
 BMDMAState *bm = opaque;
+IDEBus *bus = bm-bus;
 int is_read;
 
 qemu_bh_delete(bm-bh);
 bm-bh = NULL;
 
-is_read = !!(bm-status  BM_STATUS_RETRY_READ);
+if (bm-unit == (uint8_t) -1) {
+return;
+}
 
-if (bm-status  BM_STATUS_DMA_RETRY) {
-bm-status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ);
+is_read = !!(bus-error_status  BM_STATUS_RETRY_READ);
+
+if (bus-error_status  BM_STATUS_DMA_RETRY) {
+bus-error_status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ);
 bmdma_restart_dma(bm, is_read);
-} else if (bm-status  BM_STATUS_PIO_RETRY) {
-bm-status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ);
+} else if (bus-error_status  BM_STATUS_PIO_RETRY) {
+bus-error_status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ);
 if (is_read) {
 ide_sector_read(bmdma_active_if(bm));
 } else {
 ide_sector_write(bmdma_active_if(bm));
 }
-} else if (bm-status  BM_STATUS_RETRY_FLUSH) 

Re: [Qemu-devel] [PATCH] Fix signal handling of SIG_IPI when io-thread is enabled

2011-06-15 Thread Alexandre Raymond
Hi Jan,

 Why? Ahh, because of qemu_cpu_kick_self: raise(SIG_IPI)! That should
 generate a per-process SIG_IPI. And that may not only affect Darwin.
 Looks good.

Actually, with io-thread enabled, it goes through qemu_cpu_kick_self()
- qemu_cpu_kick_thread() - pthread_kill(..., SIG_IPI).

I think the problem is with sigwait(). It doesn't state so in the
Linux or Darwin man pages, but on Solaris, it says : All signals
identified by the set argument must be blocked on all threads,
including the calling thread; otherwise, sigwait() might not work
correctly, which might correspond to the issue I've been witnessing
(ie: sigwait() unblocking once in a while on a SIGUSR1 (SIG_IPI) in
the event thread).

In any case, I don't think it should attempt to catch this signal at
all since the cpu thread is already catching it.

Alexandre



[Qemu-devel] [PATCH 13/14] ide: Clear error_status after restarting flush

2011-06-15 Thread Kevin Wolf
Clearing the error status flag was missing for restarting flushes. Now that the
error status is separate from the BM status register, we can simply set it to 0
after restarting the request. This ensures that we never forget to clear a bit.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/pci.c |   18 +++---
 1 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index c940375..9f3050a 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -189,6 +189,7 @@ static void bmdma_restart_bh(void *opaque)
 BMDMAState *bm = opaque;
 IDEBus *bus = bm-bus;
 int is_read;
+int error_status;
 
 qemu_bh_delete(bm-bh);
 bm-bh = NULL;
@@ -199,22 +200,25 @@ static void bmdma_restart_bh(void *opaque)
 
 is_read = !!(bus-error_status  BM_STATUS_RETRY_READ);
 
-if (bus-error_status  BM_STATUS_DMA_RETRY) {
-if (bus-error_status  BM_STATUS_RETRY_TRIM) {
-bus-error_status = ~BM_STATUS_RETRY_TRIM;
+/* The error status must be cleared before resubmitting the request: The
+ * request may fail again, and this case can only be distinguished if the
+ * called function can set a new error status. */
+error_status = bus-error_status;
+bus-error_status = 0;
+
+if (error_status  BM_STATUS_DMA_RETRY) {
+if (error_status  BM_STATUS_RETRY_TRIM) {
 bmdma_restart_dma(bm, IDE_DMA_TRIM);
 } else {
-bus-error_status = ~(BM_STATUS_DMA_RETRY | BM_STATUS_RETRY_READ);
 bmdma_restart_dma(bm, is_read ? IDE_DMA_READ : IDE_DMA_WRITE);
 }
-} else if (bus-error_status  BM_STATUS_PIO_RETRY) {
-bus-error_status = ~(BM_STATUS_PIO_RETRY | BM_STATUS_RETRY_READ);
+} else if (error_status  BM_STATUS_PIO_RETRY) {
 if (is_read) {
 ide_sector_read(bmdma_active_if(bm));
 } else {
 ide_sector_write(bmdma_active_if(bm));
 }
-} else if (bus-error_status  BM_STATUS_RETRY_FLUSH) {
+} else if (error_status  BM_STATUS_RETRY_FLUSH) {
 ide_flush_cache(bmdma_active_if(bm));
 }
 }
-- 
1.7.5.2




[Qemu-devel] [PATCH 10/14] make dma_bdrv_io available to drivers

2011-06-15 Thread Kevin Wolf
From: Christoph Hellwig h...@lst.de

Make dma_bdrv_io available for drivers, and pass an explicit I/O function
instead of hardcoding bdrv_aio_readv/bdrv_aio_writev.  This is required
to implement non-READ/WRITE dma commands in the ide driver, e.g. the
upcoming TRIM support.

Signed-off-by: Christoph Hellwig h...@lst.de
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 dma-helpers.c |   23 ++-
 dma.h |8 
 2 files changed, 18 insertions(+), 13 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 712ed89..ba7f897 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -47,6 +47,7 @@ typedef struct {
 target_phys_addr_t sg_cur_byte;
 QEMUIOVector iov;
 QEMUBH *bh;
+DMAIOFunc *io_func;
 } DMAAIOCB;
 
 static void dma_bdrv_cb(void *opaque, int ret);
@@ -116,13 +117,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
 return;
 }
 
-if (dbs-is_write) {
-dbs-acb = bdrv_aio_writev(dbs-bs, dbs-sector_num, dbs-iov,
-   dbs-iov.size / 512, dma_bdrv_cb, dbs);
-} else {
-dbs-acb = bdrv_aio_readv(dbs-bs, dbs-sector_num, dbs-iov,
-  dbs-iov.size / 512, dma_bdrv_cb, dbs);
-}
+dbs-acb = dbs-io_func(dbs-bs, dbs-sector_num, dbs-iov,
+dbs-iov.size / 512, dma_bdrv_cb, dbs);
 if (!dbs-acb) {
 dma_bdrv_unmap(dbs);
 qemu_iovec_destroy(dbs-iov);
@@ -144,12 +140,12 @@ static AIOPool dma_aio_pool = {
 .cancel = dma_aio_cancel,
 };
 
-static BlockDriverAIOCB *dma_bdrv_io(
+BlockDriverAIOCB *dma_bdrv_io(
 BlockDriverState *bs, QEMUSGList *sg, uint64_t sector_num,
-BlockDriverCompletionFunc *cb, void *opaque,
-int is_write)
+DMAIOFunc *io_func, BlockDriverCompletionFunc *cb,
+void *opaque, int is_write)
 {
-DMAAIOCB *dbs =  qemu_aio_get(dma_aio_pool, bs, cb, opaque);
+DMAAIOCB *dbs = qemu_aio_get(dma_aio_pool, bs, cb, opaque);
 
 dbs-acb = NULL;
 dbs-bs = bs;
@@ -158,6 +154,7 @@ static BlockDriverAIOCB *dma_bdrv_io(
 dbs-sg_cur_index = 0;
 dbs-sg_cur_byte = 0;
 dbs-is_write = is_write;
+dbs-io_func = io_func;
 dbs-bh = NULL;
 qemu_iovec_init(dbs-iov, sg-nsg);
 dma_bdrv_cb(dbs, 0);
@@ -173,12 +170,12 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
 QEMUSGList *sg, uint64_t sector,
 void (*cb)(void *opaque, int ret), void 
*opaque)
 {
-return dma_bdrv_io(bs, sg, sector, cb, opaque, 0);
+return dma_bdrv_io(bs, sg, sector, bdrv_aio_readv, cb, opaque, 0);
 }
 
 BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
  QEMUSGList *sg, uint64_t sector,
  void (*cb)(void *opaque, int ret), void 
*opaque)
 {
-return dma_bdrv_io(bs, sg, sector, cb, opaque, 1);
+return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, 1);
 }
diff --git a/dma.h b/dma.h
index f3bb275..3d8324b 100644
--- a/dma.h
+++ b/dma.h
@@ -32,6 +32,14 @@ void qemu_sglist_add(QEMUSGList *qsg, target_phys_addr_t 
base,
  target_phys_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 
+typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, int64_t sector_num,
+ QEMUIOVector *iov, int nb_sectors,
+ BlockDriverCompletionFunc *cb, void *opaque);
+
+BlockDriverAIOCB *dma_bdrv_io(BlockDriverState *bs,
+  QEMUSGList *sg, uint64_t sector_num,
+  DMAIOFunc *io_func, BlockDriverCompletionFunc 
*cb,
+  void *opaque, int is_write);
 BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
 QEMUSGList *sg, uint64_t sector,
 BlockDriverCompletionFunc *cb, void *opaque);
-- 
1.7.5.2




Re: [Qemu-devel] [PULL 00/14] Block patches

2011-06-15 Thread Anthony Liguori

On 06/15/2011 09:02 AM, Kevin Wolf wrote:

The following changes since commit 0b862cedf36d927818c50584ddd611b0370673df:

   configure: Detect and don't try to use older libcurl (2011-06-13 21:16:27 
+0200)

are available in the git repository at:
   git://repo.or.cz/qemu/kevin.git for-anthony


Pulled.  Thanks.

Regards,

Anthony Liguori


Christoph Hellwig (3):
   make dma_bdrv_io available to drivers
   ide: allow other dma comands than read and write
   ide: add TRIM support

Devin Nakamura (1):
   Replaced tabs with spaces in block.h and block_int.h

Kevin Wolf (9):
   qcow2: Avoid direct AIO callback
   qcow: Avoid direct AIO callback
   vdi: Avoid direct AIO callback
   qcow2: Fix in-flight list after qcow2_cache_put failure
   ide: Split error status from status register
   ide: Fix ide_drive_pio_state_needed()
   ide: Add forgotten VMSTATE_END_OF_LIST in subsection
   ide: Clear error_status after restarting flush
   Allow nested qemu_bh_poll() after BH deletion

Stefan Weil (1):
   block/rbd: Remove unused local variable

  async.c   |5 +-
  block.h   |6 +-
  block/qcow.c  |   58 ++-
  block/qcow2-cluster.c |   12 +++-
  block/qcow2.c |   39 ++---
  block/rbd.c   |4 -
  block/vdi.c   |   41 +++--
  block_int.h   |4 +-
  dma-helpers.c |   23 +++
  dma.h |8 +++
  hw/ide/core.c |  154 -
  hw/ide/internal.h |   32 ++-
  hw/ide/macio.c|   13 -
  hw/ide/pci.c  |   88 +---
  hw/ide/pci.h  |4 +
  hw/ide/qdev.c |5 ++
  16 files changed, 423 insertions(+), 73 deletions(-)





[Qemu-devel] [PATCH 12/14] ide: add TRIM support

2011-06-15 Thread Kevin Wolf
From: Christoph Hellwig h...@lst.de

Add support for TRIM sub function of the data set management command,
and wire it up to the qemu discard infrastructure.

Signed-off-by: Christoph Hellwig h...@lst.de
Signed-off-by: Kevin Wolf kw...@redhat.com
---
 hw/ide/core.c |   97 +++-
 hw/ide/internal.h |   14 +++-
 hw/ide/macio.c|4 ++
 hw/ide/pci.c  |9 -
 hw/ide/qdev.c |5 +++
 5 files changed, 124 insertions(+), 5 deletions(-)

diff --git a/hw/ide/core.c b/hw/ide/core.c
index 14bda82..ca17a43 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -78,7 +78,7 @@ static void ide_identify(IDEState *s)
 {
 uint16_t *p;
 unsigned int oldsize;
-IDEDevice *dev;
+IDEDevice *dev = s-unit ? s-bus-slave : s-bus-master;
 
 if (s-identify_set) {
memcpy(s-io_buffer, s-identify_data, sizeof(s-identify_data));
@@ -124,6 +124,9 @@ static void ide_identify(IDEState *s)
 put_le16(p + 66, 120);
 put_le16(p + 67, 120);
 put_le16(p + 68, 120);
+if (dev  dev-conf.discard_granularity) {
+put_le16(p + 69, (1  14)); /* determinate TRIM behavior */
+}
 
 if (s-ncq_queues) {
 put_le16(p + 75, s-ncq_queues - 1);
@@ -154,9 +157,12 @@ static void ide_identify(IDEState *s)
 put_le16(p + 101, s-nb_sectors  16);
 put_le16(p + 102, s-nb_sectors  32);
 put_le16(p + 103, s-nb_sectors  48);
-dev = s-unit ? s-bus-slave : s-bus-master;
+
 if (dev  dev-conf.physical_block_size)
 put_le16(p + 106, 0x6000 | get_physical_block_exp(dev-conf));
+if (dev  dev-conf.discard_granularity) {
+put_le16(p + 169, 1); /* TRIM support */
+}
 
 memcpy(s-identify_data, p, sizeof(s-identify_data));
 s-identify_set = 1;
@@ -299,6 +305,74 @@ static void ide_set_signature(IDEState *s)
 }
 }
 
+typedef struct TrimAIOCB {
+BlockDriverAIOCB common;
+QEMUBH *bh;
+int ret;
+} TrimAIOCB;
+
+static void trim_aio_cancel(BlockDriverAIOCB *acb)
+{
+TrimAIOCB *iocb = container_of(acb, TrimAIOCB, common);
+
+qemu_bh_delete(iocb-bh);
+iocb-bh = NULL;
+qemu_aio_release(iocb);
+}
+
+static AIOPool trim_aio_pool = {
+.aiocb_size = sizeof(TrimAIOCB),
+.cancel = trim_aio_cancel,
+};
+
+static void ide_trim_bh_cb(void *opaque)
+{
+TrimAIOCB *iocb = opaque;
+
+iocb-common.cb(iocb-common.opaque, iocb-ret);
+
+qemu_bh_delete(iocb-bh);
+iocb-bh = NULL;
+
+qemu_aio_release(iocb);
+}
+
+BlockDriverAIOCB *ide_issue_trim(BlockDriverState *bs,
+int64_t sector_num, QEMUIOVector *qiov, int nb_sectors,
+BlockDriverCompletionFunc *cb, void *opaque)
+{
+TrimAIOCB *iocb;
+int i, j, ret;
+
+iocb = qemu_aio_get(trim_aio_pool, bs, cb, opaque);
+iocb-bh = qemu_bh_new(ide_trim_bh_cb, iocb);
+iocb-ret = 0;
+
+for (j = 0; j  qiov-niov; j++) {
+uint64_t *buffer = qiov-iov[j].iov_base;
+
+for (i = 0; i  qiov-iov[j].iov_len / 8; i++) {
+/* 6-byte LBA + 2-byte range per entry */
+uint64_t entry = le64_to_cpu(buffer[i]);
+uint64_t sector = entry  0xULL;
+uint16_t count = entry  48;
+
+if (count == 0) {
+break;
+}
+
+ret = bdrv_discard(bs, sector, count);
+if (!iocb-ret) {
+iocb-ret = ret;
+}
+}
+}
+
+qemu_bh_schedule(iocb-bh);
+
+return iocb-common;
+}
+
 static inline void ide_abort_command(IDEState *s)
 {
 s-status = READY_STAT | ERR_STAT;
@@ -474,6 +548,9 @@ handle_rw_error:
 
 if (s-dma_cmd == IDE_DMA_READ)
 op |= BM_STATUS_RETRY_READ;
+else if (s-dma_cmd == IDE_DMA_TRIM)
+op |= BM_STATUS_RETRY_TRIM;
+
 if (ide_handle_rw_error(s, -ret, op)) {
 return;
 }
@@ -519,6 +596,10 @@ handle_rw_error:
 s-bus-dma-aiocb = dma_bdrv_write(s-bs, s-sg, sector_num,
 ide_dma_cb, s);
 break;
+case IDE_DMA_TRIM:
+s-bus-dma-aiocb = dma_bdrv_io(s-bs, s-sg, sector_num,
+ ide_issue_trim, ide_dma_cb, s, 1);
+break;
 }
 
 if (!s-bus-dma-aiocb) {
@@ -818,6 +899,18 @@ void ide_exec_cmd(IDEBus *bus, uint32_t val)
 return;
 
 switch(val) {
+case WIN_DSM:
+switch (s-feature) {
+case DSM_TRIM:
+if (!s-bs) {
+goto abort_cmd;
+}
+ide_sector_start_dma(s, IDE_DMA_TRIM);
+break;
+default:
+goto abort_cmd;
+}
+break;
 case WIN_IDENTIFY:
 if (s-bs  s-drive_kind != IDE_CD) {
 if (s-drive_kind != IDE_CFATA)
diff --git a/hw/ide/internal.h b/hw/ide/internal.h
index ea3edf5..02e805f 100644
--- a/hw/ide/internal.h
+++ b/hw/ide/internal.h
@@ -62,7 +62,11 @@ typedef struct IDEDMAOps IDEDMAOps;
  */
 #define 

[Qemu-devel] [PATCH 14/14] Allow nested qemu_bh_poll() after BH deletion

2011-06-15 Thread Kevin Wolf
Without this, qemu segfaults when a BH handler first deletes its BH and
then calls another function which involves a nested qemu_bh_poll() call.

This can be reproduced by generating an I/O error (e.g. with blkdebug) on
an IDE device and using rerror/werror=stop to stop the VM. When continuing
the VM, qemu segfaults.

Signed-off-by: Kevin Wolf kw...@redhat.com
Reviewed-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 async.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/async.c b/async.c
index 57ac3a8..fd313df 100644
--- a/async.c
+++ b/async.c
@@ -137,11 +137,12 @@ QEMUBH *qemu_bh_new(QEMUBHFunc *cb, void *opaque)
 
 int qemu_bh_poll(void)
 {
-QEMUBH *bh, **bhp;
+QEMUBH *bh, **bhp, *next;
 int ret;
 
 ret = 0;
-for (bh = async_context-first_bh; bh; bh = bh-next) {
+for (bh = async_context-first_bh; bh; bh = next) {
+next = bh-next;
 if (!bh-deleted  bh-scheduled) {
 bh-scheduled = 0;
 if (!bh-idle)
-- 
1.7.5.2




Re: [Qemu-devel] [PATCH 33/34] hw/usb-ohci.c: Implement remote wakeup

2011-06-15 Thread Peter Maydell
On 14 June 2011 12:05, Gerd Hoffmann kra...@redhat.com wrote:
 From: Peter Maydell peter.mayd...@linaro.org

 Implement the wakeup callback in the OHCI USBPortOps, so that when
 a downstream device wakes up it correctly causes the OHCI controller
 to come out of suspend.

Just to let you know, this patch turns out to not be quite right:
 * if the port is suspended and the controller is not, we need to
   raise OHCI_INTR_RHSC
 * if the controller is suspended we need to put it into the
   resume state
 * the controller can be suspended when the port is not, so the
   check on s-ctl musn't be inside the port-ctrl if().

Sorry for the half-baked patch. I'll send a fixed version shortly.

-- PMM



Re: [Qemu-devel] [PATCH 0/2] [PULL] qemu-kvm.git uq/master queue

2011-06-15 Thread Anthony Liguori

On 06/01/2011 12:31 PM, Marcelo Tosatti wrote:

The following changes since commit 578c7b2ca8ee9e97fa8693b1a83d517e8e3f962e:

   audio: fix integer overflow expression (2011-06-01 00:14:07 +0400)


Pulled.  Thanks.

Regards,

Anthony Liguori



are available in the git repository at:
   git://git.kernel.org/pub/scm/virt/kvm/qemu-kvm.git uq/master

Yang, Wei Y (1):
   kvm: Enable CPU SMEP feature

brill...@viatech.com.cn (1):
   kvm: Add CPUID support for VIA CPU

  target-i386/cpu.h   |9 ++-
  target-i386/cpuid.c |   66 +-
  target-i386/kvm.c   |   15 +++
  3 files changed, 87 insertions(+), 3 deletions(-)







Re: [Qemu-devel] [PATCH 1/2] Allow silent system resets

2011-06-15 Thread Luiz Capitulino
On Tue, 14 Jun 2011 18:29:43 +0200
Jan Kiszka jan.kis...@siemens.com wrote:

 This allows qemu_system_reset to be issued silently for internal
 purposes, ie. without sending out a monitor event. Convert the system
 reset after startup to the silent mode.
 
 Signed-off-by: Jan Kiszka jan.kis...@siemens.com

Only this patch is monitor related, but I've applied the two in the
monitor branch.

Thanks.

 ---
  sysemu.h  |5 -
  vl.c  |   10 ++
  xen-all.c |2 +-
  3 files changed, 11 insertions(+), 6 deletions(-)
 
 diff --git a/sysemu.h b/sysemu.h
 index 7e70daa..d3013f5 100644
 --- a/sysemu.h
 +++ b/sysemu.h
 @@ -34,6 +34,9 @@ void qemu_del_vm_change_state_handler(VMChangeStateEntry 
 *e);
  #define VMSTOP_LOADVM7
  #define VMSTOP_MIGRATE   8
  
 +#define VMRESET_SILENT   false
 +#define VMRESET_REPORT   true
 +
  void vm_start(void);
  void vm_stop(int reason);
  
 @@ -50,7 +53,7 @@ int qemu_powerdown_requested(void);
  void qemu_system_killed(int signal, pid_t pid);
  void qemu_kill_report(void);
  extern qemu_irq qemu_system_powerdown;
 -void qemu_system_reset(void);
 +void qemu_system_reset(bool report);
  
  void qemu_add_exit_notifier(Notifier *notify);
  void qemu_remove_exit_notifier(Notifier *notify);
 diff --git a/vl.c b/vl.c
 index d7f905d..91380d2 100644
 --- a/vl.c
 +++ b/vl.c
 @@ -1249,7 +1249,7 @@ void qemu_unregister_reset(QEMUResetHandler *func, void 
 *opaque)
  }
  }
  
 -void qemu_system_reset(void)
 +void qemu_system_reset(bool report)
  {
  QEMUResetEntry *re, *nre;
  
 @@ -1257,7 +1257,9 @@ void qemu_system_reset(void)
  QTAILQ_FOREACH_SAFE(re, reset_handlers, entry, nre) {
  re-func(re-opaque);
  }
 -monitor_protocol_event(QEVENT_RESET, NULL);
 +if (report) {
 +monitor_protocol_event(QEVENT_RESET, NULL);
 +}
  cpu_synchronize_all_post_reset();
  }
  
 @@ -1399,7 +1401,7 @@ static void main_loop(void)
  if (qemu_reset_requested()) {
  pause_all_vcpus();
  cpu_synchronize_all_states();
 -qemu_system_reset();
 +qemu_system_reset(VMRESET_REPORT);
  resume_all_vcpus();
  }
  if (qemu_powerdown_requested()) {
 @@ -3272,7 +3274,7 @@ int main(int argc, char **argv, char **envp)
  qemu_register_reset(qbus_reset_all_fn, sysbus_get_default());
  qemu_run_machine_init_done_notifiers();
  
 -qemu_system_reset();
 +qemu_system_reset(VMRESET_SILENT);
  if (loadvm) {
  if (load_vmstate(loadvm)  0) {
  autostart = 0;
 diff --git a/xen-all.c b/xen-all.c
 index 0eac202..41fd98a 100644
 --- a/xen-all.c
 +++ b/xen-all.c
 @@ -452,7 +452,7 @@ static void cpu_handle_ioreq(void *opaque)
  destroy_hvm_domain();
  }
  if (qemu_reset_requested_get()) {
 -qemu_system_reset();
 +qemu_system_reset(VMRESET_REPORT);
  }
  }
  




Re: [Qemu-devel] [PULL] libcacard libtoolized + usb-ccid fix

2011-06-15 Thread Anthony Liguori

On 05/31/2011 11:42 AM, Alon Levy wrote:

Hi,

This pull request includes the libcacard.la library target and a memory leak
fix by Markus. Please pull.

The following changes since commit b1d7d2b93a1d6b2d2848b616cc35acdf521c923c:

   Merge remote-tracking branch 'stefanha/trivial-patches' into staging 
(2011-05-31 08:23:11 -0500)

are available in the git repository at:

   git://anongit.freedesktop.org/~alon/qemu pull-libcacard-1


Pulled.  Thanks.

Regards,

Anthony Liguori



Alon Levy (2):
   configure: add libdir and --libdir
   libcacard: add libcacard.la target

Markus Armbruster (1):
   usb-ccid: Plug memory leak on qdev exit()

  Makefile   |   20 +++-
  Makefile.objs  |8 
  configure  |   17 -
  hw/usb-ccid.c  |   28 
  libcacard/Makefile |   32 
  rules.mak  |8 
  6 files changed, 87 insertions(+), 26 deletions(-)







Re: [Qemu-devel] [PULL] usb patch queue

2011-06-15 Thread Anthony Liguori

On 06/14/2011 06:05 AM, Gerd Hoffmann wrote:

   Hi,

The USB patch queue has been rebased, got a minor fix (wrong comment in
patch #8, spotted by David Ahern) and three new patches.  I'm just
posting the three new patches to avoid spamming the list with 30
identical patches ...

please pull,
   Gerd


Pulled.  Thanks.

Regards,

Anthony Liguori



The following changes since commit 0b862cedf36d927818c50584ddd611b0370673df:

   configure: Detect and don't try to use older libcurl (2011-06-13 21:16:27 
+0200)

are available in the git repository at:
   git://git.kraxel.org/qemu usb.16

Brad Hards (3):
   usb: Add defines for USB Serial Bus Release Number register
   usb: Use defines for serial bus release number register for UHCI
   usb: Use defines for serial bus release number register for EHCI

Gerd Hoffmann (18):
   usb-linux: catch ENODEV in more places.
   usb-ehci: trace mmio and usbsts
   usb-ehci: trace state machine changes
   usb-ehci: trace port state
   usb-ehci: improve mmio tracing
   usb-ehci: trace buffer copy
   usb-ehci: add queue data struct
   usb-ehci: multiqueue support
   usb-ehci: fix offset writeback in ehci_buffer_rw
   usb-ehci: fix error handling.
   usb: cancel async packets on unplug
   usb-ehci: drop EXECUTING checks.
   usb-ehci: itd handling fixes.
   usb-ehci: split trace calls to handle arg count limits
   usb: documentation update
   usb-linux: only cleanup in host_close when host_open was successful.
   usb: don't call usb_host_device_open from vl.c
   usb-uhci: fix expire time initialization.

Hans de Goede (9):
   ehci: fix a number of unused-but-set-variable warnings (new with gcc-4.6)
   usb-linux: Get speed from sysfs rather then from the connectinfo ioctl
   usb-linux: Teach about super speed
   usb-linux: Don't do perror when errno is not set
   usb-linux: Ensure devep != 0
   usb-linux: Don't try to open the same device twice
   usb-linux: Enlarge buffer for descriptors to 8192 bytes
   usb-bus: Add knowledge of USB_SPEED_SUPER to usb_speed helper
   usb-bus: Don't detach non attached devices on device exit

Kevin O'Connor (2):
   Fix USB mouse Set_Protocol behavior
   The USB tablet should not claim boot protocol support.

Peter Maydell (2):
   hw/usb-ohci.c: Ignore writes to HcPeriodCurrentED register
   hw/usb-ohci.c: Implement remote wakeup

  docs/usb2.txt  |   85 
  hw/milkymist-softusb.c |   10 +-
  hw/usb-bus.c   |   10 +-
  hw/usb-ehci.c  | 1198 
  hw/usb-hid.c   |5 +-
  hw/usb-musb.c  |   23 +-
  hw/usb-ohci.c  |   37 ++-
  hw/usb-uhci.c  |   32 ++-
  hw/usb.h   |   14 +-
  trace-events   |   20 +
  usb-linux.c|   96 +++--
  vl.c   |6 +-
  12 files changed, 990 insertions(+), 546 deletions(-)







Re: [Qemu-devel] [PATCH v3 04/21] qapi: add QAPI visitor core

2011-06-15 Thread Luiz Capitulino
On Mon, 13 Jun 2011 21:31:09 -0500
Michael Roth mdr...@linux.vnet.ibm.com wrote:

 Base definitions/includes for Visiter interface used by generated
 visiter/marshalling code.
 
 Includes a GenericList type. Our lists require an embedded element.
 Since these types are generated, if you want to use them in a different
 type of data structure, there's no easy way to add another embedded
 element. The solution is to have non-embedded lists and that what this is.
 
 Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
 ---
  Makefile.objs  |6 +++
  qapi/qapi-types-core.h |   21 ++
  qapi/qapi-visit-core.c |  101 
 
  qapi/qapi-visit-core.h |   68 
  4 files changed, 196 insertions(+), 0 deletions(-)
  create mode 100644 qapi/qapi-types-core.h
  create mode 100644 qapi/qapi-visit-core.c
  create mode 100644 qapi/qapi-visit-core.h
 
 diff --git a/Makefile.objs b/Makefile.objs
 index a7807e8..68d7b5a 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -364,6 +364,12 @@ endif
  
  libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o 
 vcard_emul_type.o card_7816.o
  
 +##
 +# qapi
 +
 +qapi-nested-y = qapi-visit-core.o
 +qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y))

To build this I have to add qapi-obj-y to common-obj-y and add the
qapi directory to the DIR variable in configure (so that it builds
in a different directory).

I saw that you send the configure change in a different patch, but it
has to be done here.

 +
  vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
  
  vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS)
 diff --git a/qapi/qapi-types-core.h b/qapi/qapi-types-core.h
 new file mode 100644
 index 000..de733ab
 --- /dev/null
 +++ b/qapi/qapi-types-core.h
 @@ -0,0 +1,21 @@
 +/*
 + * Core Definitions for QAPI-generated Types
 + *
 + * Copyright IBM, Corp. 2011
 + *
 + * Authors:
 + *  Anthony Liguori   aligu...@us.ibm.com
 + *
 + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
 later.
 + * See the COPYING.LIB file in the top-level directory.
 + *
 + */
 +
 +#ifndef QAPI_TYPES_CORE_H
 +#define QAPI_TYPES_CORE_H
 +
 +#include stdbool.h
 +#include stdint.h
 +#include error.h
 +
 +#endif
 diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
 new file mode 100644
 index 000..948818e
 --- /dev/null
 +++ b/qapi/qapi-visit-core.c
 @@ -0,0 +1,101 @@
 +#include qapi/qapi-visit-core.h

No license info.

 +
 +void visit_start_handle(Visitor *v, void **obj, const char *kind, const char 
 *name, Error **errp)
 +{
 +if (!error_is_set(errp)  v-start_handle) {
 +v-start_handle(v, obj, kind, name, errp);
 +}
 +}
 +
 +void visit_end_handle(Visitor *v, Error **errp)
 +{
 +if (!error_is_set(errp)  v-end_handle) {
 +v-end_handle(v, errp);
 +}
 +}
 +
 +void visit_start_struct(Visitor *v, void **obj, const char *kind, const char 
 *name, size_t size, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-start_struct(v, obj, kind, name, size, errp);
 +}
 +}
 +
 +void visit_end_struct(Visitor *v, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-end_struct(v, errp);
 +}
 +}
 +
 +void visit_start_list(Visitor *v, const char *name, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-start_list(v, name, errp);
 +}
 +}
 +
 +GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +return v-next_list(v, list, errp);
 +}
 +
 +return 0;
 +}
 +
 +void visit_end_list(Visitor *v, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-end_list(v, errp);
 +}
 +}
 +
 +void visit_start_optional(Visitor *v, bool *present, const char *name, Error 
 **errp)
 +{
 +if (!error_is_set(errp)  v-start_optional) {
 +v-start_optional(v, present, name, errp);
 +}
 +}
 +
 +void visit_end_optional(Visitor *v, Error **errp)
 +{
 +if (!error_is_set(errp)  v-end_optional) {
 +v-end_optional(v, errp);
 +}
 +}
 +
 +void visit_type_enum(Visitor *v, int *obj, const char *kind, const char 
 *name, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-type_enum(v, obj, kind, name, errp);
 +}
 +}
 +
 +void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-type_int(v, obj, name, errp);
 +}
 +}
 +
 +void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-type_bool(v, obj, name, errp);
 +}
 +}
 +
 +void visit_type_str(Visitor *v, char **obj, const char *name, Error **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-type_str(v, obj, name, errp);
 +}
 +}
 +
 +void visit_type_number(Visitor *v, double *obj, const char *name, Error 
 **errp)
 +{
 +if (!error_is_set(errp)) {
 +v-type_number(v, obj, name, 

Re: [Qemu-devel] [RFC] Specifying storage devices for live migration

2011-06-15 Thread Avishay Traeger
Stefan Hajnoczi stefa...@gmail.com wrote on 15/06/2011 15:49:12:
  Hello all,
  Currently there is no way to choose storage devices to be migrated.
There
  was some discussion about making the monitor API a bit more flexible,
but
  the code was never written:
  http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html

 Hmm...2009.  Have you seen this recent thread?

 http://lists.gnu.org/archive/html/qemu-devel/2011-05/msg00733.html

 Are you planning to extend the block migration API and fix known
 issues with block migration?

 Stefan

This feature (selectively migrating block devices during live migration) is
missing for us right now.  I know it's an old thread, but as far as I know,
this hasn't been fixed.  In the more recent thread I see that there are
thoughts to re-work this code (pre-copy vs. post-copy), but doesn't the
interface need to be fixed anyway?  Where are the known issues with block
migration documented?

Thanks,
Avishay




Re: [Qemu-devel] [PATCH v3 04/21] qapi: add QAPI visitor core

2011-06-15 Thread Michael Roth

On 06/15/2011 09:33 AM, Luiz Capitulino wrote:

On Mon, 13 Jun 2011 21:31:09 -0500
Michael Rothmdr...@linux.vnet.ibm.com  wrote:


Base definitions/includes for Visiter interface used by generated
visiter/marshalling code.

Includes a GenericList type. Our lists require an embedded element.
Since these types are generated, if you want to use them in a different
type of data structure, there's no easy way to add another embedded
element. The solution is to have non-embedded lists and that what this is.

Signed-off-by: Michael Rothmdr...@linux.vnet.ibm.com
---
  Makefile.objs  |6 +++
  qapi/qapi-types-core.h |   21 ++
  qapi/qapi-visit-core.c |  101 
  qapi/qapi-visit-core.h |   68 
  4 files changed, 196 insertions(+), 0 deletions(-)
  create mode 100644 qapi/qapi-types-core.h
  create mode 100644 qapi/qapi-visit-core.c
  create mode 100644 qapi/qapi-visit-core.h

diff --git a/Makefile.objs b/Makefile.objs
index a7807e8..68d7b5a 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -364,6 +364,12 @@ endif

  libcacard-y = cac.o event.o vcard.o vreader.o vcard_emul_nss.o 
vcard_emul_type.o card_7816.o

+##
+# qapi
+
+qapi-nested-y = qapi-visit-core.o
+qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y))


To build this I have to add qapi-obj-y to common-obj-y and add the
qapi directory to the DIR variable in configure (so that it builds
in a different directory).


Yah, nobody actually uses these right now except the unit tests/guest 
agent so for test builds you have to manually add qapi-obj-y to another 
target like qemu-img. I'll move the configure change to this patch in 
the next run.




I saw that you send the configure change in a different patch, but it
has to be done here.


+
  vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)

  vl.o: QEMU_CFLAGS+=$(SDL_CFLAGS)
diff --git a/qapi/qapi-types-core.h b/qapi/qapi-types-core.h
new file mode 100644
index 000..de733ab
--- /dev/null
+++ b/qapi/qapi-types-core.h
@@ -0,0 +1,21 @@
+/*
+ * Core Definitions for QAPI-generated Types
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef QAPI_TYPES_CORE_H
+#define QAPI_TYPES_CORE_H
+
+#includestdbool.h
+#includestdint.h
+#include error.h
+
+#endif
diff --git a/qapi/qapi-visit-core.c b/qapi/qapi-visit-core.c
new file mode 100644
index 000..948818e
--- /dev/null
+++ b/qapi/qapi-visit-core.c
@@ -0,0 +1,101 @@
+#include qapi/qapi-visit-core.h


No license info.



Doh! Something about the term copyright header frequently causes me to 
disregard non-.h files. These will all get fixed in the next run.



+
+void visit_start_handle(Visitor *v, void **obj, const char *kind, const char 
*name, Error **errp)
+{
+if (!error_is_set(errp)  v-start_handle) {
+v-start_handle(v, obj, kind, name, errp);
+}
+}
+
+void visit_end_handle(Visitor *v, Error **errp)
+{
+if (!error_is_set(errp)  v-end_handle) {
+v-end_handle(v, errp);
+}
+}
+
+void visit_start_struct(Visitor *v, void **obj, const char *kind, const char 
*name, size_t size, Error **errp)
+{
+if (!error_is_set(errp)) {
+v-start_struct(v, obj, kind, name, size, errp);
+}
+}
+
+void visit_end_struct(Visitor *v, Error **errp)
+{
+if (!error_is_set(errp)) {
+v-end_struct(v, errp);
+}
+}
+
+void visit_start_list(Visitor *v, const char *name, Error **errp)
+{
+if (!error_is_set(errp)) {
+v-start_list(v, name, errp);
+}
+}
+
+GenericList *visit_next_list(Visitor *v, GenericList **list, Error **errp)
+{
+if (!error_is_set(errp)) {
+return v-next_list(v, list, errp);
+}
+
+return 0;
+}
+
+void visit_end_list(Visitor *v, Error **errp)
+{
+if (!error_is_set(errp)) {
+v-end_list(v, errp);
+}
+}
+
+void visit_start_optional(Visitor *v, bool *present, const char *name, Error 
**errp)
+{
+if (!error_is_set(errp)  v-start_optional) {
+v-start_optional(v, present, name, errp);
+}
+}
+
+void visit_end_optional(Visitor *v, Error **errp)
+{
+if (!error_is_set(errp)  v-end_optional) {
+v-end_optional(v, errp);
+}
+}
+
+void visit_type_enum(Visitor *v, int *obj, const char *kind, const char *name, 
Error **errp)
+{
+if (!error_is_set(errp)) {
+v-type_enum(v, obj, kind, name, errp);
+}
+}
+
+void visit_type_int(Visitor *v, int64_t *obj, const char *name, Error **errp)
+{
+if (!error_is_set(errp)) {
+v-type_int(v, obj, name, errp);
+}
+}
+
+void visit_type_bool(Visitor *v, bool *obj, const char *name, Error **errp)
+{
+if (!error_is_set(errp)) {
+v-type_bool(v, obj, name, errp);
+}
+}
+
+void visit_type_str(Visitor *v, char **obj, const char *name, Error 

Re: [Qemu-devel] [PATCH v3 05/21] qapi: add QMP input visitor

2011-06-15 Thread Luiz Capitulino
On Mon, 13 Jun 2011 21:31:10 -0500
Michael Roth mdr...@linux.vnet.ibm.com wrote:

 A type of Visiter class that is used to walk a qobject's
 structure and assign each entry to the corresponding native C type.
 Command marshaling function will use this to pull out QMP command
 parameters recieved over the wire and pass them as native arguments
 to the corresponding C functions.
 
 Signed-off-by: Michael Roth mdr...@linux.vnet.ibm.com
 ---
  Makefile.objs|2 +-
  qapi/qmp-input-visitor.c |  251 
 ++
  qapi/qmp-input-visitor.h |   27 +
  qerror.h |3 +
  4 files changed, 282 insertions(+), 1 deletions(-)
  create mode 100644 qapi/qmp-input-visitor.c
  create mode 100644 qapi/qmp-input-visitor.h
 
 diff --git a/Makefile.objs b/Makefile.objs
 index 68d7b5a..2eb90b8 100644
 --- a/Makefile.objs
 +++ b/Makefile.objs
 @@ -367,7 +367,7 @@ libcacard-y = cac.o event.o vcard.o vreader.o 
 vcard_emul_nss.o vcard_emul_type.o
  ##
  # qapi
  
 -qapi-nested-y = qapi-visit-core.o
 +qapi-nested-y = qapi-visit-core.o qmp-input-visitor.o
  qapi-obj-y = $(addprefix qapi/, $(qapi-nested-y))
  
  vl.o: QEMU_CFLAGS+=$(GPROF_CFLAGS)
 diff --git a/qapi/qmp-input-visitor.c b/qapi/qmp-input-visitor.c
 new file mode 100644
 index 000..9344d37
 --- /dev/null
 +++ b/qapi/qmp-input-visitor.c

No license info. It seems to be missing in more files.

 @@ -0,0 +1,251 @@
 +#include qmp-input-visitor.h
 +#include qemu-queue.h
 +#include qemu-common.h
 +#include qemu-objects.h
 +#include qerror.h
 +
 +#define QIV_STACK_SIZE 1024
 +
 +typedef struct StackObject
 +{
 +QObject *obj;
 +const  QListEntry *entry;
 +} StackObject;
 +
 +struct QmpInputVisitor
 +{
 +Visitor visitor;
 +QObject *obj;
 +StackObject stack[QIV_STACK_SIZE];
 +int nb_stack;
 +};
 +
 +static QmpInputVisitor *to_qiv(Visitor *v)
 +{
 +return container_of(v, QmpInputVisitor, visitor);
 +}
 +
 +static QObject *qmp_input_get_object(QmpInputVisitor *qiv, const char *name)
 +{
 +QObject *qobj;
 +
 +if (qiv-nb_stack == 0) {
 +qobj = qiv-obj;
 +} else {
 +qobj = qiv-stack[qiv-nb_stack - 1].obj;
 +}
 +
 +if (name  qobject_type(qobj) == QTYPE_QDICT) {
 +return qdict_get(qobject_to_qdict(qobj), name);
 +} else if (qiv-nb_stack  0  qobject_type(qobj) == QTYPE_QLIST) {
 +return qlist_entry_obj(qiv-stack[qiv-nb_stack - 1].entry);
 +}
 +
 +return qobj;
 +}
 +
 +static void qmp_input_push(QmpInputVisitor *qiv, QObject *obj, Error **errp)
 +{
 +qiv-stack[qiv-nb_stack].obj = obj;
 +if (qobject_type(obj) == QTYPE_QLIST) {
 +qiv-stack[qiv-nb_stack].entry = qlist_first(qobject_to_qlist(obj));
 +}
 +qiv-nb_stack++;
 +
 +if (qiv-nb_stack = QIV_STACK_SIZE) {
 +error_set(errp, QERR_QAPI_VISITOR_STACK_OVERRUN);

QAPI is something internal to qemu, the error name and description
should be more generic like not enough memory or not enough space.

 +return;
 +}
 +}
 +
 +static void qmp_input_pop(QmpInputVisitor *qiv, Error **errp)
 +{
 +qiv-nb_stack--;
 +if (qiv-nb_stack  0) {
 +error_set(errp, QERR_QAPI_VISITOR_STACK_OVERRUN);
 +return;
 +}
 +}
 +
 +static void qmp_input_start_struct(Visitor *v, void **obj, const char *kind, 
 const char *name, size_t size, Error **errp)
 +{
 +QmpInputVisitor *qiv = to_qiv(v);
 +QObject *qobj = qmp_input_get_object(qiv, name);
 +
 +if (!qobj || qobject_type(qobj) != QTYPE_QDICT) {
 +error_set(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : null, 
 QDict);
 +return;
 +}
 +
 +qmp_input_push(qiv, qobj, errp);
 +if (error_is_set(errp)) {
 +return;
 +}
 +
 +if (obj) {
 +*obj = qemu_mallocz(size);
 +}
 +}
 +
 +static void qmp_input_end_struct(Visitor *v, Error **errp)
 +{
 +QmpInputVisitor *qiv = to_qiv(v);
 +
 +qmp_input_pop(qiv, errp);
 +}
 +
 +static void qmp_input_start_list(Visitor *v, const char *name, Error **errp)
 +{
 +QmpInputVisitor *qiv = to_qiv(v);
 +QObject *qobj = qmp_input_get_object(qiv, name);
 +
 +if (!qobj || qobject_type(qobj) != QTYPE_QLIST) {
 +error_set(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : null, 
 list);
 +return;
 +}
 +
 +qmp_input_push(qiv, qobj, errp);
 +}
 +
 +static GenericList *qmp_input_next_list(Visitor *v, GenericList **list, 
 Error **errp)
 +{
 +QmpInputVisitor *qiv = to_qiv(v);
 +GenericList *entry;
 +StackObject *so = qiv-stack[qiv-nb_stack - 1];
 +
 +if (so-entry == NULL) {
 +return NULL;
 +}
 +
 +entry = qemu_mallocz(sizeof(*entry));
 +if (*list) {
 +so-entry = qlist_next(so-entry);
 +if (so-entry == NULL) {
 +qemu_free(entry);
 +return NULL;
 +}
 +(*list)-next = entry;
 +}
 +*list = entry;
 +
 +
 +

Re: [Qemu-devel] [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability

2011-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2011 at 9:12 AM, M. Mohan Kumar mo...@in.ibm.com wrote:
 [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability

 In passthrough security model, following a symbolic link in the server
 side could result in TOCTTOU vulnerability.

 Use clone system call to create a thread which runs in chrooted
 environment. All passthrough model file operations are done from this
 thread to avoid TOCTTOU vulnerability.

 Signed-off-by: Venkateswararao Jujjuri jv...@linux.vnet.ibm.com
 Signed-off-by: M. Mohan Kumar mo...@in.ibm.com
 ---
  fsdev/file-op-9p.h         |    1 +
  hw/9pfs/virtio-9p-coth.c   |  105 +--
  hw/9pfs/virtio-9p-coth.h   |   13 +-
  hw/9pfs/virtio-9p-device.c |    7 +++-
  hw/9pfs/virtio-9p.h        |    6 ++-
  5 files changed, 124 insertions(+), 8 deletions(-)

This patch isn't against upstream virtio-9p.  Please post a link to a
repo or more information.


 diff --git a/fsdev/file-op-9p.h b/fsdev/file-op-9p.h
 index 5d088d4..c54f6bf 100644
 --- a/fsdev/file-op-9p.h
 +++ b/fsdev/file-op-9p.h
 @@ -60,6 +60,7 @@ typedef struct FsContext
     SecModel fs_sm;
     uid_t uid;
     int open_flags;
 +    int enable_chroot;

Please use bool.  Using int is like using void*, it throws away information.

     struct xattr_operations **xops;
     /* fs driver specific data */
     void *private;
 diff --git a/hw/9pfs/virtio-9p-coth.c b/hw/9pfs/virtio-9p-coth.c
 index e61b656..aa71a83 100644
 --- a/hw/9pfs/virtio-9p-coth.c
 +++ b/hw/9pfs/virtio-9p-coth.c
 @@ -17,6 +17,8 @@
  #include qemu-thread.h
  #include qemu-coroutine.h
  #include virtio-9p-coth.h
 +#include sys/syscall.h

Do you need this header?  Please include system headers first, then
QEMU headers.

 +#include qemu-error.h

  /* v9fs glib thread pool */
  static V9fsThPool v9fs_pool;
 @@ -56,7 +58,96 @@ static void v9fs_thread_routine(gpointer data, gpointer 
 user_data)
     } while (len == -1  errno == EINTR);
  }

 -int v9fs_init_worker_threads(void)
 +static int v9fs_chroot_function(void *arg)
 +{
 +    GError *err;
 +    V9fsChrootState *csp = arg;
 +    FsContext *fs_ctx = csp-fs_ctx;
 +    V9fsThPool *p = v9fs_pool;
 +    int notifier_fds[2];
 +

Must acquire cs-mutex before calling any QEMU functions here -
otherwise this thread runs concurrently with QEMU threads and could
lead to race conditions.

 +    if (chroot(fs_ctx-fs_root)  0) {
 +        error_report(chroot);
 +        goto error;
 +    }
 +
 +    if (qemu_pipe(notifier_fds) == -1) {
 +        return -1;
 +    }
 +    p-pool = g_thread_pool_new(v9fs_thread_routine, p, 10, TRUE, err);
 +    if (!p-pool) {
 +        error_report(g_thread_pool_new);
 +        goto error;
 +    }
 +
 +    p-completed = g_async_queue_new();
 +    if (!p-completed) {
 +        /*
 +         * We are going to terminate.
 +         * So don't worry about cleanup
 +         */
 +        goto error;
 +    }
 +    p-rfd = notifier_fds[0];
 +    p-wfd = notifier_fds[1];
 +
 +    fcntl(p-rfd, F_SETFL, O_NONBLOCK);
 +    fcntl(p-wfd, F_SETFL, O_NONBLOCK);
 +
 +    qemu_set_fd_handler(p-rfd, v9fs_qemu_process_req_done, NULL, NULL);

These should be in a common function that v9fs_init_worker_threads()
can call instead of copy-paste.

 +
 +    /* Signal parent thread that thread pools are created */
 +    g_cond_signal(csp-cond);

Now cs-mutex can be released and the QEMU thread can continue.

 +    g_mutex_lock(csp-mutex_term);
 +    /* TODO: Wake up cs-terminate during 9p hotplug support */
 +    g_cond_wait(csp-terminate, csp-mutex);
 +    g_mutex_unlock(csp-mutex_term);
 +    g_mutex_free(csp-mutex);
 +    g_cond_free(csp-cond);
 +    g_mutex_free(csp-mutex_term);
 +    g_cond_free(csp-terminate);
 +    qemu_free(csp-stack);

There goes my stack!  It's only safe to free this in the QEMU thread
that signalled, but you need a way to join this thread.

Did you try syscall(SYS_exit, 0) instead?  I think this would make
cleanup much easier and you wouldn't have to juggle around many heap
allocated resources.

 +    qemu_free(csp);
 +    return 0;
 +error:
 +    p-pool = NULL;
 +    g_cond_signal(csp-cond);
 +    return 0;
 +}
 +
 +static int v9fs_clone_chroot_th(FsContext *fs_ctx)
 +{
 +    pid_t pid;
 +    V9fsChrootState *cs;
 +
 +    cs = qemu_malloc(sizeof(*cs));
 +    cs-stack = qemu_malloc(THREAD_STACK);
 +    cs-mutex = g_mutex_new();
 +    cs-cond = g_cond_new();
 +    cs-mutex_term = g_mutex_new();
 +    cs-terminate = g_cond_new();
 +    cs-fs_ctx = fs_ctx;
 +
 +    g_mutex_lock(cs-mutex);
 +    pid = clone(v9fs_chroot_function, cs-stack + THREAD_STACK, CLONE_FILES 
 |
 +                                CLONE_SIGHAND | CLONE_VM | CLONE_THREAD, cs);

I'm a little concerned that thread-local storage is going to be broken
and glib will act weird, but I don't know the implementation details.

 +    if (pid == -1) {
 +        error_report(clone);
 +        g_mutex_unlock(cs-mutex);
 +        g_mutex_free(cs-mutex);
 +        

Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing

2011-06-15 Thread Stefano Stabellini
On Wed, 15 Jun 2011, Alexander Graf wrote:
  commit 973bb091a967fdec37a1bc8fe30d46a483d2903d
  Author: Stefano Stabellini stefano.stabell...@eu.citrix.com
  Date:   Tue May 17 12:10:36 2011 +
  
 xen: fix interrupt routing
  
 - remove i440FX-xen and i440fx_write_config_xen
 we don't need to intercept pci config writes to i440FX anymore;
  
 - introduce PIIX3-xen and piix3_write_config_xen
 we do need to intercept pci config write to the PCI-ISA bridge to update
 the PCI link routing;
  
 - set the number of PIIX3-xen interrupts line to 128;
 
 I still find it unpretty and I'm pretty sure it's completely different from 
 real hardware, but since Xen code is your call and this doesn't affect 
 non-Xen workloads, I won't block it, unless someone else is very much opposed 
 to it.
 
 Please resend as proper patch.
 
 
Thanks, I'll do that.



Re: [Qemu-devel] [PATCH 10/10] linux-user: remove unused variables

2011-06-15 Thread Alexander Graf

On 14.06.2011, at 19:36, Michael S. Tsirkin wrote:

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
 linux-user/flatload.c  |   10 ++
 linux-user/linuxload.c |   25 +
 linux-user/main.c  |6 +++---
 linux-user/signal.c|5 -
 linux-user/syscall.c   |6 --
 5 files changed, 10 insertions(+), 42 deletions(-)
 
 diff --git a/linux-user/flatload.c b/linux-user/flatload.c
 index cd7af7c..2933c5f 100644
 --- a/linux-user/flatload.c
 +++ b/linux-user/flatload.c
 @@ -379,12 +379,14 @@ static int load_flat_file(struct linux_binprm * bprm,
 abi_long result;
 abi_ulong realdatastart = 0;
 abi_ulong text_len, data_len, bss_len, stack_len, flags;
 -abi_ulong memp = 0; /* for finding the brk area */
 abi_ulong extra;
 abi_ulong reloc = 0, rp;
 int i, rev, relocs = 0;
 abi_ulong fpos;
 -abi_ulong start_code, end_code;
 +abi_ulong start_code;
 +#ifdef DEBUG
 +abi_ulong end_code;
 +#endif
 abi_ulong indx_len;
 
 hdr = ((struct flat_hdr *) bprm-buf);/* exec-header */
 @@ -491,7 +493,6 @@ static int load_flat_file(struct linux_binprm * bprm,
 }
 
 reloc = datapos + (ntohl(hdr-reloc_start) - text_len);
 -memp = realdatastart;
 
 } else {
 
 @@ -506,7 +507,6 @@ static int load_flat_file(struct linux_binprm * bprm,
 realdatastart = textpos + ntohl(hdr-data_start);
 datapos = realdatastart + indx_len;
 reloc = (textpos + ntohl(hdr-reloc_start) + indx_len);
 -memp = textpos;
 
 #ifdef CONFIG_BINFMT_ZFLAT
 #error code needs checking
 @@ -552,7 +552,9 @@ static int load_flat_file(struct linux_binprm * bprm,
 
 /* The main program needs a little extra setup in the task structure */
 start_code = textpos + sizeof (struct flat_hdr);
 +#ifdef DEBUG
 end_code = textpos + text_len;
 +#endif
 
 DBG_FLT(%s %s: TEXT=%x-%x DATA=%x-%x BSS=%x-%x\n,
 id ? Lib : Load, bprm-filename,
 diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c
 index ac8c486..62ebc7e 100644
 --- a/linux-user/linuxload.c
 +++ b/linux-user/linuxload.c
 @@ -26,22 +26,6 @@ abi_long memcpy_to_target(abi_ulong dest, const void *src,
 return 0;
 }
 
 -static int in_group_p(gid_t g)
 -{
 -/* return TRUE if we're in the specified group, FALSE otherwise */
 -int  ngroup;
 -int  i;
 -gid_tgrouplist[NGROUPS];
 -
 -ngroup = getgroups(NGROUPS, grouplist);
 -for(i = 0; i  ngroup; i++) {
 - if(grouplist[i] == g) {
 - return 1;
 - }
 -}
 -return 0;
 -}
 -
 static int count(char ** vec)
 {
 int   i;
 @@ -57,7 +41,7 @@ static int prepare_binprm(struct linux_binprm *bprm)
 {
 struct stat   st;
 int mode;
 -int retval, id_change;
 +int retval;
 
 if(fstat(bprm-fd, st)  0) {
   return(-errno);
 @@ -73,14 +57,10 @@ static int prepare_binprm(struct linux_binprm *bprm)
 
 bprm-e_uid = geteuid();
 bprm-e_gid = getegid();
 -id_change = 0;
 
 /* Set-uid? */
 if(mode  S_ISUID) {
   bprm-e_uid = st.st_uid;
 - if(bprm-e_uid != geteuid()) {
 - id_change = 1;
 - }
 }
 
 /* Set-gid? */
 @@ -91,9 +71,6 @@ static int prepare_binprm(struct linux_binprm *bprm)
  */
 if ((mode  (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
   bprm-e_gid = st.st_gid;
 - if (!in_group_p(bprm-e_gid)) {
 - id_change = 1;
 - }
 }
 
 retval = read(bprm-fd, bprm-buf, BPRM_BUF_SIZE);
 diff --git a/linux-user/main.c b/linux-user/main.c
 index 04da0a4..9b995e5 100644
 --- a/linux-user/main.c
 +++ b/linux-user/main.c
 @@ -2053,15 +2053,15 @@ void cpu_loop(CPUMIPSState *env)
 } else {
 int nb_args;
 abi_ulong sp_reg;
 -abi_ulong arg5 = 0, arg6 = 0, arg7 = 0, arg8 = 0;
 +abi_ulong arg5 = 0, arg6 = 0;
 
 nb_args = mips_syscall_args[syscall_num];
 sp_reg = env-active_tc.gpr[29];
 switch (nb_args) {
 /* these arguments are taken from the stack */
 /* FIXME - what to do if get_user() fails? */
 -case 8: get_user_ual(arg8, sp_reg + 28);
 -case 7: get_user_ual(arg7, sp_reg + 24);
 +case 8: /* get_user_ual(arg8, sp_reg + 28); */
 +case 7: /* get_user_ual(arg7, sp_reg + 24); */

I'd prefer to see these and the respective variable definitions #if 0'd with a 
comment, stating that they're currently unused.

 case 6: get_user_ual(arg6, sp_reg + 20);
 case 5: get_user_ual(arg5, sp_reg + 16);
 default:
 diff --git a/linux-user/signal.c b/linux-user/signal.c
 index 11b25be..685ae61 100644
 --- a/linux-user/signal.c
 +++ b/linux-user/signal.c
 @@ -2080,7 +2080,6 @@ long do_sigreturn(CPUState *env)
 uint32_t up_psr, pc, npc;
 

Re: [Qemu-devel] [PATCH 06/10] kvm: remove unused variables

2011-06-15 Thread Kevin Wolf
Am 14.06.2011 19:36, schrieb Michael S. Tsirkin:
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/virtio-pci.h   |8 +---
  target-i386/kvm.c |3 +--
  2 files changed, 6 insertions(+), 5 deletions(-)
 
 diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
 index a4b5fd3..b518917 100644
 --- a/hw/virtio-pci.h
 +++ b/hw/virtio-pci.h
 @@ -37,7 +37,9 @@ typedef struct {
  bool ioeventfd_started;
  } VirtIOPCIProxy;
  
 -extern void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
 -uint16_t vendor, uint16_t device,
 -uint16_t class_code, uint8_t pif);
 +void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev);
 +
 +/* Virtio ABI version, if we increment this, we break the guest driver. */
 +#define VIRTIO_PCI_ABI_VERSION  0
 +
  #endif

Is this hunk there intentionally?

Kevin


 diff --git a/target-i386/kvm.c b/target-i386/kvm.c
 index faedc6c..58a70bc 100644
 --- a/target-i386/kvm.c
 +++ b/target-i386/kvm.c
 @@ -970,7 +970,7 @@ static int kvm_get_xsave(CPUState *env)
  #ifdef KVM_CAP_XSAVE
  struct kvm_xsave* xsave;
  int ret, i;
 -uint16_t cwd, swd, twd, fop;
 +uint16_t cwd, swd, twd;
  
  if (!kvm_has_xsave()) {
  return kvm_get_fpu(env);
 @@ -986,7 +986,6 @@ static int kvm_get_xsave(CPUState *env)
  cwd = (uint16_t)xsave-region[0];
  swd = (uint16_t)(xsave-region[0]  16);
  twd = (uint16_t)xsave-region[1];
 -fop = (uint16_t)(xsave-region[1]  16);
  env-fpstt = (swd  11)  7;
  env-fpus = swd;
  env-fpuc = cwd;




Re: [Qemu-devel] [PATCH 03/10] usb-ehci: remove unused variables

2011-06-15 Thread Gerd Hoffmann

On 06/14/11 19:35, Michael S. Tsirkin wrote:

Signed-off-by: Michael S. Tsirkinm...@redhat.com


usb patch queue has a simliar fix already.
I'd suggest to drop this to reduce merge conflicts.

cheers,
  Gerd




Re: [Qemu-devel] [PATCH 06/10] kvm: remove unused variables

2011-06-15 Thread Michael S. Tsirkin
On Wed, Jun 15, 2011 at 09:38:39AM +0200, Kevin Wolf wrote:
 Am 14.06.2011 19:36, schrieb Michael S. Tsirkin:
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
   hw/virtio-pci.h   |8 +---
   target-i386/kvm.c |3 +--
   2 files changed, 6 insertions(+), 5 deletions(-)
  
  diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
  index a4b5fd3..b518917 100644
  --- a/hw/virtio-pci.h
  +++ b/hw/virtio-pci.h
  @@ -37,7 +37,9 @@ typedef struct {
   bool ioeventfd_started;
   } VirtIOPCIProxy;
   
  -extern void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
  -uint16_t vendor, uint16_t device,
  -uint16_t class_code, uint8_t pif);
  +void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev);
  +
  +/* Virtio ABI version, if we increment this, we break the guest driver. */
  +#define VIRTIO_PCI_ABI_VERSION  0
  +
   #endif
 
 Is this hunk there intentionally?
 
 Kevin

Sorry, this belongs in another patch.
Thanks for pointing this out.
Otherwise ack?

 
  diff --git a/target-i386/kvm.c b/target-i386/kvm.c
  index faedc6c..58a70bc 100644
  --- a/target-i386/kvm.c
  +++ b/target-i386/kvm.c
  @@ -970,7 +970,7 @@ static int kvm_get_xsave(CPUState *env)
   #ifdef KVM_CAP_XSAVE
   struct kvm_xsave* xsave;
   int ret, i;
  -uint16_t cwd, swd, twd, fop;
  +uint16_t cwd, swd, twd;
   
   if (!kvm_has_xsave()) {
   return kvm_get_fpu(env);
  @@ -986,7 +986,6 @@ static int kvm_get_xsave(CPUState *env)
   cwd = (uint16_t)xsave-region[0];
   swd = (uint16_t)(xsave-region[0]  16);
   twd = (uint16_t)xsave-region[1];
  -fop = (uint16_t)(xsave-region[1]  16);
   env-fpstt = (swd  11)  7;
   env-fpus = swd;
   env-fpuc = cwd;



Re: [Qemu-devel] [PATCH 06/10] kvm: remove unused variables

2011-06-15 Thread Chris Krumme

On 06/14/2011 12:36 PM, Michael S. Tsirkin wrote:

Signed-off-by: Michael S. Tsirkinm...@redhat.com
---
  hw/virtio-pci.h   |8 +---
  target-i386/kvm.c |3 +--
  2 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/hw/virtio-pci.h b/hw/virtio-pci.h
index a4b5fd3..b518917 100644
--- a/hw/virtio-pci.h
+++ b/hw/virtio-pci.h
@@ -37,7 +37,9 @@ typedef struct {
  bool ioeventfd_started;
  } VirtIOPCIProxy;

-extern void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev,
-uint16_t vendor, uint16_t device,
-uint16_t class_code, uint8_t pif);
+void virtio_init_pci(VirtIOPCIProxy *proxy, VirtIODevice *vdev);


Hello MST,

The above change shows why we should be including a header, rather than 
having private prototypes (w/bitrot).



+
+/* Virtio ABI version, if we increment this, we break the guest driver. */
+#define VIRTIO_PCI_ABI_VERSION  0
+


This change seems unrelated.

Thanks

Chris


  #endif
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index faedc6c..58a70bc 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -970,7 +970,7 @@ static int kvm_get_xsave(CPUState *env)
  #ifdef KVM_CAP_XSAVE
  struct kvm_xsave* xsave;
  int ret, i;
-uint16_t cwd, swd, twd, fop;
+uint16_t cwd, swd, twd;

  if (!kvm_has_xsave()) {
  return kvm_get_fpu(env);
@@ -986,7 +986,6 @@ static int kvm_get_xsave(CPUState *env)
  cwd = (uint16_t)xsave-region[0];
  swd = (uint16_t)(xsave-region[0]  16);
  twd = (uint16_t)xsave-region[1];
-fop = (uint16_t)(xsave-region[1]  16);
  env-fpstt = (swd  11)  7;
  env-fpus = swd;
  env-fpuc = cwd;





Re: [Qemu-devel] [PATCH 10/10] linux-user: remove unused variables

2011-06-15 Thread Michael S. Tsirkin
On Wed, Jun 15, 2011 at 10:35:19AM +0200, Alexander Graf wrote:
 
 On 14.06.2011, at 19:36, Michael S. Tsirkin wrote:
 
  Signed-off-by: Michael S. Tsirkin m...@redhat.com
  ---
  linux-user/flatload.c  |   10 ++
  linux-user/linuxload.c |   25 +
  linux-user/main.c  |6 +++---
  linux-user/signal.c|5 -
  linux-user/syscall.c   |6 --
  5 files changed, 10 insertions(+), 42 deletions(-)
  
  diff --git a/linux-user/flatload.c b/linux-user/flatload.c
  index cd7af7c..2933c5f 100644
  --- a/linux-user/flatload.c
  +++ b/linux-user/flatload.c
  @@ -379,12 +379,14 @@ static int load_flat_file(struct linux_binprm * bprm,
  abi_long result;
  abi_ulong realdatastart = 0;
  abi_ulong text_len, data_len, bss_len, stack_len, flags;
  -abi_ulong memp = 0; /* for finding the brk area */
  abi_ulong extra;
  abi_ulong reloc = 0, rp;
  int i, rev, relocs = 0;
  abi_ulong fpos;
  -abi_ulong start_code, end_code;
  +abi_ulong start_code;
  +#ifdef DEBUG
  +abi_ulong end_code;
  +#endif
  abi_ulong indx_len;
  
  hdr = ((struct flat_hdr *) bprm-buf);  /* exec-header */
  @@ -491,7 +493,6 @@ static int load_flat_file(struct linux_binprm * bprm,
  }
  
  reloc = datapos + (ntohl(hdr-reloc_start) - text_len);
  -memp = realdatastart;
  
  } else {
  
  @@ -506,7 +507,6 @@ static int load_flat_file(struct linux_binprm * bprm,
  realdatastart = textpos + ntohl(hdr-data_start);
  datapos = realdatastart + indx_len;
  reloc = (textpos + ntohl(hdr-reloc_start) + indx_len);
  -memp = textpos;
  
  #ifdef CONFIG_BINFMT_ZFLAT
  #error code needs checking
  @@ -552,7 +552,9 @@ static int load_flat_file(struct linux_binprm * bprm,
  
  /* The main program needs a little extra setup in the task structure */
  start_code = textpos + sizeof (struct flat_hdr);
  +#ifdef DEBUG
  end_code = textpos + text_len;
  +#endif
  
  DBG_FLT(%s %s: TEXT=%x-%x DATA=%x-%x BSS=%x-%x\n,
  id ? Lib : Load, bprm-filename,
  diff --git a/linux-user/linuxload.c b/linux-user/linuxload.c
  index ac8c486..62ebc7e 100644
  --- a/linux-user/linuxload.c
  +++ b/linux-user/linuxload.c
  @@ -26,22 +26,6 @@ abi_long memcpy_to_target(abi_ulong dest, const void 
  *src,
  return 0;
  }
  
  -static int in_group_p(gid_t g)
  -{
  -/* return TRUE if we're in the specified group, FALSE otherwise */
  -intngroup;
  -inti;
  -gid_t  grouplist[NGROUPS];
  -
  -ngroup = getgroups(NGROUPS, grouplist);
  -for(i = 0; i  ngroup; i++) {
  -   if(grouplist[i] == g) {
  -   return 1;
  -   }
  -}
  -return 0;
  -}
  -
  static int count(char ** vec)
  {
  int i;
  @@ -57,7 +41,7 @@ static int prepare_binprm(struct linux_binprm *bprm)
  {
  struct stat st;
  int mode;
  -int retval, id_change;
  +int retval;
  
  if(fstat(bprm-fd, st)  0) {
  return(-errno);
  @@ -73,14 +57,10 @@ static int prepare_binprm(struct linux_binprm *bprm)
  
  bprm-e_uid = geteuid();
  bprm-e_gid = getegid();
  -id_change = 0;
  
  /* Set-uid? */
  if(mode  S_ISUID) {
  bprm-e_uid = st.st_uid;
  -   if(bprm-e_uid != geteuid()) {
  -   id_change = 1;
  -   }
  }
  
  /* Set-gid? */
  @@ -91,9 +71,6 @@ static int prepare_binprm(struct linux_binprm *bprm)
   */
  if ((mode  (S_ISGID | S_IXGRP)) == (S_ISGID | S_IXGRP)) {
  bprm-e_gid = st.st_gid;
  -   if (!in_group_p(bprm-e_gid)) {
  -   id_change = 1;
  -   }
  }
  
  retval = read(bprm-fd, bprm-buf, BPRM_BUF_SIZE);
  diff --git a/linux-user/main.c b/linux-user/main.c
  index 04da0a4..9b995e5 100644
  --- a/linux-user/main.c
  +++ b/linux-user/main.c
  @@ -2053,15 +2053,15 @@ void cpu_loop(CPUMIPSState *env)
  } else {
  int nb_args;
  abi_ulong sp_reg;
  -abi_ulong arg5 = 0, arg6 = 0, arg7 = 0, arg8 = 0;
  +abi_ulong arg5 = 0, arg6 = 0;
  
  nb_args = mips_syscall_args[syscall_num];
  sp_reg = env-active_tc.gpr[29];
  switch (nb_args) {
  /* these arguments are taken from the stack */
  /* FIXME - what to do if get_user() fails? */
  -case 8: get_user_ual(arg8, sp_reg + 28);
  -case 7: get_user_ual(arg7, sp_reg + 24);
  +case 8: /* get_user_ual(arg8, sp_reg + 28); */
  +case 7: /* get_user_ual(arg7, sp_reg + 24); */
 
 I'd prefer to see these and the respective variable definitions #if 0'd with 
 a comment, stating that they're currently unused.
 
  case 6: get_user_ual(arg6, sp_reg + 20);
  case 5: get_user_ual(arg5, sp_reg + 16);
  default:
  diff --git a/linux-user/signal.c b/linux-user/signal.c
  index 

Re: [Qemu-devel] [PATCH 04/10] lsi53c895a: remove unused variables

2011-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2011 at 08:35:44PM +0300, Michael S. Tsirkin wrote:
 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  hw/lsi53c895a.c |2 --
  1 files changed, 0 insertions(+), 2 deletions(-)

This one is already in the trivial-patches tree.

Stefan



Re: [Qemu-devel] [patch 6/7] QEMU live block copy (update)

2011-06-15 Thread Marcelo Tosatti
On Tue, Jun 07, 2011 at 12:15:55PM +0200, Jiri Denemark wrote:
 On Mon, Jun 06, 2011 at 14:03:44 -0300, Marcelo Tosatti wrote:
 ...
  +  return:[
  +{device:ide1-hd0,
  +status:active,
  +info:{
  +   percentage:23,
  +}
  +},
 
 What about using the same form of progress reporting as used by query-migrate?
 That is, instead of
 
 info:{
 percentage:23
 }
 
 the following would be reported:
 
 disk:{
 transferred:123,
 remaining:123,
 total:246
 }
 
 One can trivially compute percentage from that but it's impossible to get this
 kind of data when only percentage is reported. And total can even change in
 time if needed (just like it changes during migration).

Done.

  +{device:ide1-hd1,
  + status:failed
  +}
 
 Is there any way we can get the exact error which made it fail, such as EPERM
 or ENOSPC?

The errors that can generate failed here are internal to QEMU, so i
don't think exporting them to management application makes sense. This
are errors such as EIO from file system.

Error processing should be done in the block_copy command, that is where
checks of the destination image are performed.




Re: [Qemu-devel] [PATCH 10/10] linux-user: remove unused variables

2011-06-15 Thread Richard Henderson
On 06/15/2011 01:35 AM, Alexander Graf wrote:
 -abi_ulong arg5 = 0, arg6 = 0, arg7 = 0, arg8 = 0;
 +abi_ulong arg5 = 0, arg6 = 0;

 nb_args = mips_syscall_args[syscall_num];
 sp_reg = env-active_tc.gpr[29];
 switch (nb_args) {
 /* these arguments are taken from the stack */
 /* FIXME - what to do if get_user() fails? */
 -case 8: get_user_ual(arg8, sp_reg + 28);
 -case 7: get_user_ual(arg7, sp_reg + 24);
 +case 8: /* get_user_ual(arg8, sp_reg + 28); */
 +case 7: /* get_user_ual(arg7, sp_reg + 24); */
 
 I'd prefer to see these and the respective variable definitions #if
 0'd with a comment, stating that they're currently unused.

I'd prefer not to see if 0 code.  Better, I think, to mark the
variables as __attribute__((unused)) with that same comment.

 @@ -7058,18 +7056,14 @@ abi_long do_syscall(void *cpu_env, int num, abi_long 
 arg1,
 case TARGET_NR_osf_sigprocmask:
 {
 abi_ulong mask;
 -int how = arg1;
 sigset_t set, oldset;

 switch(arg1) {
 case TARGET_SIG_BLOCK:
 -how = SIG_BLOCK;
 break;
 case TARGET_SIG_UNBLOCK:
 -how = SIG_UNBLOCK;
 break;
 case TARGET_SIG_SETMASK:
 -how = SIG_SETMASK;
 
 why go through the effort of setting how and then not using it? I'm
 pretty sure this is a bug as well. A few lines down is the following
 code:
 
sigprocmask(arg1, set, oldset);
 
 which in TARGET_NR_sigprocmask would be:
 
   ret = get_errno(sigprocmask(how, set, oldset));
 
 So we end up sending guest masks to the host. Richard, this is Alpha
 specific code. Mind to double-check?

I remember fixing this before.  Perhaps it was in a patch tree that 
never got pulled...


r~



Re: [Qemu-devel] [PATCH] xen: avoid tracking the region 0xa0000 - 0xbffff

2011-06-15 Thread Stefano Stabellini
On Wed, 15 Jun 2011, Alexander Graf wrote:
 Please add a comment here, explaining that Xen can only handle a single dirty 
 log region for now, and that we want the linear framebuffer to be that 
 region. Also, please resend with proper patch headers and I'll pull it into 
 the xen-next tree.
 

OK, I'll do that.



[Qemu-devel] [PATCH] xen: only track the linear framebuffer

2011-06-15 Thread Stefano Stabellini
Xen can only do dirty bit tracking for one memory region, so we should
explicitly avoid trying to track anything but the vga vram region.

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

diff --git a/xen-all.c b/xen-all.c
index 9a5c3ec..fa1d2e1 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -214,6 +214,7 @@ static int xen_add_to_physmap(XenIOState *state,
 unsigned long i = 0;
 int rc = 0;
 XenPhysmap *physmap = NULL;
+RAMBlock *block;
 
 if (get_physmapping(state, start_addr, size)) {
 return 0;
@@ -221,7 +222,19 @@ static int xen_add_to_physmap(XenIOState *state,
 if (size = 0) {
 return -1;
 }
+/* Xen can only handle a single dirty log region for now and we want
+ * the linear framebuffer to be that region.
+ * Avoid tracking any regions that is not videoram and avoid tracking
+ * the legacy vga region. */
+QLIST_FOREACH(block, ram_list.blocks, next) {
+if (!strcmp(block-idstr, vga.vram)  block-offset == phys_offset
+ start_addr  0xb) {
+goto go_physmap;
+}
+}
+return -1;
 
+go_physmap:
 DPRINTF(mapping vram to %llx - %llx, from %llx\n, start_addr, start_addr 
+ size, phys_offset);
 for (i = 0; i  size  TARGET_PAGE_BITS; i++) {
 unsigned long idx = (phys_offset  TARGET_PAGE_BITS) + i;



[Qemu-devel] [PATCH 1/2] lsi: Fix unused-but-set-variable warning

2011-06-15 Thread Stefan Hajnoczi
From: Christophe Fergeau cferg...@redhat.com

This warning is new in gcc 4.6.

Signed-off-by: Christophe Fergeau cferg...@redhat.com
Acked-by: Paolo Bonzini pbonz...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/lsi53c895a.c |2 --
 1 files changed, 0 insertions(+), 2 deletions(-)

diff --git a/hw/lsi53c895a.c b/hw/lsi53c895a.c
index 83084b6..90c6cbc 100644
--- a/hw/lsi53c895a.c
+++ b/hw/lsi53c895a.c
@@ -889,7 +889,6 @@ static void lsi_do_msgout(LSIState *s)
 uint8_t msg;
 int len;
 uint32_t current_tag;
-SCSIDevice *current_dev;
 lsi_request *current_req, *p, *p_next;
 int id;
 
@@ -901,7 +900,6 @@ static void lsi_do_msgout(LSIState *s)
 current_req = lsi_find_by_tag(s, current_tag);
 }
 id = (current_tag  8)  0xf;
-current_dev = s-bus.devs[id];
 
 DPRINTF(MSG out len=%d\n, s-dbc);
 while (s-dbc) {
-- 
1.7.5.3




Re: [Qemu-devel] [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability

2011-06-15 Thread Stefan Hajnoczi
On Tue, Jun 14, 2011 at 9:12 AM, M. Mohan Kumar mo...@in.ibm.com wrote:
 [RFC PATCH] virtio-9p: Use clone approach to fix TOCTOU vulnerability

 In passthrough security model, following a symbolic link in the server
 side could result in TOCTTOU vulnerability.

 Use clone system call to create a thread which runs in chrooted
 environment. All passthrough model file operations are done from this
 thread to avoid TOCTTOU vulnerability.

How will chroot(2) work when QEMU runs as non-root (i.e. secure
production environments)?

Stefan



[Qemu-devel] [RFC] qxl: set mm_time in vga update

2011-06-15 Thread Alon Levy
This fixes a problem where on windows 7 startup phase, before the qxl driver
is loaded, the drawables are sufficiently large and video like to trigger a
stream, but the lack of a filled mm time field triggers a warning in spice-gtk.
---
 ui/spice-display.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/ui/spice-display.c b/ui/spice-display.c
index 15f0704..f73 100644
--- a/ui/spice-display.c
+++ b/ui/spice-display.c
@@ -70,6 +70,7 @@ static SimpleSpiceUpdate 
*qemu_spice_create_update(SimpleSpiceDisplay *ssd)
 QXLCommand *cmd;
 uint8_t *src, *dst;
 int by, bw, bh;
+struct timespec time_space;
 
 if (qemu_spice_rect_is_empty(ssd-dirty)) {
 return NULL;
@@ -96,6 +97,10 @@ static SimpleSpiceUpdate 
*qemu_spice_create_update(SimpleSpiceDisplay *ssd)
 drawable-surfaces_dest[0] = -1;
 drawable-surfaces_dest[1] = -1;
 drawable-surfaces_dest[2] = -1;
+clock_gettime(CLOCK_MONOTONIC, time_space);
+/* time in milliseconds from epoch. */
+drawable-mm_time = time_space.tv_sec * 1000
+  + time_space.tv_nsec / 1000 / 1000;
 
 drawable-u.copy.rop_descriptor  = SPICE_ROPD_OP_PUT;
 drawable-u.copy.src_bitmap  = (intptr_t)image;
-- 
1.7.5.2




Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing

2011-06-15 Thread Avi Kivity

On 06/15/2011 11:24 AM, Alexander Graf wrote:

I'm actually not quite sure what exactly he's describing here. But if it's 
bypassing the bus logic, it's not a normal PCI device :). Sure, there are 
special case devices that also expose a PCI interface. But real PCI cards that 
you plug in onto the PCI bus can't bypass the interrupt logic of the bus, as 
the only interrupt wires they have go to the bus. And since the PCI adapters we 
use in PC machines in Qemu are all non-special, guests can possibly choke on 
this.



There actually is a special device in qemu - acpi power management is 
configured as a PCI device, but its interrupt is hard-wired to gsi 9 and 
is edge-triggered (so it can't share the irq line).


I other devices that are special in this regard to also be part of the 
chipset, not devices you can plug into arbitrary slots.


--
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH 2/2] Fix typo in cpus.c

2011-06-15 Thread Stefan Hajnoczi
From: Alexandre Raymond cerb...@gmail.com

filed - failed

Signed-off-by: Alexandre Raymond cerb...@gmail.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 cpus.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/cpus.c b/cpus.c
index 1fc34b7..4ab76f0 100644
--- a/cpus.c
+++ b/cpus.c
@@ -297,7 +297,7 @@ static void qemu_event_increment(void)
 
 /* EAGAIN is fine, a read must be pending.  */
 if (ret  0  errno != EAGAIN) {
-fprintf(stderr, qemu_event_increment: write() filed: %s\n,
+fprintf(stderr, qemu_event_increment: write() failed: %s\n,
 strerror(errno));
 exit (1);
 }
-- 
1.7.5.3




[Qemu-devel] [patch 0/4] live block copy (v5)

2011-06-15 Thread Marcelo Tosatti
v5:
- fix several leaks in block-copy and blkmirror
- change progress status reporting
- support migration
- fix buffer overwrite bug in blkmirror

v4:
- add block-copy documentation
- block-copy style fixes
- fix blkmirror_cancel method
- blkmirror style fixes
- add escaping to blkmirror pathnames
- add blkmirror documentation

v3:
- replace commit file with mirrored writes
- address comments from round 2

v2:
- use reference counting to be safe against device hotplug / bdrv_truncate
- add comment about usage of timer






[Qemu-devel] [PULL 0/2] Trivial patches for June 9 to June 15 2011

2011-06-15 Thread Stefan Hajnoczi
The following changes since commit 71f34ad05359d7fa97996562d904979281ddc7f5:

  Merge remote-tracking branch 'alon/pull-libcacard-1' into staging (2011-06-15 
09:03:49 -0500)

are available in the git repository at:

  ssh://repo.or.cz/srv/git/qemu/stefanha.git trivial-patches

Alexandre Raymond (1):
  Fix typo in cpus.c

Christophe Fergeau (1):
  lsi: Fix unused-but-set-variable warning

 cpus.c  |2 +-
 hw/lsi53c895a.c |2 --
 2 files changed, 1 insertions(+), 3 deletions(-)

-- 
1.7.5.3




[Qemu-devel] [PATCH v2] xen: fix interrupt routing

2011-06-15 Thread Stefano Stabellini
Compared to the last version I only added a comment to the code.

- remove i440FX-xen and i440fx_write_config_xen
we don't need to intercept pci config writes to i440FX anymore;

- introduce PIIX3-xen and piix3_write_config_xen
we do need to intercept pci config write to the PCI-ISA bridge to update
the PCI link routing;

- set the number of PIIX3-xen interrupts line to 128;

Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com

diff --git a/hw/pc.h b/hw/pc.h
index 0dcbee7..6d5730b 100644
--- a/hw/pc.h
+++ b/hw/pc.h
@@ -176,7 +176,6 @@ struct PCII440FXState;
 typedef struct PCII440FXState PCII440FXState;
 
 PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int *piix_devfn, qemu_irq 
*pic, ram_addr_t ram_size);
-PCIBus *i440fx_xen_init(PCII440FXState **pi440fx_state, int *piix3_devfn, 
qemu_irq *pic, ram_addr_t ram_size);
 void i440fx_init_memory_mappings(PCII440FXState *d);
 
 /* piix4.c */
diff --git a/hw/pc_piix.c b/hw/pc_piix.c
index 9a22a8a..ba198de 100644
--- a/hw/pc_piix.c
+++ b/hw/pc_piix.c
@@ -124,11 +124,7 @@ static void pc_init1(ram_addr_t ram_size,
 isa_irq = qemu_allocate_irqs(isa_irq_handler, isa_irq_state, 24);
 
 if (pci_enabled) {
-if (!xen_enabled()) {
-pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_irq, 
ram_size);
-} else {
-pci_bus = i440fx_xen_init(i440fx_state, piix3_devfn, isa_irq, 
ram_size);
-}
+pci_bus = i440fx_init(i440fx_state, piix3_devfn, isa_irq, ram_size);
 } else {
 pci_bus = NULL;
 i440fx_state = NULL;
diff --git a/hw/piix_pci.c b/hw/piix_pci.c
index 7f1c4cc..1e29ec4 100644
--- a/hw/piix_pci.c
+++ b/hw/piix_pci.c
@@ -40,6 +40,7 @@ typedef PCIHostState I440FXState;
 
 #define PIIX_NUM_PIC_IRQS   16  /* i8259 * 2 */
 #define PIIX_NUM_PIRQS  4ULL/* PIRQ[A-D] */
+#define XEN_PIIX_NUM_PIRQS  128ULL
 #define PIIX_PIRQC  0x60
 
 typedef struct PIIX3State {
@@ -78,6 +79,8 @@ struct PCII440FXState {
 #define I440FX_SMRAM0x72
 
 static void piix3_set_irq(void *opaque, int pirq, int level);
+static void piix3_write_config_xen(PCIDevice *dev,
+   uint32_t address, uint32_t val, int len);
 
 /* return the global irq number corresponding to a given device irq
pin. We could also use the bus number to have a more precise
@@ -173,13 +176,6 @@ static void i440fx_write_config(PCIDevice *dev,
 }
 }
 
-static void i440fx_write_config_xen(PCIDevice *dev,
-uint32_t address, uint32_t val, int len)
-{
-xen_piix_pci_write_config_client(address, val, len);
-i440fx_write_config(dev, address, val, len);
-}
-
 static int i440fx_load_old(QEMUFile* f, void *opaque, int version_id)
 {
 PCII440FXState *d = opaque;
@@ -267,8 +263,21 @@ static PCIBus *i440fx_common_init(const char *device_name,
 d = pci_create_simple(b, 0, device_name);
 *pi440fx_state = DO_UPCAST(PCII440FXState, dev, d);
 
-piix3 = DO_UPCAST(PIIX3State, dev,
-  pci_create_simple_multifunction(b, -1, true, PIIX3));
+/* Xen supports additional interrupt routes from the PCI devices to
+ * the IOAPIC: the four pins of each PCI device on the bus are also
+ * connected to the IOAPIC directly.
+ * These additional routes can be discovered through ACPI. */
+if (xen_enabled()) {
+piix3 = DO_UPCAST(PIIX3State, dev,
+pci_create_simple_multifunction(b, -1, true, PIIX3-xen));
+pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
+piix3, XEN_PIIX_NUM_PIRQS);
+} else {
+piix3 = DO_UPCAST(PIIX3State, dev,
+pci_create_simple_multifunction(b, -1, true, PIIX3));
+pci_bus_irqs(b, piix3_set_irq, pci_slot_get_pirq, piix3,
+PIIX_NUM_PIRQS);
+}
 piix3-pic = pic;
 
 (*pi440fx_state)-piix3 = piix3;
@@ -289,21 +298,6 @@ PCIBus *i440fx_init(PCII440FXState **pi440fx_state, int 
*piix3_devfn,
 PCIBus *b;
 
 b = i440fx_common_init(i440FX, pi440fx_state, piix3_devfn, pic, 
ram_size);
-pci_bus_irqs(b, piix3_set_irq, pci_slot_get_pirq, (*pi440fx_state)-piix3,
- PIIX_NUM_PIRQS);
-
-return b;
-}
-
-PCIBus *i440fx_xen_init(PCII440FXState **pi440fx_state, int *piix3_devfn,
-qemu_irq *pic, ram_addr_t ram_size)
-{
-PCIBus *b;
-
-b = i440fx_common_init(i440FX-xen, pi440fx_state, piix3_devfn, pic, 
ram_size);
-pci_bus_irqs(b, xen_piix3_set_irq, xen_pci_slot_get_pirq,
- (*pi440fx_state)-piix3, PIIX_NUM_PIRQS);
-
 return b;
 }
 
@@ -365,6 +359,13 @@ static void piix3_write_config(PCIDevice *dev,
 }
 }
 
+static void piix3_write_config_xen(PCIDevice *dev,
+   uint32_t address, uint32_t val, int len)
+{
+xen_piix_pci_write_config_client(address, val, len);
+piix3_write_config(dev, address, val, len);
+}
+
 static void piix3_reset(void *opaque)
 {
 PIIX3State *d = 

Re: [Qemu-devel] [RFC] Specifying storage devices for live migration

2011-06-15 Thread Marcelo Tosatti
On Wed, Jun 15, 2011 at 02:22:22PM -0300, Marcelo Tosatti wrote:
 On Wed, Jun 15, 2011 at 02:06:21PM +0300, Avishay Traeger wrote:
  
  Hello all,
  Currently there is no way to choose storage devices to be migrated.  There
  was some discussion about making the monitor API a bit more flexible, but
  the code was never written:
  http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html
  
  So today users can choose only:
  1) No storage migrated
  2) All disks migrated with fully copy
  3) All disks migrated with incremental
  
  I'd like to write the code to allow users to choose which disks to migrate,
  and how.  What API would you recommend?  What would break the least amount
  of utilities?
 
 I'm unaware of any application that makes use of the -b flag, so if
 thats true breaking the changing the current syntax is fine. The
 one Liran suggests looks alright.

We also need the same feature, BTW.




Re: [Qemu-devel] [Xen-devel] Re: [PATCH] xen: fix interrupt routing

2011-06-15 Thread Alexander Graf

Am 15.06.2011 um 18:34 schrieb Avi Kivity a...@redhat.com:

 On 06/15/2011 11:24 AM, Alexander Graf wrote:
 I'm actually not quite sure what exactly he's describing here. But if it's 
 bypassing the bus logic, it's not a normal PCI device :). Sure, there are 
 special case devices that also expose a PCI interface. But real PCI cards 
 that you plug in onto the PCI bus can't bypass the interrupt logic of the 
 bus, as the only interrupt wires they have go to the bus. And since the PCI 
 adapters we use in PC machines in Qemu are all non-special, guests can 
 possibly choke on this.
 
 
 There actually is a special device in qemu - acpi power management is 
 configured as a PCI device, but its interrupt is hard-wired to gsi 9 and is 
 edge-triggered (so it can't share the irq line).
 
 I other devices that are special in this regard to also be part of the 
 chipset, not devices you can plug into arbitrary slots.

Sure, platform devices can do that. Real PCI cards can not. Have you ever seen 
an e1000 with direct mapped interrupt lines? :)

I admit though that we also emulate platform devices that happen to expose 
themselves on the PCI bus. It's not common though and I wouldn't expect every 
OS/driver to be happy about it.


Alex

 



Re: [Qemu-devel] [PATCH] Command line support for altering the log file location

2011-06-15 Thread Blue Swirl
Thanks, applied.

On Wed, Jun 8, 2011 at 5:32 AM, Matthew Fernandez
matthew.fernan...@gmail.com wrote:
 Add command line support for logging to a location other than /tmp/qemu.log.

 With logging enabled (command line option -d), the log is written to
 the hard-coded path /tmp/qemu.log. This patch adds support for writing
 the log to a different location by passing the -D option.

 Signed-off-by: Matthew Fernandez matthew.fernan...@gmail.com
 

 Thanks Kevin and Blue Swirl for the feedback. Amended patch is below.
 On 7 June 2011 18:49, Kevin Wolf kw...@redhat.com wrote:
 Am 05.06.2011 02:46, schrieb Matthew Fernandez:
 So, to clarify, all text above the '' is included as the commit
 message and the sign off line should be in this section. All text
 below the '' before the first diff is treated as comments that are
 not to be committed. Is this correct? If so I'll send through an
 ammended patch with these changes.

 Yes, this is correct.

 Kevin


 On 4 June 2011 20:18, Blue Swirl blauwir...@gmail.com wrote:
 On Tue, May 31, 2011 at 9:20 AM, Matthew Fernandez
 matthew.fernan...@gmail.com wrote:
 Hi,

 The included patch adds command line support for logging to a location
 other than /tmp/qemu.log. The diff is relative to commit
 2eb9f241824d000fcd90bd7f4b49e40b88e62975. Please let me know if
 anything needs to be cleaned up or changed. Anthony, I'm not sure who
 should be responsible for reviewing/accepting this, but I've CCed you
 as it touches vl.c.

 The patch looks OK, however the above text (which git-am would use as
 the commit message) needs to be changed. How about something like
 this:
 Add command line support for logging to a location other than 
 /tmp/qemu.log.

 If you want to add comments which should not go to the commit message,
 please put them below the line with ''.

 Thanks,
 Matthew

 
 Signed-off-by: Matthew Fernandez matthew.fernan...@gmail.com

 Also this needs to be above the  '' line.



 diff --git a/bsd-user/main.c b/bsd-user/main.c
 index 0c3fca1..0af8a7e 100644
 --- a/bsd-user/main.c
 +++ b/bsd-user/main.c
 @@ -690,7 +690,8 @@ static void usage(void)
            -bsd type         select emulated BSD type
 FreeBSD/NetBSD/OpenBSD (default)\n
            \n
            Debug options:\n
 -           -d options   activate log (logfile=%s)\n
 +           -d options   activate log (default logfile=%s)\n
 +           -D logfile   override default logfile location\n
            -p pagesize  set the host page size to 'pagesize'\n
            -singlestep  always run in singlestep mode\n
            -strace      log system calls\n
 @@ -731,6 +732,8 @@ int main(int argc, char **argv)
  {
     const char *filename;
     const char *cpu_model;
 +    const char *log_file = DEBUG_LOGFILE;
 +    const char *log_mask = NULL;
     struct target_pt_regs regs1, *regs = regs1;
     struct image_info info1, *info = info1;
     TaskState ts1, *ts = ts1;
 @@ -745,9 +748,6 @@ int main(int argc, char **argv)
     if (argc = 1)
         usage();

 -    /* init debug */
 -    cpu_set_log_filename(DEBUG_LOGFILE);
 -
     if ((envlist = envlist_create()) == NULL) {
         (void) fprintf(stderr, Unable to allocate envlist\n);
         exit(1);
 @@ -775,22 +775,15 @@ int main(int argc, char **argv)
         if (!strcmp(r, -)) {
             break;
         } else if (!strcmp(r, d)) {
 -            int mask;
 -            const CPULogItem *item;
 -
 -            if (optind = argc)
 +            if (optind = argc) {
                 break;
 -
 -            r = argv[optind++];
 -            mask = cpu_str_to_log_mask(r);
 -            if (!mask) {
 -                printf(Log items (comma separated):\n);
 -                for(item = cpu_log_items; item-mask != 0; item++) {
 -                    printf(%-10s %s\n, item-name, item-help);
 -                }
 -                exit(1);
             }
 -            cpu_set_log(mask);
 +            log_mask = argv[optind++];
 +        } else if (!strcmp(r, D)) {
 +            if (optind = argc) {
 +                break;
 +            }
 +            log_file = argv[optind++];
         } else if (!strcmp(r, E)) {
             r = argv[optind++];
             if (envlist_setenv(envlist, r) != 0)
 @@ -867,6 +860,23 @@ int main(int argc, char **argv)
         usage();
     filename = argv[optind];

 +    /* init debug */
 +    cpu_set_log_filename(log_file);
 +    if (log_mask) {
 +        int mask;
 +        const CPULogItem *item;
 +
 +        mask = cpu_str_to_log_mask(r);
 +        if (!mask) {
 +            printf(Log items (comma separated):\n);
 +            for (item = cpu_log_items; item-mask != 0; item++) {
 +                printf(%-10s %s\n, item-name, item-help);
 +            }
 +            exit(1);
 +        }
 +        cpu_set_log(mask);
 +    }
 +
     /* Zero out regs */
     memset(regs, 0, sizeof(struct target_pt_regs));

 diff --git a/cpus.c b/cpus.c
 index 1fc34b7..17e96b5 100644
 --- a/cpus.c
 +++ b/cpus.c
 @@ -1142,6 

Re: [Qemu-devel] [RFC] Specifying storage devices for live migration

2011-06-15 Thread Marcelo Tosatti
On Wed, Jun 15, 2011 at 02:06:21PM +0300, Avishay Traeger wrote:
 
 Hello all,
 Currently there is no way to choose storage devices to be migrated.  There
 was some discussion about making the monitor API a bit more flexible, but
 the code was never written:
 http://lists.gnu.org/archive/html/qemu-devel/2009-11/msg01494.html
 
 So today users can choose only:
 1) No storage migrated
 2) All disks migrated with fully copy
 3) All disks migrated with incremental
 
 I'd like to write the code to allow users to choose which disks to migrate,
 and how.  What API would you recommend?  What would break the least amount
 of utilities?

I'm unaware of any application that makes use of the -b flag, so if
thats true breaking the changing the current syntax is fine. The
one Liran suggests looks alright.




Re: [Qemu-devel] [PATCH v3 05/21] qapi: add QMP input visitor

2011-06-15 Thread Luiz Capitulino
On Wed, 15 Jun 2011 11:56:03 -0500
Michael Roth mdr...@linux.vnet.ibm.com wrote:

 On 06/15/2011 11:11 AM, Luiz Capitulino wrote:
  On Mon, 13 Jun 2011 21:31:10 -0500
  Michael Rothmdr...@linux.vnet.ibm.com  wrote:
 
 snip
  +Visitor *qmp_input_get_visitor(QmpInputVisitor *v)
  +{
  +returnv-visitor;
  +}
  +
  +void qmp_input_visitor_cleanup(QmpInputVisitor *v)
  +{
 
  You're not decrementing the reference to v-obj. Actually, we should also
  increment the reference to it before storing or document that the reference
  ownership is transferred to this function.
 
 
 Ownership never gets transferred to the visitor in the dispatch path; 
 the caller is supposed to free it. I thought there was some manipulation 
 elsewhere that prevented us from using a const argument to 
 qmp_input_visitor_new() instead, but it looks like we can. Otherwise, 
 I'd prefer to do an inc+dec to make this more explicit.

Yeah, I prefer that too.

 
  +qemu_free(v);
  +}
  +
  +QmpInputVisitor *qmp_input_visitor_new(QObject *obj)
  +{
  +QmpInputVisitor *v;
  +
  +v = qemu_mallocz(sizeof(*v));
  +
  +v-visitor.start_struct = qmp_input_start_struct;
  +v-visitor.end_struct = qmp_input_end_struct;
  +v-visitor.start_list = qmp_input_start_list;
  +v-visitor.next_list = qmp_input_next_list;
  +v-visitor.end_list = qmp_input_end_list;
  +v-visitor.type_enum = qmp_input_type_enum;
  +v-visitor.type_int = qmp_input_type_int;
  +v-visitor.type_bool = qmp_input_type_bool;
  +v-visitor.type_str = qmp_input_type_str;
  +v-visitor.type_number = qmp_input_type_number;
  +v-visitor.start_optional = qmp_input_start_optional;
  +v-visitor.end_optional = qmp_input_end_optional;
  +
  +v-obj = obj;
  +
  +return v;
  +}
  diff --git a/qapi/qmp-input-visitor.h b/qapi/qmp-input-visitor.h
  new file mode 100644
  index 000..3f798f0
  --- /dev/null
  +++ b/qapi/qmp-input-visitor.h
  @@ -0,0 +1,27 @@
  +/*
  + * Input Visitor
  + *
  + * Copyright IBM, Corp. 2011
  + *
  + * Authors:
  + *  Anthony Liguorialigu...@us.ibm.com
  + *
  + * This work is licensed under the terms of the GNU LGPL, version 2.1 or 
  later.
  + * See the COPYING.LIB file in the top-level directory.
  + *
  + */
  +
  +#ifndef QMP_INPUT_VISITOR_H
  +#define QMP_INPUT_VISITOR_H
  +
  +#include qapi-visit-core.h
  +#include qobject.h
  +
  +typedef struct QmpInputVisitor QmpInputVisitor;
  +
  +QmpInputVisitor *qmp_input_visitor_new(QObject *obj);
  +void qmp_input_visitor_cleanup(QmpInputVisitor *v);
  +
  +Visitor *qmp_input_get_visitor(QmpInputVisitor *v);
  +
  +#endif
  diff --git a/qerror.h b/qerror.h
  index 16c830d..7a89a50 100644
  --- a/qerror.h
  +++ b/qerror.h
  @@ -124,6 +124,9 @@ QError *qobject_to_qerror(const QObject *obj);
#define QERR_JSON_PARSE_ERROR \
{ 'class': 'JSONParseError', 'data': { 'message': %s } }
 
  +#define QERR_QAPI_VISITOR_STACK_OVERRUN \
  +{ 'class': 'QAPIVisitorStackOverrun', 'data': {} }
  +
#define QERR_KVM_MISSING_CAP \
{ 'class': 'KVMMissingCap', 'data': { 'capability': %s, 'feature': 
  %s } }
 
 
 




Re: [Qemu-devel] [PATCH v3 05/21] qapi: add QMP input visitor

2011-06-15 Thread Michael Roth

On 06/15/2011 11:11 AM, Luiz Capitulino wrote:

On Mon, 13 Jun 2011 21:31:10 -0500
Michael Rothmdr...@linux.vnet.ibm.com  wrote:


snip

+Visitor *qmp_input_get_visitor(QmpInputVisitor *v)
+{
+returnv-visitor;
+}
+
+void qmp_input_visitor_cleanup(QmpInputVisitor *v)
+{


You're not decrementing the reference to v-obj. Actually, we should also
increment the reference to it before storing or document that the reference
ownership is transferred to this function.



Ownership never gets transferred to the visitor in the dispatch path; 
the caller is supposed to free it. I thought there was some manipulation 
elsewhere that prevented us from using a const argument to 
qmp_input_visitor_new() instead, but it looks like we can. Otherwise, 
I'd prefer to do an inc+dec to make this more explicit.



+qemu_free(v);
+}
+
+QmpInputVisitor *qmp_input_visitor_new(QObject *obj)
+{
+QmpInputVisitor *v;
+
+v = qemu_mallocz(sizeof(*v));
+
+v-visitor.start_struct = qmp_input_start_struct;
+v-visitor.end_struct = qmp_input_end_struct;
+v-visitor.start_list = qmp_input_start_list;
+v-visitor.next_list = qmp_input_next_list;
+v-visitor.end_list = qmp_input_end_list;
+v-visitor.type_enum = qmp_input_type_enum;
+v-visitor.type_int = qmp_input_type_int;
+v-visitor.type_bool = qmp_input_type_bool;
+v-visitor.type_str = qmp_input_type_str;
+v-visitor.type_number = qmp_input_type_number;
+v-visitor.start_optional = qmp_input_start_optional;
+v-visitor.end_optional = qmp_input_end_optional;
+
+v-obj = obj;
+
+return v;
+}
diff --git a/qapi/qmp-input-visitor.h b/qapi/qmp-input-visitor.h
new file mode 100644
index 000..3f798f0
--- /dev/null
+++ b/qapi/qmp-input-visitor.h
@@ -0,0 +1,27 @@
+/*
+ * Input Visitor
+ *
+ * Copyright IBM, Corp. 2011
+ *
+ * Authors:
+ *  Anthony Liguorialigu...@us.ibm.com
+ *
+ * This work is licensed under the terms of the GNU LGPL, version 2.1 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#ifndef QMP_INPUT_VISITOR_H
+#define QMP_INPUT_VISITOR_H
+
+#include qapi-visit-core.h
+#include qobject.h
+
+typedef struct QmpInputVisitor QmpInputVisitor;
+
+QmpInputVisitor *qmp_input_visitor_new(QObject *obj);
+void qmp_input_visitor_cleanup(QmpInputVisitor *v);
+
+Visitor *qmp_input_get_visitor(QmpInputVisitor *v);
+
+#endif
diff --git a/qerror.h b/qerror.h
index 16c830d..7a89a50 100644
--- a/qerror.h
+++ b/qerror.h
@@ -124,6 +124,9 @@ QError *qobject_to_qerror(const QObject *obj);
  #define QERR_JSON_PARSE_ERROR \
  { 'class': 'JSONParseError', 'data': { 'message': %s } }

+#define QERR_QAPI_VISITOR_STACK_OVERRUN \
+{ 'class': 'QAPIVisitorStackOverrun', 'data': {} }
+
  #define QERR_KVM_MISSING_CAP \
  { 'class': 'KVMMissingCap', 'data': { 'capability': %s, 'feature': %s } 
}








[Qemu-devel] [PATCH] Added legacy 9P2000 support back into QEMU

2011-06-15 Thread William K. Bittner
It was tested with v9fs in the guest by using: sudo mount -t 9p -o 
trans=virtio,version=9p2000 v_boot /pmnt

Signed-off-by: William K. Bittner wkbit...@us.ibm.com

diff --git a/hw/9pfs/virtio-9p.c b/hw/9pfs/virtio-9p.c
index b5fc52b..febfba2 100644
--- a/hw/9pfs/virtio-9p.c
+++ b/hw/9pfs/virtio-9p.c
@@ -1186,6 +1186,8 @@ static void v9fs_version(V9fsState *s, V9fsPDU *pdu)
 s-proto_version = V9FS_PROTO_2000U;
 } else if (!strcmp(version.data, 9P2000.L)) {
 s-proto_version = V9FS_PROTO_2000L;
+} else if (!strcmp(version.data, 9P2000)) {
+s-proto_version = V9FS_PROTO_2000;
 } else {
 v9fs_string_sprintf(version, unknown);
 }
diff --git a/hw/9pfs/virtio-9p.h b/hw/9pfs/virtio-9p.h
index 622928f..c5b5229 100644
--- a/hw/9pfs/virtio-9p.h
+++ b/hw/9pfs/virtio-9p.h
@@ -94,6 +94,7 @@ enum {
 };
 
 enum p9_proto_version {
+V9FS_PROTO_2000  = 0x00,
 V9FS_PROTO_2000U = 0x01,
 V9FS_PROTO_2000L = 0x02,
 };
-- 
1.7.4.1




  1   2   >