date:20120229

Re: [Qemu-devel] Broken screendump with unconnected vnc

2012-02-29 Thread Gerd Hoffmann

On 02/29/12 15:05, Avi Kivity wrote:
> qemu-system-x86_64 -monitor stdio -vnc :0
> QEMU 1.0.50 monitor - type 'help' for more information
> (qemu) screendump /tmp/x.ppm
> 
> display /tmp/x.ppm shows an empty screenshot.  Breaks autotest for me. 
> Connecting a vnc viewer works around the problem.

Patch sent.  Attached here for convenience as git send-email hasn't
picked up Reported-by: for Cc'ing.

cheers,
  Gerd

From d30d22f5a71ba60d0bc71b6879791f2377c70a1f Mon Sep 17 00:00:00 2001
From: Gerd Hoffmann 
Date: Thu, 1 Mar 2012 08:28:32 +0100
Subject: [PATCH] fix screendump

Commit 45efb16124efef51de5157afc31984b5a47700f9 optimized a bit too
much.  We can skip the vga_invalidate_display() in case no console
switch happened because we don't need a full redraw then.  We can *not*
skip vga_hw_update() though, because the screen content will be stale
then in case nobody else calls vga_hw_update().

Trigger: vga textmode with vnc display and no client connected.

Reported-by: Avi Kivity 
Signed-off-by: Gerd Hoffmann 
---
 hw/blizzard.c  |4 +---
 hw/omap_lcdc.c |5 ++---
 hw/vga.c   |2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/blizzard.c b/hw/blizzard.c
index c7d844d..29074c4 100644
--- a/hw/blizzard.c
+++ b/hw/blizzard.c
@@ -937,9 +937,7 @@ static void blizzard_screen_dump(void *opaque, const char 
*filename,
 {
 BlizzardState *s = (BlizzardState *) opaque;
 
-if (cswitch) {
-blizzard_update_display(opaque);
-}
+blizzard_update_display(opaque);
 if (s && ds_get_data(s->state))
 ppm_save(filename, s->state->surface);
 }
diff --git a/hw/omap_lcdc.c b/hw/omap_lcdc.c
index f172093..4a08e9d 100644
--- a/hw/omap_lcdc.c
+++ b/hw/omap_lcdc.c
@@ -267,9 +267,8 @@ static int ppm_save(const char *filename, uint8_t *data,
 static void omap_screen_dump(void *opaque, const char *filename, bool cswitch)
 {
 struct omap_lcd_panel_s *omap_lcd = opaque;
-if (cswitch) {
-omap_update_display(opaque);
-}
+
+omap_update_display(opaque);
 if (omap_lcd && ds_get_data(omap_lcd->state))
 ppm_save(filename, ds_get_data(omap_lcd->state),
 omap_lcd->width, omap_lcd->height,
diff --git a/hw/vga.c b/hw/vga.c
index 5994f43..16546ef 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2413,7 +2413,7 @@ static void vga_screen_dump(void *opaque, const char 
*filename, bool cswitch)
 
 if (cswitch) {
 vga_invalidate_display(s);
-vga_hw_update();
 }
+vga_hw_update();
 ppm_save(filename, s->ds->surface);
 }
-- 
1.7.1

[Qemu-devel] [PATCH] fix screendump

2012-02-29 Thread Gerd Hoffmann

Commit 45efb16124efef51de5157afc31984b5a47700f9 optimized a bit too
much.  We can skip the vga_invalidate_display() in case no console
switch happened because we don't need a full redraw then.  We can *not*
skip vga_hw_update() though, because the screen content will be stale
then in case nobody else calls vga_hw_update().

Trigger: vga textmode with vnc display and no client connected.

Reported-by: Avi Kivity 
Signed-off-by: Gerd Hoffmann 
---
 hw/blizzard.c  |4 +---
 hw/omap_lcdc.c |5 ++---
 hw/vga.c   |2 +-
 3 files changed, 4 insertions(+), 7 deletions(-)

diff --git a/hw/blizzard.c b/hw/blizzard.c
index c7d844d..29074c4 100644
--- a/hw/blizzard.c
+++ b/hw/blizzard.c
@@ -937,9 +937,7 @@ static void blizzard_screen_dump(void *opaque, const char 
*filename,
 {
 BlizzardState *s = (BlizzardState *) opaque;
 
-if (cswitch) {
-blizzard_update_display(opaque);
-}
+blizzard_update_display(opaque);
 if (s && ds_get_data(s->state))
 ppm_save(filename, s->state->surface);
 }
diff --git a/hw/omap_lcdc.c b/hw/omap_lcdc.c
index f172093..4a08e9d 100644
--- a/hw/omap_lcdc.c
+++ b/hw/omap_lcdc.c
@@ -267,9 +267,8 @@ static int ppm_save(const char *filename, uint8_t *data,
 static void omap_screen_dump(void *opaque, const char *filename, bool cswitch)
 {
 struct omap_lcd_panel_s *omap_lcd = opaque;
-if (cswitch) {
-omap_update_display(opaque);
-}
+
+omap_update_display(opaque);
 if (omap_lcd && ds_get_data(omap_lcd->state))
 ppm_save(filename, ds_get_data(omap_lcd->state),
 omap_lcd->width, omap_lcd->height,
diff --git a/hw/vga.c b/hw/vga.c
index 5994f43..16546ef 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -2413,7 +2413,7 @@ static void vga_screen_dump(void *opaque, const char 
*filename, bool cswitch)
 
 if (cswitch) {
 vga_invalidate_display(s);
-vga_hw_update();
 }
+vga_hw_update();
 ppm_save(filename, s->ds->surface);
 }
-- 
1.7.1

Re: [Qemu-devel] [RFC][PATCH 04/14 v7] Add API to get memory mapping

2012-02-29 Thread Wen Congyang

At 03/01/2012 03:11 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: Re: [RFC][PATCH 04/14 v7] Add API to get memory mapping
> Date: Thu, 01 Mar 2012 14:17:53 +0800
> 
>> At 03/01/2012 02:01 PM, HATAYAMA Daisuke Wrote:
>>> From: Wen Congyang 
>>> Subject: [RFC][PATCH 04/14 v7] Add API to get memory mapping
>>> Date: Thu, 01 Mar 2012 10:43:13 +0800
>>>
 +int qemu_get_guest_memory_mapping(MemoryMappingList *list)
 +{
 +CPUState *env;
 +MemoryMapping *memory_mapping;
 +RAMBlock *block;
 +ram_addr_t offset, length;
 +int ret;
 +
 +#if defined(CONFIG_HAVE_GET_MEMORY_MAPPING)
 +for (env = first_cpu; env != NULL; env = env->next_cpu) {
 +ret = cpu_get_memory_mapping(list, env);
 +if (ret < 0) {
 +return -1;
 +}
 +}
 +#else
 +return -2;
 +#endif
 +
 +/* some memory may be not mapped, add them into memory mapping's list 
 */
>>>
>>> The part from here is logic fully for 2nd kernel? If so, I think it
>>> better to describe why this addtional mapping is needed; we should
>>> assume most people doesn't know kdump mechanism.
>>
>> Not only for 2nd kernel. If the guest has 1 vcpu, and is in the 2nd kernel,
>> we need this logic for 1st kernel.
>>
> 
> So you should describe two cases explicitly. I don't understand them
> from ``some memory''.
> 
> and please consider cleanup below too. Conditionals over two lines are
> hard to read.

OK. I will fix it.

Thanks
Wen Congyang

> 
>>>
>>> I think it more readable if shortening memory_mapping->phys_addr and
>>> memmory_maping->length at the berinning of the innermost foreach loop.
>>>
>>>   m_phys_addr = memory_mapping->phys_addr;
>>>   m_length = memory_mapping->length;
>>>
>>> Then, each conditionals becomes compact.
> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

Re: [Qemu-devel] [RFC][PATCH 09/14 v7] introduce a new monitor command 'dump' to dump guest's memory

2012-02-29 Thread Wen Congyang

At 03/01/2012 03:04 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: [RFC][PATCH 09/14 v7] introduce a new monitor command 'dump' to dump 
> guest's memory
> Date: Thu, 01 Mar 2012 10:51:42 +0800
> 
>> +/*
>> + * calculate phdr_num
>> + *
>> + * the type of phdr->num is uint16_t, so we should avoid overflow
>> + */
>> +s->phdr_num = 1; /* PT_NOTE */
>> +if (s->list.num > (1 << 16) - 2) {
>> +s->phdr_num = (1 << 16) - 1;
>> +} else {
>> +s->phdr_num += s->list.num;
>> +}
>> +
>> +return s;
>> +}
> 
> Though e_phnum is uint16_t at default, there's extension up to
> uint32_t. Look at relatively new manual page. This is from FC14's.
> 
>  e_phnum This member  holds the number of  entries in the
>  program  header  table.   Thus  the  product  of
>  e_phentsize and  e_phnum gives the  table's size
>  in  bytes.  If  a  file has  no program  header,
>  e_phnum holds the value zero.
> 
>  If the  number of entries in  the program header
>  table  is  larger   than  or  equal  to  PN_XNUM
>  (0x), this member holds PN_XNUM (0x) and
>  the real number of entries in the program header
>  table  is  held in  the  sh_info  member of  the
>  initial   entry   in   section   header   table.
>  Otherwise,  the sh_info  member  of the  initial
>  entry contains the value zero.
> 
>  PN_XNUM  This is defined  as 0x, the largest
>   number  e_phnum  can  have,  specifying
>   where  the  actual  number  of  program
>   headers is assigned.

Good news.

> 
> Recent kernel, gdb and tools in binutils supports this. But crash
> doesn't, so you need to fix this.

I think it can be easily fixed.

> 
> I'm interested in the number of program headers at worst
> case. According to Intel Programming Guide 3A, Table 4-1. shows
> physical-address width on IA-32e is up to 52 and linear-address width
> is 48. Can the number exceed this limit in theory? Also how many
> program headers are created typically?

In my test, the guest has 512M memory, and it contains about 1000~2000 program
headers.

In theory, if the guest has 2^52 memory, the number can still exceed this limit.
Tha maxnium number is 2^52/2^12

Thanks
Wen Congyang

> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

Re: [Qemu-devel] [RFC][PATCH 04/14 v7] Add API to get memory mapping

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: Re: [RFC][PATCH 04/14 v7] Add API to get memory mapping
Date: Thu, 01 Mar 2012 14:17:53 +0800

> At 03/01/2012 02:01 PM, HATAYAMA Daisuke Wrote:
>> From: Wen Congyang 
>> Subject: [RFC][PATCH 04/14 v7] Add API to get memory mapping
>> Date: Thu, 01 Mar 2012 10:43:13 +0800
>> 
>>> +int qemu_get_guest_memory_mapping(MemoryMappingList *list)
>>> +{
>>> +CPUState *env;
>>> +MemoryMapping *memory_mapping;
>>> +RAMBlock *block;
>>> +ram_addr_t offset, length;
>>> +int ret;
>>> +
>>> +#if defined(CONFIG_HAVE_GET_MEMORY_MAPPING)
>>> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
>>> +ret = cpu_get_memory_mapping(list, env);
>>> +if (ret < 0) {
>>> +return -1;
>>> +}
>>> +}
>>> +#else
>>> +return -2;
>>> +#endif
>>> +
>>> +/* some memory may be not mapped, add them into memory mapping's list 
>>> */
>> 
>> The part from here is logic fully for 2nd kernel? If so, I think it
>> better to describe why this addtional mapping is needed; we should
>> assume most people doesn't know kdump mechanism.
> 
> Not only for 2nd kernel. If the guest has 1 vcpu, and is in the 2nd kernel,
> we need this logic for 1st kernel.
> 

So you should describe two cases explicitly. I don't understand them
from ``some memory''.

and please consider cleanup below too. Conditionals over two lines are
hard to read.

>> 
>> I think it more readable if shortening memory_mapping->phys_addr and
>> memmory_maping->length at the berinning of the innermost foreach loop.
>> 
>>   m_phys_addr = memory_mapping->phys_addr;
>>   m_length = memory_mapping->length;
>> 
>> Then, each conditionals becomes compact.

Thanks.
HATAYAMA, Daisuke

[Qemu-devel] [PATCH 6/6] 64bit PCI range in _CRS table

2012-02-29 Thread Alexey Korolev

This patch was originally proposed by Michael, to solve issues I've seen
on Windows guests, when 64bit BAR's are present.
This patch also might be helpful on Linux guests when use_crs kernel
boot option is set.

Signed-off-by: Alexey Korolev  
Signed-off-by: Michael S. Tsirkin 
---
 src/acpi-dsdt.dsl |7 +
 src/acpi-dsdt.hex |   72 +++-
 2 files changed, 66 insertions(+), 13 deletions(-)

diff --git a/src/acpi-dsdt.dsl b/src/acpi-dsdt.dsl
index 7082b65..c17e947 100644
--- a/src/acpi-dsdt.dsl
+++ b/src/acpi-dsdt.dsl
@@ -175,6 +175,13 @@ DefinitionBlock (
 0x, // Address Translation Offset
 0x1EC0, // Address Length
 ,, , AddressRangeMemory, TypeStatic)
+QWordMemory (ResourceProducer, PosDecode, MinFixed, MaxFixed, 
Cacheable, ReadWrite,
+0x,  // Address Space Granularity
+0x80,// Address Range Minimum
+0xFF,// Address Range Maximum
+0x,  // Address Translation Offset
+0x80,// Address Length
+,, , AddressRangeMemory, TypeStatic)
 })
 }
 }
diff --git a/src/acpi-dsdt.hex b/src/acpi-dsdt.hex
index 5dc7bb4..2393827 100644
--- a/src/acpi-dsdt.hex
+++ b/src/acpi-dsdt.hex
@@ -3,12 +3,12 @@ static unsigned char AmlCode[] = {
 0x53,
 0x44,
 0x54,
-0xd3,
-0x10,
+0x1,
+0x11,
 0x0,
 0x0,
 0x1,
-0x2d,
+0x1e,
 0x42,
 0x58,
 0x50,
@@ -31,9 +31,9 @@ static unsigned char AmlCode[] = {
 0x4e,
 0x54,
 0x4c,
-0x28,
-0x5,
-0x10,
+0x23,
+0x1,
+0x9,
 0x20,
 0x10,
 0x49,
@@ -110,16 +110,16 @@ static unsigned char AmlCode[] = {
 0x47,
 0x42,
 0x10,
-0x44,
-0x81,
+0x42,
+0x84,
 0x5f,
 0x53,
 0x42,
 0x5f,
 0x5b,
 0x82,
-0x4c,
-0x80,
+0x4a,
+0x83,
 0x50,
 0x43,
 0x49,
@@ -2064,10 +2064,10 @@ static unsigned char AmlCode[] = {
 0x52,
 0x53,
 0x11,
-0x42,
-0x7,
+0x40,
+0xa,
 0xa,
-0x6e,
+0x9c,
 0x88,
 0xd,
 0x0,
@@ -2176,6 +2176,52 @@ static unsigned char AmlCode[] = {
 0x0,
 0xc0,
 0x1e,
+0x8a,
+0x2b,
+0x0,
+0x0,
+0xc,
+0x3,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x80,
+0x0,
+0x0,
+0x0,
+0xff,
+0xff,
+0xff,
+0xff,
+0xff,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x0,
+0x80,
+0x0,
+0x0,
+0x0,
 0x79,
 0x0,
 0x10,
-- 
1.7.5.4

Re: [Qemu-devel] [RFC][PATCH 09/14 v7] introduce a new monitor command 'dump' to dump guest's memory

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: [RFC][PATCH 09/14 v7] introduce a new monitor command 'dump' to dump 
guest's memory
Date: Thu, 01 Mar 2012 10:51:42 +0800

> +/*
> + * calculate phdr_num
> + *
> + * the type of phdr->num is uint16_t, so we should avoid overflow
> + */
> +s->phdr_num = 1; /* PT_NOTE */
> +if (s->list.num > (1 << 16) - 2) {
> +s->phdr_num = (1 << 16) - 1;
> +} else {
> +s->phdr_num += s->list.num;
> +}
> +
> +return s;
> +}

Though e_phnum is uint16_t at default, there's extension up to
uint32_t. Look at relatively new manual page. This is from FC14's.

 e_phnum This member  holds the number of  entries in the
 program  header  table.   Thus  the  product  of
 e_phentsize and  e_phnum gives the  table's size
 in  bytes.  If  a  file has  no program  header,
 e_phnum holds the value zero.

 If the  number of entries in  the program header
 table  is  larger   than  or  equal  to  PN_XNUM
 (0x), this member holds PN_XNUM (0x) and
 the real number of entries in the program header
 table  is  held in  the  sh_info  member of  the
 initial   entry   in   section   header   table.
 Otherwise,  the sh_info  member  of the  initial
 entry contains the value zero.

 PN_XNUM  This is defined  as 0x, the largest
  number  e_phnum  can  have,  specifying
  where  the  actual  number  of  program
  headers is assigned.

Recent kernel, gdb and tools in binutils supports this. But crash
doesn't, so you need to fix this.

I'm interested in the number of program headers at worst
case. According to Intel Programming Guide 3A, Table 4-1. shows
physical-address width on IA-32e is up to 52 and linear-address width
is 48. Can the number exceed this limit in theory? Also how many
program headers are created typically?

Thanks.
HATAYAMA, Daisuke

[Qemu-devel] [PATCH 5/6] Delete old code

2012-02-29 Thread Alexey Korolev

Delete old code.

Signed-off-by: Alexey Korolev 
---
 src/pciinit.c |  212 -
 1 files changed, 0 insertions(+), 212 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 0fba130..9c41e3c 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -617,208 +617,6 @@ static int pci_bios_map_regions(struct pci_region 
*regions)
 return 0;
 }
 
-static void pci_bios_bus_reserve(struct pci_bus *bus, int type, u32 size)
-{
-u32 index;
-
-index = pci_size_to_index(size, type);
-size = pci_index_to_size(index, type);
-bus->r[type].count[index]++;
-bus->r[type].sum += size;
-if (bus->r[type].max < size)
-bus->r[type].max = size;
-}
-
-static void pci_bios_check_devices(struct pci_bus *busses)
-{
-dprintf(1, "PCI: check devices\n");
-
-// Calculate resources needed for regular (non-bus) devices.
-struct pci_device *pci;
-foreachpci(pci) {
-if (pci->class == PCI_CLASS_BRIDGE_PCI) {
-busses[pci->secondary_bus].bus_dev = pci;
-continue;
-}
-struct pci_bus *bus = &busses[pci_bdf_to_bus(pci->bdf)];
-int i;
-for (i = 0; i < PCI_NUM_REGIONS; i++) {
-u32 val, size;
-pci_bios_get_bar(pci, i, &val, &size);
-if (val == 0)
-continue;
-
-pci_bios_bus_reserve(bus, pci_addr_to_type(val), size);
-pci->bars[i].addr = val;
-pci->bars[i].size = size;
-pci->bars[i].is64 = (!(val & PCI_BASE_ADDRESS_SPACE_IO) &&
- (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
- == PCI_BASE_ADDRESS_MEM_TYPE_64);
-
-if (pci->bars[i].is64)
-i++;
-}
-}
-
-// Propagate required bus resources to parent busses.
-int secondary_bus;
-for (secondary_bus=MaxPCIBus; secondary_bus>0; secondary_bus--) {
-struct pci_bus *s = &busses[secondary_bus];
-if (!s->bus_dev)
-continue;
-struct pci_bus *parent = &busses[pci_bdf_to_bus(s->bus_dev->bdf)];
-int type;
-for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) {
-u32 limit = (type == PCI_REGION_TYPE_IO) ?
-PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
-s->r[type].size = s->r[type].sum;
-if (s->r[type].size < limit)
-s->r[type].size = limit;
-s->r[type].size = pci_size_roundup(s->r[type].size);
-pci_bios_bus_reserve(parent, type, s->r[type].size);
-}
-dprintf(1, "PCI: secondary bus %d sizes: io %x, mem %x, prefmem %x\n",
-secondary_bus,
-s->r[PCI_REGION_TYPE_IO].size,
-s->r[PCI_REGION_TYPE_MEM].size,
-s->r[PCI_REGION_TYPE_PREFMEM].size);
-}
-}
-
-#define ROOT_BASE(top, sum, max) ALIGN_DOWN((top)-(sum),(max) ?: 1)
-
-// Setup region bases (given the regions' size and alignment)
-static int pci_bios_init_root_regions(struct pci_bus *bus, u32 start, u32 end)
-{
-bus->r[PCI_REGION_TYPE_IO].base = 0xc000;
-
-int reg1 = PCI_REGION_TYPE_PREFMEM, reg2 = PCI_REGION_TYPE_MEM;
-if (bus->r[reg1].sum < bus->r[reg2].sum) {
-// Swap regions so larger area is more likely to align well.
-reg1 = PCI_REGION_TYPE_MEM;
-reg2 = PCI_REGION_TYPE_PREFMEM;
-}
-bus->r[reg2].base = ROOT_BASE(end, bus->r[reg2].sum, bus->r[reg2].max);
-bus->r[reg1].base = ROOT_BASE(bus->r[reg2].base, bus->r[reg1].sum
-  , bus->r[reg1].max);
-if (bus->r[reg1].base < start)
-// Memory range requested is larger than available.
-return -1;
-return 0;
-}
-
-
-/
- * BAR assignment
- /
-
-static void pci_bios_init_bus_bases(struct pci_bus *bus)
-{
-u32 base, newbase, size;
-int type, i;
-
-for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) {
-dprintf(1, "  type %s max %x sum %x base %x\n", region_type_name[type],
-bus->r[type].max, bus->r[type].sum, bus->r[type].base);
-base = bus->r[type].base;
-for (i = ARRAY_SIZE(bus->r[type].count)-1; i >= 0; i--) {
-size = pci_index_to_size(i, type);
-if (!bus->r[type].count[i])
-continue;
-newbase = base + size * bus->r[type].count[i];
-dprintf(1, "size %8x: %d bar(s), %8x -> %8x\n",
-size, bus->r[type].count[i], base, newbase - 1);
-bus->r[type].bases[i] = base;
-base = newbase;
-}
-}
-}
-
-static u32 pci_bios_bus_get_addr(struct pci_bus *bus, int type, u32 size)
-{
-u32 index, addr;
-
-index = pci_size_to_index(size, type);
-addr = bus->r[type].bases[index];
-bus->r[type].bases[index] += pci_index_to_size(index, type);
-re

[Qemu-devel] [PATCH 4/6] Mapping of BARs and Bridge regions

2012-02-29 Thread Alexey Korolev

In pci_bios_map_regions() we try to reserve memory for 
all entries of root bus regions.
If pci_bios_init_root_regions() fails - e.g no enough space, we create two new 
pci_regions:
r64pref, r64mem and migrate all entries which are 64bit capable to them. 
Migration process
is very simple: delete the entry from one list add to another.
Then try pci_bios_init_root_regions() again.

If it passes, we map entries for each region.
1. Calculate base address of the entry. And increase pci_region base address.
2. Program PCI BAR or bridge region. If the entry belongs to PCI-to-PCI bridge 
and provides
   a pci_region for downstream devices, we set base address of the region the 
entry provides.
3. Delete entry. 


Signed-off-by: Alexey Korolev 
---
 src/config.h  |2 +
 src/pciinit.c |  123 -
 2 files changed, 124 insertions(+), 1 deletions(-)

diff --git a/src/config.h b/src/config.h
index b0187a4..bbacae7 100644
--- a/src/config.h
+++ b/src/config.h
@@ -47,6 +47,8 @@
 
 #define BUILD_PCIMEM_START0xe000
 #define BUILD_PCIMEM_END  0xfec0/* IOAPIC is mapped at */
+#define BUILD_PCIMEM64_START  0x80ULL
+#define BUILD_PCIMEM64_END0x100ULL
 
 #define BUILD_IOAPIC_ADDR 0xfec0
 #define BUILD_HPET_ADDRESS0xfed0
diff --git a/src/pciinit.c b/src/pciinit.c
index 03ece34..0fba130 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -496,6 +496,126 @@ static int pci_bios_fill_regions(struct pci_region 
*regions)
 return 0;
 }
 
+/
+ * Map pci region entries
+ /
+
+#define ROOT_BASE(top, sum, max) ALIGN_DOWN((top)-(sum),(max) ?: 1)
+// Setup region bases (given the regions' size and alignment)
+static int pci_bios_init_root_regions(struct pci_region *regions)
+{
+struct pci_region *r_end, *r_start;
+regions[PCI_REGION_TYPE_IO].base = 0xc000;
+
+r_end = ®ions[PCI_REGION_TYPE_PREFMEM];
+r_start = ®ions[PCI_REGION_TYPE_MEM];
+if (pci_region_sum(r_end) > pci_region_sum(r_start)) {
+// Swap regions so larger area is more likely to align well.
+r_end = r_start;
+r_start = ®ions[PCI_REGION_TYPE_PREFMEM];
+}
+// Out of space
+if ((pci_region_sum(r_end) +  pci_region_sum(r_start) > BUILD_PCIMEM_END))
+return -1;
+
+r_end->base = ROOT_BASE(BUILD_PCIMEM_END, pci_region_sum(r_end),
+pci_region_max_size(r_end));
+r_start->base = ROOT_BASE(r_end->base, pci_region_sum(r_start),
+pci_region_max_size(r_start));
+if (r_start->base < BUILD_PCIMEM_START)
+// Memory range requested is larger than available...
+return -1;
+return 0;
+}
+
+static void
+pci_region_move_64bit_entries(struct pci_region *to, struct pci_region *from)
+{
+struct pci_region_entry *entry, *next;
+foreach_region_entry_safe(from, next, entry) {
+if (entry->is64bit) {
+region_entry_del(entry);
+region_entry_add(to, entry);
+entry->parent_region = to;
+}
+}
+}
+
+static void pci_region_map_one_entry(struct pci_region_entry *entry)
+{
+if (!entry->this_region ) {
+pci_set_io_region_addr(entry->dev, entry->bar, entry->base);
+if (entry->is64bit)
+pci_set_io_region_addr(entry->dev, entry->bar + 1, entry->base >> 
32);
+return;
+}
+
+entry->this_region->base = entry->base;
+u16 bdf = entry->dev->bdf;
+u64 base = entry->base;
+u64 limit = entry->base + entry->size - 1;
+if (entry->type == PCI_REGION_TYPE_IO) {
+pci_config_writeb(bdf, PCI_IO_BASE, base >> 8);
+pci_config_writew(bdf, PCI_IO_BASE_UPPER16, 0);
+pci_config_writeb(bdf, PCI_IO_LIMIT, limit >> 8);
+pci_config_writew(bdf, PCI_IO_LIMIT_UPPER16, 0);
+}
+if (entry->type == PCI_REGION_TYPE_MEM) {
+pci_config_writew(bdf, PCI_MEMORY_BASE, base >> 16);
+pci_config_writew(bdf, PCI_MEMORY_LIMIT, limit >> 16);
+}
+if (entry->type == PCI_REGION_TYPE_PREFMEM) {
+pci_config_writew(bdf, PCI_PREF_MEMORY_BASE, base >> 16);
+pci_config_writew(bdf, PCI_PREF_MEMORY_LIMIT, limit >> 16);
+pci_config_writel(bdf, PCI_PREF_BASE_UPPER32, base >> 32);
+pci_config_writel(bdf, PCI_PREF_LIMIT_UPPER32, limit >> 32);
+}
+return;
+}
+
+static void pci_region_map_entries(struct pci_region *r)
+{
+struct pci_region_entry *entry, *next;
+u64 size, max_size = pci_region_max_size(r);
+
+for (size = max_size; size > 0; size >>= 1) {
+foreach_region_entry_safe(r, next, entry) {
+if (size == entry->size) {
+entry->base = r->base;
+r->base += size;
+dump_entry(entry);
+pci_region_map_one_entry(entry);
+region_entry_del(entry);
+

Re: [Qemu-devel] [PATCH 6/6] add mirroring to blockdev-transaction

2012-02-29 Thread Paolo Bonzini

Il 29/02/2012 20:47, Eric Blake ha scritto:
> This falls out very nicely.
> 
> Do we want to also add a 'reopen' operation to the union, for the
> remaining action needed by oVirt live migration?

We can add it later, I think, if need arises.  Switching to the
destination need not be done atomically for all disks.  (In fact,
neither does the start-mirroring operation if you want to migrate
multiple disks; it's just the snapshot+mirror pair that needs it).

Paolo

[Qemu-devel] [PATCH 11/13] pseries: Convert sPAPR TCEs to use generic IOMMU infrastructure

2012-02-29 Thread David Gibson

The pseries platform already contains an IOMMU implementation, since it is
essential for the platform's paravirtualized VIO devices.  This IOMMU
support is currently built into the implementation of the VIO "bus" and
the various VIO devices.

This patch converts this code to make use of the new common IOMMU
infrastructure.

Cc: Alex Graf 

Signed-off-by: David Gibson 
---
 Makefile.target  |2 +-
 configure|2 +-
 hw/spapr.c   |3 +
 hw/spapr.h   |   16 +++
 hw/spapr_iommu.c |  238 +
 hw/spapr_llan.c  |   61 ++--
 hw/spapr_vio.c   |  282 --
 hw/spapr_vio.h   |   71 +++---
 hw/spapr_vscsi.c |   24 +++--
 target-ppc/kvm.c |4 +-
 10 files changed, 361 insertions(+), 342 deletions(-)
 create mode 100644 hw/spapr_iommu.c

diff --git a/Makefile.target b/Makefile.target
index 68a5641..af199ba 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -256,7 +256,7 @@ obj-ppc-y += ppc_oldworld.o
 # NewWorld PowerMac
 obj-ppc-y += ppc_newworld.o
 # IBM pSeries (sPAPR)
-obj-ppc-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o
+obj-ppc-$(CONFIG_PSERIES) += spapr.o spapr_hcall.o spapr_rtas.o spapr_vio.o 
spapr_iommu.o
 obj-ppc-$(CONFIG_PSERIES) += xics.o spapr_vty.o spapr_llan.o spapr_vscsi.o
 obj-ppc-$(CONFIG_PSERIES) += spapr_pci.o device-hotplug.o pci-hotplug.o
 # PowerPC 4xx boards
diff --git a/configure b/configure
index 3eff89c..16edee8 100755
--- a/configure
+++ b/configure
@@ -3651,7 +3651,7 @@ case "$target_arch2" in
   fi
 fi
 esac
-if test "$target_arch2" = "ppc64" -a "$fdt" = "yes"; then
+if test "$target_arch2" = "ppc64" -a "$fdt" = "yes" -a "$iommu" = "yes" ; then
   echo "CONFIG_PSERIES=y" >> $config_target_mak
 fi
 if test "$target_bigendian" = "yes" ; then
diff --git a/hw/spapr.c b/hw/spapr.c
index d1cb6cd..c6dc21d 100644
--- a/hw/spapr.c
+++ b/hw/spapr.c
@@ -618,6 +618,9 @@ static void ppc_spapr_init(ram_addr_t ram_size,
 spapr->icp = xics_system_init(XICS_IRQS);
 spapr->next_irq = 16;
 
+/* Set up IOMMU */
+spapr_iommu_init();
+
 /* Set up VIO bus */
 spapr->vio_bus = spapr_vio_bus_init();
 
diff --git a/hw/spapr.h b/hw/spapr.h
index 4aae1a0..6ecdca2 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -318,4 +318,20 @@ target_ulong spapr_rtas_call(sPAPREnvironment *spapr,
 int spapr_rtas_device_tree_setup(void *fdt, target_phys_addr_t rtas_addr,
  target_phys_addr_t rtas_size);
 
+#define SPAPR_TCE_PAGE_SHIFT   12
+#define SPAPR_TCE_PAGE_SIZE(1ULL << SPAPR_TCE_PAGE_SHIFT)
+#define SPAPR_TCE_PAGE_MASK(SPAPR_TCE_PAGE_SIZE - 1)
+
+typedef struct sPAPRTCE {
+uint64_t tce;
+} sPAPRTCE;
+
+#define SPAPR_VIO_BASE_LIOBN0x
+
+void spapr_iommu_init(void);
+DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
+void spapr_tce_free(DMAContext *dma);
+int spapr_dma_dt(void *fdt, int node_off, const char *propname,
+ DMAContext *dma);
+
 #endif /* !defined (__HW_SPAPR_H__) */
diff --git a/hw/spapr_iommu.c b/hw/spapr_iommu.c
new file mode 100644
index 000..c9f48c9
--- /dev/null
+++ b/hw/spapr_iommu.c
@@ -0,0 +1,238 @@
+/*
+ * QEMU sPAPR IOMMU (TCE) code
+ *
+ * Copyright (c) 2010 David Gibson, IBM Corporation 
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2 of the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, see .
+ */
+#include "hw.h"
+#include "kvm.h"
+#include "qdev.h"
+#include "kvm_ppc.h"
+#include "dma.h"
+
+#include "hw/spapr.h"
+
+#include 
+
+/* #define DEBUG_TCE */
+
+enum sPAPRTCEAccess {
+SPAPR_TCE_FAULT = 0,
+SPAPR_TCE_RO = 1,
+SPAPR_TCE_WO = 2,
+SPAPR_TCE_RW = 3,
+};
+
+typedef struct sPAPRTCETable sPAPRTCETable;
+
+struct sPAPRTCETable {
+DMAContext dma;
+uint32_t liobn;
+uint32_t window_size;
+sPAPRTCE *table;
+int fd;
+QLIST_ENTRY(sPAPRTCETable) list;
+};
+
+
+QLIST_HEAD(spapr_tce_tables, sPAPRTCETable) spapr_tce_tables;
+
+static sPAPRTCETable *spapr_tce_find_by_liobn(uint32_t liobn)
+{
+sPAPRTCETable *tcet;
+
+QLIST_FOREACH(tcet, &spapr_tce_tables, list) {
+if (tcet->liobn == liobn) {
+return tcet;
+}
+}
+
+return NULL;
+}
+
+static int spapr_tce_translate(DMAContext *dma,
+   dma_addr_t addr,
+   dma_addr_t *paddr,
+

Re: [Qemu-devel] [PATCH] pc: make user-triggerable exit conditional to DEBUG_BIOS define

2012-02-29 Thread Hervé Poussineau


Hi,

I've no idea what installer is trying to do with this port, or what 
device it is probing.


We can maybe add a runtime switch to exit Qemu or not. What do you think 
of '-vgabios-backdoor' which, if specified, does the exit ?


Otherwise, which solution do you propose?

Regards,

Hervé

Anthony Liguori a écrit :

On 02/29/2012 04:44 PM, Hervé Poussineau wrote:

The port 0x501 is (at least) used by SCO Xenix 2.3.4 installer.


For what?  What device would normally be there?

I don't want to disable this by default.  My regression suite depends on 
this as an exit mechanism.


Regards,

Anthony Liguori



Signed-off-by: Hervé Poussineau
---
  hw/pc.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 12c02f2..113a38a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -565,7 +565,10 @@ static void bochs_bios_write(void *opaque, 
uint32_t addr, uint32_t val)

  /* LGPL'ed VGA BIOS messages */
  case 0x501:
  case 0x502:
+#ifdef DEBUG_BIOS
  exit((val<<  1) | 1);
+#endif
+break;
  case 0x500:
  case 0x503:
  #ifdef DEBUG_BIOS

[Qemu-devel] [PATCH 3/6] Fill PCI regions with etnries

2012-02-29 Thread Alexey Korolev

In this patch we fill pci_regions with entries. 

The idea of implementation is pretty much the same as it was before in 
pci_check_devices()
function.
The pci_bios_fill_regions() function scans pci devices.
1) If pci device is a pci-to-pci bridge
   a) we create empty entry.
   b) Associate new entry with pci_region, which is provided by pci-to-pci 
bridge
   c) Add new entry to a list of pci_region of parent bus.
2) If pci device is not a bridge.
   a) Scan PCI BARs.
   b) Get size and attributes. (Type and is64bit)
   c) Add new entry to a list of pci_region of parent bus.

Then the pci_bios_fill_regions() scans pci_regions in reverse order
to calculate size of pci_region_entries belonging to a bridge.


Signed-off-by: Alexey Korolev 
---
 src/pciinit.c |  103 ++---
 1 files changed, 98 insertions(+), 5 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index b02da89..03ece34 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -12,11 +12,10 @@
 #include "pci_regs.h" // PCI_COMMAND
 #include "xen.h" // usingXen
 
-#define PCI_IO_INDEX_SHIFT 2
-#define PCI_MEM_INDEX_SHIFT 12
-
-#define PCI_BRIDGE_IO_MIN  0x1000
-#define PCI_BRIDGE_MEM_MIN   0x10
+#define PCI_DEV_IO_MINSIZE 4
+#define PCI_DEV_MEM_MINSIZE 0x1000
+#define PCI_BRIDGE_IO_MINSIZE 0x1000
+#define PCI_BRIDGE_MEM_MINSIZE 0x10
 
 enum pci_region_type {
 PCI_REGION_TYPE_IO,
@@ -408,6 +407,96 @@ dump_entry(struct pci_region_entry *entry)
 region_type_name[entry->type],entry->is64bit ? "64bits" : "32bits");
 }
 
+/
+ * Build topology and calculate size of entries
+ /
+struct pci_region_entry *
+pci_region_create_entry(struct pci_region *parent, struct pci_device *dev,
+   u64 size, int type, int is64bit)
+{
+struct pci_region_entry *entry= malloc_tmp(sizeof(*entry));
+if (!entry) {
+warn_noalloc();
+return NULL;
+}
+memset(entry, 0, sizeof(*entry));
+
+entry->dev = dev;
+entry->type = type;
+entry->is64bit = is64bit;
+entry->size = size;
+region_entry_add(parent, entry);
+entry->parent_region = parent;
+return entry;
+}
+
+static int pci_bios_fill_regions(struct pci_region *regions)
+{
+struct pci_region *this_region, *parent;
+enum pci_region_type type;
+struct pci_device *pci;
+struct pci_region_entry *entry;
+int is64bit, i;
+u64 size, min_size;
+
+foreachpci(pci) {
+if (pci->class == PCI_CLASS_BRIDGE_PCI) {
+this_region = ®ions[pci->secondary_bus * PCI_REGION_TYPE_COUNT];
+parent = ®ions[pci_bdf_to_bus(pci->bdf) * 
PCI_REGION_TYPE_COUNT];
+for (type = 0; type < PCI_REGION_TYPE_COUNT;
+   type++, this_region++, parent++) {
+/* Only prefetchable bridge regions can be 64bit */
+is64bit = (type == PCI_REGION_TYPE_PREFMEM);
+entry = pci_region_create_entry(parent, pci, 0, type, is64bit);
+if (!entry)
+return -1;
+entry->this_region = this_region;
+this_region->this_entry = entry;
+}
+continue;
+}
+for (i = 0; i < PCI_NUM_REGIONS; i++) {
+size = pci_get_bar_size(pci, i, &type, &is64bit);
+if (size == 0)
+continue;
+min_size = (type == PCI_REGION_TYPE_IO) ?
+PCI_DEV_IO_MINSIZE : PCI_DEV_MEM_MINSIZE;
+size = (size > min_size) ? size : min_size;
+
+parent = ®ions[pci_bdf_to_bus(pci->bdf) * PCI_REGION_TYPE_COUNT
+  + type];
+entry = pci_region_create_entry(parent, pci, size, type, is64bit);
+if (!entry)
+return -1;
+entry->bar = i;
+dump_entry(entry);
+if (is64bit)
+i++;
+}
+}
+
+for (i = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT ; i < 0; i--) {
+struct pci_region_entry *this_entry = regions[i-1].this_entry;
+if(!this_entry)
+continue;
+
+is64bit = this_entry->is64bit;
+size = 0;
+foreach_region_entry(®ions[i-1], entry) {
+size += entry->size;
+is64bit &= entry->is64bit;
+}
+min_size = (this_entry->type == PCI_REGION_TYPE_IO) ?
+PCI_BRIDGE_IO_MINSIZE : PCI_BRIDGE_MEM_MINSIZE;
+size = (size > min_size) ? size : min_size;
+this_entry->is64bit = is64bit;
+this_entry->size = pci_size_roundup(size);
+dump_entry(entry);
+}
+return 0;
+}
+
+
 static void pci_bios_bus_reserve(struct pci_bus *bus, int type, u32 size)
 {
 u32 index;
@@ -645,6 +734,10 @@ pci_setup(void)
 return;
 }
 memset(regions, 0, sizeof(*regions) * num_regions);
+if (p

[Qemu-devel] [PATCH 02/13] Better support for dma_addr_t variables

2012-02-29 Thread David Gibson

A while back, we introduced the dma_addr_t type, which is supposed to
be used for bus visible memory addresses.  At present, this is an
alias for target_phys_addr_t, but this will change when we eventually
add support for guest visible IOMMUs.

There are some instances of target_phys_addr_t in the code now which
should really be dma_addr_t, but can't be trivially converted due to
missing features which this patch corrects.

 * We add DMA_ADDR_BITS analagous to TARGET_PHYS_ADDR_BITS.  This is
   important where we need to make a compile-time (#if) based on the
   size of dma_addr_t.

 * We add a new helper macro to create device properties which take a
   dma_addr_t.  This lets us correctly convert code which cuurrently
   has DEFINE_PROP_TADDR() for variables which should be dma_addr_t
   instead of target_phys_addr_t.

Signed-off-by: David Gibson 
---
 Makefile.objs |2 +-
 dma.h |1 +
 hw/qdev-dma.c |   77 +
 hw/qdev-dma.h |8 ++
 4 files changed, 87 insertions(+), 1 deletions(-)
 create mode 100644 hw/qdev-dma.c
 create mode 100644 hw/qdev-dma.h

diff --git a/Makefile.objs b/Makefile.objs
index 808de6a..59aed69 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -287,7 +287,7 @@ hw-obj-$(CONFIG_LSI_SCSI_PCI) += lsi53c895a.o
 hw-obj-$(CONFIG_ESP) += esp.o
 
 hw-obj-y += dma-helpers.o sysbus.o isa-bus.o
-hw-obj-y += qdev-addr.o
+hw-obj-y += qdev-addr.o qdev-dma.o
 
 # VGA
 hw-obj-$(CONFIG_VGA_PCI) += vga-pci.o
diff --git a/dma.h b/dma.h
index 05ac325..463095c 100644
--- a/dma.h
+++ b/dma.h
@@ -32,6 +32,7 @@ struct QEMUSGList {
 #if defined(TARGET_PHYS_ADDR_BITS)
 typedef target_phys_addr_t dma_addr_t;
 
+#define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
 #define DMA_ADDR_FMT TARGET_FMT_plx
 
 struct ScatterGatherEntry {
diff --git a/hw/qdev-dma.c b/hw/qdev-dma.c
new file mode 100644
index 000..2b21bf4
--- /dev/null
+++ b/hw/qdev-dma.c
@@ -0,0 +1,77 @@
+#include "qdev.h"
+#include "qdev-dma.h"
+#include "dma.h"
+
+/* --- target physical address --- */
+
+static int parse_dmaaddr(DeviceState *dev, Property *prop, const char *str)
+{
+dma_addr_t *ptr = qdev_get_prop_ptr(dev, prop);
+
+*ptr = strtoull(str, NULL, 16);
+return 0;
+}
+
+static int print_dmaaddr(DeviceState *dev, Property *prop, char *dest,
+ size_t len)
+{
+dma_addr_t *ptr = qdev_get_prop_ptr(dev, prop);
+return snprintf(dest, len, "0x" DMA_ADDR_FMT, *ptr);
+}
+
+static void get_dmaaddr(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+dma_addr_t *ptr = qdev_get_prop_ptr(dev, prop);
+int64_t value;
+
+value = *ptr;
+visit_type_int(v, &value, name, errp);
+}
+
+static void set_dmaaddr(Object *obj, Visitor *v, void *opaque,
+  const char *name, Error **errp)
+{
+DeviceState *dev = DEVICE(obj);
+Property *prop = opaque;
+dma_addr_t *ptr = qdev_get_prop_ptr(dev, prop);
+Error *local_err = NULL;
+int64_t value;
+
+if (dev->state != DEV_STATE_CREATED) {
+error_set(errp, QERR_PERMISSION_DENIED);
+return;
+}
+
+visit_type_int(v, &value, name, &local_err);
+if (local_err) {
+error_propagate(errp, local_err);
+return;
+}
+if ((uint64_t)value <= (uint64_t) ~(dma_addr_t)0) {
+*ptr = value;
+} else {
+error_set(errp, QERR_PROPERTY_VALUE_OUT_OF_RANGE,
+  dev->id?:"", name, value, (uint64_t) 0,
+  (uint64_t) ~(dma_addr_t)0);
+}
+}
+
+
+PropertyInfo qdev_prop_dmaaddr = {
+.name  = "dmaaddr",
+.parse = parse_dmaaddr,
+.print = print_dmaaddr,
+.get   = get_dmaaddr,
+.set   = set_dmaaddr,
+};
+
+void qdev_prop_set_dmaaddr(DeviceState *dev, const char *name, dma_addr_t 
value)
+{
+Error *errp = NULL;
+object_property_set_int(OBJECT(dev), value, name, &errp);
+assert(!errp);
+
+}
diff --git a/hw/qdev-dma.h b/hw/qdev-dma.h
new file mode 100644
index 000..8b01fda
--- /dev/null
+++ b/hw/qdev-dma.h
@@ -0,0 +1,8 @@
+#include "dma.h"
+
+#define DEFINE_PROP_DMAADDR(_n, _s, _f, _d)   \
+DEFINE_PROP_DEFAULT(_n, _s, _f, _d, qdev_prop_dmaaddr, dma_addr_t)
+
+extern PropertyInfo qdev_prop_dmaaddr;
+void qdev_prop_set_dmaaddr(DeviceState *dev, const char *name,
+   dma_addr_t value);
-- 
1.7.9

[Qemu-devel] [PATCH 10/13] iommu: Introduce IOMMU emulation infrastructure

2012-02-29 Thread David Gibson

This patch adds the basic infrastructure necessary to emulate an IOMMU
visible to the guest.  The DMAContext structure is extended with
information and a callback describing the translation, and the various
DMA functions used by devices will now perform IOMMU translation using
this callback.

Cc: Michael S. Tsirkin 
Cc: Richard Henderson 

Signed-off-by: Eduard - Gabriel Munteanu 
Signed-off-by: David Gibson 
---
 configure |   12 
 dma-helpers.c |  189 +
 dma.h |  126 +++---
 3 files changed, 305 insertions(+), 22 deletions(-)

diff --git a/configure b/configure
index fb0e18e..3eff89c 100755
--- a/configure
+++ b/configure
@@ -138,6 +138,7 @@ linux_aio=""
 cap_ng=""
 attr=""
 libattr=""
+iommu="yes"
 xfs=""
 
 vhost_net="no"
@@ -784,6 +785,10 @@ for opt do
   ;;
   --enable-vhost-net) vhost_net="yes"
   ;;
+  --enable-iommu) iommu="yes"
+  ;;
+  --disable-iommu) iommu="no"
+  ;;
   --disable-opengl) opengl="no"
   ;;
   --enable-opengl) opengl="yes"
@@ -1085,6 +1090,8 @@ echo "  --enable-docsenable documentation 
build"
 echo "  --disable-docs   disable documentation build"
 echo "  --disable-vhost-net  disable vhost-net acceleration support"
 echo "  --enable-vhost-net   enable vhost-net acceleration support"
+echo "  --disable-iommu  disable IOMMU emulation support"
+echo "  --enable-iommu   enable IOMMU emulation support"
 echo "  --enable-trace-backend=B Set trace backend"
 echo "   Available backends:" 
$("$source_path"/scripts/tracetool --list-backends)
 echo "  --with-trace-file=NAME   Full PATH,NAME of file to store traces"
@@ -2935,6 +2942,7 @@ echo "posix_madvise $posix_madvise"
 echo "uuid support  $uuid"
 echo "libcap-ng support $cap_ng"
 echo "vhost-net support $vhost_net"
+echo "IOMMU support $iommu"
 echo "Trace backend $trace_backend"
 echo "Trace output file $trace_file-"
 echo "spice support $spice"
@@ -3812,6 +3820,10 @@ if test "$target_softmmu" = "yes" -a \( \
   echo "CONFIG_NEED_MMU=y" >> $config_target_mak
 fi
 
+if test "$iommu" = "yes" ; then
+  echo "CONFIG_IOMMU=y" >> $config_host_mak
+fi
+
 if test "$gprof" = "yes" ; then
   echo "TARGET_GPROF=yes" >> $config_target_mak
   if test "$target_linux_user" = "yes" ; then
diff --git a/dma-helpers.c b/dma-helpers.c
index 9dcfb2c..8269d60 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -10,6 +10,9 @@
 #include "dma.h"
 #include "block_int.h"
 #include "trace.h"
+#include "range.h"
+
+/*#define DEBUG_IOMMU*/
 
 void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma)
 {
@@ -255,3 +258,189 @@ void dma_acct_start(BlockDriverState *bs, BlockAcctCookie 
*cookie,
 {
 bdrv_acct_start(bs, cookie, sg->size, type);
 }
+
+#ifdef CONFIG_IOMMU
+bool __dma_memory_valid(DMAContext *dma, dma_addr_t addr, dma_addr_t len,
+DMADirection dir)
+{
+target_phys_addr_t paddr, plen;
+
+#ifdef DEBUG_DMA
+fprintf(stderr, "dma_memory_check iommu=%p addr=0x%llx len=0x%llx 
dir=%d\n",
+(unsigned long long)addr, (unsigned long long)len, dir);
+#endif
+
+while (len) {
+if (dma->translate(dma, addr, &paddr, &plen, dir) != 0) {
+return false;
+}
+
+/* The translation might be valid for larger regions. */
+if (plen > len) {
+plen = len;
+}
+
+len -= plen;
+addr += plen;
+}
+
+return true;
+}
+
+int __dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+void *buf, dma_addr_t len, DMADirection dir)
+{
+target_phys_addr_t paddr, plen;
+int err;
+
+#ifdef DEBUG_IOMMU
+fprintf(stderr, "dma_memory_rw iommu=%p addr=0x%llx len=0x%llx dir=%d\n",
+(unsigned long long)addr, (unsigned long long)len, dir);
+#endif
+
+while (len) {
+err = dma->translate(dma, addr, &paddr, &plen, dir);
+if (err) {
+return -1;
+}
+
+/* The translation might be valid for larger regions. */
+if (plen > len) {
+plen = len;
+}
+
+cpu_physical_memory_rw(paddr, buf, plen,
+   dir == DMA_DIRECTION_FROM_DEVICE);
+
+len -= plen;
+addr += plen;
+buf += plen;
+}
+
+return 0;
+}
+
+int __dma_memory_zero(DMAContext *dma, dma_addr_t addr, dma_addr_t len)
+{
+target_phys_addr_t paddr, plen;
+int err;
+
+#ifdef DEBUG_IOMMU
+fprintf(stderr, "dma_memory_zero iommu=%p addr=0x%llx len=0x%llx\n",
+(unsigned long long)addr, (unsigned long long)len);
+#endif
+
+while (len) {
+err = dma->translate(dma, addr, &paddr, &plen,
+ DMA_DIRECTION_FROM_DEVICE);
+if (err) {
+return -1;
+}
+
+/* The translation might be valid for larger regions. */
+if (plen > len) {
+plen = len;
+}
+
+

[Qemu-devel] [PATCH 04/13] Implement cpu_physical_memory_zero()

2012-02-29 Thread David Gibson

This patch adds cpu_physical_memory_zero() function.  This is equivalent to
calling cpu_physical_memory_write() with a buffer full of zeroes, but
avoids actually allocating such a buffer along the way.

Signed-off-by: David Gibson 
---
 cpu-common.h |1 +
 exec.c   |   56 
 2 files changed, 57 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index a40c57d..aee37bf 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -53,6 +53,7 @@ void qemu_ram_set_idstr(ram_addr_t addr, const char *name, 
DeviceState *dev);
 
 void cpu_physical_memory_rw(target_phys_addr_t addr, uint8_t *buf,
 int len, int is_write);
+void cpu_physical_memory_zero(target_phys_addr_t addr, int len);
 static inline void cpu_physical_memory_read(target_phys_addr_t addr,
 void *buf, int len)
 {
diff --git a/exec.c b/exec.c
index 0094f53..145a753 100644
--- a/exec.c
+++ b/exec.c
@@ -3639,6 +3639,62 @@ void cpu_physical_memory_rw(target_phys_addr_t addr, 
uint8_t *buf,
 }
 }
 
+void cpu_physical_memory_zero(target_phys_addr_t addr, int len)
+{
+int l, io_index;
+uint8_t *ptr;
+target_phys_addr_t page;
+ram_addr_t pd;
+PhysPageDesc p;
+
+while (len > 0) {
+page = addr & TARGET_PAGE_MASK;
+l = (page + TARGET_PAGE_SIZE) - addr;
+if (l > len) {
+l = len;
+}
+p = phys_page_find(page >> TARGET_PAGE_BITS);
+pd = p.phys_offset;
+
+if ((pd & ~TARGET_PAGE_MASK) != io_mem_ram.ram_addr) {
+target_phys_addr_t addr1;
+io_index = pd & (IO_MEM_NB_ENTRIES - 1);
+addr1 = (addr & ~TARGET_PAGE_MASK) + p.region_offset;
+/* XXX: could force cpu_single_env to NULL to avoid
+   potential bugs */
+if (l >= 4 && ((addr1 & 3) == 0)) {
+/* 32 bit write access */
+io_mem_write(io_index, addr1, 0, 4);
+l = 4;
+} else if (l >= 2 && ((addr1 & 1) == 0)) {
+/* 16 bit write access */
+io_mem_write(io_index, addr1, 0, 2);
+l = 2;
+} else {
+/* 8 bit write access */
+io_mem_write(io_index, addr1, 0, 1);
+l = 1;
+}
+} else {
+ram_addr_t addr1;
+addr1 = (pd & TARGET_PAGE_MASK) + (addr & ~TARGET_PAGE_MASK);
+/* RAM case */
+ptr = qemu_get_ram_ptr(addr1);
+memset(ptr, 0, l);
+if (!cpu_physical_memory_is_dirty(addr1)) {
+/* invalidate code */
+tb_invalidate_phys_page_range(addr1, addr1 + l, 0);
+/* set dirty bit */
+cpu_physical_memory_set_dirty_flags(
+addr1, (0xff & ~CODE_DIRTY_FLAG));
+}
+qemu_put_ram_ptr(ptr);
+}
+len -= l;
+addr += l;
+}
+}
+
 /* used for ROM loading : can write in RAM and ROM */
 void cpu_physical_memory_write_rom(target_phys_addr_t addr,
const uint8_t *buf, int len)
-- 
1.7.9

[Qemu-devel] [PATCH 09/13] usb: Convert usb_packet_{map, unmap} to universal DMA helpers

2012-02-29 Thread David Gibson

The USB UHCI and EHCI drivers were converted some time ago to use the
pci_dma_*() helper functions.  However, this conversion was not complete
because in some places both these drivers do DMA via the usb_packet_map()
function in usb-libhw.c.  That function directly used
cpu_physical_memory_map().

Now that the sglist code uses DMA wrappers properly, we can convert the
functions in usb-libhw.c, thus conpleting the conversion of UHCI and EHCI
to use the DMA wrappers.

Note that usb_packet_map() invokes dma_memory_map() with a NULL invalidate
callback function.  When IOMMU support is added, this will mean that
usb_packet_map() and the corresponding usb_packet_unmap() must be called in
close proximity without dropping the qemu device lock - otherwise the guest
might invalidate IOMMU mappings while they are still in use by the device
code.

Signed-off-by: David Gibson 
---
 hw/usb-ehci.c  |4 ++--
 hw/usb-libhw.c |   20 +++-
 hw/usb-uhci.c  |2 +-
 hw/usb.h   |2 +-
 4 files changed, 15 insertions(+), 13 deletions(-)

diff --git a/hw/usb-ehci.c b/hw/usb-ehci.c
index afc8ccf..f7dd50e 100644
--- a/hw/usb-ehci.c
+++ b/hw/usb-ehci.c
@@ -1343,7 +1343,7 @@ err:
 set_field(&q->qh.token, q->tbytes, QTD_TOKEN_TBYTES);
 }
 ehci_finish_transfer(q, q->usb_status);
-usb_packet_unmap(&q->packet);
+usb_packet_unmap(&q->packet, &q->sgl);
 
 q->qh.token ^= QTD_TOKEN_DTOGGLE;
 q->qh.token &= ~QTD_TOKEN_ACTIVE;
@@ -1464,7 +1464,7 @@ static int ehci_process_itd(EHCIState *ehci,
 usb_packet_map(&ehci->ipacket, &ehci->isgl);
 ret = usb_handle_packet(dev, &ehci->ipacket);
 assert(ret != USB_RET_ASYNC);
-usb_packet_unmap(&ehci->ipacket);
+usb_packet_unmap(&ehci->ipacket, &ehci->isgl);
 } else {
 DPRINTF("ISOCH: attempt to addess non-iso endpoint\n");
 ret = USB_RET_NAK;
diff --git a/hw/usb-libhw.c b/hw/usb-libhw.c
index 162b42b..88282de 100644
--- a/hw/usb-libhw.c
+++ b/hw/usb-libhw.c
@@ -26,15 +26,16 @@
 
 int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
 {
-int is_write = (p->pid == USB_TOKEN_IN);
+DMADirection dir = (p->pid == USB_TOKEN_IN) ?
+DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
 target_phys_addr_t len;
 void *mem;
 int i;
 
 for (i = 0; i < sgl->nsg; i++) {
 len = sgl->sg[i].len;
-mem = cpu_physical_memory_map(sgl->sg[i].base, &len,
-  is_write);
+mem = dma_memory_map(sgl->dma, NULL, NULL,
+ sgl->sg[i].base, &len, dir);
 if (!mem) {
 goto err;
 }
@@ -46,18 +47,19 @@ int usb_packet_map(USBPacket *p, QEMUSGList *sgl)
 return 0;
 
 err:
-usb_packet_unmap(p);
+usb_packet_unmap(p, sgl);
 return -1;
 }
 
-void usb_packet_unmap(USBPacket *p)
+void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl)
 {
-int is_write = (p->pid == USB_TOKEN_IN);
+DMADirection dir = (p->pid == USB_TOKEN_IN) ?
+DMA_DIRECTION_FROM_DEVICE : DMA_DIRECTION_TO_DEVICE;
 int i;
 
 for (i = 0; i < p->iov.niov; i++) {
-cpu_physical_memory_unmap(p->iov.iov[i].iov_base,
-  p->iov.iov[i].iov_len, is_write,
-  p->iov.iov[i].iov_len);
+dma_memory_unmap(sgl->dma, p->iov.iov[i].iov_base,
+ p->iov.iov[i].iov_len, dir,
+ p->iov.iov[i].iov_len);
 }
 }
diff --git a/hw/usb-uhci.c b/hw/usb-uhci.c
index 70e3881..a7d2dbc 100644
--- a/hw/usb-uhci.c
+++ b/hw/usb-uhci.c
@@ -863,7 +863,7 @@ static int uhci_handle_td(UHCIState *s, uint32_t addr, 
UHCI_TD *td, uint32_t *in
 
 done:
 len = uhci_complete_td(s, td, async, int_mask);
-usb_packet_unmap(&async->packet);
+usb_packet_unmap(&async->packet, &async->sgl);
 uhci_async_free(async);
 return len;
 }
diff --git a/hw/usb.h b/hw/usb.h
index 8e83697..720a045 100644
--- a/hw/usb.h
+++ b/hw/usb.h
@@ -336,7 +336,7 @@ void usb_packet_set_state(USBPacket *p, USBPacketState 
state);
 void usb_packet_setup(USBPacket *p, int pid, USBEndpoint *ep);
 void usb_packet_addbuf(USBPacket *p, void *ptr, size_t len);
 int usb_packet_map(USBPacket *p, QEMUSGList *sgl);
-void usb_packet_unmap(USBPacket *p);
+void usb_packet_unmap(USBPacket *p, QEMUSGList *sgl);
 void usb_packet_copy(USBPacket *p, void *ptr, size_t bytes);
 void usb_packet_skip(USBPacket *p, size_t bytes);
 void usb_packet_cleanup(USBPacket *p);
-- 
1.7.9

[Qemu-devel] [PATCH 13/13] pseries: Implement IOMMU and DMA for PAPR PCI devices

2012-02-29 Thread David Gibson

Currently the pseries machine emulation does not support DMA for emulated
PCI devices, because the PAPR spec always requires a (guest visible,
paravirtualized) IOMMU which was not implemented.  Now that we have
infrastructure for IOMMU emulation, we can correct this and allow PCI DMA
for pseries.

With the existing PAPR IOMMU code used for VIO devices, this is almost
trivial. We use a single DMAContext for each (virtual) PCI host bridge,
which is the usual configuration on real PAPR machines (which often have
_many_ PCI host bridges).

Cc: Alex Graf 

Signed-off-by: David Gibson 
---
 hw/spapr.h |1 +
 hw/spapr_pci.c |   15 +++
 hw/spapr_pci.h |1 +
 3 files changed, 17 insertions(+), 0 deletions(-)

diff --git a/hw/spapr.h b/hw/spapr.h
index 6ecdca2..0306140 100644
--- a/hw/spapr.h
+++ b/hw/spapr.h
@@ -327,6 +327,7 @@ typedef struct sPAPRTCE {
 } sPAPRTCE;
 
 #define SPAPR_VIO_BASE_LIOBN0x
+#define SPAPR_PCI_BASE_LIOBN0x8000
 
 void spapr_iommu_init(void);
 DMAContext *spapr_tce_new_dma_context(uint32_t liobn, size_t window_size);
diff --git a/hw/spapr_pci.c b/hw/spapr_pci.c
index 654cb56..0a7aab7 100644
--- a/hw/spapr_pci.c
+++ b/hw/spapr_pci.c
@@ -201,6 +201,14 @@ static MemoryRegionOps spapr_io_ops = {
 /*
  * PHB PCI device
  */
+static DMAContext *spapr_pci_dma_context_fn(PCIBus *bus, void *opaque,
+int devfn)
+{
+sPAPRPHBState *phb = opaque;
+
+return phb->dma;
+}
+
 static int spapr_phb_init(SysBusDevice *s)
 {
 sPAPRPHBState *phb = FROM_SYSBUS(sPAPRPHBState, s);
@@ -208,6 +216,7 @@ static int spapr_phb_init(SysBusDevice *s)
 char namebuf[64];
 int i;
 PCIBus *bus;
+uint32_t liobn;
 
 sprintf(busname, "pci@%" PRIx64, phb->buid);
 
@@ -248,6 +257,10 @@ static int spapr_phb_init(SysBusDevice *s)
  PCI_DEVFN(0, 0),
  SPAPR_PCI_NUM_LSI);
 
+liobn = SPAPR_PCI_BASE_LIOBN | (pci_find_domain(bus) << 16);
+phb->dma = spapr_tce_new_dma_context(liobn, 0x4000);
+pci_setup_iommu(bus, spapr_pci_dma_context_fn, phb);
+
 QLIST_INSERT_HEAD(&spapr->phbs, phb, list);
 
 /* Initialize the LSI table */
@@ -407,5 +420,7 @@ int spapr_populate_pci_devices(sPAPRPHBState *phb,
 _FDT(fdt_setprop(fdt, bus_off, "interrupt-map", &interrupt_map,
  7 * sizeof(interrupt_map[0])));
 
+spapr_dma_dt(fdt, bus_off, "ibm,dma-window", phb->dma);
+
 return 0;
 }
diff --git a/hw/spapr_pci.h b/hw/spapr_pci.h
index b4b8a73..365c75e 100644
--- a/hw/spapr_pci.h
+++ b/hw/spapr_pci.h
@@ -37,6 +37,7 @@ typedef struct sPAPRPHBState {
 MemoryRegion memspace, iospace;
 target_phys_addr_t mem_win_addr, mem_win_size, io_win_addr, io_win_size;
 MemoryRegion memwindow, iowindow;
+DMAContext *dma;
 
 struct {
 uint32_t dt_irq;
-- 
1.7.9

[Qemu-devel] [PATCH] usb: Fix compilation for hosts which use the usb-stub

2012-02-29 Thread Stefan Weil

Commit 3741715cf2e54727fe3d9884ea6dcea68c7f7d4b added a new parameter
which was still missing here.

Signed-off-by: Stefan Weil 
---
 usb-stub.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/usb-stub.c b/usb-stub.c
index 9c3fcea..bfd8bb0 100644
--- a/usb-stub.c
+++ b/usb-stub.c
@@ -41,7 +41,7 @@ void usb_host_info(Monitor *mon)
 }
 
 /* XXX: modify configure to compile the right host driver */
-USBDevice *usb_host_device_open(const char *devname)
+USBDevice *usb_host_device_open(USBBus *guest_bus, const char *devname)
 {
 return NULL;
 }
-- 
1.7.0.4

[Qemu-devel] [PATCH 05/13] iommu: Add universal DMA helper functions

2012-02-29 Thread David Gibson

Not that long ago, every device implementation using DMA directly
accessed guest memory using cpu_physical_memory_*().  This meant that
adding support for a guest visible IOMMU would require changing every
one of these devices to go through IOMMU translation.

Shortly before qemu 1.0, I made a start on fixing this by providing
helper functions for PCI DMA.  These are currently just stubs which
call the direct access functions, but mean that an IOMMU can be
implemented in one place, rather than for every PCI device.

Clearly, this doesn't help for non PCI devices, which could also be
IOMMU translated on some platforms.  It is also problematic for the
devices which have both PCI and non-PCI version (e.g. OHCI, AHCI) - we
cannot use the the pci_dma_*() functions, because they assume the
presence of a PCIDevice, but we don't want to have to check between
pci_dma_*() and cpu_physical_memory_*() every time we do a DMA in the
device code.

This patch makes the first step on addressing both these problems, by
introducing new (stub) dma helper functions which can be used for any
DMA capable device.

These dma functions take a DMAContext *, a new (currently empty)
variable describing the DMA address space in which the operation is to
take place.  NULL indicates untranslated DMA directly into guest
physical address space.  The intention is that in future non-NULL
values will given information about any necessary IOMMU translation.

DMA using devices must obtain a DMAContext (or, potentially, contexts)
from their bus or platform.  For now this patch just converts the PCI
wrappers to be implemented in terms of the universal wrappers,
converting other drivers can take place over time.

Cc: Michael S. Tsirkin 
Cc: Joerg Rodel 
Cc: Eduard - Gabriel Munteanu 
Cc: Richard Henderson 

Signed-off-by: David Gibson 
---
 dma.h |  103 +
 hw/pci.h  |   22 +++-
 qemu-common.h |1 +
 3 files changed, 117 insertions(+), 9 deletions(-)

diff --git a/dma.h b/dma.h
index 463095c..8b6ef44 100644
--- a/dma.h
+++ b/dma.h
@@ -35,6 +35,109 @@ typedef target_phys_addr_t dma_addr_t;
 #define DMA_ADDR_BITS TARGET_PHYS_ADDR_BITS
 #define DMA_ADDR_FMT TARGET_FMT_plx
 
+typedef void DMAInvalidateMapFunc(void *);
+
+/* Checks that the given range of addresses is valid for DMA.  This is
+ * useful for certain cases, but usually you should just use
+ * dma_memory_{read,write}() and check for errors */
+static inline bool dma_memory_valid(DMAContext *dma, dma_addr_t addr,
+dma_addr_t len, DMADirection dir)
+{
+/* Stub version, with no iommu we assume all bus addresses are valid */
+return true;
+}
+
+static inline int dma_memory_rw(DMAContext *dma, dma_addr_t addr,
+void *buf, dma_addr_t len, DMADirection dir)
+{
+/* Stub version when we have no iommu support */
+cpu_physical_memory_rw(addr, buf, (target_phys_addr_t)len,
+   dir == DMA_DIRECTION_FROM_DEVICE);
+return 0;
+}
+
+static inline int dma_memory_read(DMAContext *dma, dma_addr_t addr,
+  void *buf, dma_addr_t len)
+{
+return dma_memory_rw(dma, addr, buf, len, DMA_DIRECTION_TO_DEVICE);
+}
+
+static inline int dma_memory_write(DMAContext *dma, dma_addr_t addr,
+   const void *buf, dma_addr_t len)
+{
+return dma_memory_rw(dma, addr, (void *)buf, len,
+ DMA_DIRECTION_FROM_DEVICE);
+}
+
+static inline int dma_memory_zero(DMAContext *dma, dma_addr_t addr,
+  dma_addr_t len)
+{
+/* Stub version when we have no iommu support */
+cpu_physical_memory_zero(addr, len);
+return 0;
+}
+
+static inline void *dma_memory_map(DMAContext *dma,
+   DMAInvalidateMapFunc *cb, void *opaque,
+   dma_addr_t addr, dma_addr_t *len,
+   DMADirection dir)
+{
+target_phys_addr_t xlen = *len;
+void *p;
+
+p = cpu_physical_memory_map(addr, &xlen,
+dir == DMA_DIRECTION_FROM_DEVICE);
+*len = xlen;
+return p;
+}
+
+static inline void dma_memory_unmap(DMAContext *dma,
+void *buffer, dma_addr_t len,
+DMADirection dir, dma_addr_t access_len)
+{
+return cpu_physical_memory_unmap(buffer, (target_phys_addr_t)len,
+ dir == DMA_DIRECTION_FROM_DEVICE,
+ access_len);
+}
+
+#define DEFINE_LDST_DMA(_lname, _sname, _bits, _end) \
+static inline uint##_bits##_t ld##_lname##_##_end##_dma(DMAContext *dma, \
+dma_addr_t addr) \
+{   \
+uint##_bits##_t val;

Re: [Qemu-devel] [RFC][PATCH 03/14 v7] target-i386: implement cpu_get_memory_mapping()

2012-02-29 Thread Wen Congyang

At 03/01/2012 02:13 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: [RFC][PATCH 03/14 v7] target-i386: implement cpu_get_memory_mapping()
> Date: Thu, 01 Mar 2012 10:41:47 +0800
> 
>> +int cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
>> +{
>> +if (env->cr[4] & CR4_PAE_MASK) {
>> +#ifdef TARGET_X86_64
>> +if (env->hflags & HF_LMA_MASK) {
>> +target_phys_addr_t pml4e_addr;
>> +
>> +pml4e_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
>> +walk_pml4e(list, pml4e_addr, env->a20_mask);
>> +} else
>> +#endif
>> +{
>> +target_phys_addr_t pdpe_addr;
>> +
>> +pdpe_addr = (env->cr[3] & ~0x1f) & env->a20_mask;
>> +walk_pdpe2(list, pdpe_addr, env->a20_mask);
>> +}
>> +} else {
>> +target_phys_addr_t pde_addr;
>> +bool pse;
>> +
>> +pde_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
>> +pse = !!(env->cr[4] & CR4_PSE_MASK);
>> +walk_pde2(list, pde_addr, env->a20_mask, pse);
>> +}
>> +
>> +return 0;
>> +}
> 
> Does this assume paging mode? I don't know qemu very well, but qemu
> dump command runs externally to guest machine, so I think the machine
> could be in the state with paging disabled where CR4 doesn't refer to
> page table as expected.

CR4? I think you want to say CR3.

Yes, the guest may be in the state without paging mode. I will fix it.

Thanks
Wen Congyang

> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

Re: [Qemu-devel] [RFC][PATCH 04/14 v7] Add API to get memory mapping

2012-02-29 Thread Wen Congyang

At 03/01/2012 02:01 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: [RFC][PATCH 04/14 v7] Add API to get memory mapping
> Date: Thu, 01 Mar 2012 10:43:13 +0800
> 
>> +int qemu_get_guest_memory_mapping(MemoryMappingList *list)
>> +{
>> +CPUState *env;
>> +MemoryMapping *memory_mapping;
>> +RAMBlock *block;
>> +ram_addr_t offset, length;
>> +int ret;
>> +
>> +#if defined(CONFIG_HAVE_GET_MEMORY_MAPPING)
>> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
>> +ret = cpu_get_memory_mapping(list, env);
>> +if (ret < 0) {
>> +return -1;
>> +}
>> +}
>> +#else
>> +return -2;
>> +#endif
>> +
>> +/* some memory may be not mapped, add them into memory mapping's list */
> 
> The part from here is logic fully for 2nd kernel? If so, I think it
> better to describe why this addtional mapping is needed; we should
> assume most people doesn't know kdump mechanism.

Not only for 2nd kernel. If the guest has 1 vcpu, and is in the 2nd kernel,
we need this logic for 1st kernel.

Thanks
Wen Congyang

> 
>> +QLIST_FOREACH(block, &ram_list.blocks, next) {
>> +offset = block->offset;
>> +length = block->length;
>> +
>> +QTAILQ_FOREACH(memory_mapping, &list->head, next) {
>> +if (memory_mapping->phys_addr >= (offset + length)) {
>> +/*
>> + * memory_mapping's list does not conatin the region
>> + * [offset, offset+length)
>> + */
>> +create_new_memory_mapping(list, offset, 0, length);
>> +length = 0;
>> +break;
>> +}
>> +
>> +if ((memory_mapping->phys_addr + memory_mapping->length) <=
>> +offset) {
>> +continue;
>> +}
>> +
>> +if (memory_mapping->phys_addr > offset) {
>> +/*
>> + * memory_mapping's list does not conatin the region
>> + * [offset, memory_mapping->phys_addr)
>> + */
>> +create_new_memory_mapping(list, offset, 0,
>> +  memory_mapping->phys_addr - 
>> offset);
>> +}
>> +
>> +if ((offset + length) <=
>> +(memory_mapping->phys_addr + memory_mapping->length)) {
>> +length = 0;
>> +break;
>> +}
>> +length -= memory_mapping->phys_addr + memory_mapping->length -
>> +  offset;
>> +offset = memory_mapping->phys_addr + memory_mapping->length;
>> +}
>> +
>> +if (length > 0) {
>> +/*
>> + * memory_mapping's list does not conatin the region
>> + * [offset, memory_mapping->phys_addr)
>> + */
>> +create_new_memory_mapping(list, offset, 0, length);
>> +}
>> +}
>> +
>> +return 0;
>> +}
> 
> I think it more readable if shortening memory_mapping->phys_addr and
> memmory_maping->length at the berinning of the innermost foreach loop.
> 
>   m_phys_addr = memory_mapping->phys_addr;
>   m_length = memory_mapping->length;
> 
> Then, each conditionals becomes compact.
> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

[Qemu-devel] [PATCH 2/6] New service functions and ported old functions to 64bit

2012-02-29 Thread Alexey Korolev

This patch is all about service functions
It includes: 
- basic operations with lists
- 64bit modification of Pci_size_roundup()
- modification of pci_bios_get_bar to support HUGE bars (size over 4GB)
- 2 service function to get pci_region statistics
- dump entry - for debug output

Signed-off-by: Alexey Korolev 
---
 src/pciinit.c |  132 +++--
 1 files changed, 91 insertions(+), 41 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index 2e5416c..dbfa4f2 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -51,33 +51,30 @@ struct pci_region {
 u64 base;
 };
 
-static int pci_size_to_index(u32 size, enum pci_region_type type)
-{
-int index = __fls(size);
-int shift = (type == PCI_REGION_TYPE_IO) ?
-PCI_IO_INDEX_SHIFT : PCI_MEM_INDEX_SHIFT;
-
-if (index < shift)
-index = shift;
-index -= shift;
-return index;
-}
+#define foreach_region_entry(R, ENTRY)\
+for (ENTRY = (R)->list; ENTRY; ENTRY = ENTRY->next)
 
-static u32 pci_index_to_size(int index, enum pci_region_type type)
-{
-int shift = (type == PCI_REGION_TYPE_IO) ?
-PCI_IO_INDEX_SHIFT : PCI_MEM_INDEX_SHIFT;
+#define foreach_region_entry_safe(R, N, ENTRY)\
+for (ENTRY = (R)->list; ENTRY && ({N=ENTRY->next; 1;}); \
+ENTRY = N)
 
-return 0x1 << (index + shift);
+static inline void region_entry_del(struct pci_region_entry *entry)
+{
+struct pci_region_entry *next = entry->next;
+*entry->pprev = next;
+if (next)
+next->pprev = entry->pprev;
 }
 
-static enum pci_region_type pci_addr_to_type(u32 addr)
+static inline void
+region_entry_add(struct pci_region *r, struct pci_region_entry *entry)
 {
-if (addr & PCI_BASE_ADDRESS_SPACE_IO)
-return PCI_REGION_TYPE_IO;
-if (addr & PCI_BASE_ADDRESS_MEM_PREFETCH)
-return PCI_REGION_TYPE_PREFMEM;
-return PCI_REGION_TYPE_MEM;
+struct pci_region_entry *first = r->list;
+entry->next = first;
+if (first)
+first->pprev = &entry->next;
+r->list = entry;
+entry->pprev = &r->list;
 }
 
 static u32 pci_bar(struct pci_device *pci, int region_num)
@@ -324,38 +321,91 @@ pci_bios_init_bus(void)
 pci_bios_init_bus_rec(0 /* host bus */, &pci_bus);
 }
 
-
 /
  * Bus sizing
  /
 
-static u32 pci_size_roundup(u32 size)
+static u64 pci_size_roundup(u64 size)
 {
-int index = __fls(size-1)+1;
-return 0x1 << index;
+int index = __fls((u32)((size - 1) >> 32));
+if (!index)
+   index = __fls((u32)(size - 1));
+return 0x1 << (index + 1);
 }
 
-static void
-pci_bios_get_bar(struct pci_device *pci, int bar, u32 *val, u32 *size)
+static u64
+pci_get_bar_size(struct pci_device *pci, int bar,
+  enum pci_region_type *type, int *is64bit)
 {
 u32 ofs = pci_bar(pci, bar);
 u16 bdf = pci->bdf;
-u32 old = pci_config_readl(bdf, ofs);
-u32 mask;
-
-if (bar == PCI_ROM_SLOT) {
-mask = PCI_ROM_ADDRESS_MASK;
-pci_config_writel(bdf, ofs, mask);
+u32 l, sz, mask;
+
+mask = (bar == PCI_ROM_SLOT) ? PCI_ROM_ADDRESS_MASK : ~0;
+l = pci_config_readl(bdf, ofs);
+pci_config_writel(bdf, ofs, mask);
+sz = pci_config_readl(bdf, ofs);
+pci_config_writel(bdf, ofs, l);
+
+*is64bit = 0;
+if (l & PCI_BASE_ADDRESS_SPACE_IO) {
+mask = PCI_BASE_ADDRESS_IO_MASK;
+*type = PCI_REGION_TYPE_IO;
 } else {
-if (old & PCI_BASE_ADDRESS_SPACE_IO)
-mask = PCI_BASE_ADDRESS_IO_MASK;
+mask = PCI_BASE_ADDRESS_MEM_MASK;
+if (l & PCI_BASE_ADDRESS_MEM_TYPE_64)
+*is64bit = 1;
+if (l & PCI_BASE_ADDRESS_MEM_PREFETCH)
+*type = PCI_REGION_TYPE_PREFMEM;
 else
-mask = PCI_BASE_ADDRESS_MEM_MASK;
-pci_config_writel(bdf, ofs, ~0);
+*type = PCI_REGION_TYPE_MEM;
+}
+if (*is64bit) {
+u64 mask64, sz64 = sz;
+l = pci_config_readl(bdf, ofs + 4);
+pci_config_writel(bdf, ofs + 4, ~0);
+sz = pci_config_readl(bdf, ofs + 4);
+pci_config_writel(bdf, ofs + 4, l);
+mask64 = mask | ((u64)0x << 32);
+sz64 |= ((u64)sz << 32);
+return (~(sz64 & mask64)) + 1;
+}
+return (u32)((~(sz & mask)) + 1);
+}
+
+static u64 pci_region_max_size(struct pci_region *r)
+{
+u64 max = 0;
+struct pci_region_entry *entry;
+foreach_region_entry(r, entry) {
+max = (max > entry->size) ? max : entry->size;
+}
+return max;
+}
+
+static u64 pci_region_sum(struct pci_region *r)
+{
+u64 sum = 0;
+struct pci_region_entry *entry;
+foreach_region_entry(r, entry) {
+sum += entry->size;
 }
-*val = pci_config_readl(bdf, ofs);
-pci_config_writel(bdf, ofs, old);
-*size = (~(*val & mask)) + 1;
+return sum;
+}
+
+static void
+dump

Re: [Qemu-devel] [RFC][PATCH 03/14 v7] target-i386: implement cpu_get_memory_mapping()

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: [RFC][PATCH 03/14 v7] target-i386: implement cpu_get_memory_mapping()
Date: Thu, 01 Mar 2012 10:41:47 +0800

> +int cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env)
> +{
> +if (env->cr[4] & CR4_PAE_MASK) {
> +#ifdef TARGET_X86_64
> +if (env->hflags & HF_LMA_MASK) {
> +target_phys_addr_t pml4e_addr;
> +
> +pml4e_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
> +walk_pml4e(list, pml4e_addr, env->a20_mask);
> +} else
> +#endif
> +{
> +target_phys_addr_t pdpe_addr;
> +
> +pdpe_addr = (env->cr[3] & ~0x1f) & env->a20_mask;
> +walk_pdpe2(list, pdpe_addr, env->a20_mask);
> +}
> +} else {
> +target_phys_addr_t pde_addr;
> +bool pse;
> +
> +pde_addr = (env->cr[3] & ~0xfff) & env->a20_mask;
> +pse = !!(env->cr[4] & CR4_PSE_MASK);
> +walk_pde2(list, pde_addr, env->a20_mask, pse);
> +}
> +
> +return 0;
> +}

Does this assume paging mode? I don't know qemu very well, but qemu
dump command runs externally to guest machine, so I think the machine
could be in the state with paging disabled where CR4 doesn't refer to
page table as expected.

Thanks.
HATAYAMA, Daisuke

[Qemu-devel] [Seabios] [PATCH 1/6] Adding new structures

2012-02-29 Thread Alexey Korolev

This patch introduces two structures instead of old pci_bus one.
The pci_region structure describes one bus region. It includes a list 
of pci_region_entries and base address. 
Number of pci_region structures can be calculated:
PCI_REGION_TYPE_COUNT * number of busses.
Extra two regions can be added if we need 64bit address ranges.

The pci_region_entry describes PCI BAR resource or downstream PCI region 
(bridge region).
Each entry should be assigned to a particular pci_region. If the entry 
describes a bridge
region, it provides pci_region to downstream devices.
Having this we can easily build topology and migrate entries if necessary.


Signed-off-by: Alexey Korolev  

---
 src/pci.h |6 --
 src/pciinit.c |   37 ++---
 2 files changed, 22 insertions(+), 21 deletions(-)

diff --git a/src/pci.h b/src/pci.h
index a2a5a4c..8fa064f 100644
--- a/src/pci.h
+++ b/src/pci.h
@@ -51,12 +51,6 @@ struct pci_device {
 u8 prog_if, revision;
 u8 header_type;
 u8 secondary_bus;
-struct {
-u32 addr;
-u32 size;
-int is64;
-} bars[PCI_NUM_REGIONS];
-
 // Local information on device.
 int have_driver;
 };
diff --git a/src/pciinit.c b/src/pciinit.c
index 9f3fdd4..2e5416c 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -31,18 +31,24 @@ static const char *region_type_name[] = {
 [ PCI_REGION_TYPE_PREFMEM ] = "prefmem",
 };
 
-struct pci_bus {
-struct {
-/* pci region stats */
-u32 count[32 - PCI_MEM_INDEX_SHIFT];
-u32 sum, max;
-/* seconday bus region sizes */
-u32 size;
-/* pci region assignments */
-u32 bases[32 - PCI_MEM_INDEX_SHIFT];
-u32 base;
-} r[PCI_REGION_TYPE_COUNT];
-struct pci_device *bus_dev;
+struct pci_region;
+struct pci_region_entry {
+struct pci_device *dev;
+int bar;
+u64 base;
+u64 size;
+int is64bit;
+enum pci_region_type type;
+struct pci_region *this_region;
+struct pci_region *parent_region;
+struct pci_region_entry *next;
+struct pci_region_entry **pprev;
+};
+
+struct pci_region {
+struct pci_region_entry *list;
+struct pci_region_entry *this_entry;
+u64 base;
 };
 
 static int pci_size_to_index(u32 size, enum pci_region_type type)
@@ -582,12 +588,13 @@ pci_setup(void)
 pci_probe_devices();
 
 dprintf(1, "=== PCI new allocation pass #1 ===\n");
-struct pci_bus *busses = malloc_tmp(sizeof(*busses) * (MaxPCIBus + 1));
-if (!busses) {
+int num_regions = (MaxPCIBus + 1) * PCI_REGION_TYPE_COUNT;
+struct pci_region *regions = malloc_tmp(sizeof(*regions) * num_regions);
+if (!regions) {
 warn_noalloc();
 return;
 }
-memset(busses, 0, sizeof(*busses) * (MaxPCIBus + 1));
+memset(regions, 0, sizeof(*regions) * num_regions);
 pci_bios_check_devices(busses);
 if (pci_bios_init_root_regions(&busses[0], start, end) != 0) {
 panic("PCI: out of address space\n");
-- 
1.7.5.4

[Qemu-devel] [PATCH 12/13] iommu: Allow PCI to use IOMMU infrastructure

2012-02-29 Thread David Gibson

This patch adds some hooks to let PCI devices and busses use the new IOMMU
infrastructure.  When IOMMU support is enabled, each PCI device now
contains a DMAContext * which is used by the pci_dma_*() wrapper functions.

By default, the contexts are initialized to NULL, assuming no IOMMU.
However the platform or host bridge code which sets up the PCI bus can use
pci_setup_iommu() to set a function which will determine the correct
DMAContext for a given PCI device.

Cc: Michael S. Tsirkin 
Cc: Richard Henderson 

Signed-off-by: David Gibson 
Signed-off-by: Eduard - Gabriel Munteanu 
---
 hw/pci.c   |   13 +
 hw/pci.h   |   13 -
 hw/pci_internals.h |4 
 3 files changed, 29 insertions(+), 1 deletions(-)

diff --git a/hw/pci.c b/hw/pci.c
index bf046bf..1f54c45 100644
--- a/hw/pci.c
+++ b/hw/pci.c
@@ -761,6 +761,11 @@ static PCIDevice *do_pci_register_device(PCIDevice 
*pci_dev, PCIBus *bus,
 return NULL;
 }
 pci_dev->bus = bus;
+#ifdef CONFIG_IOMMU
+if (bus->dma_context_fn) {
+pci_dev->dma = bus->dma_context_fn(bus, bus->dma_context_opaque, 
devfn);
+}
+#endif
 pci_dev->devfn = devfn;
 pstrcpy(pci_dev->name, sizeof(pci_dev->name), name);
 pci_dev->irq_state = 0;
@@ -2005,6 +2010,14 @@ static void pci_device_class_init(ObjectClass *klass, 
void *data)
 k->bus_info = &pci_bus_info;
 }
 
+#ifdef CONFIG_IOMMU
+void pci_setup_iommu(PCIBus *bus, PCIDMAContextFunc fn, void *opaque)
+{
+bus->dma_context_fn = fn;
+bus->dma_context_opaque = opaque;
+}
+#endif
+
 static TypeInfo pci_device_type_info = {
 .name = TYPE_PCI_DEVICE,
 .parent = TYPE_DEVICE,
diff --git a/hw/pci.h b/hw/pci.h
index 41dcd05..1273dc3 100644
--- a/hw/pci.h
+++ b/hw/pci.h
@@ -170,6 +170,7 @@ typedef struct PCIDeviceClass {
 
 struct PCIDevice {
 DeviceState qdev;
+
 /* PCI config space */
 uint8_t *config;
 
@@ -191,6 +192,9 @@ struct PCIDevice {
 uint32_t devfn;
 char name[64];
 PCIIORegion io_regions[PCI_NUM_REGIONS];
+#ifdef CONFIG_IOMMU
+DMAContext *dma;
+#endif
 
 /* do not access the following fields */
 PCIConfigReadFunc *config_read;
@@ -311,6 +315,10 @@ int pci_read_devaddr(Monitor *mon, const char *addr, int 
*domp, int *busp,
 
 void pci_device_deassert_intx(PCIDevice *dev);
 
+typedef DMAContext *(*PCIDMAContextFunc)(PCIBus *, void *, int);
+
+void pci_setup_iommu(PCIBus *bus, PCIDMAContextFunc fn, void *opaque);
+
 static inline void
 pci_set_byte(uint8_t *config, uint8_t val)
 {
@@ -547,8 +555,11 @@ static inline uint32_t pci_config_size(const PCIDevice *d)
 /* DMA access functions */
 static inline DMAContext *pci_dma_context(PCIDevice *dev)
 {
-/* Stub for when we have no PCI iommu support */
+#ifdef CONFIG_IOMMU
+return dev->dma;
+#else
 return NULL;
+#endif
 }
 
 static inline int pci_dma_rw(PCIDevice *dev, dma_addr_t addr,
diff --git a/hw/pci_internals.h b/hw/pci_internals.h
index 96690b7..b6b7a0e 100644
--- a/hw/pci_internals.h
+++ b/hw/pci_internals.h
@@ -16,6 +16,10 @@ extern struct BusInfo pci_bus_info;
 
 struct PCIBus {
 BusState qbus;
+#ifdef CONFIG_IOMMU
+PCIDMAContextFunc dma_context_fn;
+void *dma_context_opaque;
+#endif
 uint8_t devfn_min;
 pci_set_irq_fn set_irq;
 pci_map_irq_fn map_irq;
-- 
1.7.9

Re: [Qemu-devel] [RFC PATCH v4 00/14] Tracing Improvements, Simpletrace v2

2012-02-29 Thread Harsh Bora


On 02/29/2012 01:53 AM, Lluís Vilanova wrote:

Harsh Prateek Bora writes:


This patchset introduces 2 major updates:
1) Tracetool Improvements (Conversion from shell script to python)
2) Simpletrace v2 log format (Support for variable args, strings)


ping



Hi Lluis,
I have actually moved into another project and therefore struggling to 
find time for the last patch to update the log reader script. I am 
planning to work over it this weekend.


BTW, I hope, tracetool conversion (shell script to python) patches can 
be reviewed/merged early as they are independent of simpletrace v2. Stefan ?


regards,
Harsh



Lluis

[Qemu-devel] [PATCH 01/13] Use DMADirection type for dma_bdrv_io

2012-02-29 Thread David Gibson

Currently dma_bdrv_io() takes a 'to_dev' boolean parameter to
determine the direction of DMA it is emulating.  We already have a
DMADirection enum designed specifically to encode DMA directions.
This patch uses it for dma_bdrv_io() as well.  This involves removing
the DMADirection definition from the #ifdef it was inside, but since that
only existed to protect the definition of dma_addr_t from places where
config.h is not included, there wasn't any reason for it to be there in
the first place.

Cc: Kevin Wolf 

Signed-off-by: David Gibson 
---
 dma-helpers.c  |   20 
 dma.h  |   12 ++--
 hw/ide/core.c  |3 ++-
 hw/ide/macio.c |3 ++-
 4 files changed, 22 insertions(+), 16 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index c29ea6d..5f19a85 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -42,7 +42,7 @@ typedef struct {
 BlockDriverAIOCB *acb;
 QEMUSGList *sg;
 uint64_t sector_num;
-bool to_dev;
+DMADirection dir;
 bool in_cancel;
 int sg_cur_index;
 dma_addr_t sg_cur_byte;
@@ -76,7 +76,8 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
 
 for (i = 0; i < dbs->iov.niov; ++i) {
 cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-  dbs->iov.iov[i].iov_len, !dbs->to_dev,
+  dbs->iov.iov[i].iov_len,
+  dbs->dir != DMA_DIRECTION_TO_DEVICE,
   dbs->iov.iov[i].iov_len);
 }
 qemu_iovec_reset(&dbs->iov);
@@ -123,7 +124,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
 while (dbs->sg_cur_index < dbs->sg->nsg) {
 cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
 cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-mem = cpu_physical_memory_map(cur_addr, &cur_len, !dbs->to_dev);
+mem = cpu_physical_memory_map(cur_addr, &cur_len,
+  dbs->dir != DMA_DIRECTION_TO_DEVICE);
 if (!mem)
 break;
 qemu_iovec_add(&dbs->iov, mem, cur_len);
@@ -170,11 +172,11 @@ static AIOPool dma_aio_pool = {
 BlockDriverAIOCB *dma_bdrv_io(
 BlockDriverState *bs, QEMUSGList *sg, uint64_t sector_num,
 DMAIOFunc *io_func, BlockDriverCompletionFunc *cb,
-void *opaque, bool to_dev)
+void *opaque, DMADirection dir)
 {
 DMAAIOCB *dbs = qemu_aio_get(&dma_aio_pool, bs, cb, opaque);
 
-trace_dma_bdrv_io(dbs, bs, sector_num, to_dev);
+trace_dma_bdrv_io(dbs, bs, sector_num, (dir == DMA_DIRECTION_TO_DEVICE));
 
 dbs->acb = NULL;
 dbs->bs = bs;
@@ -182,7 +184,7 @@ BlockDriverAIOCB *dma_bdrv_io(
 dbs->sector_num = sector_num;
 dbs->sg_cur_index = 0;
 dbs->sg_cur_byte = 0;
-dbs->to_dev = to_dev;
+dbs->dir = dir;
 dbs->io_func = io_func;
 dbs->bh = NULL;
 qemu_iovec_init(&dbs->iov, sg->nsg);
@@ -195,14 +197,16 @@ BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
 QEMUSGList *sg, uint64_t sector,
 void (*cb)(void *opaque, int ret), void 
*opaque)
 {
-return dma_bdrv_io(bs, sg, sector, bdrv_aio_readv, cb, opaque, false);
+return dma_bdrv_io(bs, sg, sector, bdrv_aio_readv, cb, opaque,
+   DMA_DIRECTION_FROM_DEVICE);
 }
 
 BlockDriverAIOCB *dma_bdrv_write(BlockDriverState *bs,
  QEMUSGList *sg, uint64_t sector,
  void (*cb)(void *opaque, int ret), void 
*opaque)
 {
-return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque, true);
+return dma_bdrv_io(bs, sg, sector, bdrv_aio_writev, cb, opaque,
+   DMA_DIRECTION_TO_DEVICE);
 }
 
 
diff --git a/dma.h b/dma.h
index 20e86d2..05ac325 100644
--- a/dma.h
+++ b/dma.h
@@ -17,6 +17,11 @@
 
 typedef struct ScatterGatherEntry ScatterGatherEntry;
 
+typedef enum {
+DMA_DIRECTION_TO_DEVICE = 0,
+DMA_DIRECTION_FROM_DEVICE = 1,
+} DMADirection;
+
 struct QEMUSGList {
 ScatterGatherEntry *sg;
 int nsg;
@@ -29,11 +34,6 @@ typedef target_phys_addr_t dma_addr_t;
 
 #define DMA_ADDR_FMT TARGET_FMT_plx
 
-typedef enum {
-DMA_DIRECTION_TO_DEVICE = 0,
-DMA_DIRECTION_FROM_DEVICE = 1,
-} DMADirection;
-
 struct ScatterGatherEntry {
 dma_addr_t base;
 dma_addr_t len;
@@ -51,7 +51,7 @@ typedef BlockDriverAIOCB *DMAIOFunc(BlockDriverState *bs, 
int64_t sector_num,
 BlockDriverAIOCB *dma_bdrv_io(BlockDriverState *bs,
   QEMUSGList *sg, uint64_t sector_num,
   DMAIOFunc *io_func, BlockDriverCompletionFunc 
*cb,
-  void *opaque, bool to_dev);
+  void *opaque, DMADirection dir);
 BlockDriverAIOCB *dma_bdrv_read(BlockDriverState *bs,
 QEMUSGList *sg, uint64_t sector,
 BlockDriverCompletionFunc *cb, void *opaque);
diff --git a/hw/

Re: [Qemu-devel] [RFC][PATCH 04/14 v7] Add API to get memory mapping

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: [RFC][PATCH 04/14 v7] Add API to get memory mapping
Date: Thu, 01 Mar 2012 10:43:13 +0800

> +int qemu_get_guest_memory_mapping(MemoryMappingList *list)
> +{
> +CPUState *env;
> +MemoryMapping *memory_mapping;
> +RAMBlock *block;
> +ram_addr_t offset, length;
> +int ret;
> +
> +#if defined(CONFIG_HAVE_GET_MEMORY_MAPPING)
> +for (env = first_cpu; env != NULL; env = env->next_cpu) {
> +ret = cpu_get_memory_mapping(list, env);
> +if (ret < 0) {
> +return -1;
> +}
> +}
> +#else
> +return -2;
> +#endif
> +
> +/* some memory may be not mapped, add them into memory mapping's list */

The part from here is logic fully for 2nd kernel? If so, I think it
better to describe why this addtional mapping is needed; we should
assume most people doesn't know kdump mechanism.

> +QLIST_FOREACH(block, &ram_list.blocks, next) {
> +offset = block->offset;
> +length = block->length;
> +
> +QTAILQ_FOREACH(memory_mapping, &list->head, next) {
> +if (memory_mapping->phys_addr >= (offset + length)) {
> +/*
> + * memory_mapping's list does not conatin the region
> + * [offset, offset+length)
> + */
> +create_new_memory_mapping(list, offset, 0, length);
> +length = 0;
> +break;
> +}
> +
> +if ((memory_mapping->phys_addr + memory_mapping->length) <=
> +offset) {
> +continue;
> +}
> +
> +if (memory_mapping->phys_addr > offset) {
> +/*
> + * memory_mapping's list does not conatin the region
> + * [offset, memory_mapping->phys_addr)
> + */
> +create_new_memory_mapping(list, offset, 0,
> +  memory_mapping->phys_addr - 
> offset);
> +}
> +
> +if ((offset + length) <=
> +(memory_mapping->phys_addr + memory_mapping->length)) {
> +length = 0;
> +break;
> +}
> +length -= memory_mapping->phys_addr + memory_mapping->length -
> +  offset;
> +offset = memory_mapping->phys_addr + memory_mapping->length;
> +}
> +
> +if (length > 0) {
> +/*
> + * memory_mapping's list does not conatin the region
> + * [offset, memory_mapping->phys_addr)
> + */
> +create_new_memory_mapping(list, offset, 0, length);
> +}
> +}
> +
> +return 0;
> +}

I think it more readable if shortening memory_mapping->phys_addr and
memmory_maping->length at the berinning of the innermost foreach loop.

  m_phys_addr = memory_mapping->phys_addr;
  m_length = memory_mapping->length;

Then, each conditionals becomes compact.

Thanks.
HATAYAMA, Daisuke

[Qemu-devel] [PATCH 07/13] iommu: Make sglists and dma_bdrv helpers use new universal DMA helpers

2012-02-29 Thread David Gibson

dma-helpers.c contains a number of helper functions for doing
scatter/gather DMA, and various block device related DMA.  Currently,
these directly access guest memory using cpu_physical_memory_*(),
assuming no IOMMU translation.

This patch updates this code to use the new universal DMA helper
functions.  qemu_sglist_init() now takes a DMAContext * to describe
the DMA address space in which the scatter/gather will take place.

We minimally update the callers qemu_sglist_init() to pass NULL
(i.e. no translation, same as current behaviour).  Some of those
callers should pass something else in some cases to allow proper IOMMU
translation in future, but that will be fixed in later patches.

Cc: Kevin Wolf 
Cc: Michael S. Tsirkin 

Signed-off-by: David Gibson 
---
 dma-helpers.c  |   26 ++
 dma.h  |3 ++-
 hw/ide/ahci.c  |3 ++-
 hw/ide/macio.c |4 ++--
 hw/pci.h   |2 +-
 5 files changed, 25 insertions(+), 13 deletions(-)

diff --git a/dma-helpers.c b/dma-helpers.c
index 5f19a85..9dcfb2c 100644
--- a/dma-helpers.c
+++ b/dma-helpers.c
@@ -11,12 +11,13 @@
 #include "block_int.h"
 #include "trace.h"
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint)
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma)
 {
 qsg->sg = g_malloc(alloc_hint * sizeof(ScatterGatherEntry));
 qsg->nsg = 0;
 qsg->nalloc = alloc_hint;
 qsg->size = 0;
+qsg->dma = dma;
 }
 
 void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len)
@@ -75,10 +76,9 @@ static void dma_bdrv_unmap(DMAAIOCB *dbs)
 int i;
 
 for (i = 0; i < dbs->iov.niov; ++i) {
-cpu_physical_memory_unmap(dbs->iov.iov[i].iov_base,
-  dbs->iov.iov[i].iov_len,
-  dbs->dir != DMA_DIRECTION_TO_DEVICE,
-  dbs->iov.iov[i].iov_len);
+dma_memory_unmap(dbs->sg->dma, dbs->iov.iov[i].iov_base,
+ dbs->iov.iov[i].iov_len, dbs->dir,
+ dbs->iov.iov[i].iov_len);
 }
 qemu_iovec_reset(&dbs->iov);
 }
@@ -104,10 +104,20 @@ static void dma_complete(DMAAIOCB *dbs, int ret)
 }
 }
 
+static void dma_bdrv_cancel(void *opaque)
+{
+DMAAIOCB *dbs = opaque;
+
+bdrv_aio_cancel(dbs->acb);
+dma_bdrv_unmap(dbs);
+qemu_iovec_destroy(&dbs->iov);
+qemu_aio_release(dbs);
+}
+
 static void dma_bdrv_cb(void *opaque, int ret)
 {
 DMAAIOCB *dbs = (DMAAIOCB *)opaque;
-target_phys_addr_t cur_addr, cur_len;
+dma_addr_t cur_addr, cur_len;
 void *mem;
 
 trace_dma_bdrv_cb(dbs, ret);
@@ -124,8 +134,8 @@ static void dma_bdrv_cb(void *opaque, int ret)
 while (dbs->sg_cur_index < dbs->sg->nsg) {
 cur_addr = dbs->sg->sg[dbs->sg_cur_index].base + dbs->sg_cur_byte;
 cur_len = dbs->sg->sg[dbs->sg_cur_index].len - dbs->sg_cur_byte;
-mem = cpu_physical_memory_map(cur_addr, &cur_len,
-  dbs->dir != DMA_DIRECTION_TO_DEVICE);
+mem = dma_memory_map(dbs->sg->dma, dma_bdrv_cancel, dbs,
+ cur_addr, &cur_len, dbs->dir);
 if (!mem)
 break;
 qemu_iovec_add(&dbs->iov, mem, cur_len);
diff --git a/dma.h b/dma.h
index 8b6ef44..a66e3d7 100644
--- a/dma.h
+++ b/dma.h
@@ -27,6 +27,7 @@ struct QEMUSGList {
 int nsg;
 int nalloc;
 size_t size;
+DMAContext *dma;
 };
 
 #if defined(TARGET_PHYS_ADDR_BITS)
@@ -143,7 +144,7 @@ struct ScatterGatherEntry {
 dma_addr_t len;
 };
 
-void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint);
+void qemu_sglist_init(QEMUSGList *qsg, int alloc_hint, DMAContext *dma);
 void qemu_sglist_add(QEMUSGList *qsg, dma_addr_t base, dma_addr_t len);
 void qemu_sglist_destroy(QEMUSGList *qsg);
 #endif
diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 041ce1e..6a218b5 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -667,7 +667,8 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist)
 if (sglist_alloc_hint > 0) {
 AHCI_SG *tbl = (AHCI_SG *)prdt;
 
-qemu_sglist_init(sglist, sglist_alloc_hint);
+/* FIXME: pass the correct DMAContext */
+qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
 for (i = 0; i < sglist_alloc_hint; i++) {
 /* flags_size is zero-based */
 qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index edcf885..568a299 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -76,7 +76,7 @@ static void pmac_ide_atapi_transfer_cb(void *opaque, int ret)
 
 s->io_buffer_size = io->len;
 
-qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1);
+qemu_sglist_init(&s->sg, io->len / MACIO_PAGE_SIZE + 1, NULL);
 qemu_sglist_add(&s->sg, io->addr, io->len);
 io->addr += io->len;
 io->len = 0;
@@ -133,7 +133,7 @@ static void pmac_ide_transfer_cb(void *opaque, int ret)
 s->io_buffer_index = 0;
 s->io_buffer_size =

[Qemu-devel] [Seabios] [PATCH 0/6] 64bit PCI BARs allocations (take 2)

2012-02-29 Thread Alexey Korolev

Hi,

This patch series enables 64bit BAR support in seabios. 
It has a bit different approach for resources accounting, We did this
because we wanted:
a) Provide 64bit bar support for PCI BARs and bridges with 64bit memory
window.
b) Allow migration to 64bit bit ranges if we did not fit into 32bit
range
c) Keep implementation simple.

There are still have two main passes to enumerate resources and map
devices, but structures are changed.
We introduced two new structures: pci_region and pci_region_entry.

The pci_region structure includes a list of pci_region_entries. Each
pci_region_entry could be a PCI bar or a downstream PCI region (bridge).
Each entry has a set of attributes: type (IO, MEM, PREFMEM), is64bit (if
address can be over 4GB), size, base address, PCI device owner, and a
pointer to the pci_region it belongs to.

In the first pass we fill the pci_regions with entries and discover
topology.
In the second pass we try assigning memory addresses to pci_regions. If
there is not enough space available the 64bit entries of root regions
will be migrated to 64bit bit ranges and then we try assigning memory
addresses again.
Then each entry of each region will be mapped.


The patch series includes 6 patches.
In the 1st patch we introduce new structures. 

In the 2nd patch we introduce support functions for basic hlist
operations, plus modify service functions to support 64bits address
ranges. 
Note: I've seen similar hlist operations in post memory manager 
and stack location operations, it makes sense to move
them to a header file. 

In the 3rd patch a new function to fill pci_region structures with
entries, and discover topology is added.

In the 4th patch we define address range for pci_region structure,
migrate entries to 64bits address range if necessary, and program PCI
BAR addresses and bridge regions.

In the 6th patch we clear old code.

And last patch is proposed by Michael Tsirkin, it contains changes in
acpi-dsdt.dsl file those are necessary to support 64bit BARs in Windows.

 src/acpi-dsdt.dsl |7 +
 src/acpi-dsdt.hex |   72 ++--
 src/config.h  |2 +
 src/pci.h |6 -
 src/pciinit.c |  509 +---
 5 files changed, 352 insertions(+), 244 deletions(-)


Note:
At the moment there are three issues related to support of 64bit BARs in
qemu (head of master branch). It's very likely they will be fixed in
next qemu release.

The 1st one is described here (this issue causing problems if 64bit BAR
is mapped below 4GB and Linux guest OS version is < 2.6.27):
http://www.mail-archive.com/qemu-devel@nongnu.org/msg94522.html

The 2nd one is just a typo in i440fx init code (this issue is causing
problems if somebody is going to access 64bit PCI memory - memory will
be inaccessible):
http://www.mail-archive.com/qemu-devel@nongnu.org/msg99423.html

The 3nd issue is related to a case of HUGE PCI bars when BAR size is 4GB
and over. Qemu for some reasons reports zero size in this case. New
seabios should handle huge bars well.

I've sent the patches for the first two issues. If they are all applied
problems except the "huge BARs issue" should gone.

[Qemu-devel] [PATCH 03/13] usb-xhci: Use PCI DMA helper functions

2012-02-29 Thread David Gibson

Shortly before 1.0, we added helper functions / wrappers for doing PCI DMA
from individual devices.  This makes what's going on clearer and means that
when we add IOMMU support somewhere in the future, only the general PCI
code will have to change, not every device that uses PCI DMA.

However, usb-xhci is not using these wrappers, despite being a PCI only
device.  This patch remedies the situation, using the pci dma functions
instead of direct calls to cpu_physical_memory_{read,write}().  Likewise
address parameters for DMA are changed to dma_addr_t instead of
target_phys_addr_t.

Cc: Gerd Hoffman 

Signed-off-by: David Gibson 
---
 hw/usb-xhci.c |  176 +++-
 1 files changed, 85 insertions(+), 91 deletions(-)

diff --git a/hw/usb-xhci.c b/hw/usb-xhci.c
index fc5b542..f0ffa8f 100644
--- a/hw/usb-xhci.c
+++ b/hw/usb-xhci.c
@@ -22,7 +22,6 @@
 #include "qemu-timer.h"
 #include "usb.h"
 #include "pci.h"
-#include "qdev-addr.h"
 #include "msi.h"
 
 //#define DEBUG_XHCI
@@ -140,7 +139,7 @@ typedef struct XHCITRB {
 uint64_t parameter;
 uint32_t status;
 uint32_t control;
-target_phys_addr_t addr;
+dma_addr_t addr;
 bool ccs;
 } XHCITRB;
 
@@ -291,8 +290,8 @@ typedef enum EPType {
 } EPType;
 
 typedef struct XHCIRing {
-target_phys_addr_t base;
-target_phys_addr_t dequeue;
+dma_addr_t base;
+dma_addr_t dequeue;
 bool ccs;
 } XHCIRing;
 
@@ -345,7 +344,7 @@ typedef struct XHCIEPContext {
 unsigned int next_bg;
 XHCITransfer bg_transfers[BG_XFERS];
 EPType type;
-target_phys_addr_t pctx;
+dma_addr_t pctx;
 unsigned int max_psize;
 bool has_bg;
 uint32_t state;
@@ -353,7 +352,7 @@ typedef struct XHCIEPContext {
 
 typedef struct XHCISlot {
 bool enabled;
-target_phys_addr_t ctx;
+dma_addr_t ctx;
 unsigned int port;
 unsigned int devaddr;
 XHCIEPContext * eps[31];
@@ -402,7 +401,7 @@ struct XHCIState {
 uint32_t erdp_low;
 uint32_t erdp_high;
 
-target_phys_addr_t er_start;
+dma_addr_t er_start;
 uint32_t er_size;
 bool er_pcs;
 unsigned int er_ep_idx;
@@ -479,18 +478,18 @@ static const char *trb_name(XHCITRB *trb)
 static void xhci_kick_ep(XHCIState *xhci, unsigned int slotid,
  unsigned int epid);
 
-static inline target_phys_addr_t xhci_addr64(uint32_t low, uint32_t high)
+static inline dma_addr_t xhci_addr64(uint32_t low, uint32_t high)
 {
-#if TARGET_PHYS_ADDR_BITS > 32
-return low | ((target_phys_addr_t)high << 32);
+#if DMA_ADDR_BITS > 32
+return low | ((dma_addr_t)high << 32);
 #else
 return low;
 #endif
 }
 
-static inline target_phys_addr_t xhci_mask64(uint64_t addr)
+static inline dma_addr_t xhci_mask64(uint64_t addr)
 {
-#if TARGET_PHYS_ADDR_BITS > 32
+#if DMA_ADDR_BITS > 32
 return addr;
 #else
 return addr & 0x;
@@ -532,7 +531,7 @@ static void xhci_die(XHCIState *xhci)
 static void xhci_write_event(XHCIState *xhci, XHCIEvent *event)
 {
 XHCITRB ev_trb;
-target_phys_addr_t addr;
+dma_addr_t addr;
 
 ev_trb.parameter = cpu_to_le64(event->ptr);
 ev_trb.status = cpu_to_le32(event->length | (event->ccode << 24));
@@ -548,7 +547,7 @@ static void xhci_write_event(XHCIState *xhci, XHCIEvent 
*event)
 trb_name(&ev_trb));
 
 addr = xhci->er_start + TRB_SIZE*xhci->er_ep_idx;
-cpu_physical_memory_write(addr, (uint8_t *) &ev_trb, TRB_SIZE);
+pci_dma_write(&xhci->pci_dev, addr, &ev_trb, TRB_SIZE);
 
 xhci->er_ep_idx++;
 if (xhci->er_ep_idx >= xhci->er_size) {
@@ -559,7 +558,7 @@ static void xhci_write_event(XHCIState *xhci, XHCIEvent 
*event)
 
 static void xhci_events_update(XHCIState *xhci)
 {
-target_phys_addr_t erdp;
+dma_addr_t erdp;
 unsigned int dp_idx;
 bool do_irq = 0;
 
@@ -570,8 +569,8 @@ static void xhci_events_update(XHCIState *xhci)
 erdp = xhci_addr64(xhci->erdp_low, xhci->erdp_high);
 if (erdp < xhci->er_start ||
 erdp >= (xhci->er_start + TRB_SIZE*xhci->er_size)) {
-fprintf(stderr, "xhci: ERDP out of bounds: "TARGET_FMT_plx"\n", erdp);
-fprintf(stderr, "xhci: ER at "TARGET_FMT_plx" len %d\n",
+fprintf(stderr, "xhci: ERDP out of bounds: "DMA_ADDR_FMT"\n", erdp);
+fprintf(stderr, "xhci: ER at "DMA_ADDR_FMT" len %d\n",
 xhci->er_start, xhci->er_size);
 xhci_die(xhci);
 return;
@@ -630,7 +629,7 @@ static void xhci_events_update(XHCIState *xhci)
 
 static void xhci_event(XHCIState *xhci, XHCIEvent *event)
 {
-target_phys_addr_t erdp;
+dma_addr_t erdp;
 unsigned int dp_idx;
 
 if (xhci->er_full) {
@@ -649,8 +648,8 @@ static void xhci_event(XHCIState *xhci, XHCIEvent *event)
 erdp = xhci_addr64(xhci->erdp_low, xhci->erdp_high);
 if (erdp < xhci->er_start ||
 erdp >= (xhci->er_start + TRB_SIZE*xhci->er_size)) {
-fprintf(stderr, "xhci: ERDP out of bounds: "TARGET_FMT_plx"\n", erdp);
-fprin

[Qemu-devel] [PATCH 08/13] ide/ahci: Use universal DMA helper functions

2012-02-29 Thread David Gibson

The AHCI device can provide both PCI and SysBus AHCI device
emulations.  For this reason, it wasn't previously converted to use
the pci_dma_*() helper functions.  Now that we have universal DMA
helper functions, this converts AHCI to use them.

The DMAContext is obtained from pci_dma_context() in the PCI case and
set to NULL in the SysBus case (i.e. we assume for now that a SysBus
AHCI has no IOMMU translation).

Cc: Kevin Wolf 
Cc: Michael S. Tsirkin 

Signed-off-by: David Gibson 
---
 hw/ide/ahci.c |7 ---
 hw/ide/ahci.h |3 ++-
 hw/ide/ich.c  |2 +-
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/hw/ide/ahci.c b/hw/ide/ahci.c
index 6a218b5..3d31179 100644
--- a/hw/ide/ahci.c
+++ b/hw/ide/ahci.c
@@ -668,7 +668,7 @@ static int ahci_populate_sglist(AHCIDevice *ad, QEMUSGList 
*sglist)
 AHCI_SG *tbl = (AHCI_SG *)prdt;
 
 /* FIXME: pass the correct DMAContext */
-qemu_sglist_init(sglist, sglist_alloc_hint, NULL);
+qemu_sglist_init(sglist, sglist_alloc_hint, ad->hba->dma);
 for (i = 0; i < sglist_alloc_hint; i++) {
 /* flags_size is zero-based */
 qemu_sglist_add(sglist, le64_to_cpu(tbl[i].addr),
@@ -1115,11 +1115,12 @@ static const IDEDMAOps ahci_dma_ops = {
 .reset = ahci_dma_reset,
 };
 
-void ahci_init(AHCIState *s, DeviceState *qdev, int ports)
+void ahci_init(AHCIState *s, DeviceState *qdev, DMAContext *dma, int ports)
 {
 qemu_irq *irqs;
 int i;
 
+s->dma = dma;
 s->ports = ports;
 s->dev = g_malloc0(sizeof(AHCIDevice) * ports);
 ahci_reg_init(s);
@@ -1182,7 +1183,7 @@ static const VMStateDescription vmstate_sysbus_ahci = {
 static int sysbus_ahci_init(SysBusDevice *dev)
 {
 SysbusAHCIState *s = FROM_SYSBUS(SysbusAHCIState, dev);
-ahci_init(&s->ahci, &dev->qdev, s->num_ports);
+ahci_init(&s->ahci, &dev->qdev, NULL, s->num_ports);
 
 sysbus_init_mmio(dev, &s->ahci.mem);
 sysbus_init_irq(dev, &s->ahci.irq);
diff --git a/hw/ide/ahci.h b/hw/ide/ahci.h
index b223d2c..af8c6ef 100644
--- a/hw/ide/ahci.h
+++ b/hw/ide/ahci.h
@@ -299,6 +299,7 @@ typedef struct AHCIState {
 uint32_t idp_index; /* Current IDP index */
 int ports;
 qemu_irq irq;
+DMAContext *dma;
 } AHCIState;
 
 typedef struct AHCIPCIState {
@@ -329,7 +330,7 @@ typedef struct NCQFrame {
 uint8_t reserved10;
 } QEMU_PACKED NCQFrame;
 
-void ahci_init(AHCIState *s, DeviceState *qdev, int ports);
+void ahci_init(AHCIState *s, DeviceState *qdev, DMAContext *dma, int ports);
 void ahci_uninit(AHCIState *s);
 
 void ahci_reset(void *opaque);
diff --git a/hw/ide/ich.c b/hw/ide/ich.c
index 560ae37..5354e13 100644
--- a/hw/ide/ich.c
+++ b/hw/ide/ich.c
@@ -91,7 +91,7 @@ static int pci_ich9_ahci_init(PCIDevice *dev)
 uint8_t *sata_cap;
 d = DO_UPCAST(struct AHCIPCIState, card, dev);
 
-ahci_init(&d->ahci, &dev->qdev, 6);
+ahci_init(&d->ahci, &dev->qdev, pci_dma_context(dev), 6);
 
 pci_config_set_prog_interface(d->card.config, AHCI_PROGMODE_MAJOR_REV_1);
 
-- 
1.7.9

[Qemu-devel] [PATCH 06/13] usb-ohci: Use universal DMA helper functions

2012-02-29 Thread David Gibson

The OHCI device emulation can provide both PCI and SysBus OHCI
implementations.  Because of this, it was not previously converted to
use the PCI DMA helper functions.

This patch converts it to use the new universal DMA helper functions.
In the PCI case, it obtains its DMAContext from pci_dma_context(), in
the SysBus case, it uses NULL - i.e. assumes for now that there will
be no IOMMU translation for a SysBus OHCI.

Cc: Gerd Hoffmann 
Cc: Michael S. Tsirkin 

Signed-off-by: David Gibson 
---
 hw/usb-ohci.c |   93 +++-
 1 files changed, 51 insertions(+), 42 deletions(-)

diff --git a/hw/usb-ohci.c b/hw/usb-ohci.c
index bc99519..4ad53d7 100644
--- a/hw/usb-ohci.c
+++ b/hw/usb-ohci.c
@@ -32,7 +32,7 @@
 #include "pci.h"
 #include "usb-ohci.h"
 #include "sysbus.h"
-#include "qdev-addr.h"
+#include "qdev-dma.h"
 
 //#define DEBUG_OHCI
 /* Dump packet contents.  */
@@ -63,6 +63,7 @@ typedef struct {
 USBBus bus;
 qemu_irq irq;
 MemoryRegion mem;
+DMAContext *dma;
 int num_ports;
 const char *name;
 
@@ -105,7 +106,7 @@ typedef struct {
 uint32_t htest;
 
 /* SM501 local memory offset */
-target_phys_addr_t localmem_base;
+dma_addr_t localmem_base;
 
 /* Active packets.  */
 uint32_t old_ctl;
@@ -483,14 +484,14 @@ static void ohci_reset(void *opaque)
 
 /* Get an array of dwords from main memory */
 static inline int get_dwords(OHCIState *ohci,
- uint32_t addr, uint32_t *buf, int num)
+ dma_addr_t addr, uint32_t *buf, int num)
 {
 int i;
 
 addr += ohci->localmem_base;
 
 for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-cpu_physical_memory_read(addr, buf, sizeof(*buf));
+dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
 *buf = le32_to_cpu(*buf);
 }
 
@@ -499,7 +500,7 @@ static inline int get_dwords(OHCIState *ohci,
 
 /* Put an array of dwords in to main memory */
 static inline int put_dwords(OHCIState *ohci,
- uint32_t addr, uint32_t *buf, int num)
+ dma_addr_t addr, uint32_t *buf, int num)
 {
 int i;
 
@@ -507,7 +508,7 @@ static inline int put_dwords(OHCIState *ohci,
 
 for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
 uint32_t tmp = cpu_to_le32(*buf);
-cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
+dma_memory_write(ohci->dma, addr, &tmp, sizeof(tmp));
 }
 
 return 1;
@@ -515,14 +516,14 @@ static inline int put_dwords(OHCIState *ohci,
 
 /* Get an array of words from main memory */
 static inline int get_words(OHCIState *ohci,
-uint32_t addr, uint16_t *buf, int num)
+dma_addr_t addr, uint16_t *buf, int num)
 {
 int i;
 
 addr += ohci->localmem_base;
 
 for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
-cpu_physical_memory_read(addr, buf, sizeof(*buf));
+dma_memory_read(ohci->dma, addr, buf, sizeof(*buf));
 *buf = le16_to_cpu(*buf);
 }
 
@@ -531,7 +532,7 @@ static inline int get_words(OHCIState *ohci,
 
 /* Put an array of words in to main memory */
 static inline int put_words(OHCIState *ohci,
-uint32_t addr, uint16_t *buf, int num)
+dma_addr_t addr, uint16_t *buf, int num)
 {
 int i;
 
@@ -539,40 +540,40 @@ static inline int put_words(OHCIState *ohci,
 
 for (i = 0; i < num; i++, buf++, addr += sizeof(*buf)) {
 uint16_t tmp = cpu_to_le16(*buf);
-cpu_physical_memory_write(addr, &tmp, sizeof(tmp));
+dma_memory_write(ohci->dma, addr, &tmp, sizeof(tmp));
 }
 
 return 1;
 }
 
 static inline int ohci_read_ed(OHCIState *ohci,
-   uint32_t addr, struct ohci_ed *ed)
+   dma_addr_t addr, struct ohci_ed *ed)
 {
 return get_dwords(ohci, addr, (uint32_t *)ed, sizeof(*ed) >> 2);
 }
 
 static inline int ohci_read_td(OHCIState *ohci,
-   uint32_t addr, struct ohci_td *td)
+   dma_addr_t addr, struct ohci_td *td)
 {
 return get_dwords(ohci, addr, (uint32_t *)td, sizeof(*td) >> 2);
 }
 
 static inline int ohci_read_iso_td(OHCIState *ohci,
-   uint32_t addr, struct ohci_iso_td *td)
+   dma_addr_t addr, struct ohci_iso_td *td)
 {
 return (get_dwords(ohci, addr, (uint32_t *)td, 4) &&
 get_words(ohci, addr + 16, td->offset, 8));
 }
 
 static inline int ohci_read_hcca(OHCIState *ohci,
- uint32_t addr, struct ohci_hcca *hcca)
+ dma_addr_t addr, struct ohci_hcca *hcca)
 {
-cpu_physical_memory_read(addr + ohci->localmem_base, hcca, sizeof(*hcca));
+dma_memory_read(ohci->dma, addr + ohci->localmem_base, hcca, 
sizeof(*hcca));
 return 1;
 }
 
 static inline

[Qemu-devel] [0/13] RFC: Support for guest-visible IOMMUs

2012-02-29 Thread David Gibson

This patch series introduces a general DMA infrastructure which allows
the emulation of guest-visible IOMMUs.  That is, it provides a
framework by which an IOMMU device can be implemented, such that DMA
from other device emulations will be translated according to the
mappings provided by the IOMMU.

One example IOMMU implementation is included, for the para-virtualized
TCE tables specified by PAPR and used in the pseries machine for both
virtual IO and PCI devices.

This series is an updated and cleaned-up version of patches posted by
Eduard - Gabriel Munteanu some time ago.  Those prompted some
discussion at the time, but no resolution was reached.  These patches
are now pretty polished although I'd like to get some comment from
people working with other IOMMUs to make sure the infrastructure is
sufficient to cover those.

Specifically, Eduard - Gabriel, if you have an updated version of your
AMD IOMMU driver, it would be good to see if that can work with this
infrastructure.

The series also converts a number of existing device models to use the
new DMA infrastructure, so they can be used with IOMMUs.  Along with
the pci_dma_*() wrapper functions which are already in, this means
that with this series applied, essentially all PCI devices should work
with an emulated IOMMU, as well as pseries VIO devices.  Other types
of devices would need some further conversion to work with the new
framework, but that should be quite straightforward.

Re: [Qemu-devel] [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core file

2012-02-29 Thread Wen Congyang

At 03/01/2012 01:10 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to 
> core file
> Date: Thu, 01 Mar 2012 10:48:17 +0800
> 
>> +int cpu_write_elf64_qemunote(write_core_dump_function f, CPUState *env,
>> + target_phys_addr_t *offset, void *opaque)
>> +{
>> +QEMUCPUState state;
>> +Elf64_Nhdr *note;
>> +char *buf;
>> +int descsz, note_size, name_size = 5;
>> +const char *name = "QEMU";
>> +int ret;
>> +
>> +qemu_get_cpustate(&state, env);
>> +
>> +descsz = sizeof(state);
>> +note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
>> +(descsz + 3) / 4) * 4;
>> +note = g_malloc(note_size);
>> +
>> +memset(note, 0, note_size);
>> +note->n_namesz = cpu_to_le32(name_size);
>> +note->n_descsz = cpu_to_le32(descsz);
>> +note->n_type = 0;
>> +buf = (char *)note;
>> +buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
>> +memcpy(buf, name, name_size);
>> +buf += ((name_size + 3) / 4) * 4;
>> +memcpy(buf, &state, sizeof(state));
> 
> x86_64_write_elf64_note() and x86_write_elf64_note() does the same
> processing for note data. Is it better to do this in helper functions
> in common?

I forgot to merge them. I will fix it.

Thanks
Wen Congyang

> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

Re: [Qemu-devel] [PATCH] kvm: notify host when guest paniced

2012-02-29 Thread Wen Congyang

At 02/29/2012 06:39 PM, Avi Kivity Wrote:
> On 02/29/2012 12:17 PM, Wen Congyang wrote:
>
 Yes, crash can be so severe that it is not even detected by a kernel
 itself, so not OOPS message even printed. But in most cases if kernel is
 functional enough to print OOPS it is functional enough to call single
 hypercall instruction.
>>>
>>> Why not print the oops to virtio-serial?  Or even just a regular serial
>>> port?  That's what bare metal does.
>>
>> If virtio-serial's driver has bug or the guest doesn't have such device...
> 
> We have the same issue with the hypercall; and virtio-serial is
> available on many deployed versions.

virtio-serial is available, but it is an optional device. If the guest does
not have this device, the guest cannot tell the host that is is paniced. So
I still prefer to touch the hypervisor.

Thanks
Wen Congyang

> 
>>>
>> Having special kdump
>> kernel that transfers dump to a host via virtio-serial channel though
>> sounds interesting. May be that's what you mean.
>
> Yes.  The "panic, starting dump" signal should be initiated by the
> panicking kernel though, in case the dump fails.
>
 Then panic hypercall sounds like a reasonable solution.
>>>
>>> It is, but I'm trying to see if we can get away with doing nothing.
>>>
>>
>> If we have a reliable way with doing nothing, it is better. But I donot
>> find such way.
> 
> We won't have a 100% reliable way.  But I think a variant of the driver
> that doesn't use interrupts, or just using the ordinary serial driver,
> should be reliable enough.
>

Re: [Qemu-devel] [RFC][PATCH 00/14 v7] introducing a new, dedicated memory dump mechanism

2012-02-29 Thread Wen Congyang

At 03/01/2012 12:42 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: [RFC][PATCH 00/14 v7] introducing a new, dedicated memory dump 
> mechanism
> Date: Thu, 01 Mar 2012 10:35:44 +0800
> 
>> Hi, all
>>
>> 'virsh dump' can not work when host pci device is used by guest. We have
>> discussed this issue here:
>> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
>>
>> The last version is here:
>> http://lists.nongnu.org/archive/html/qemu-devel/2012-02/msg01007.html
>>
>> We have determined to introduce a new command dump to dump memory. The core
>> file's format can be elf.
>>
>> Note:
>> 1. The guest should be x86 or x86_64. The other arch is not supported now.
>> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
> 
> Does this say the thing caused by gdb versions with no Dwarf V3
> support? If so, it's better to write that too explicitly here.

I donot know why gdb crashed, and I cannot reproduce this problem now.

> 
>> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>>work by specifying '--machdep phys_addr=xxx' in the command line. The
>>reason is that the second kernel will update the page table, and we can
>>not get the page table for the first kernel.
>> 4. The cpu's state is stored in QEMU note. You neet to modify crash to use
>>it to calculate phys_base.
> 
> Again, you still need to fix crash utility to recover the 1st kernel's
> first 640kB physical memory that has been reserved during switch from
> 1st kernel to 2nd kernel.

It is another work, I will try to do it in the future.

Thanks
Wen Congyang

> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

Re: [Qemu-devel] [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core file

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core 
file
Date: Thu, 01 Mar 2012 10:48:17 +0800

> +int cpu_write_elf64_qemunote(write_core_dump_function f, CPUState *env,
> + target_phys_addr_t *offset, void *opaque)
> +{
> +QEMUCPUState state;
> +Elf64_Nhdr *note;
> +char *buf;
> +int descsz, note_size, name_size = 5;
> +const char *name = "QEMU";
> +int ret;
> +
> +qemu_get_cpustate(&state, env);
> +
> +descsz = sizeof(state);
> +note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
> +(descsz + 3) / 4) * 4;
> +note = g_malloc(note_size);
> +
> +memset(note, 0, note_size);
> +note->n_namesz = cpu_to_le32(name_size);
> +note->n_descsz = cpu_to_le32(descsz);
> +note->n_type = 0;
> +buf = (char *)note;
> +buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
> +memcpy(buf, name, name_size);
> +buf += ((name_size + 3) / 4) * 4;
> +memcpy(buf, &state, sizeof(state));

x86_64_write_elf64_note() and x86_write_elf64_note() does the same
processing for note data. Is it better to do this in helper functions
in common?

Thanks.
HATAYAMA, Daisuke

Re: [Qemu-devel] [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core file

2012-02-29 Thread Wen Congyang

At 03/01/2012 01:01 PM, HATAYAMA Daisuke Wrote:
> From: Wen Congyang 
> Subject: [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to 
> core file
> Date: Thu, 01 Mar 2012 10:48:17 +0800
> 
>> +struct QEMUCPUState {
>> +uint32_t version;
>> +uint32_t size;
>> +uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
>> +uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
>> +uint64_t rip, rflags;
>> +QEMUCPUSegment cs, ds, es, fs, gs, ss;
>> +QEMUCPUSegment ldt, tr, gdt, idt;
>> +uint64_t cr[5];
>> +};
>> +
>> +typedef struct QEMUCPUState QEMUCPUState;
> 
>> +static void qemu_get_cpustate(QEMUCPUState *s, CPUState *env)
>> +{
>> +memset(s, 0, sizeof(QEMUCPUState));
>> +
>> +s->version = 1;
> 
> It seems to me better to prepare a macro:
> 
>   #define QEMUCPUSTATE_VERSION (1)
> 
> and use it as:
> 
>   s->version = QEMUCPUSTATE_VERSION;
> 
> and add comment above the macro definition indicating: please count up
> QEMUCPUSTATE_VERSION if you have changed definition of QEMUCPUState,
> and modify the tools using this information accordingly.

Yes, I will fix it.

PS: Do you have any comment about QEMUCPUState? I think the content is enough
to calculate phys_base now.

Thanks
Wen Congyang

> 
> Thanks.
> HATAYAMA, Daisuke
> 
>

Re: [Qemu-devel] [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core file

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core 
file
Date: Thu, 01 Mar 2012 10:48:17 +0800

> +struct QEMUCPUState {
> +uint32_t version;
> +uint32_t size;
> +uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
> +uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
> +uint64_t rip, rflags;
> +QEMUCPUSegment cs, ds, es, fs, gs, ss;
> +QEMUCPUSegment ldt, tr, gdt, idt;
> +uint64_t cr[5];
> +};
> +
> +typedef struct QEMUCPUState QEMUCPUState;

> +static void qemu_get_cpustate(QEMUCPUState *s, CPUState *env)
> +{
> +memset(s, 0, sizeof(QEMUCPUState));
> +
> +s->version = 1;

It seems to me better to prepare a macro:

  #define QEMUCPUSTATE_VERSION (1)

and use it as:

  s->version = QEMUCPUSTATE_VERSION;

and add comment above the macro definition indicating: please count up
QEMUCPUSTATE_VERSION if you have changed definition of QEMUCPUState,
and modify the tools using this information accordingly.

Thanks.
HATAYAMA, Daisuke

Re: [Qemu-devel] [RFC][PATCH 00/14 v7] introducing a new, dedicated memory dump mechanism

2012-02-29 Thread HATAYAMA Daisuke

From: Wen Congyang 
Subject: [RFC][PATCH 00/14 v7] introducing a new, dedicated memory dump 
mechanism
Date: Thu, 01 Mar 2012 10:35:44 +0800

> Hi, all
> 
> 'virsh dump' can not work when host pci device is used by guest. We have
> discussed this issue here:
> http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html
> 
> The last version is here:
> http://lists.nongnu.org/archive/html/qemu-devel/2012-02/msg01007.html
> 
> We have determined to introduce a new command dump to dump memory. The core
> file's format can be elf.
> 
> Note:
> 1. The guest should be x86 or x86_64. The other arch is not supported now.
> 2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.

Does this say the thing caused by gdb versions with no Dwarf V3
support? If so, it's better to write that too explicitly here.

> 3. If the OS is in the second kernel, gdb may not work well, and crash can
>work by specifying '--machdep phys_addr=xxx' in the command line. The
>reason is that the second kernel will update the page table, and we can
>not get the page table for the first kernel.
> 4. The cpu's state is stored in QEMU note. You neet to modify crash to use
>it to calculate phys_base.

Again, you still need to fix crash utility to recover the 1st kernel's
first 640kB physical memory that has been reserved during switch from
1st kernel to 2nd kernel.

Thanks.
HATAYAMA, Daisuke

[Qemu-devel] Windows 8 fails to boot

2012-02-29 Thread Jun Koi

hi,

anybody tested Qemu with the Windows 8 Consumer Preview?

i tried the 32-bit ISO file with 1.0.1, with and without -enable-kvm,
and Qemu reboots immediately after the first screen. that is no more
than 10 seconds into the boot.

thanks,
Jun

Re: [Qemu-devel] Fail to share Samba directory with guest

2012-02-29 Thread Jun Koi

On Tue, Feb 28, 2012 at 9:43 AM, Jun Koi  wrote:
> On Tue, Feb 28, 2012 at 12:08 AM, Shu Ming  wrote:
>> On 2012-2-27 17:21, Jun Koi wrote:
>>>
>>> hi,
>>>
>>> on qemu 1.0.1, i am trying to share a host directory with the Windows
>>> guest like below:
>>>
>>> qemu-system-i386 -enable-kvm -m 1000 -net nic,model=rtl8139 -net
>>> user,smb=/tmp img.winxp
>>>
>>> but in the guest, \\10.0.2.4 doesnt show me any shared directory.
>>>
>>> i already run Samba on the host (default configuration).
>>>
>>> did i miss something, or is it a bug??
>>
>>
>> So 10.0.2.4 is your host IP with samba server?   And what's the network the
>> guest belongs to?
>>
>
> according to some network schemes used by Qemu, 10.0.2.4 is the IP of
> the Samba server (DHCP: 10.0.2.2, DNS: 10.0.2.3, Samba: 10.0.2.4)
>
> http://en.wikibooks.org/wiki/QEMU/Networking
>
> i tried \\10.0.2.2, but dont see any share folder, either.

i tested again, and again, but Samba sharing folder never work for me.
meanwhile, my guest can see the folder shared configured in /etc/samba/smb.conf

so this is definitely a bug. perhaps the Samba setting in net/slirp.c is wrong?
the current configuration is like below. i am not experienced with
Samba, so cannot figure out what is wrong.

btw, it seems the Qemu unittest ignore this sharing folder testing?

thanks,
Ju


// from net/slirp.c, function slirp_smb()
   fprintf(f,
"[global]\n"
"private dir=%s\n"
"smb ports=0\n"
"socket address=127.0.0.1\n"
"pid directory=%s\n"
"lock directory=%s\n"
"log file=%s/log.smbd\n"
"smb passwd file=%s/smbpasswd\n"
"security = share\n"
"[qemu]\n"
"path=%s\n"
"read only=no\n"
"guest ok=yes\n",
s->smb_dir,
s->smb_dir,
s->smb_dir,
s->smb_dir,
s->smb_dir,
exported_dir
);

Re: [Qemu-devel] [PATCH] kvm: notify host when guest paniced

2012-02-29 Thread Wen Congyang

At 02/29/2012 06:39 PM, Avi Kivity Wrote:
> On 02/29/2012 12:17 PM, Wen Congyang wrote:
>
 Yes, crash can be so severe that it is not even detected by a kernel
 itself, so not OOPS message even printed. But in most cases if kernel is
 functional enough to print OOPS it is functional enough to call single
 hypercall instruction.
>>>
>>> Why not print the oops to virtio-serial?  Or even just a regular serial
>>> port?  That's what bare metal does.
>>
>> If virtio-serial's driver has bug or the guest doesn't have such device...
> 
> We have the same issue with the hypercall; and virtio-serial is
> available on many deployed versions.

How to know whether a guest has virtio-serial?

Thanks
Wen Congyang

> 
>>>
>> Having special kdump
>> kernel that transfers dump to a host via virtio-serial channel though
>> sounds interesting. May be that's what you mean.
>
> Yes.  The "panic, starting dump" signal should be initiated by the
> panicking kernel though, in case the dump fails.
>
 Then panic hypercall sounds like a reasonable solution.
>>>
>>> It is, but I'm trying to see if we can get away with doing nothing.
>>>
>>
>> If we have a reliable way with doing nothing, it is better. But I donot
>> find such way.
> 
> We won't have a 100% reliable way.  But I think a variant of the driver
> that doesn't use interrupts, or just using the ordinary serial driver,
> should be reliable enough.
>

Re: [Qemu-devel] [PATCH 1/2 v7] block: add-cow file format

2012-02-29 Thread Dong Xu Wang

Sorry, missed add-cow-cache.c, please ignore it. I will re-send the patch.

On Thu, Mar 1, 2012 at 10:49, Dong Xu Wang  wrote:
> From: Dong Xu Wang 
>
> Provide a new file format: add-cow. The usage can be found in add-cow.txt of
> this patch.
>
> CC: Marcelo Tosatti 
> CC: Kevin Wolf 
> CC: Stefan Hajnoczi 
> Signed-off-by: Dong Xu Wang 
> ---
>  Makefile.objs          |    1 +
>  block.c                |    2 +-
>  block.h                |    1 +
>  block/add-cow.c        |  402 
> 
>  block_int.h            |    1 +
>  docs/specs/add-cow.txt |   68 
>  6 files changed, 474 insertions(+), 1 deletions(-)
>  create mode 100644 block/add-cow.c
>  create mode 100644 docs/specs/add-cow.txt
>
> diff --git a/Makefile.objs b/Makefile.objs
> index 808de6a..fa9dde0 100644
> --- a/Makefile.objs
> +++ b/Makefile.objs
> @@ -34,6 +34,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
> dmg.o bochs.o vpc.o vv
>  block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
> qcow2-cache.o
>  block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
>  block-nested-y += qed-check.o
> +block-nested-y += add-cow.o add-cow-cache.o
>  block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
>  block-nested-y += stream.o
>  block-nested-$(CONFIG_WIN32) += raw-win32.o
> diff --git a/block.c b/block.c
> index 52ffe14..581c092 100644
> --- a/block.c
> +++ b/block.c
> @@ -194,7 +194,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
>  }
>
>  /* check if the path starts with ":" */
> -static int path_has_protocol(const char *path)
> +int path_has_protocol(const char *path)
>  {
>  #ifdef _WIN32
>     if (is_windows_drive(path) ||
> diff --git a/block.h b/block.h
> index 48d0bf3..3d96444 100644
> --- a/block.h
> +++ b/block.h
> @@ -310,6 +310,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
> QEMUSnapshotInfo *sn);
>
>  char *get_human_readable_size(char *buf, int buf_size, int64_t size);
>  int path_is_absolute(const char *path);
> +int path_has_protocol(const char *path);
>  void path_combine(char *dest, int dest_size,
>                   const char *base_path,
>                   const char *filename);
> diff --git a/block/add-cow.c b/block/add-cow.c
> new file mode 100644
> index 000..6897a52
> --- /dev/null
> +++ b/block/add-cow.c
> @@ -0,0 +1,402 @@
> +/*
> + * QEMU ADD-COW Disk Format
> + *
> + * Copyright IBM, Corp. 2012
> + *
> + * Authors:
> + *  Dong Xu Wang 
> + * This work is licensed under the terms of the GNU LGPL, version 2 or later.
> + * See the COPYING.LIB file in the top-level directory.
> + *
> + */
> +
> +#include "qemu-common.h"
> +#include "block_int.h"
> +#include "module.h"
> +#include "add-cow.h"
> +
> +static int add_cow_probe(const uint8_t *buf, int buf_size, const char 
> *filename)
> +{
> +    const AddCowHeader *header = (const void *)buf;
> +
> +    if (be64_to_cpu(header->magic) == ADD_COW_MAGIC &&
> +        be32_to_cpu(header->version) == ADD_COW_VERSION) {
> +        return 100;
> +    } else {
> +        return 0;
> +    }
> +}
> +
> +static int add_cow_open(BlockDriverState *bs, int flags)
> +{
> +    AddCowHeader        header;
> +    char                image_filename[ADD_COW_FILE_LEN];
> +    BlockDriver         *image_drv = NULL;
> +    int                 ret;
> +    BDRVAddCowState     *s = bs->opaque;
> +
> +    ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
> +    if (ret != sizeof(header)) {
> +        goto fail;
> +    }
> +
> +    if (be64_to_cpu(header.magic) != ADD_COW_MAGIC) {
> +        ret = -EINVAL;
> +        goto fail;
> +    }
> +    if (be32_to_cpu(header.version) != ADD_COW_VERSION) {
> +        char version[64];
> +        snprintf(version, sizeof(version), "ADD-COW version %d", 
> header.version);
> +        qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
> +            bs->device_name, "add-cow", version);
> +        ret = -ENOTSUP;
> +        goto fail;
> +    }
> +
> +    QEMU_BUILD_BUG_ON(sizeof(bs->backing_file) != 
> sizeof(header.backing_file));
> +    strncpy(bs->backing_file, header.backing_file,
> +            sizeof(bs->backing_file));
> +
> +    if (header.image_file[0] == '\0') {
> +        ret = -ENOENT;
> +        goto fail;
> +    }
> +    s->image_hd = bdrv_new("");
> +    if (path_has_protocol(header.image_file)) {
> +        strncpy(image_filename, header.image_file, sizeof(image_filename));
> +    } else {
> +        path_combine(image_filename, sizeof(image_filename),
> +                     bs->filename, header.image_file);
> +    }
> +
> +    image_drv = bdrv_find_format("raw");
> +    ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
> +    if (ret < 0) {
> +        bdrv_delete(s->image_hd);
> +        goto fail;
> +    }
> +    bs->total_sectors = s->image_hd->total_sectors;
> +    s->cluster_size = ADD_COW_CLUSTER_SIZE;
> +    s->bitmap_cache = add_cow_cache_create(bs, ADD_COW_CACH

[Qemu-devel] [PATCH 1/2 v7] block: add-cow file format

2012-02-29 Thread Dong Xu Wang

From: Dong Xu Wang 


Provide a new file format: add-cow. The usage can be found in add-cow.txt of
this patch.

CC: Marcelo Tosatti 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
Signed-off-by: Dong Xu Wang 
---
 Makefile.objs  |1 +
 block.c|2 +-
 block.h|1 +
 block/add-cow-cache.c  |  171 
 block/add-cow.c|  402 
 block_int.h|1 +
 docs/specs/add-cow.txt |   68 
 7 files changed, 645 insertions(+), 1 deletions(-)
 create mode 100644 block/add-cow-cache.c
 create mode 100644 block/add-cow.c
 create mode 100644 docs/specs/add-cow.txt

diff --git a/Makefile.objs b/Makefile.objs
index 808de6a..fa9dde0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -34,6 +34,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
+block-nested-y += add-cow.o add-cow-cache.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-y += stream.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block.c b/block.c
index 52ffe14..581c092 100644
--- a/block.c
+++ b/block.c
@@ -194,7 +194,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with ":" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
 #ifdef _WIN32
 if (is_windows_drive(path) ||
diff --git a/block.h b/block.h
index 48d0bf3..3d96444 100644
--- a/block.h
+++ b/block.h
@@ -310,6 +310,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
diff --git a/block/add-cow-cache.c b/block/add-cow-cache.c
new file mode 100644
index 000..6be02ff
--- /dev/null
+++ b/block/add-cow-cache.c
@@ -0,0 +1,171 @@
+/*
+ * Cache For QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang 
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "block_int.h"
+#include "qemu-common.h"
+#include "add-cow.h"
+
+AddCowCache *add_cow_cache_create(BlockDriverState *bs, int num_tables)
+{
+BDRVAddCowState *s = bs->opaque;
+AddCowCache *c;
+int i;
+
+c = g_malloc0(sizeof(*c));
+c->size = num_tables;
+c->entries = g_malloc0(sizeof(*c->entries) * num_tables);
+
+for (i = 0; i < c->size; i++) {
+c->entries[i].table = qemu_blockalign(bs, s->cluster_size);
+c->entries[i].offset = -1;
+}
+
+return c;
+}
+
+int add_cow_cache_destroy(BlockDriverState *bs, AddCowCache *c)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+qemu_vfree(c->entries[i].table);
+}
+
+g_free(c->entries);
+g_free(c);
+
+return 0;
+}
+
+static int add_cow_cache_find_entry_to_replace(AddCowCache *c)
+{
+int i;
+int min_count = INT_MAX;
+int min_index = -1;
+
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].cache_hits < min_count) {
+min_index = i;
+min_count = c->entries[i].cache_hits;
+}
+
+c->entries[i].cache_hits /= 2;
+}
+
+if (min_index == -1) {
+abort();
+}
+return min_index;
+}
+
+static int add_cow_cache_entry_flush(BlockDriverState *bs,
+AddCowCache *c, int i)
+{
+BDRVAddCowState *s = bs->opaque;
+int ret = 0;
+
+if (!c->entries[i].dirty || (-1 == c->entries[i].offset)) {
+return 0;
+}
+ret = bdrv_flush(bs->file);
+if (ret < 0) {
+return ret;
+}
+
+ret = bdrv_pwrite(bs->file,
+sizeof(AddCowHeader) + c->entries[i].offset,
+c->entries[i].table,
+s->cluster_size);
+if (ret < 0) {
+return ret;
+}
+
+c->entries[i].dirty = false;
+
+return 0;
+}
+
+void add_cow_cache_entry_mark_dirty(AddCowCache *c, void *table)
+{
+int i;
+
+for (i = 0; i < c->size; i++) {
+if (c->entries[i].table == table) {
+goto found;
+}
+}
+abort();
+
+found:
+c->entries[i].dirty = true;
+}
+
+int add_cow_cache_flush(BlockDriverState *bs, AddCowCache *c)
+{
+int result = 0;
+int ret;
+int i;
+
+for (i = 0; i < c->size; i++) {
+ret = add_cow_cache_entry_flush(bs, c, i);
+if (ret < 0 && result != -ENOSPC) {
+result = ret;
+}
+}
+
+if (result == 0) {
+ret = bdrv_flush(bs->

[Qemu-devel] [RFC][PATCH 14/14 v7] allow user to dump a fraction of the memory

2012-02-29 Thread Wen Congyang

Signed-off-by: Wen Congyang 
---
 dump.c   |  124 +++--
 hmp-commands.hx  |   14 --
 hmp.c|   13 +-
 memory_mapping.c |   27 
 memory_mapping.h |2 +
 qapi-schema.json |6 ++-
 qmp-commands.hx  |8 +++-
 7 files changed, 172 insertions(+), 22 deletions(-)

diff --git a/dump.c b/dump.c
index dd3a72c..3aa160e 100644
--- a/dump.c
+++ b/dump.c
@@ -91,6 +91,9 @@ typedef struct DumpState {
 void *opaque;
 RAMBlock *block;
 ram_addr_t start;
+bool has_filter;
+int64_t begin;
+int64_t length;
 target_phys_addr_t offset;
 VMChangeStateEntry *handler;
 } DumpState;
@@ -413,17 +416,47 @@ static int write_memory(DumpState *s, RAMBlock *block, 
ram_addr_t start,
 
 /* get the memory's offset in the vmcore */
 static target_phys_addr_t get_offset(target_phys_addr_t phys_addr,
- target_phys_addr_t memory_offset)
+ DumpState *s)
 {
 RAMBlock *block;
-target_phys_addr_t offset = memory_offset;
+target_phys_addr_t offset = s->memory_offset;
+int64_t size_in_block, start;
+
+if (s->has_filter) {
+if (phys_addr < s->begin || phys_addr >= s->begin + s->length) {
+return -1;
+}
+}
 
 QLIST_FOREACH(block, &ram_list.blocks, next) {
-if (phys_addr >= block->offset &&
-phys_addr < block->offset + block->length) {
-return phys_addr - block->offset + offset;
+if (s->has_filter) {
+if (block->offset >= s->begin + s->length ||
+block->offset + block->length <= s->begin) {
+/* This block is out of the range */
+continue;
+}
+
+if (s->begin <= block->offset) {
+start = block->offset;
+} else {
+start = s->begin;
+}
+
+size_in_block = block->length - (start - block->offset);
+if (s->begin + s->length < block->offset + block->length) {
+size_in_block -= block->offset + block->length -
+ (s->begin + s->length);
+}
+} else {
+start = block->offset;
+size_in_block = block->length;
 }
-offset += block->length;
+
+if (phys_addr >= start && phys_addr < start + size_in_block) {
+return phys_addr - start + offset;
+}
+
+offset += size_in_block;
 }
 
 return -1;
@@ -495,7 +528,7 @@ static int dump_completed(DumpState *s)
 int phdr_index = 1, ret;
 
 QTAILQ_FOREACH(memory_mapping, &s->list.head, next) {
-offset = get_offset(memory_mapping->phys_addr, s->memory_offset);
+offset = get_offset(memory_mapping->phys_addr, s);
 if (s->dump_info.d_class == ELFCLASS64) {
 ret = write_elf64_load(s, memory_mapping, phdr_index++, offset);
 } else {
@@ -522,6 +555,17 @@ static int get_next_block(DumpState *s, RAMBlock *block)
 
 s->start = 0;
 s->block = block;
+if (s->has_filter) {
+if (block->offset >= s->begin + s->length ||
+block->offset + block->length <= s->begin) {
+/* This block is out of the range */
+continue;
+}
+
+if (s->begin > block->offset) {
+s->start = s->begin - block->offset;
+}
+}
 
 return 0;
 }
@@ -600,7 +644,36 @@ static void dump_vm_state_change(void *opaque, int 
running, RunState state)
 }
 }
 
-static DumpState *dump_init(Error **errp)
+static ram_addr_t get_start_block(DumpState *s)
+{
+RAMBlock *block;
+
+if (!s->has_filter) {
+s->block = QLIST_FIRST(&ram_list.blocks);
+return 0;
+}
+
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+if (block->offset >= s->begin + s->length ||
+block->offset + block->length <= s->begin) {
+/* This block is out of the range */
+continue;
+}
+
+s->block = block;
+if (s->begin > block->offset ) {
+s->start = s->begin - block->offset;
+} else {
+s->start = 0;
+}
+return s->start;
+}
+
+return -1;
+}
+
+static DumpState *dump_init(bool has_filter, int64_t begin, int64_t length,
+Error **errp)
 {
 CPUState *env;
 DumpState *s = dump_get_current();
@@ -617,8 +690,16 @@ static DumpState *dump_init(Error **errp)
 g_free(s->error);
 s->error = NULL;
 }
-s->block = QLIST_FIRST(&ram_list.blocks);
-s->start = 0;
+
+s->has_filter = has_filter;
+s->begin = begin;
+s->length = length;
+s->start = get_start_block(s);
+if (s->start == -1) {
+error_set(errp, QERR_INVALID_PARAMETER, "begin");
+return NULL;
+}
+
 s->handler = qemu_add_vm_change_state_h

[Qemu-devel] [RFC][PATCH 13/14 v7] support detached dump

2012-02-29 Thread Wen Congyang

Let the user to choose whether to block other monitor command while dumping.

Signed-off-by: Wen Congyang 
---
 dump.c   |2 +-
 hmp-commands.hx  |9 +
 hmp.c|   49 +++--
 qapi-schema.json |3 ++-
 qmp-commands.hx  |7 +--
 5 files changed, 60 insertions(+), 10 deletions(-)

diff --git a/dump.c b/dump.c
index 8224116..dd3a72c 100644
--- a/dump.c
+++ b/dump.c
@@ -721,7 +721,7 @@ static DumpState *dump_init_fd(int fd, Error **errp)
 return s;
 }
 
-void qmp_dump(const char *file, Error **errp)
+void qmp_dump(bool detach, const char *file, Error **errp)
 {
 const char *p;
 int fd = -1;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0c0a7b4..bd0c95d 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -883,20 +883,21 @@ ETEXI
 #if defined(CONFIG_HAVE_CORE_DUMP)
 {
 .name   = "dump",
-.args_type  = "file:s",
-.params = "file",
-.help   = "dump to file",
+.args_type  = "detach:-d,file:s",
+.params = "[-d] file",
+.help   = "dump to file (using -d to not wait for completion)",
 .user_print = monitor_user_noop,
 .mhandler.cmd = hmp_dump,
 },
 
 
 STEXI
-@item dump @var{file}
+@item dump [-d] @var{file}
 @findex dump
 Dump to @var{file}. The file can be processed with crash or gdb.
 file: destination file(started with "file:") or destination file descriptor
   (started with "fd:")
+  -d: donot wait for completion.
 ETEXI
 #endif
 
diff --git a/hmp.c b/hmp.c
index 476d355..707701b 100644
--- a/hmp.c
+++ b/hmp.c
@@ -857,13 +857,58 @@ void hmp_block_job_cancel(Monitor *mon, const QDict 
*qdict)
 hmp_handle_error(mon, &error);
 }
 
+typedef struct DumpingStatus
+{
+QEMUTimer *timer;
+Monitor *mon;
+} DumpingStatus;
+
+static void hmp_dumping_status_cb(void *opaque)
+{
+DumpingStatus *status = opaque;
+DumpInfo *info;
+
+info = qmp_query_dump(NULL);
+if (!info->has_status || strcmp(info->status, "active") == 0) {
+qemu_mod_timer(status->timer, qemu_get_clock_ms(rt_clock) + 1000);
+} else {
+monitor_resume(status->mon);
+qemu_del_timer(status->timer);
+g_free(status);
+}
+
+qapi_free_DumpInfo(info);
+}
+
 void hmp_dump(Monitor *mon, const QDict *qdict)
 {
 Error *errp = NULL;
+int detach = qdict_get_try_bool(qdict, "detach", 0);
 const char *file = qdict_get_str(qdict, "file");
 
-qmp_dump(file, &errp);
-hmp_handle_error(mon, &errp);
+qmp_dump(!!detach, file, &errp);
+if (errp) {
+hmp_handle_error(mon, &errp);
+return;
+}
+
+if (!detach) {
+DumpingStatus *status;
+int ret;
+
+ret = monitor_suspend(mon);
+if (ret < 0) {
+monitor_printf(mon, "terminal does not allow synchronous "
+   "migration, continuing detached\n");
+return;
+}
+
+status = g_malloc0(sizeof(*status));
+status->mon = mon;
+status->timer = qemu_new_timer_ms(rt_clock, hmp_dumping_status_cb,
+  status);
+qemu_mod_timer(status->timer, qemu_get_clock_ms(rt_clock));
+}
 }
 
 void hmp_dump_cancel(Monitor *mon, const QDict *qdict)
diff --git a/qapi-schema.json b/qapi-schema.json
index 1e9af32..ed157d7 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1599,13 +1599,14 @@
 #
 # Dump guest's memory to vmcore.
 #
+# @detach: detached dumping.
 # @file: the filename or file descriptor of the vmcore.
 #
 # Returns: nothing on success
 #
 # Since: 1.1
 ##
-{ 'command': 'dump', 'data': { 'file': 'str' } }
+{ 'command': 'dump', 'data': { 'detach': 'bool', 'file': 'str' } }
 
 ##
 # @dump_cancel
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 666f1bc..c7b9c82 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -589,8 +589,8 @@ EQMP
 #if defined(CONFIG_HAVE_CORE_DUMP)
 {
 .name   = "dump",
-.args_type  = "file:s",
-.params = "file",
+.args_type  = "detach:-d,file:s",
+.params = "[-d] file",
 .help   = "dump to file",
 .user_print = monitor_user_noop,
 .mhandler.cmd_new = qmp_marshal_input_dump,
@@ -616,6 +616,9 @@ Notes:
 
 (1) The 'info dump' command should be used to check dumping's progress
 and final result (this information is provided by the 'status' member)
+(2) All boolean arguments default to false
+(3) The user Monitor's "detach" argument is invalid in QMP and should not
+be used
 
 EQMP
 #endif
-- 
1.7.1

[Qemu-devel] [RFC][PATCH 12/14 v7] run dump at the background

2012-02-29 Thread Wen Congyang

The new monitor command dump may take long time to finish. So we need run it
at the background.

Signed-off-by: Wen Congyang 
---
 dump.c |  168 
 vl.c   |5 +-
 2 files changed, 150 insertions(+), 23 deletions(-)

diff --git a/dump.c b/dump.c
index 48779d8..8224116 100644
--- a/dump.c
+++ b/dump.c
@@ -78,9 +78,21 @@ typedef struct DumpState {
 bool resume;
 char *error;
 target_phys_addr_t memory_offset;
+
+/*
+ * Return value:
+ * -2: EAGAIN
+ * -1: error
+ *  0: success
+ */
 write_core_dump_function f;
 void (*cleanup)(void *opaque);
+int (*dump_begin_iterate)(struct DumpState *, void *opaque);
 void *opaque;
+RAMBlock *block;
+ram_addr_t start;
+target_phys_addr_t offset;
+VMChangeStateEntry *handler;
 } DumpState;
 
 static DumpState *dump_get_current(void)
@@ -98,6 +110,12 @@ static int dump_cleanup(DumpState *s)
 
 memory_mapping_list_free(&s->list);
 s->cleanup(s->opaque);
+
+if (s->handler) {
+qemu_del_vm_change_state_handler(s->handler);
+s->handler = NULL;
+}
+
 if (s->resume) {
 vm_start();
 }
@@ -323,40 +341,70 @@ static int write_elf32_notes(DumpState *s, int phdr_index,
 return 0;
 }
 
+/*
+ * Return value:
+ * -2: blocked
+ * -1: failed
+ *  0: sucess
+ */
 static int write_data(DumpState *s, void *buf, int length,
   target_phys_addr_t *offset)
 {
 int ret;
 
 ret = s->f(*offset, buf, length, s->opaque);
-if (ret < 0) {
+if (ret == -1) {
 dump_error(s, "dump: failed to save memory.\n");
 return -1;
 }
 
+if (ret == -2) {
+return -2;
+}
+
 *offset += length;
 return 0;
 }
 
 /* write the memroy to vmcore. 1 page per I/O. */
-static int write_memory(DumpState *s, RAMBlock *block,
-target_phys_addr_t *offset)
+static int write_memory(DumpState *s, RAMBlock *block, ram_addr_t start,
+target_phys_addr_t *offset, int64_t *size,
+int64_t deadline)
 {
 int i, ret;
+int64_t writen_size = 0;
+int64_t time;
 
-for (i = 0; i < block->length / TARGET_PAGE_SIZE; i++) {
-ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
+*size = block->length - start;
+for (i = 0; i < *size / TARGET_PAGE_SIZE; i++) {
+ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
  TARGET_PAGE_SIZE, offset);
 if (ret < 0) {
-return -1;
+*size = writen_size;
+return ret;
+}
+
+writen_size += TARGET_PAGE_SIZE;
+time = qemu_get_clock_ms(rt_clock);
+if (time >= deadline) {
+/* time out */
+*size = writen_size;
+return -2;
 }
 }
 
-if ((block->length % TARGET_PAGE_SIZE) != 0) {
-ret = write_data(s, block->host + i * TARGET_PAGE_SIZE,
- block->length % TARGET_PAGE_SIZE, offset);
+if ((*size % TARGET_PAGE_SIZE) != 0) {
+ret = write_data(s, block->host + start + i * TARGET_PAGE_SIZE,
+ *size % TARGET_PAGE_SIZE, offset);
 if (ret < 0) {
-return -1;
+*size = writen_size;
+return ret;
+}
+
+time = qemu_get_clock_ms(rt_clock);
+if (time >= deadline) {
+/* time out */
+return -2;
 }
 }
 
@@ -435,6 +483,7 @@ static int dump_begin(DumpState *s)
 }
 
 s->memory_offset = offset;
+s->offset = offset;
 return 0;
 }
 
@@ -462,22 +511,65 @@ static int dump_completed(DumpState *s)
 return 0;
 }
 
-/* write all memory to vmcore */
-static int dump_iterate(DumpState *s)
+static int get_next_block(DumpState *s, RAMBlock *block)
 {
+while (1) {
+block = QLIST_NEXT(block, next);
+if (!block) {
+/* no more block */
+return 1;
+}
+
+s->start = 0;
+s->block = block;
+
+return 0;
+}
+}
+
+/* write memory to vmcore */
+static void dump_iterate(void *opaque)
+{
+DumpState *s = opaque;
 RAMBlock *block;
-target_phys_addr_t offset = s->memory_offset;
+target_phys_addr_t offset = s->offset;
+int64_t size;
+int64_t deadline, now;
 int ret;
 
-/* write all memory to vmcore */
-QLIST_FOREACH(block, &ram_list.blocks, next) {
-ret = write_memory(s, block, &offset);
-if (ret < 0) {
-return -1;
+now = qemu_get_clock_ms(rt_clock);
+deadline = now + 5;
+while(1) {
+block = s->block;
+ret = write_memory(s, block, s->start, &offset, &size, deadline);
+if (ret == -1) {
+return;
+}
+
+if (ret == -2) {
+break;
+}
+
+ret = get_next_block(s, block);
+if (ret == 1) {
+dump_completed(s);
+

[Qemu-devel] [RFC][PATCH 11/14 v7] support to query dumping status

2012-02-29 Thread Wen Congyang

Add API to allow the user to query dumping status.

Signed-off-by: Wen Congyang 
---
 dump.c   |   32 
 hmp-commands.hx  |2 ++
 hmp.c|   17 +
 hmp.h|1 +
 monitor.c|7 +++
 qapi-schema.json |   26 ++
 qmp-commands.hx  |   52 
 7 files changed, 137 insertions(+), 0 deletions(-)

diff --git a/dump.c b/dump.c
index 673720f..48779d8 100644
--- a/dump.c
+++ b/dump.c
@@ -648,3 +648,35 @@ void qmp_dump_cancel(Error **errp)
 dump_cleanup(s);
 return;
 }
+
+DumpInfo *qmp_query_dump(Error **errp)
+{
+DumpInfo *info = g_malloc0(sizeof(*info));
+DumpState *s = dump_get_current();
+
+switch (s->state) {
+case DUMP_STATE_SETUP:
+/* no migration has happened ever */
+break;
+case DUMP_STATE_ACTIVE:
+info->has_status = true;
+info->status = g_strdup("active");
+break;
+case DUMP_STATE_COMPLETED:
+info->has_status = true;
+info->status = g_strdup("completed");
+break;
+case DUMP_STATE_ERROR:
+info->has_status = true;
+info->status = g_strdup("failed");
+info->has_error = true;
+info->error = g_strdup(s->error);
+break;
+case DUMP_STATE_CANCELLED:
+info->has_status = true;
+info->status = g_strdup("cancelled");
+break;
+}
+
+return info;
+}
diff --git a/hmp-commands.hx b/hmp-commands.hx
index e6db6b6..0c0a7b4 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1434,6 +1434,8 @@ show device tree
 show qdev device model list
 @item info roms
 show roms
+@item info dump
+show dumping status
 @end table
 ETEXI
 
diff --git a/hmp.c b/hmp.c
index a20a7b0..476d355 100644
--- a/hmp.c
+++ b/hmp.c
@@ -870,3 +870,20 @@ void hmp_dump_cancel(Monitor *mon, const QDict *qdict)
 {
 qmp_dump_cancel(NULL);
 }
+
+void hmp_info_dump(Monitor *mon)
+{
+DumpInfo *info;
+
+info = qmp_query_dump(NULL);
+
+if (info->has_status) {
+monitor_printf(mon, "Dumping status: %s\n", info->status);
+}
+
+if (info->has_error) {
+monitor_printf(mon, "Dumping failed reason: %s\n", info->error);
+}
+
+qapi_free_DumpInfo(info);
+}
diff --git a/hmp.h b/hmp.h
index 75c6c1d..3d105a9 100644
--- a/hmp.h
+++ b/hmp.h
@@ -61,5 +61,6 @@ void hmp_block_job_set_speed(Monitor *mon, const QDict 
*qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_dump(Monitor *mon, const QDict *qdict);
 void hmp_dump_cancel(Monitor *mon, const QDict *qdict);
+void hmp_info_dump(Monitor *mon);
 
 #endif
diff --git a/monitor.c b/monitor.c
index 96af5e0..f240895 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2603,6 +2603,13 @@ static mon_cmd_t info_cmds[] = {
 .mhandler.info = do_trace_print_events,
 },
 {
+.name   = "dump",
+.args_type  = "",
+.params = "",
+.help   = "show dumping status",
+.mhandler.info = hmp_info_dump,
+},
+{
 .name   = NULL,
 },
 };
diff --git a/qapi-schema.json b/qapi-schema.json
index a764cd3..1e9af32 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1619,3 +1619,29 @@
 # Since: 1.1
 ##
 { 'command': 'dump_cancel' }
+
+##
+# @DumpInfo
+#
+# Information about current migration process.
+#
+# @status: #optional string describing the current dumping status.
+#  As of 1,1 this can be 'active', 'completed', 'failed' or
+#  'cancelled'. If this field is not returned, no migration process
+#  has been initiated
+#
+# Since: 1.1
+##
+{ 'type': 'DumpInfo',
+  'data': { '*status': 'str', '*error': 'str' } }
+
+##
+# @query-dump
+#
+# Returns information about current dumping process.
+#
+# Returns: @DumpInfo
+#
+# Since: 1.1
+##
+{ 'command': 'query-dump', 'returns': 'DumpInfo' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index a2d94a9..666f1bc 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -612,6 +612,11 @@ Example:
 -> { "execute": "dump", "arguments": { "file": "fd:dump" } }
 <- { "return": {} }
 
+Notes:
+
+(1) The 'info dump' command should be used to check dumping's progress
+and final result (this information is provided by the 'status' member)
+
 EQMP
 #endif
 
@@ -2046,6 +2051,53 @@ EQMP
 },
 
 SQMP
+query-dump
+-
+
+Dumping status.
+
+Return a json-object.
+
+The main json-object contains the following:
+
+- "status": migration status (json-string)
+ - Possible values: "active", "completed", "failed", "cancelled"
+
+Examples:
+
+1. Before the first migration
+
+-> { "execute": "query-dump" }
+<- { "return": {} }
+
+2. Migration is done and has succeeded
+
+-> { "execute": "query-dump" }
+<- { "return": { "status": "completed" } }
+
+3. Migration is done and has failed
+
+-> { "execute": "query-dump" }
+<- { "return": { "status": "failed" } }
+
+4. Migration is being performed:
+
+-> { "execute":

[Qemu-devel] [RFC][PATCH 10/14 v7] support to cancel the current dumping

2012-02-29 Thread Wen Congyang

Add API to allow the user to cancel the current dumping.

Signed-off-by: Wen Congyang 
---
 dump.c   |   13 +
 hmp-commands.hx  |   14 ++
 hmp.c|5 +
 hmp.h|1 +
 qapi-schema.json |   13 +
 qmp-commands.hx  |   21 +
 6 files changed, 67 insertions(+), 0 deletions(-)

diff --git a/dump.c b/dump.c
index c9baedb..673720f 100644
--- a/dump.c
+++ b/dump.c
@@ -635,3 +635,16 @@ void qmp_dump(const char *file, Error **errp)
 
 return;
 }
+
+void qmp_dump_cancel(Error **errp)
+{
+DumpState *s = dump_get_current();
+
+if (s->state != DUMP_STATE_ACTIVE) {
+return;
+}
+
+s->state = DUMP_STATE_CANCELLED;
+dump_cleanup(s);
+return;
+}
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 19200ad..e6db6b6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -901,6 +901,20 @@ ETEXI
 #endif
 
 {
+.name   = "dump_cancel",
+.args_type  = "",
+.params = "",
+.help   = "cancel the current VM dumping",
+.mhandler.cmd = hmp_dump_cancel,
+},
+
+STEXI
+@item dump_cancel
+@findex dump_cancel
+Cancel the current VM dumping.
+ETEXI
+
+{
 .name   = "snapshot_blkdev",
 .args_type  = "device:B,snapshot-file:s?,format:s?",
 .params = "device [new-image-file] [format]",
diff --git a/hmp.c b/hmp.c
index 309ccec..a20a7b0 100644
--- a/hmp.c
+++ b/hmp.c
@@ -865,3 +865,8 @@ void hmp_dump(Monitor *mon, const QDict *qdict)
 qmp_dump(file, &errp);
 hmp_handle_error(mon, &errp);
 }
+
+void hmp_dump_cancel(Monitor *mon, const QDict *qdict)
+{
+qmp_dump_cancel(NULL);
+}
diff --git a/hmp.h b/hmp.h
index b055e50..75c6c1d 100644
--- a/hmp.h
+++ b/hmp.h
@@ -60,5 +60,6 @@ void hmp_block_stream(Monitor *mon, const QDict *qdict);
 void hmp_block_job_set_speed(Monitor *mon, const QDict *qdict);
 void hmp_block_job_cancel(Monitor *mon, const QDict *qdict);
 void hmp_dump(Monitor *mon, const QDict *qdict);
+void hmp_dump_cancel(Monitor *mon, const QDict *qdict);
 
 #endif
diff --git a/qapi-schema.json b/qapi-schema.json
index 3c4aa70..a764cd3 100644
--- a/qapi-schema.json
+++ b/qapi-schema.json
@@ -1606,3 +1606,16 @@
 # Since: 1.1
 ##
 { 'command': 'dump', 'data': { 'file': 'str' } }
+
+##
+# @dump_cancel
+#
+# Cancel the current executing dumping process.
+#
+# Returns: nothing on success
+#
+# Notes: This command succeeds even if there is no dumping process running.
+#
+# Since: 1.1
+##
+{ 'command': 'dump_cancel' }
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 1199316..a2d94a9 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -616,6 +616,27 @@ EQMP
 #endif
 
 {
+.name   = "dump_cancel",
+.args_type  = "",
+.mhandler.cmd_new = qmp_marshal_input_dump_cancel,
+},
+
+SQMP
+dump_cancel
+
+
+Cancel the current dumping.
+
+Arguments: None.
+
+Example:
+
+-> { "execute": "dump_cancel" }
+<- { "return": {} }
+
+EQMP
+
+{
 .name   = "netdev_add",
 .args_type  = "netdev:O",
 .params = "[user|tap|socket],id=str[,prop=value][,...]",
-- 
1.7.1

[Qemu-devel] [RFC][PATCH 09/14 v7] introduce a new monitor command 'dump' to dump guest's memory

2012-02-29 Thread Wen Congyang

Signed-off-by: Wen Congyang 
---
 Makefile.target  |2 +-
 dump.c   |  637 ++
 hmp-commands.hx  |   20 ++
 hmp.c|9 +
 hmp.h|1 +
 qapi-schema.json |   13 ++
 qmp-commands.hx  |   29 +++
 7 files changed, 710 insertions(+), 1 deletions(-)
 create mode 100644 dump.c

diff --git a/Makefile.target b/Makefile.target
index cfd3113..4ae59f5 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -210,7 +210,7 @@ obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-$(CONFIG_VGA) += vga.o
 obj-y += memory.o savevm.o
 obj-y += memory_mapping.o
-obj-$(CONFIG_HAVE_CORE_DUMP) += arch_dump.o
+obj-$(CONFIG_HAVE_CORE_DUMP) += arch_dump.o dump.o
 LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
diff --git a/dump.c b/dump.c
new file mode 100644
index 000..c9baedb
--- /dev/null
+++ b/dump.c
@@ -0,0 +1,637 @@
+/*
+ * QEMU dump
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ * Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include 
+#include "elf.h"
+#include 
+#include 
+#include "cpu.h"
+#include "cpu-all.h"
+#include "targphys.h"
+#include "monitor.h"
+#include "kvm.h"
+#include "dump.h"
+#include "sysemu.h"
+#include "bswap.h"
+#include "memory_mapping.h"
+#include "error.h"
+#include "qmp-commands.h"
+#include "gdbstub.h"
+
+static inline uint16_t cpu_contert_to_target16(uint16_t val, int endian)
+{
+if (endian == ELFDATA2LSB) {
+val = cpu_to_le16(val);
+} else {
+val = cpu_to_be16(val);
+}
+
+return val;
+}
+
+static inline uint32_t cpu_contert_to_target32(uint32_t val, int endian)
+{
+if (endian == ELFDATA2LSB) {
+val = cpu_to_le32(val);
+} else {
+val = cpu_to_be32(val);
+}
+
+return val;
+}
+
+static inline uint64_t cpu_contert_to_target64(uint64_t val, int endian)
+{
+if (endian == ELFDATA2LSB) {
+val = cpu_to_le64(val);
+} else {
+val = cpu_to_be64(val);
+}
+
+return val;
+}
+
+enum {
+DUMP_STATE_ERROR,
+DUMP_STATE_SETUP,
+DUMP_STATE_CANCELLED,
+DUMP_STATE_ACTIVE,
+DUMP_STATE_COMPLETED,
+};
+
+typedef struct DumpState {
+ArchDumpInfo dump_info;
+MemoryMappingList list;
+int phdr_num;
+int state;
+bool resume;
+char *error;
+target_phys_addr_t memory_offset;
+write_core_dump_function f;
+void (*cleanup)(void *opaque);
+void *opaque;
+} DumpState;
+
+static DumpState *dump_get_current(void)
+{
+static DumpState current_dump = {
+.state = DUMP_STATE_SETUP,
+};
+
+return ¤t_dump;
+}
+
+static int dump_cleanup(DumpState *s)
+{
+int ret = 0;
+
+memory_mapping_list_free(&s->list);
+s->cleanup(s->opaque);
+if (s->resume) {
+vm_start();
+}
+
+return ret;
+}
+
+static void dump_error(DumpState *s, const char *reason)
+{
+s->state = DUMP_STATE_ERROR;
+s->error = g_strdup(reason);
+dump_cleanup(s);
+}
+
+static int write_elf64_header(DumpState *s)
+{
+Elf64_Ehdr elf_header;
+int ret;
+int endian = s->dump_info.d_endian;
+
+memset(&elf_header, 0, sizeof(Elf64_Ehdr));
+memcpy(&elf_header, ELFMAG, 4);
+elf_header.e_ident[EI_CLASS] = ELFCLASS64;
+elf_header.e_ident[EI_DATA] = s->dump_info.d_endian;
+elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+elf_header.e_type = cpu_contert_to_target16(ET_CORE, endian);
+elf_header.e_machine = cpu_contert_to_target16(s->dump_info.d_machine,
+   endian);
+elf_header.e_version = cpu_contert_to_target32(EV_CURRENT, endian);
+elf_header.e_ehsize = cpu_contert_to_target16(sizeof(elf_header), endian);
+elf_header.e_phoff = cpu_contert_to_target64(sizeof(Elf64_Ehdr), endian);
+elf_header.e_phentsize = cpu_contert_to_target16(sizeof(Elf64_Phdr), 
endian);
+elf_header.e_phnum = cpu_contert_to_target16(s->phdr_num, endian);
+
+ret = s->f(0, &elf_header, sizeof(elf_header), s->opaque);
+if (ret < 0) {
+dump_error(s, "dump: failed to write elf header.\n");
+return -1;
+}
+
+return 0;
+}
+
+static int write_elf32_header(DumpState *s)
+{
+Elf32_Ehdr elf_header;
+int ret;
+int endian = s->dump_info.d_endian;
+
+memset(&elf_header, 0, sizeof(Elf32_Ehdr));
+memcpy(&elf_header, ELFMAG, 4);
+elf_header.e_ident[EI_CLASS] = ELFCLASS32;
+elf_header.e_ident[EI_DATA] = endian;
+elf_header.e_ident[EI_VERSION] = EV_CURRENT;
+elf_header.e_type = cpu_contert_to_target16(ET_CORE, endian);
+elf_header.e_machine = cpu_contert_to_target16(s->dump_info.d_machine,
+   endian);
+elf_header.e_version = cpu_contert_to_target32(EV_CURRENT, endian);
+elf_header.e_ehsize = cpu_contert_to_target16(sizeof(elf_header), endian);
+elf_header.e_phoff = cpu_contert_to

[Qemu-devel] [PATCH 1/2 v7] block: add-cow file format

2012-02-29 Thread Dong Xu Wang

From: Dong Xu Wang 

Provide a new file format: add-cow. The usage can be found in add-cow.txt of
this patch.

CC: Marcelo Tosatti 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
Signed-off-by: Dong Xu Wang 
---
 Makefile.objs  |1 +
 block.c|2 +-
 block.h|1 +
 block/add-cow.c|  402 
 block_int.h|1 +
 docs/specs/add-cow.txt |   68 
 6 files changed, 474 insertions(+), 1 deletions(-)
 create mode 100644 block/add-cow.c
 create mode 100644 docs/specs/add-cow.txt

diff --git a/Makefile.objs b/Makefile.objs
index 808de6a..fa9dde0 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -34,6 +34,7 @@ block-nested-y += raw.o cow.o qcow.o vdi.o vmdk.o cloop.o 
dmg.o bochs.o vpc.o vv
 block-nested-y += qcow2.o qcow2-refcount.o qcow2-cluster.o qcow2-snapshot.o 
qcow2-cache.o
 block-nested-y += qed.o qed-gencb.o qed-l2-cache.o qed-table.o qed-cluster.o
 block-nested-y += qed-check.o
+block-nested-y += add-cow.o add-cow-cache.o
 block-nested-y += parallels.o nbd.o blkdebug.o sheepdog.o blkverify.o
 block-nested-y += stream.o
 block-nested-$(CONFIG_WIN32) += raw-win32.o
diff --git a/block.c b/block.c
index 52ffe14..581c092 100644
--- a/block.c
+++ b/block.c
@@ -194,7 +194,7 @@ static void bdrv_io_limits_intercept(BlockDriverState *bs,
 }
 
 /* check if the path starts with ":" */
-static int path_has_protocol(const char *path)
+int path_has_protocol(const char *path)
 {
 #ifdef _WIN32
 if (is_windows_drive(path) ||
diff --git a/block.h b/block.h
index 48d0bf3..3d96444 100644
--- a/block.h
+++ b/block.h
@@ -310,6 +310,7 @@ char *bdrv_snapshot_dump(char *buf, int buf_size, 
QEMUSnapshotInfo *sn);
 
 char *get_human_readable_size(char *buf, int buf_size, int64_t size);
 int path_is_absolute(const char *path);
+int path_has_protocol(const char *path);
 void path_combine(char *dest, int dest_size,
   const char *base_path,
   const char *filename);
diff --git a/block/add-cow.c b/block/add-cow.c
new file mode 100644
index 000..6897a52
--- /dev/null
+++ b/block/add-cow.c
@@ -0,0 +1,402 @@
+/*
+ * QEMU ADD-COW Disk Format
+ *
+ * Copyright IBM, Corp. 2012
+ *
+ * Authors:
+ *  Dong Xu Wang 
+ * This work is licensed under the terms of the GNU LGPL, version 2 or later.
+ * See the COPYING.LIB file in the top-level directory.
+ *
+ */
+
+#include "qemu-common.h"
+#include "block_int.h"
+#include "module.h"
+#include "add-cow.h"
+
+static int add_cow_probe(const uint8_t *buf, int buf_size, const char 
*filename)
+{
+const AddCowHeader *header = (const void *)buf;
+
+if (be64_to_cpu(header->magic) == ADD_COW_MAGIC &&
+be32_to_cpu(header->version) == ADD_COW_VERSION) {
+return 100;
+} else {
+return 0;
+}
+}
+
+static int add_cow_open(BlockDriverState *bs, int flags)
+{
+AddCowHeaderheader;
+charimage_filename[ADD_COW_FILE_LEN];
+BlockDriver *image_drv = NULL;
+int ret;
+BDRVAddCowState *s = bs->opaque;
+
+ret = bdrv_pread(bs->file, 0, &header, sizeof(header));
+if (ret != sizeof(header)) {
+goto fail;
+}
+
+if (be64_to_cpu(header.magic) != ADD_COW_MAGIC) {
+ret = -EINVAL;
+goto fail;
+}
+if (be32_to_cpu(header.version) != ADD_COW_VERSION) {
+char version[64];
+snprintf(version, sizeof(version), "ADD-COW version %d", 
header.version);
+qerror_report(QERR_UNKNOWN_BLOCK_FORMAT_FEATURE,
+bs->device_name, "add-cow", version);
+ret = -ENOTSUP;
+goto fail;
+}
+
+QEMU_BUILD_BUG_ON(sizeof(bs->backing_file) != sizeof(header.backing_file));
+strncpy(bs->backing_file, header.backing_file,
+sizeof(bs->backing_file));
+
+if (header.image_file[0] == '\0') {
+ret = -ENOENT;
+goto fail;
+}
+s->image_hd = bdrv_new("");
+if (path_has_protocol(header.image_file)) {
+strncpy(image_filename, header.image_file, sizeof(image_filename));
+} else {
+path_combine(image_filename, sizeof(image_filename),
+ bs->filename, header.image_file);
+}
+
+image_drv = bdrv_find_format("raw");
+ret = bdrv_open(s->image_hd, image_filename, flags, image_drv);
+if (ret < 0) {
+bdrv_delete(s->image_hd);
+goto fail;
+}
+bs->total_sectors = s->image_hd->total_sectors;
+s->cluster_size = ADD_COW_CLUSTER_SIZE;
+s->bitmap_cache = add_cow_cache_create(bs, ADD_COW_CACHE_SIZE);
+qemu_co_mutex_init(&s->lock);
+return 0;
+ fail:
+return ret;
+}
+
+static inline bool is_bit_set(BlockDriverState *bs, int64_t bitnum)
+{
+BDRVAddCowState *s = bs->opaque;
+uint64_t offset = bitnum >> 3;
+uint8_t *bitmap;
+int ret = add_cow_cache_get(bs, s->bitmap_cache,
+offset & ~(ADD_COW_CLUSTER_SIZE - 1), (void **)&bitmap);
+if (ret < 0) {
+abort();
+

[Qemu-devel] [PATCH 2/2] block: add-cow support snapshot_blkdev

2012-02-29 Thread Dong Xu Wang

From: Dong Xu Wang 

We can not use raw to support snapshot_file, but add-cow can do this.

CC: Marcelo Tosatti 
CC: Kevin Wolf 
CC: Stefan Hajnoczi 
Signed-off-by: Dong Xu Wang 
---
 blockdev.c  |   53 ++
 docs/live-block-ops.txt |8 ++-
 2 files changed, 55 insertions(+), 6 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d78aa51..c820fcb 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -687,12 +687,55 @@ void qmp_blockdev_snapshot_sync(const char *device, const 
char *snapshot_file,
 return;
 }
 
-ret = bdrv_img_create(snapshot_file, format, bs->filename,
-  bs->drv->format_name, NULL, -1, flags);
-if (ret) {
-error_set(errp, QERR_UNDEFINED_ERROR);
-return;
+if (strcmp(format, "add-cow")) {
+ret = bdrv_img_create(snapshot_file, format, bs->filename,
+  bs->drv->format_name, NULL, -1, flags);
+if (ret) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
+} else {
+char image_file[1024];
+char option[1024];
+
+uint64_t size;
+BlockDriver *backing_drv = NULL;
+BlockDriverState *backing_bs = NULL;
+
+backing_bs = bdrv_new("");
+backing_drv = bdrv_find_format(bs->drv->format_name);
+if (!backing_drv) {
+error_report("Unknown backing file format '%s'",
+ bs->drv->format_name);
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
+ret = bdrv_open(backing_bs, bs->filename, flags, backing_drv);
+if (ret < 0) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
+bdrv_get_geometry(backing_bs, &size);
+size *= 512;
+bdrv_delete(backing_bs);
+
+sprintf(image_file, "%s.raw", snapshot_file);
+
+ret = bdrv_img_create(image_file, "raw", NULL,
+  NULL, NULL, size, flags);
+if (ret) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
+sprintf(option, "image_file=%s.raw", snapshot_file);
+ret = bdrv_img_create(snapshot_file, format, bs->filename,
+  bs->drv->format_name, option, -1, flags);
+if (ret) {
+error_set(errp, QERR_UNDEFINED_ERROR);
+return;
+}
 }
+bs->backing_format[0] = '\0';
 
 bdrv_drain_all();
 bdrv_flush(bs);
diff --git a/docs/live-block-ops.txt b/docs/live-block-ops.txt
index a257087..7edbf91 100644
--- a/docs/live-block-ops.txt
+++ b/docs/live-block-ops.txt
@@ -2,7 +2,8 @@ LIVE BLOCK OPERATIONS
 =
 
 High level description of live block operations. Note these are not
-supported for use with the raw format at the moment.
+supported for use with the raw format at the moment, but we can use
+add-cow as metadata to suport raw format.
 
 Snapshot live merge
 ===
@@ -55,4 +56,9 @@ into that image. Example:
 
 (qemu) block_stream ide0-hd0
 
+Raw is not supported, but we can use add-cow in the 1st step:
 
+(qemu) snapshot_blkdev ide0-hd0 /new-path/disk.img add-cow
+
+It will create a raw file named disk.img.raw, with the same virtual size of
+ide0-hd0 first, and then create disk.img.
-- 
1.7.5.4

[Qemu-devel] [RFC][PATCH 08/14 v7] make gdb_id() generally avialable

2012-02-29 Thread Wen Congyang

The following patch also needs this API, so make it generally avialable

Signed-off-by: Wen Congyang 
---
 gdbstub.c |9 -
 gdbstub.h |9 +
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 7d470b6..046b036 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -1939,15 +1939,6 @@ static void gdb_set_cpu_pc(GDBState *s, target_ulong pc)
 #endif
 }
 
-static inline int gdb_id(CPUState *env)
-{
-#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
-return env->host_tid;
-#else
-return env->cpu_index + 1;
-#endif
-}
-
 static CPUState *find_cpu(uint32_t thread_id)
 {
 CPUState *env;
diff --git a/gdbstub.h b/gdbstub.h
index d82334f..f30bfe8 100644
--- a/gdbstub.h
+++ b/gdbstub.h
@@ -30,6 +30,15 @@ void gdb_register_coprocessor(CPUState *env,
   gdb_reg_cb get_reg, gdb_reg_cb set_reg,
   int num_regs, const char *xml, int g_pos);
 
+static inline int gdb_id(CPUState *env)
+{
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_USE_NPTL)
+return env->host_tid;
+#else
+return env->cpu_index + 1;
+#endif
+}
+
 #endif
 
 #ifdef CONFIG_USER_ONLY
-- 
1.7.1

[Qemu-devel] [RFC][PATCH 07/14 v7] target-i386: add API to get dump info

2012-02-29 Thread Wen Congyang

Dump info contains: endian, class and architecture. The next
patch will use these information to create vmcore.

Signed-off-by: Wen Congyang 
---
 cpu-all.h   |7 +++
 dump.h  |   23 +++
 target-i386/arch_dump.c |   34 ++
 3 files changed, 64 insertions(+), 0 deletions(-)
 create mode 100644 dump.h

diff --git a/cpu-all.h b/cpu-all.h
index ad69269..a77f9e8 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -23,6 +23,7 @@
 #include "qemu-tls.h"
 #include "cpu-common.h"
 #include "memory_mapping.h"
+#include "dump.h"
 
 /* some important defines:
  *
@@ -544,6 +545,7 @@ int cpu_write_elf64_qemunote(write_core_dump_function f, 
CPUState *env,
  target_phys_addr_t *offset, void *opaque);
 int cpu_write_elf32_qemunote(write_core_dump_function f, CPUState *env,
  target_phys_addr_t *offset, void *opaque);
+int cpu_get_dump_info(ArchDumpInfo *info);
 #else
 static inline int cpu_write_elf64_note(write_core_dump_function f,
CPUState *env, int cpuid,
@@ -574,6 +576,11 @@ static inline int 
cpu_write_elf32_qemunote(write_core_dump_function f,
 {
 return -1;
 }
+
+static inline int cpu_get_dump_info(ArchDumpInfo *info)
+{
+return -1;
+}
 #endif
 
 #endif /* CPU_ALL_H */
diff --git a/dump.h b/dump.h
new file mode 100644
index 000..28340cf
--- /dev/null
+++ b/dump.h
@@ -0,0 +1,23 @@
+/*
+ * QEMU dump
+ *
+ * Copyright Fujitsu, Corp. 2011, 2012
+ *
+ * Authors:
+ * Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef DUMP_H
+#define DUMP_H
+
+typedef struct ArchDumpInfo {
+int d_machine;  /* Architecture */
+int d_endian;   /* ELFDATA2LSB or ELFDATA2MSB */
+int d_class;/* ELFCLASS32 or ELFCLASS64 */
+} ArchDumpInfo;
+
+#endif
diff --git a/target-i386/arch_dump.c b/target-i386/arch_dump.c
index 560c8a3..e4351f4 100644
--- a/target-i386/arch_dump.c
+++ b/target-i386/arch_dump.c
@@ -13,6 +13,7 @@
 
 #include "cpu.h"
 #include "cpu-all.h"
+#include "dump.h"
 #include "elf.h"
 
 #ifdef TARGET_X86_64
@@ -401,3 +402,36 @@ int cpu_write_elf32_qemunote(write_core_dump_function f, 
CPUState *env,
 
 return 0;
 }
+
+int cpu_get_dump_info(ArchDumpInfo *info)
+{
+bool lma = false;
+RAMBlock *block;
+
+#ifdef TARGET_X86_64
+lma = !!(first_cpu->hflags & HF_LMA_MASK);
+#endif
+
+if (lma) {
+info->d_machine = EM_X86_64;
+} else {
+info->d_machine = EM_386;
+}
+info->d_endian = ELFDATA2LSB;
+
+if (lma) {
+info->d_class = ELFCLASS64;
+} else {
+info->d_class = ELFCLASS32;
+
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+if (block->offset + block->length > UINT_MAX) {
+/* The memory size is greater than 4G */
+info->d_class = ELFCLASS64;
+break;
+}
+}
+}
+
+return 0;
+}
-- 
1.7.1

[Qemu-devel] [RFC][PATCH 06/14 v7] target-i386: Add API to write cpu status to core file

2012-02-29 Thread Wen Congyang

The core file has register's value. But it does not include all register.
Store the cpu status into QEMU note, and the user can get more information
from vmcore.

Signed-off-by: Wen Congyang 
---
 cpu-all.h   |   20 ++
 target-i386/arch_dump.c |  154 +++
 2 files changed, 174 insertions(+), 0 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index f7c6321..ad69269 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -540,6 +540,10 @@ int cpu_write_elf64_note(write_core_dump_function f, 
CPUState *env, int cpuid,
  target_phys_addr_t *offset, void *opaque);
 int cpu_write_elf32_note(write_core_dump_function f, CPUState *env, int cpuid,
  target_phys_addr_t *offset, void *opaque);
+int cpu_write_elf64_qemunote(write_core_dump_function f, CPUState *env,
+ target_phys_addr_t *offset, void *opaque);
+int cpu_write_elf32_qemunote(write_core_dump_function f, CPUState *env,
+ target_phys_addr_t *offset, void *opaque);
 #else
 static inline int cpu_write_elf64_note(write_core_dump_function f,
CPUState *env, int cpuid,
@@ -554,6 +558,22 @@ static inline int 
cpu_write_elf32_note(write_core_dump_function f,
 {
 return -1;
 }
+
+static inline int cpu_write_elf64_qemunote(write_core_dump_function f,
+   CPUState *env,
+   target_phys_addr_t *offset,
+   void *opaque);
+{
+return -1;
+}
+
+static inline int cpu_write_elf32_qemunote(write_core_dump_function f,
+   CPUState *env,
+   target_phys_addr_t *offset,
+   void *opaque)
+{
+return -1;
+}
 #endif
 
 #endif /* CPU_ALL_H */
diff --git a/target-i386/arch_dump.c b/target-i386/arch_dump.c
index 3239c40..560c8a3 100644
--- a/target-i386/arch_dump.c
+++ b/target-i386/arch_dump.c
@@ -247,3 +247,157 @@ int cpu_write_elf32_note(write_core_dump_function f, 
CPUState *env, int cpuid,
 
 return 0;
 }
+
+struct QEMUCPUSegment {
+uint32_t selector;
+uint32_t limit;
+uint32_t flags;
+uint32_t pad;
+uint64_t base;
+};
+
+typedef struct QEMUCPUSegment QEMUCPUSegment;
+
+struct QEMUCPUState {
+uint32_t version;
+uint32_t size;
+uint64_t rax, rbx, rcx, rdx, rsi, rdi, rsp, rbp;
+uint64_t r8, r9, r10, r11, r12, r13, r14, r15;
+uint64_t rip, rflags;
+QEMUCPUSegment cs, ds, es, fs, gs, ss;
+QEMUCPUSegment ldt, tr, gdt, idt;
+uint64_t cr[5];
+};
+
+typedef struct QEMUCPUState QEMUCPUState;
+
+static void copy_segment(QEMUCPUSegment *d, SegmentCache *s)
+{
+d->pad = 0;
+d->selector = s->selector;
+d->limit = s->limit;
+d->flags = s->flags;
+d->base = s->base;
+}
+
+static void qemu_get_cpustate(QEMUCPUState *s, CPUState *env)
+{
+memset(s, 0, sizeof(QEMUCPUState));
+
+s->version = 1;
+s->size = sizeof(QEMUCPUState);
+
+s->rax = env->regs[R_EAX];
+s->rbx = env->regs[R_EBX];
+s->rcx = env->regs[R_ECX];
+s->rdx = env->regs[R_EDX];
+s->rsi = env->regs[R_ESI];
+s->rdi = env->regs[R_EDI];
+s->rsp = env->regs[R_ESP];
+s->rbp = env->regs[R_EBP];
+#ifdef TARGET_X86_64
+s->r8  = env->regs[8];
+s->r9  = env->regs[9];
+s->r10 = env->regs[10];
+s->r11 = env->regs[11];
+s->r12 = env->regs[12];
+s->r13 = env->regs[13];
+s->r14 = env->regs[14];
+s->r15 = env->regs[15];
+#endif
+s->rip = env->eip;
+s->rflags = env->eflags;
+
+copy_segment(&s->cs, &env->segs[R_CS]);
+copy_segment(&s->ds, &env->segs[R_DS]);
+copy_segment(&s->es, &env->segs[R_ES]);
+copy_segment(&s->fs, &env->segs[R_FS]);
+copy_segment(&s->gs, &env->segs[R_GS]);
+copy_segment(&s->ss, &env->segs[R_SS]);
+copy_segment(&s->ldt, &env->ldt);
+copy_segment(&s->tr, &env->tr);
+copy_segment(&s->gdt, &env->gdt);
+copy_segment(&s->idt, &env->idt);
+
+s->cr[0] = env->cr[0];
+s->cr[1] = env->cr[1];
+s->cr[2] = env->cr[2];
+s->cr[3] = env->cr[3];
+s->cr[4] = env->cr[4];
+}
+
+int cpu_write_elf64_qemunote(write_core_dump_function f, CPUState *env,
+ target_phys_addr_t *offset, void *opaque)
+{
+QEMUCPUState state;
+Elf64_Nhdr *note;
+char *buf;
+int descsz, note_size, name_size = 5;
+const char *name = "QEMU";
+int ret;
+
+qemu_get_cpustate(&state, env);
+
+descsz = sizeof(state);
+note_size = ((sizeof(Elf32_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+(descsz + 3) / 4) * 4;
+note = g_malloc(note_size);
+
+memset(note, 0, note_size);
+note->n_namesz = cpu_to_le32(name_size);
+note->n_descsz = cpu_to_le32(descsz);
+note->n_type = 0;
+buf = (char *)note;
+buf += ((sizeof(Elf32_Nhdr) + 3) / 4) * 4;
+mem

[Qemu-devel] [RFC][PATCH 05/14 v7] target-i386: Add API to write elf notes to core file

2012-02-29 Thread Wen Congyang

The core file contains register's value. These APIs write registers to
core file, and them will be called in the following patch.

Signed-off-by: Wen Congyang 
---
 Makefile.target |1 +
 configure   |4 +
 cpu-all.h   |   23 +
 target-i386/arch_dump.c |  249 +++
 4 files changed, 277 insertions(+), 0 deletions(-)
 create mode 100644 target-i386/arch_dump.c

diff --git a/Makefile.target b/Makefile.target
index a87e678..cfd3113 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -210,6 +210,7 @@ obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-$(CONFIG_VGA) += vga.o
 obj-y += memory.o savevm.o
 obj-y += memory_mapping.o
+obj-$(CONFIG_HAVE_CORE_DUMP) += arch_dump.o
 LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
diff --git a/configure b/configure
index ddc54f5..d2f24d3 100755
--- a/configure
+++ b/configure
@@ -3649,6 +3649,10 @@ if test "$target_softmmu" = "yes" ; then
   if test "$smartcard_nss" = "yes" ; then
 echo "subdir-$target: subdir-libcacard" >> $config_host_mak
   fi
+  case "$target_arch2" in
+i386|x86_64)
+  echo "CONFIG_HAVE_CORE_DUMP=y" >> $config_target_mak
+  esac
 fi
 if test "$target_user_only" = "yes" ; then
   echo "CONFIG_USER_ONLY=y" >> $config_target_mak
diff --git a/cpu-all.h b/cpu-all.h
index cb72680..f7c6321 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -533,4 +533,27 @@ static inline int cpu_get_memory_mapping(MemoryMappingList 
*list, CPUState *env)
 }
 #endif
 
+typedef int (*write_core_dump_function)
+(target_phys_addr_t offset, void *buf, size_t size, void *opaque);
+#if defined(CONFIG_HAVE_CORE_DUMP)
+int cpu_write_elf64_note(write_core_dump_function f, CPUState *env, int cpuid,
+ target_phys_addr_t *offset, void *opaque);
+int cpu_write_elf32_note(write_core_dump_function f, CPUState *env, int cpuid,
+ target_phys_addr_t *offset, void *opaque);
+#else
+static inline int cpu_write_elf64_note(write_core_dump_function f,
+   CPUState *env, int cpuid,
+   target_phys_addr_t *offset, void 
*opaque)
+{
+return -1;
+}
+
+static inline int cpu_write_elf32_note(write_core_dump_function f,
+   CPUState *env, int cpuid,
+   target_phys_addr_t *offset, void 
*opaque)
+{
+return -1;
+}
+#endif
+
 #endif /* CPU_ALL_H */
diff --git a/target-i386/arch_dump.c b/target-i386/arch_dump.c
new file mode 100644
index 000..3239c40
--- /dev/null
+++ b/target-i386/arch_dump.c
@@ -0,0 +1,249 @@
+/*
+ * i386 memory mapping
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ * Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "cpu.h"
+#include "cpu-all.h"
+#include "elf.h"
+
+#ifdef TARGET_X86_64
+typedef struct {
+target_ulong r15, r14, r13, r12, rbp, rbx, r11, r10;
+target_ulong r9, r8, rax, rcx, rdx, rsi, rdi, orig_rax;
+target_ulong rip, cs, eflags;
+target_ulong rsp, ss;
+target_ulong fs_base, gs_base;
+target_ulong ds, es, fs, gs;
+} x86_64_user_regs_struct;
+
+static int x86_64_write_elf64_note(write_core_dump_function f, CPUState *env,
+   int id, target_phys_addr_t *offset,
+   void *opaque)
+{
+x86_64_user_regs_struct regs;
+Elf64_Nhdr *note;
+char *buf;
+int descsz, note_size, name_size = 5;
+const char *name = "CORE";
+int ret;
+
+regs.r15 = env->regs[15];
+regs.r14 = env->regs[14];
+regs.r13 = env->regs[13];
+regs.r12 = env->regs[12];
+regs.r11 = env->regs[11];
+regs.r10 = env->regs[10];
+regs.r9  = env->regs[9];
+regs.r8  = env->regs[8];
+regs.rbp = env->regs[R_EBP];
+regs.rsp = env->regs[R_ESP];
+regs.rdi = env->regs[R_EDI];
+regs.rsi = env->regs[R_ESI];
+regs.rdx = env->regs[R_EDX];
+regs.rcx = env->regs[R_ECX];
+regs.rbx = env->regs[R_EBX];
+regs.rax = env->regs[R_EAX];
+regs.rip = env->eip;
+regs.eflags = env->eflags;
+
+regs.orig_rax = 0; /* FIXME */
+regs.cs = env->segs[R_CS].selector;
+regs.ss = env->segs[R_SS].selector;
+regs.fs_base = env->segs[R_FS].base;
+regs.gs_base = env->segs[R_GS].base;
+regs.ds = env->segs[R_DS].selector;
+regs.es = env->segs[R_ES].selector;
+regs.fs = env->segs[R_FS].selector;
+regs.gs = env->segs[R_GS].selector;
+
+descsz = 336; /* sizeof(prstatus_t) is 336 on x86_64 box */
+note_size = ((sizeof(Elf64_Nhdr) + 3) / 4 + (name_size + 3) / 4 +
+(descsz + 3) / 4) * 4;
+note = g_malloc(note_size);
+
+memset(note, 0, note_size);
+note->n_namesz = cpu_to_le32(name_size);
+note->n_descsz = cpu_to_le32(descsz);
+note->n_type = cpu_to_le32(NT_PRSTATUS);
+buf = (char *)note;
+buf += ((sizeo

[Qemu-devel] [RFC][PATCH 04/14 v7] Add API to get memory mapping

2012-02-29 Thread Wen Congyang

Add API to get all virtual address and physical address mapping.
If there is no virtual address for some physical address, the virtual
address is 0.

Signed-off-by: Wen Congyang 
---
 memory_mapping.c |   71 ++
 memory_mapping.h |8 ++
 2 files changed, 79 insertions(+), 0 deletions(-)

diff --git a/memory_mapping.c b/memory_mapping.c
index 84fb2c8..3743805 100644
--- a/memory_mapping.c
+++ b/memory_mapping.c
@@ -132,3 +132,74 @@ void memory_mapping_list_init(MemoryMappingList *list)
 list->last_mapping = NULL;
 QTAILQ_INIT(&list->head);
 }
+
+int qemu_get_guest_memory_mapping(MemoryMappingList *list)
+{
+CPUState *env;
+MemoryMapping *memory_mapping;
+RAMBlock *block;
+ram_addr_t offset, length;
+int ret;
+
+#if defined(CONFIG_HAVE_GET_MEMORY_MAPPING)
+for (env = first_cpu; env != NULL; env = env->next_cpu) {
+ret = cpu_get_memory_mapping(list, env);
+if (ret < 0) {
+return -1;
+}
+}
+#else
+return -2;
+#endif
+
+/* some memory may be not mapped, add them into memory mapping's list */
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+offset = block->offset;
+length = block->length;
+
+QTAILQ_FOREACH(memory_mapping, &list->head, next) {
+if (memory_mapping->phys_addr >= (offset + length)) {
+/*
+ * memory_mapping's list does not conatin the region
+ * [offset, offset+length)
+ */
+create_new_memory_mapping(list, offset, 0, length);
+length = 0;
+break;
+}
+
+if ((memory_mapping->phys_addr + memory_mapping->length) <=
+offset) {
+continue;
+}
+
+if (memory_mapping->phys_addr > offset) {
+/*
+ * memory_mapping's list does not conatin the region
+ * [offset, memory_mapping->phys_addr)
+ */
+create_new_memory_mapping(list, offset, 0,
+  memory_mapping->phys_addr - offset);
+}
+
+if ((offset + length) <=
+(memory_mapping->phys_addr + memory_mapping->length)) {
+length = 0;
+break;
+}
+length -= memory_mapping->phys_addr + memory_mapping->length -
+  offset;
+offset = memory_mapping->phys_addr + memory_mapping->length;
+}
+
+if (length > 0) {
+/*
+ * memory_mapping's list does not conatin the region
+ * [offset, memory_mapping->phys_addr)
+ */
+create_new_memory_mapping(list, offset, 0, length);
+}
+}
+
+return 0;
+}
diff --git a/memory_mapping.h b/memory_mapping.h
index 633fcb9..5760d1d 100644
--- a/memory_mapping.h
+++ b/memory_mapping.h
@@ -41,4 +41,12 @@ void memory_mapping_list_add_sorted(MemoryMappingList *list,
 void memory_mapping_list_free(MemoryMappingList *list);
 void memory_mapping_list_init(MemoryMappingList *list);
 
+/*
+ * Return value:
+ *0: success
+ *   -1: failed
+ *   -2: unsupported
+ */
+int qemu_get_guest_memory_mapping(MemoryMappingList *list);
+
 #endif
-- 
1.7.1

[Qemu-devel] [RFC][PATCH 03/14 v7] target-i386: implement cpu_get_memory_mapping()

2012-02-29 Thread Wen Congyang

Walk cpu's page table and collect all virtual address and physical address 
mapping.
Then, add these mapping into memory mapping list.

Signed-off-by: Wen Congyang 
---
 Makefile.target   |1 +
 configure |4 +
 cpu-all.h |   10 ++
 target-i386/arch_memory_mapping.c |  256 +
 4 files changed, 271 insertions(+), 0 deletions(-)
 create mode 100644 target-i386/arch_memory_mapping.c

diff --git a/Makefile.target b/Makefile.target
index 9227e4e..a87e678 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -84,6 +84,7 @@ libobj-y += op_helper.o helper.o
 ifeq ($(TARGET_BASE_ARCH), i386)
 libobj-y += cpuid.o
 endif
+libobj-$(CONFIG_HAVE_GET_MEMORY_MAPPING) += arch_memory_mapping.o
 libobj-$(TARGET_SPARC64) += vis_helper.o
 libobj-$(CONFIG_NEED_MMU) += mmu.o
 libobj-$(TARGET_ARM) += neon_helper.o iwmmxt_helper.o
diff --git a/configure b/configure
index f9d5330..ddc54f5 100755
--- a/configure
+++ b/configure
@@ -3630,6 +3630,10 @@ case "$target_arch2" in
   fi
 fi
 esac
+case "$target_arch2" in
+  i386|x86_64)
+echo "CONFIG_HAVE_GET_MEMORY_MAPPING=y" >> $config_target_mak
+esac
 if test "$target_arch2" = "ppc64" -a "$fdt" = "yes"; then
   echo "CONFIG_PSERIES=y" >> $config_target_mak
 fi
diff --git a/cpu-all.h b/cpu-all.h
index e2c3c49..cb72680 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -22,6 +22,7 @@
 #include "qemu-common.h"
 #include "qemu-tls.h"
 #include "cpu-common.h"
+#include "memory_mapping.h"
 
 /* some important defines:
  *
@@ -523,4 +524,13 @@ void dump_exec_info(FILE *f, fprintf_function cpu_fprintf);
 int cpu_memory_rw_debug(CPUState *env, target_ulong addr,
 uint8_t *buf, int len, int is_write);
 
+#if defined(CONFIG_HAVE_GET_MEMORY_MAPPING)
+int cpu_get_memory_mapping(MemoryMappingList *list, CPUState *env);
+#else
+static inline int cpu_get_memory_mapping(MemoryMappingList *list, CPUState 
*env)
+{
+return -1;
+}
+#endif
+
 #endif /* CPU_ALL_H */
diff --git a/target-i386/arch_memory_mapping.c 
b/target-i386/arch_memory_mapping.c
new file mode 100644
index 000..8dcc010
--- /dev/null
+++ b/target-i386/arch_memory_mapping.c
@@ -0,0 +1,256 @@
+/*
+ * i386 memory mapping
+ *
+ * Copyright Fujitsu, Corp. 2011
+ *
+ * Authors:
+ * Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2.  See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "cpu.h"
+#include "cpu-all.h"
+
+/* PAE Paging or IA-32e Paging */
+static void walk_pte(MemoryMappingList *list, target_phys_addr_t 
pte_start_addr,
+ int32_t a20_mask, target_ulong start_line_addr)
+{
+target_phys_addr_t pte_addr, start_paddr;
+uint64_t pte;
+target_ulong start_vaddr;
+int i;
+
+for (i = 0; i < 512; i++) {
+pte_addr = (pte_start_addr + i * 8) & a20_mask;
+pte = ldq_phys(pte_addr);
+if (!(pte & PG_PRESENT_MASK)) {
+/* not present */
+continue;
+}
+
+start_paddr = (pte & ~0xfff) & ~(0x1ULL << 63);
+if (cpu_physical_memory_is_io(start_paddr)) {
+/* I/O region */
+continue;
+}
+
+start_vaddr = start_line_addr | ((i & 0x1fff) << 12);
+memory_mapping_list_add_sorted(list, start_paddr, start_vaddr, 1 << 
12);
+}
+}
+
+/* 32-bit Paging */
+static void walk_pte2(MemoryMappingList *list,
+  target_phys_addr_t pte_start_addr, int32_t a20_mask,
+  target_ulong start_line_addr)
+{
+target_phys_addr_t pte_addr, start_paddr;
+uint32_t pte;
+target_ulong start_vaddr;
+int i;
+
+for (i = 0; i < 1024; i++) {
+pte_addr = (pte_start_addr + i * 4) & a20_mask;
+pte = ldl_phys(pte_addr);
+if (!(pte & PG_PRESENT_MASK)) {
+/* not present */
+continue;
+}
+
+start_paddr = pte & ~0xfff;
+if (cpu_physical_memory_is_io(start_paddr)) {
+/* I/O region */
+continue;
+}
+
+start_vaddr = start_line_addr | ((i & 0x3ff) << 12);
+memory_mapping_list_add_sorted(list, start_paddr, start_vaddr, 1 << 
12);
+}
+}
+
+/* PAE Paging or IA-32e Paging */
+static void walk_pde(MemoryMappingList *list, target_phys_addr_t 
pde_start_addr,
+ int32_t a20_mask, target_ulong start_line_addr)
+{
+target_phys_addr_t pde_addr, pte_start_addr, start_paddr;
+uint64_t pde;
+target_ulong line_addr, start_vaddr;
+int i;
+
+for (i = 0; i < 512; i++) {
+pde_addr = (pde_start_addr + i * 8) & a20_mask;
+pde = ldq_phys(pde_addr);
+if (!(pde & PG_PRESENT_MASK)) {
+/* not present */
+continue;
+}
+
+line_addr = start_line_addr | ((i & 0x1ff) << 21);
+if (pde & PG_PSE_MASK) {
+/* 2 MB page */
+start_paddr = (pde & ~0x1f) & ~(0x1ULL << 63);

[Qemu-devel] [RFC][PATCH 02/14 v7] Add API to check whether a physical address is I/O address

2012-02-29 Thread Wen Congyang

This API will be used in the following patch.

Signed-off-by: Wen Congyang 
---
 cpu-common.h |2 ++
 exec.c   |   11 +++
 2 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/cpu-common.h b/cpu-common.h
index a40c57d..fde3e5d 100644
--- a/cpu-common.h
+++ b/cpu-common.h
@@ -71,6 +71,8 @@ void cpu_physical_memory_unmap(void *buffer, 
target_phys_addr_t len,
 void *cpu_register_map_client(void *opaque, void (*callback)(void *opaque));
 void cpu_unregister_map_client(void *cookie);
 
+bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr);
+
 /* Coalesced MMIO regions are areas where write operations can be reordered.
  * This usually implies that write operations are side-effect free.  This 
allows
  * batching which can make a major impact on performance when using
diff --git a/exec.c b/exec.c
index b81677a..2114dd5 100644
--- a/exec.c
+++ b/exec.c
@@ -4435,3 +4435,14 @@ bool virtio_is_big_endian(void)
 #undef env
 
 #endif
+
+bool cpu_physical_memory_is_io(target_phys_addr_t phys_addr)
+{
+ram_addr_t pd;
+PhysPageDesc p;
+
+p = phys_page_find(phys_addr >> TARGET_PAGE_BITS);
+pd = p.phys_offset;
+
+return !is_ram_rom_romd(pd);
+}
-- 
1.7.1

[Qemu-devel] [RFC][PATCH 01/14 v7] Add API to create memory mapping list

2012-02-29 Thread Wen Congyang

The memory mapping list stores virtual address and physical address mapping.
The folloing patch will use this information to create PT_LOAD in the vmcore.

Signed-off-by: Wen Congyang 

---
 Makefile.target  |1 +
 memory_mapping.c |  134 ++
 memory_mapping.h |   44 ++
 3 files changed, 179 insertions(+), 0 deletions(-)
 create mode 100644 memory_mapping.c
 create mode 100644 memory_mapping.h

diff --git a/Makefile.target b/Makefile.target
index 68a5641..9227e4e 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -208,6 +208,7 @@ obj-$(CONFIG_KVM) += kvm.o kvm-all.o
 obj-$(CONFIG_NO_KVM) += kvm-stub.o
 obj-$(CONFIG_VGA) += vga.o
 obj-y += memory.o savevm.o
+obj-y += memory_mapping.o
 LIBS+=-lz
 
 obj-i386-$(CONFIG_KVM) += hyperv.o
diff --git a/memory_mapping.c b/memory_mapping.c
new file mode 100644
index 000..84fb2c8
--- /dev/null
+++ b/memory_mapping.c
@@ -0,0 +1,134 @@
+/*
+ * QEMU memory mapping
+ *
+ * Copyright Fujitsu, Corp. 2011, 2012
+ *
+ * Authors:
+ * Wen Congyang 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ *
+ */
+
+#include "cpu.h"
+#include "cpu-all.h"
+#include "memory_mapping.h"
+
+static void memory_mapping_list_add_mapping_sorted(MemoryMappingList *list,
+   MemoryMapping *mapping)
+{
+MemoryMapping *p;
+
+QTAILQ_FOREACH(p, &list->head, next) {
+if (p->phys_addr >= mapping->phys_addr) {
+QTAILQ_INSERT_BEFORE(p, mapping, next);
+return;
+}
+}
+QTAILQ_INSERT_TAIL(&list->head, mapping, next);
+}
+
+static void create_new_memory_mapping(MemoryMappingList *list,
+  target_phys_addr_t phys_addr,
+  target_phys_addr_t virt_addr,
+  ram_addr_t length)
+{
+MemoryMapping *memory_mapping;
+
+memory_mapping = g_malloc(sizeof(MemoryMapping));
+memory_mapping->phys_addr = phys_addr;
+memory_mapping->virt_addr = virt_addr;
+memory_mapping->length = length;
+list->last_mapping = memory_mapping;
+list->num++;
+memory_mapping_list_add_mapping_sorted(list, memory_mapping);
+}
+
+void memory_mapping_list_add_sorted(MemoryMappingList *list,
+target_phys_addr_t phys_addr,
+target_phys_addr_t virt_addr,
+ram_addr_t length)
+{
+MemoryMapping *memory_mapping, *last_mapping;
+
+if (QTAILQ_EMPTY(&list->head)) {
+create_new_memory_mapping(list, phys_addr, virt_addr, length);
+return;
+}
+
+last_mapping = list->last_mapping;
+if (last_mapping) {
+if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
+(virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
+last_mapping->length += length;
+return;
+}
+}
+
+QTAILQ_FOREACH(memory_mapping, &list->head, next) {
+last_mapping = memory_mapping;
+if ((phys_addr == (last_mapping->phys_addr + last_mapping->length)) &&
+(virt_addr == (last_mapping->virt_addr + last_mapping->length))) {
+last_mapping->length += length;
+list->last_mapping = last_mapping;
+return;
+}
+
+if (phys_addr + length < last_mapping->phys_addr) {
+/* create a new region before last_mapping */
+break;
+}
+
+if (phys_addr >= (last_mapping->phys_addr + last_mapping->length)) {
+/* last_mapping does not contain this region */
+continue;
+}
+
+if ((virt_addr - last_mapping->virt_addr) !=
+(phys_addr - last_mapping->phys_addr)) {
+/*
+ * last_mapping contains this region, but we cannot merge this
+ * region into last_mapping. Try the next memory mapping.
+ */
+continue;
+}
+
+/* merge this region into last_mapping */
+if (virt_addr < last_mapping->virt_addr) {
+last_mapping->length += last_mapping->virt_addr - virt_addr;
+last_mapping->virt_addr = virt_addr;
+}
+
+if ((virt_addr + length) >
+(last_mapping->virt_addr + last_mapping->length)) {
+last_mapping->length = virt_addr + length - 
last_mapping->virt_addr;
+}
+
+list->last_mapping = last_mapping;
+return;
+}
+
+/* this region can not be merged into any existed memory mapping. */
+create_new_memory_mapping(list, phys_addr, virt_addr, length);
+}
+
+void memory_mapping_list_free(MemoryMappingList *list)
+{
+MemoryMapping *p, *q;
+
+QTAILQ_FOREACH_SAFE(p, &list->head, next, q) {
+QTAILQ_REMOVE(&list->head, p, next);
+g_free(p);
+}
+
+list->num = 0

[Qemu-devel] [RFC][PATCH 00/14 v7] introducing a new, dedicated memory dump mechanism

2012-02-29 Thread Wen Congyang

Hi, all

'virsh dump' can not work when host pci device is used by guest. We have
discussed this issue here:
http://lists.nongnu.org/archive/html/qemu-devel/2011-10/msg00736.html

The last version is here:
http://lists.nongnu.org/archive/html/qemu-devel/2012-02/msg01007.html

We have determined to introduce a new command dump to dump memory. The core
file's format can be elf.

Note:
1. The guest should be x86 or x86_64. The other arch is not supported now.
2. If you use old gdb, gdb may crash. I use gdb-7.3.1, and it does not crash.
3. If the OS is in the second kernel, gdb may not work well, and crash can
   work by specifying '--machdep phys_addr=xxx' in the command line. The
   reason is that the second kernel will update the page table, and we can
   not get the page table for the first kernel.
4. The cpu's state is stored in QEMU note. You neet to modify crash to use
   it to calculate phys_base.
5. If the guest OS is 32 bit and the memory size is larger than 4G, the vmcore
   is elf64 format. You should use the gdb which is built with 
--enable-64-bit-bfd.
6. This patchset is based on the upstream tree, and apply one patch that is 
still
   in Luiz Capitulino's tree, because I use the API qemu_get_fd() in this 
patchset.

Changes from v6 to v7:
1. addressed Jan's comments
2. fix some bugs
3. store cpu's state into the vmcore

Changes from v5 to v6:
1. allow user to dump a fraction of the memory
2. fix some bugs

Changes from v4 to v5:
1. convert the new command dump to QAPI 

Changes from v3 to v4:
1. support it to run asynchronously
2. add API to cancel dumping and query dumping progress
3. add API to control dumping speed
4. auto cancel dumping when the user resumes vm, and the status is failed.

Changes from v2 to v3:
1. address Jan Kiszka's comment

Changes from v1 to v2:
1. fix virt addr in the vmcore.

Wen Congyang (14):
  Add API to create memory mapping list
  Add API to check whether a physical address is I/O address
  target-i386: implement cpu_get_memory_mapping()
  Add API to get memory mapping
  target-i386: Add API to write elf notes to core file
  target-i386: Add API to write cpu status to core file
  target-i386: add API to get dump info
  make gdb_id() generally avialable
  introduce a new monitor command 'dump' to dump guest's memory
  support to cancel the current dumping
  support to query dumping status
  run dump at the background
  support detached dump
  allow user to dump a fraction of the memory

 Makefile.target   |3 +
 configure |8 +
 cpu-all.h |   60 +++
 cpu-common.h  |2 +
 dump.c|  904 +
 dump.h|   23 +
 exec.c|   11 +
 gdbstub.c |9 -
 gdbstub.h |9 +
 hmp-commands.hx   |   43 ++
 hmp.c |   87 
 hmp.h |3 +
 memory_mapping.c  |  232 ++
 memory_mapping.h  |   54 +++
 monitor.c |7 +
 qapi-schema.json  |   57 +++
 qmp-commands.hx   |  109 +
 target-i386/arch_dump.c   |  437 ++
 target-i386/arch_memory_mapping.c |  256 +++
 vl.c  |5 +-
 20 files changed, 2308 insertions(+), 11 deletions(-)
 create mode 100644 dump.c
 create mode 100644 dump.h
 create mode 100644 memory_mapping.c
 create mode 100644 memory_mapping.h
 create mode 100644 target-i386/arch_dump.c
 create mode 100644 target-i386/arch_memory_mapping.c

Re: [Qemu-devel] [PATCH] pc: make user-triggerable exit conditional to DEBUG_BIOS define

2012-02-29 Thread Anthony Liguori


On 02/29/2012 04:44 PM, Hervé Poussineau wrote:

The port 0x501 is (at least) used by SCO Xenix 2.3.4 installer.


For what?  What device would normally be there?

I don't want to disable this by default.  My regression suite depends on this as 
an exit mechanism.


Regards,

Anthony Liguori



Signed-off-by: Hervé Poussineau
---
  hw/pc.c |3 +++
  1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 12c02f2..113a38a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -565,7 +565,10 @@ static void bochs_bios_write(void *opaque, uint32_t addr, 
uint32_t val)
  /* LGPL'ed VGA BIOS messages */
  case 0x501:
  case 0x502:
+#ifdef DEBUG_BIOS
  exit((val<<  1) | 1);
+#endif
+break;
  case 0x500:
  case 0x503:
  #ifdef DEBUG_BIOS

Re: [Qemu-devel] buildbot failure in qemu on default_mingw32

2012-02-29 Thread Roy Tam

2012/3/1  :
> The Buildbot has detected a new failure on builder default_mingw32 while 
> building qemu.
> Full details are available at:
>  http://buildbot.b1-systems.de/qemu/builders/default_mingw32/builds/193
>
> Buildbot URL: http://buildbot.b1-systems.de/qemu/
>
> Buildslave for this Build: kraxel_rhel61
>
> Build Reason: The Nightly scheduler named 'nightly_default' triggered this 
> build
> Build Source Stamp: [branch master] HEAD
> Blamelist:
>
> BUILD FAILED: failed compile
>
> sincerely,
>  -The Buildbot
>

commit 3741715cf2e54727fe3d9884ea6dcea68c7f7d4b cause this.
CC: Jan Kiszka

Re: [Qemu-devel] Add support for new image type

2012-02-29 Thread Brian Jackson

On Wed, 29 Feb 2012 15:52:49 -0600, Kai Meyer   
wrote:


Is it possible to extend qemu to support a new image type? I have an  
image type that is ready for consumption and I'm looking for the  
integration point between qemu and the new image format.



The last new image format that was added to Qemu was probably qed.
You could look through the list archives to see the patches that added
that support. Alternatively look through the gitweb interface for the
commit. Part of it is at:

http://git.qemu.org/?p=qemu.git;a=commit;h=75411d236d93d79d8052e0116c3eeebe23e2778b





-Kai Meyer




--
Using Opera's revolutionary email client: http://www.opera.com/mail/

[Qemu-devel] buildbot failure in qemu on default_mingw32

2012-02-29 Thread qemu

The Buildbot has detected a new failure on builder default_mingw32 while 
building qemu.
Full details are available at:
 http://buildbot.b1-systems.de/qemu/builders/default_mingw32/builds/193

Buildbot URL: http://buildbot.b1-systems.de/qemu/

Buildslave for this Build: kraxel_rhel61

Build Reason: The Nightly scheduler named 'nightly_default' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

[Qemu-devel] buildbot failure in qemu on default_openbsd_current

2012-02-29 Thread qemu

The Buildbot has detected a new failure on builder default_openbsd_current 
while building qemu.
Full details are available at:
 http://buildbot.b1-systems.de/qemu/builders/default_openbsd_current/builds/190

Buildbot URL: http://buildbot.b1-systems.de/qemu/

Buildslave for this Build: brad_openbsd_current

Build Reason: The Nightly scheduler named 'nightly_default' triggered this build
Build Source Stamp: [branch master] HEAD
Blamelist: 

BUILD FAILED: failed compile

sincerely,
 -The Buildbot

[Qemu-devel] [PATCH] pc: make user-triggerable exit conditional to DEBUG_BIOS define

2012-02-29 Thread Hervé Poussineau

The port 0x501 is (at least) used by SCO Xenix 2.3.4 installer.

Signed-off-by: Hervé Poussineau 
---
 hw/pc.c |3 +++
 1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/hw/pc.c b/hw/pc.c
index 12c02f2..113a38a 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -565,7 +565,10 @@ static void bochs_bios_write(void *opaque, uint32_t addr, 
uint32_t val)
 /* LGPL'ed VGA BIOS messages */
 case 0x501:
 case 0x502:
+#ifdef DEBUG_BIOS
 exit((val << 1) | 1);
+#endif
+break;
 case 0x500:
 case 0x503:
 #ifdef DEBUG_BIOS
-- 
1.7.9

[Qemu-devel] Add support for new image type

2012-02-29 Thread Kai Meyer

Is it possible to extend qemu to support a new image type? I have an 
image type that is ready for consumption and I'm looking for the 
integration point between qemu and the new image format.


-Kai Meyer

Re: [Qemu-devel] [PATCH 0/4] slirp: Fix for requeuing crash, cleanups

2012-02-29 Thread Jan Kiszka

On 2012-02-29 22:48, Stefan Weil wrote:
> Am 29.02.2012 22:33, schrieb Jan Kiszka:
>> On 2012-02-29 22:00, Stefan Weil wrote:
>>> Am 29.02.2012 20:15, schrieb Jan Kiszka:
 This is an alternative, more complete approach to fix the requeuing-
 related crashes reported recently. See patch 2 for details. The rest
 are
 simple cleanups.

 Please check carefully if I messed something up.

>>>
>>> Hi Jan,
>>>
>>> here is the result of MIPS Malta with your patch series applied:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> 0x5577db5b in slirp_remque (a=0x56cff360) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
>>> 39 ((struct quehead *)(element->qh_rlink))->qh_link =
>>> element->qh_link;
>>> (gdb) i s
>>> #0 0x5577db5b in slirp_remque (a=0x56cff360) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
>>> #1 0x5577b7a2 in if_start (slirp=0x564bfb80) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:208
>>> #2 0x5577b607 in if_output (so=0x56ea0b70,
>>> ifm=0x56cff9e0) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:139
>>> #3 0x5577d040 in ip_output (so=0x56ea0b70,
>>> m0=0x56cff9e0) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/ip_output.c:84
>>> #4 0x557865d6 in tcp_output (tp=0x56ea0c20) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/tcp_output.c:456
>>> #5 0x5577ff5a in slirp_select_poll (readfds=0x7fffda10,
>>> writefds=0x7fffda90, xfds=0x7fffdb10, select_error=0)
>>> at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/slirp.c:480
>>> #6 0x5572d8c0 in main_loop_wait (nonblocking=0) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/main-loop.c:469
>>> #7 0x55721a61 in main_loop () at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:1558
>>> #8 0x557284a2 in main (argc=25, argv=0x7fffdfe8,
>>> envp=0x7fffe0b8) at
>>> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:3667
>>> (gdb) p element
>>> $1 = (struct quehead *) 0x56cff360
>>> (gdb) p *element
>>> $2 = {qh_link = 0x56cff360, qh_rlink = 0x0}
>>> (gdb) p (struct quehead *)(element->qh_rlink)
>>> $3 = (struct quehead *) 0x0
>>
>> Hmm. Two options:
>>
>> - you try to debug what happens to that mbuf, why its queue anchors
>> get corrupted (maybe while in if_encap?)
>> - you tell me how to reproduce it (image file, host characteristics)
>>
>> Jan
> 
> I'm afraid that the first variant won't happen this or next week
> because lack of time.
> 
> This is my test environment:
> 
> Debian Squeeze x86_64 host, Debian Squeeze mips guest.
> 
> I use NFS root, and the latest crash happened during boot.
> All other crashes happened after the guest had booted
> when I startet apt-get update, so maybe booting from a
> Debian CDROM might also reproduce the crash.
> 
> I compiled QEMU with a default configuration, but used
> CFLAGS=-g (no optimization) and startet QEMU like this:
> 
> gdb --args
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/bin/debug/x86/mips-softmmu/qemu-system-mips
> --kernel /tftpboot/malta/boot/vmlinux-2.6.26-2-4kc-malta --initrd
> /tftpboot/malta/boot/initrd.img-2.6.26-2-4kc-malta --append "debug
> nohz=off root=/dev/nfs rw ip=malta::dhcp
> nfsroot=10.0.2.2:/tftpboot/malta -bootp abc -tftp /tftpboot/malta" -M
> malta --cpu 4KEc -m 256 --net nic,model=pcnet --net user,hostname=malta
> --redir tcp:5800::5800 --redir tcp:5900::5900 --redir tcp:10022::22
> --redir tcp:10080::80
> 
> Kernel and initrd are from Debian Squeeze (mips).

OK, thanks.

Here is a last shot (on top of my queue) before I try to reproduce:

diff --git a/slirp/if.c b/slirp/if.c
index 90bf398..d3bdf58 100644
--- a/slirp/if.c
+++ b/slirp/if.c
@@ -181,13 +181,12 @@ void if_start(Slirp *slirp)
 from_batchq = from_batchq_next;
 
 ifm_next = ifm->ifq_next;
-if (!from_batchq) {
-if (ifm_next == &slirp->if_fastq) {
-/* No more packets in fastq, switch to batchq */
-ifm_next = slirp->next_m;
-from_batchq_next = true;
-}
-} else if (ifm_next == &slirp->if_batchq) {
+if (ifm_next == &slirp->if_fastq) {
+/* No more packets in fastq, switch to batchq */
+ifm_next = slirp->next_m;
+from_batchq_next = true;
+}
+if (ifm_next == &slirp->if_batchq) {
 /* end of batchq */
 ifm_next = NULL;
 }

> 
> I had no slirp problems with that test environment during the last two
> years.

Yes, these regression here are unfortunate. Hope we can resolve them
quickly.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 0/4] slirp: Fix for requeuing crash, cleanups

2012-02-29 Thread Stefan Weil


Am 29.02.2012 22:33, schrieb Jan Kiszka:

On 2012-02-29 22:00, Stefan Weil wrote:

Am 29.02.2012 20:15, schrieb Jan Kiszka:

This is an alternative, more complete approach to fix the requeuing-
related crashes reported recently. See patch 2 for details. The rest are
simple cleanups.

Please check carefully if I messed something up.



Hi Jan,

here is the result of MIPS Malta with your patch series applied:

Program received signal SIGSEGV, Segmentation fault.
0x5577db5b in slirp_remque (a=0x56cff360) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
39 ((struct quehead *)(element->qh_rlink))->qh_link =
element->qh_link;
(gdb) i s
#0 0x5577db5b in slirp_remque (a=0x56cff360) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
#1 0x5577b7a2 in if_start (slirp=0x564bfb80) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:208
#2 0x5577b607 in if_output (so=0x56ea0b70,
ifm=0x56cff9e0) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:139
#3 0x5577d040 in ip_output (so=0x56ea0b70,
m0=0x56cff9e0) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/ip_output.c:84
#4 0x557865d6 in tcp_output (tp=0x56ea0c20) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/tcp_output.c:456
#5 0x5577ff5a in slirp_select_poll (readfds=0x7fffda10,
writefds=0x7fffda90, xfds=0x7fffdb10, select_error=0)
at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/slirp.c:480
#6 0x5572d8c0 in main_loop_wait (nonblocking=0) at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/main-loop.c:469
#7 0x55721a61 in main_loop () at
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:1558
#8 0x557284a2 in main (argc=25, argv=0x7fffdfe8,
envp=0x7fffe0b8) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:3667

(gdb) p element
$1 = (struct quehead *) 0x56cff360
(gdb) p *element
$2 = {qh_link = 0x56cff360, qh_rlink = 0x0}
(gdb) p (struct quehead *)(element->qh_rlink)
$3 = (struct quehead *) 0x0


Hmm. Two options:

- you try to debug what happens to that mbuf, why its queue anchors
get corrupted (maybe while in if_encap?)
- you tell me how to reproduce it (image file, host characteristics)

Jan


I'm afraid that the first variant won't happen this or next week
because lack of time.

This is my test environment:

Debian Squeeze x86_64 host, Debian Squeeze mips guest.

I use NFS root, and the latest crash happened during boot.
All other crashes happened after the guest had booted
when I startet apt-get update, so maybe booting from a
Debian CDROM might also reproduce the crash.

I compiled QEMU with a default configuration, but used
CFLAGS=-g (no optimization) and startet QEMU like this:

gdb --args 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/bin/debug/x86/mips-softmmu/qemu-system-mips 
--kernel /tftpboot/malta/boot/vmlinux-2.6.26-2-4kc-malta --initrd 
/tftpboot/malta/boot/initrd.img-2.6.26-2-4kc-malta --append "debug 
nohz=off root=/dev/nfs rw ip=malta::dhcp 
nfsroot=10.0.2.2:/tftpboot/malta -bootp abc -tftp /tftpboot/malta" -M 
malta --cpu 4KEc -m 256 --net nic,model=pcnet --net user,hostname=malta 
--redir tcp:5800::5800 --redir tcp:5900::5900 --redir tcp:10022::22 
--redir tcp:10080::80


Kernel and initrd are from Debian Squeeze (mips).

I had no slirp problems with that test environment during the last two 
years.


Regards,

Stefan W.

Re: [Qemu-devel] [PATCH 0/4] slirp: Fix for requeuing crash, cleanups

2012-02-29 Thread Jan Kiszka

On 2012-02-29 22:00, Stefan Weil wrote:
> Am 29.02.2012 20:15, schrieb Jan Kiszka:
>> This is an alternative, more complete approach to fix the requeuing-
>> related crashes reported recently. See patch 2 for details. The rest are
>> simple cleanups.
>>
>> Please check carefully if I messed something up.
>>
> 
> Hi Jan,
> 
> here is the result of MIPS Malta with your patch series applied:
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x5577db5b in slirp_remque (a=0x56cff360) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
> 39((struct quehead *)(element->qh_rlink))->qh_link =
> element->qh_link;
> (gdb) i s
> #0  0x5577db5b in slirp_remque (a=0x56cff360) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
> #1  0x5577b7a2 in if_start (slirp=0x564bfb80) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:208
> #2  0x5577b607 in if_output (so=0x56ea0b70,
> ifm=0x56cff9e0) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:139
> #3  0x5577d040 in ip_output (so=0x56ea0b70,
> m0=0x56cff9e0) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/ip_output.c:84
> #4  0x557865d6 in tcp_output (tp=0x56ea0c20) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/tcp_output.c:456
> #5  0x5577ff5a in slirp_select_poll (readfds=0x7fffda10,
> writefds=0x7fffda90, xfds=0x7fffdb10, select_error=0)
> at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/slirp.c:480
> #6  0x5572d8c0 in main_loop_wait (nonblocking=0) at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/main-loop.c:469
> #7  0x55721a61 in main_loop () at
> /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:1558
> #8  0x557284a2 in main (argc=25, argv=0x7fffdfe8,
> envp=0x7fffe0b8) at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:3667
> (gdb) p element
> $1 = (struct quehead *) 0x56cff360
> (gdb) p *element
> $2 = {qh_link = 0x56cff360, qh_rlink = 0x0}
> (gdb) p (struct quehead *)(element->qh_rlink)
> $3 = (struct quehead *) 0x0

Hmm. Two options:

 - you try to debug what happens to that mbuf, why its queue anchors
   get corrupted (maybe while in if_encap?)
 - you tell me how to reproduce it (image file, host characteristics)

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] QEMU PEX HW device

2012-02-29 Thread Natalia Portillo

Hi,

El 23/02/2012, a las 15:44, Shlomo Pongratz escribió:

> Hi,
> 
> I want to add a new PEX HW device emulation to QEMU, but I can't find a 
> skeleton/template driver or documentation that explains how to do it.
Great! (What's a PEX HW device?)

> Are there any guidelines for this task?

Start out from an existing emulated device, from the same bus if possible (PCI, 
ISA, USB, so on), delete all code you don't need, implement your emulation.
Once you start it's pretty straightforward, function names are almost 
self-explanatory of what they do.

Regards,
Natalia Portillo

Re: [Qemu-devel] [PULL] usb patch queue

2012-02-29 Thread Anthony Liguori


On 02/28/2012 04:20 AM, Gerd Hoffmann wrote:

   Hi,

Next batch of usb updates.  This one brings packet queuing for uhci and
xhci, so we have per-endpoint queues at usb-bus level now.  Need to
bring those to the usb drivers as next step, so they (especially
usb-host) can pipeline requests.

Also a bunch of bugfixes in ehci, smartcard emulation and usb redirect.


Regards,

Anthony Liguori



cheers,
   Gerd

The following changes since commit b4bd0b168e9f4898b98308f4a8a089f647a86d16:

   audio: Add some fall through comments (2012-02-25 18:16:11 +0400)

are available in the git repository at:
   git://git.kraxel.org/qemu usb.39

Alon Levy (4):
   usb-desc: fix user trigerrable segfaults (!config)
   libcacard: link with glib for g_strndup
   usb-ccid: advertise SELF_POWERED
   libcacard: fix reported ATR length

Gerd Hoffmann (10):
   usb-hid: fix tablet activation
   usb-ehci: fix reset
   usb-uhci: cleanup UHCIAsync allocation&  initialization.
   usb-uhci: add UHCIQueue
   usb-uhci: process uhci_handle_td return code via switch.
   usb-uhci: implement packet queuing
   usb-xhci: enable packet queuing
   usb: add tracepoint for usb packet state changes.
   usb-ehci: sanity-check iso xfers
   ehci: drop old stuff

Hans de Goede (6):
   usb-ehci: Handle ISO packets failing with an error other then NAK
   usb-redir: Fix printing of device version
   usb-redir: Always clear device state on filter reject
   usb-redir: Let the usb-host know about our device filtering
   usb-redir: Limit return values returned by iso packets
   usb-redir: Return USB_RET_NAK when we've no data for an interrupt 
endpoint

Jan Kiszka (1):
   usb: Resolve warnings about unassigned bus on usb device creation

  configure  |6 +-
  hw/usb-bt.c|4 +-
  hw/usb-bus.c   |   18 +---
  hw/usb-ccid.c  |2 +-
  hw/usb-desc.c  |   20 +++-
  hw/usb-ehci.c  |   71 ++---
  hw/usb-hid.c   |3 +
  hw/usb-msd.c   |4 +-
  hw/usb-net.c   |4 +-
  hw/usb-serial.c|8 +-
  hw/usb-uhci.c  |  314 +++-
  hw/usb-xhci.c  |6 -
  hw/usb.c   |   27 +
  hw/usb.h   |7 +-
  libcacard/vcardt.h |4 +-
  trace-events   |3 +
  usb-bsd.c  |4 +-
  usb-linux.c|4 +-
  usb-redir.c|   46 ++--
  vl.c   |7 +-
  20 files changed, 317 insertions(+), 245 deletions(-)

Re: [Qemu-devel] [PULL] spice patch queue

2012-02-29 Thread Anthony Liguori


On 02/28/2012 10:29 AM, Gerd Hoffmann wrote:

   Hi,

Here comes the spice patch queue.  For the most part this brings the
async local rendering (for vnc, sdl and screenshots) work done by Alon.
Also a few bug fixes.

please pull,
   Gerd


Pulled.  Thanks.

Regards,

Anthony Liguori


The following changes since commit b4bd0b168e9f4898b98308f4a8a089f647a86d16:

   audio: Add some fall through comments (2012-02-25 18:16:11 +0400)

are available in the git repository at:
   git://anongit.freedesktop.org/spice/qemu spice.v49

Alon Levy (7):
   qxl: fix spice+sdl no cursor regression
   sdl: remove NULL check, g_malloc0 can't fail
   qxl: drop qxl_spice_update_area_async definition
   qxl: require spice>= 0.8.2
   qxl: remove flipped
   qxl: introduce QXLCookie
   qxl: make qxl_render_update async

Christophe Fergeau (2):
   spice: use error_report to report errors
   Error out when tls-channel option is used without TLS

Gerd Hoffmann (2):
   qxl: add optinal 64bit vram bar
   qxl: properly handle upright and non-shared surfaces

  configure  |2 +-
  hw/qxl-render.c|  170 ++
  hw/qxl.c   |  215 +---
  hw/qxl.h   |   31 +---
  ui/sdl.c   |4 -
  ui/spice-core.c|   47 +---
  ui/spice-display.c |   57 --
  ui/spice-display.h |   21 +
  8 files changed, 353 insertions(+), 194 deletions(-)

Re: [Qemu-devel] [PULL 00/27] Block patches

2012-02-29 Thread Anthony Liguori


On 02/29/2012 09:17 AM, Kevin Wolf wrote:

The following changes since commit b55c952aea6de024bf1a06357b49367fba045443:

   Merge remote-tracking branch 'aneesh/for-upstream' into staging (2012-02-27 
11:19:27 -0600)

are available in the git repository at:

   git://repo.or.cz/qemu/kevin.git for-anthony


Pulled.  Thanks.

Regards,

Anthony Liguori



Hervé Poussineau (10):
   fdc: take side count into account
   fdc: set busy bit when starting a command
   fdc: most control commands do not generate interrupts
   fdc: handle read-only floppies (abort early on write commands)
   fdc: add CCR (Configuration Control Register) write register
   block: add a transfer rate for floppy types
   fdc: add a 'check media rate' property. Not used yet
   fdc: check if media rate is correct before doing any transfer
   fdc: fix seek command, which shouldn't check tracks
   fdc: DIR (Digital Input Register) should return status of current 
drive...

Jeff Cody (2):
   qapi: Introduce blockdev-group-snapshot-sync command
   QMP: Add qmp command for blockdev-group-snapshot-sync

Kevin Wolf (6):
   qcow2: Fix build with DEBUG_EXT enabled
   qcow2: Fix offset in qcow2_read_extensions
   qcow2: Reject too large header extensions
   qemu-iotests: Filter out DOS line endings
   qemu-iotests: 026: Reduce output changes for cache=none qcow2
   qemu-iotests: Test rebase with short backing file

Paolo Bonzini (3):
   ide: fail I/O to empty disk
   block: remove unused fields in BlockDriverState
   block: drop aio_multiwrite in BlockDriver

Stefan Hajnoczi (4):
   qemu-iotests: export TEST_DIR for non-bash tests
   qemu-iotests: add iotests Python module
   test: add image streaming tests
   qemu-tool: revert cpu_get_clock() abort(3)

Zhi Yong Wu (2):
   qemu-io: fix segment fault when the image format is qed
   qemu-img: fix segment fault when the image format is qed

  block.c  |  178 ++
  block.h  |   11 ++-
  block/qcow2.c|   11 ++-
  block_int.h  |   17 ++---
  blockdev.c   |  131 
  hw/fdc.c |  142 --
  hw/ide/core.c|   24 -
  hw/pc.c  |3 +-
  hw/pc_piix.c |   28 ++
  qapi-schema.json |   38 
  qemu-img.c   |2 +
  qemu-io.c|2 +
  qemu-tool.c  |2 +-
  qmp-commands.hx  |   39 
  tests/qemu-iotests/026   |6 ++
  tests/qemu-iotests/028   |5 +
  tests/qemu-iotests/028.out   |1 +
  tests/qemu-iotests/030   |  151 
  tests/qemu-iotests/030.out   |5 +
  tests/qemu-iotests/check |4 +-
  tests/qemu-iotests/common.config |2 +
  tests/qemu-iotests/common.filter |8 ++-
  tests/qemu-iotests/group |1 +
  tests/qemu-iotests/iotests.py|  164 +++
  24 files changed, 868 insertions(+), 107 deletions(-)
  create mode 100755 tests/qemu-iotests/030
  create mode 100644 tests/qemu-iotests/030.out
  create mode 100644 tests/qemu-iotests/iotests.py

Re: [Qemu-devel] [PATCH 0/4] slirp: Fix for requeuing crash, cleanups

2012-02-29 Thread Stefan Weil


Am 29.02.2012 20:15, schrieb Jan Kiszka:

This is an alternative, more complete approach to fix the requeuing-
related crashes reported recently. See patch 2 for details. The rest are
simple cleanups.

Please check carefully if I messed something up.



Hi Jan,

here is the result of MIPS Malta with your patch series applied:

Program received signal SIGSEGV, Segmentation fault.
0x5577db5b in slirp_remque (a=0x56cff360) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
39((struct quehead *)(element->qh_rlink))->qh_link = 
element->qh_link;

(gdb) i s
#0  0x5577db5b in slirp_remque (a=0x56cff360) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/misc.c:39
#1  0x5577b7a2 in if_start (slirp=0x564bfb80) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:208
#2  0x5577b607 in if_output (so=0x56ea0b70, 
ifm=0x56cff9e0) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/if.c:139
#3  0x5577d040 in ip_output (so=0x56ea0b70, 
m0=0x56cff9e0) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/ip_output.c:84
#4  0x557865d6 in tcp_output (tp=0x56ea0c20) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/tcp_output.c:456
#5  0x5577ff5a in slirp_select_poll (readfds=0x7fffda10, 
writefds=0x7fffda90, xfds=0x7fffdb10, select_error=0)

at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/slirp/slirp.c:480
#6  0x5572d8c0 in main_loop_wait (nonblocking=0) at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/main-loop.c:469
#7  0x55721a61 in main_loop () at 
/home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:1558
#8  0x557284a2 in main (argc=25, argv=0x7fffdfe8, 
envp=0x7fffe0b8) at /home/stefan/src/qemu/repo.or.cz/qemu/ar7/vl.c:3667

(gdb) p element
$1 = (struct quehead *) 0x56cff360
(gdb) p *element
$2 = {qh_link = 0x56cff360, qh_rlink = 0x0}
(gdb) p (struct quehead *)(element->qh_rlink)
$3 = (struct quehead *) 0x0

Cheers,

Stefan

Re: [Qemu-devel] [PATCH 5/6] Add blkmirror block driver

2012-02-29 Thread Eric Blake

On 02/29/2012 06:37 AM, Paolo Bonzini wrote:
> From: Marcelo Tosatti 
> 
> Mirrored writes are used by live block copy.
> 
> The blkmirror driver is for internal use only, because it requires
> bdrv_append to set up a backing_hd for it.  It relies on a quirk
> of bdrv_append, which leaves the old image open for writes.
> 
> The source is hardcoded as the backing_hd for the destination, so that
> copy-on-write functions properly.  Since the source is not yet available
> at the time blkmirror_open is called, the backing_hd is set later.
> 

> +++ b/block/blkmirror.c
> @@ -0,0 +1,153 @@
> +/*
> + * Block driver for mirrored writes.
> + *
> + * Copyright (C) 2011 Red Hat, Inc.

It's now 2012; should this be expanded?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 6/6] add mirroring to blockdev-transaction

2012-02-29 Thread Eric Blake

On 02/29/2012 06:37 AM, Paolo Bonzini wrote:
> Add a new transaction type, "mirror".  It stacks a new blkmirror
> file (instead of a snapshot) on top of the existing image.
> 
> It is possible to combine snapshot and mirror as two actions in the
> same transaction.  Because of atomicity ensured by blockdev-transaction,
> this will create a snapshot *and* ensure that _all_ operations that are
> sent to it are also mirrored.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  blockdev.c   |   47 +++
>  qapi-schema.json |   19 ++-
>  qmp-commands.hx  |   12 +++-
>  3 files changed, 64 insertions(+), 14 deletions(-)
> 

> @@ -721,6 +725,12 @@ actions array:
>- "format": format of new image (json-string, optional)
>- "reuse": whether QEMU should look for an existing image file
>  (json-bool, optional, default false)
> +  When "type" is "mirror":
> +  - "device": device name to snapshot (json-string)
> +  - "target": name of destination image file (json-string)
> +  - "format": format of new image (json-string, optional)
> +  - "reuse": whether QEMU should look for an existing image file
> +(json-bool, optional, default false)

This falls out very nicely.

Do we want to also add a 'reopen' operation to the union, for the
remaining action needed by oVirt live migration?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] RFC: Device isolation groups

2012-02-29 Thread Alex Williamson

On Thu, 2012-02-02 at 12:24 +1100, David Gibson wrote:
> On Wed, Feb 01, 2012 at 01:08:39PM -0700, Alex Williamson wrote:
> > On Wed, 2012-02-01 at 15:46 +1100, David Gibson wrote:
> > > This patch series introduces a new infrastructure to the driver core
> > > for representing "device isolation groups".  That is, groups of
> > > devices which can be "isolated" in such a way that the rest of the
> > > system can be protected from them, even in the presence of userspace
> > > or a guest OS directly driving the devices.
> > > 
> > > Isolation will typically be due to an IOMMU which can safely remap DMA
> > > and interrupts coming from these devices.  We need to represent whole
> > > groups, rather than individual devices, because there are a number of
> > > cases where the group can be isolated as a whole, but devices within
> > > it cannot be safely isolated from each other - this usually occurs
> > > because the IOMMU cannot reliably distinguish which device in the
> > > group initiated a transaction.  In other words, isolation groups
> > > represent the minimum safe granularity for passthrough to guests or
> > > userspace.
> > > 
> > > This series provides the core infraustrcture for tracking isolation
> > > groups, and example implementations initializing the groups
> > > appropriately for two PCI bridges (which include IOMMUs) found on IBM
> > > POWER systems.
> > > 
> > > Actually using the group information is not included here, but David
> > > Woodhouse has expressed an interest in using a structure like this to
> > > represent operations in iommu_ops more correctly.
> > > 
> > > Some tracking of groups is a prerequisite for safe passthrough of
> > > devices to guests or userspace, such as done by VFIO.  Current VFIO
> > > patches use the iommu_ops->device_group mechanism for this.  However,
> > > that mechanism is awkward, because without an in-kernel concrete
> > > representation of groups, enumerating a group requires traversing
> > > every device on a given bus type.  It also fails to cover some very
> > > plausible IOMMU topologies, because its groups cannot span devices on
> > > multiple bus types.
> > 
> > So far so good, but there's not much meat on the bone yet.
> 
> True..

Any update to this series?  It would be great if we could map out the
functionality to the point of breaking down and distributing work... or
determining if the end result has any value add to what VFIO already
does.  Thanks,

Alex

Re: [Qemu-devel] [PATCH 0/4] slirp: Fix for requeuing crash, cleanups

2012-02-29 Thread Jan Kiszka

On 2012-02-29 20:15, Jan Kiszka wrote:
> This is an alternative, more complete approach to fix the requeuing-
> related crashes reported recently. See patch 2 for details. The rest are
> simple cleanups.
> 
> Please check carefully if I messed something up.

Oops, outdated intro. Should have been:

"Well, this requeuing bug seems to have a long breath. Previous attempts
to fix it (mine included) neglected the fact that we need to walk the
queue of pending packets, not just restart from the beginning after a
requeue. This version should get it Right(TM).

This also comes with a fix for resource cleanups on slirp shutdown. At
least valgrind is happy now.

Reviews welcome!"

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

[Qemu-devel] [PATCH 0/4] slirp: Fix for requeuing crash, cleanups

2012-02-29 Thread Jan Kiszka

This is an alternative, more complete approach to fix the requeuing-
related crashes reported recently. See patch 2 for details. The rest are
simple cleanups.

Please check carefully if I messed something up.

CC: Fabien Chouteau 
CC: Michael S. Tsirkin 
CC: Stefan Weil 
CC: Zhi Yong Wu 

Jan Kiszka (4):
  slirp: Keep next_m always valid
  slirp: Fix queue walking in if_start
  slirp: Remove unneeded if_queued
  slirp: Cleanup resources on instance removal

 slirp/if.c   |   68 +++---
 slirp/ip_icmp.c  |7 +
 slirp/ip_icmp.h  |1 +
 slirp/ip_input.c |7 +
 slirp/mbuf.c |   21 
 slirp/mbuf.h |1 +
 slirp/slirp.c|   10 +++
 slirp/slirp.h|3 +-
 slirp/tcp_subr.c |7 +
 slirp/udp.c  |8 ++
 slirp/udp.h  |1 +
 11 files changed, 98 insertions(+), 36 deletions(-)

-- 
1.7.3.4

[Qemu-devel] [PATCH 2/4] slirp: Fix queue walking in if_start

2012-02-29 Thread Jan Kiszka

Another attempt to get this right: We need to carefully walk both the
fastq and the batchq in if_start while trying to send packets to
possibly not yet resolved hosts on the virtual network.

So far we just requeued a delayed packet where it was and then started
walking the queues from the top again - that couldn't work. Now we pre-
calculate the next packet in the queue so that the current one can
safely be removed if it was sent successfully. We also need to take into
account that the next packet can be from the same session if the current
one was sent or from another if it wasn't sent.

CC: Fabien Chouteau 
CC: Zhi Yong Wu 
CC: Stefan Weil 
Signed-off-by: Jan Kiszka 
---
 slirp/if.c |   50 ++
 1 files changed, 34 insertions(+), 16 deletions(-)

diff --git a/slirp/if.c b/slirp/if.c
index 166852a..78a9b78 100644
--- a/slirp/if.c
+++ b/slirp/if.c
@@ -158,26 +158,41 @@ void if_start(Slirp *slirp)
 {
 uint64_t now = qemu_get_clock_ns(rt_clock);
 int requeued = 0;
-bool from_batchq = false;
-struct mbuf *ifm, *ifqt;
+bool from_batchq, from_batchq_next;
+struct mbuf *ifm, *ifm_next, *ifqt;
 
 DEBUG_CALL("if_start");
 
-while (slirp->if_queued) {
+if (slirp->if_fastq.ifq_next != &slirp->if_fastq) {
+ifm_next = slirp->if_fastq.ifq_next;
+from_batchq_next = false;
+} else if (slirp->next_m != &slirp->if_batchq) {
+/* Nothing on fastq, pick up from batchq via next_m */
+ifm_next = slirp->next_m;
+from_batchq_next = true;
+} else {
+ifm_next = NULL;
+}
+
+while (ifm_next) {
 /* check if we can really output */
-if (!slirp_can_output(slirp->opaque))
+if (!slirp_can_output(slirp->opaque)) {
 return;
+}
 
-/*
- * See which queue to get next packet from
- * If there's something in the fastq, select it immediately
- */
-if (slirp->if_fastq.ifq_next != &slirp->if_fastq) {
-ifm = slirp->if_fastq.ifq_next;
-} else {
-/* Nothing on fastq, pick up from batchq via next_m */
-ifm = slirp->next_m;
-from_batchq = true;
+ifm = ifm_next;
+from_batchq = from_batchq_next;
+
+ifm_next = ifm->ifq_next;
+if (!from_batchq) {
+if (ifm_next == &slirp->if_fastq) {
+/* No more packets in fastq, switch to batchq */
+ifm_next = slirp->next_m;
+from_batchq_next = true;
+}
+} else if (ifm_next == &slirp->if_batchq) {
+/* end of batchq */
+ifm_next = NULL;
 }
 
 slirp->if_queued--;
@@ -189,7 +204,7 @@ void if_start(Slirp *slirp)
 continue;
 }
 
-if (from_batchq) {
+if (ifm == slirp->next_m) {
 /* Set which packet to send on next iteration */
 slirp->next_m = ifm->ifq_next;
 }
@@ -202,6 +217,10 @@ void if_start(Slirp *slirp)
 if (ifm->ifs_next != ifm) {
 insque(ifm->ifs_next, ifqt);
 ifs_remque(ifm);
+/* Also update ifm_next to point to this next session packet,
+ * same for from_batchq_next */
+ifm_next = ifm->ifs_next;
+from_batchq_next = from_batchq;
 }
 
 /* Update so_queued */
@@ -211,7 +230,6 @@ void if_start(Slirp *slirp)
 }
 
 m_free(ifm);
-
 }
 
 slirp->if_queued = requeued;
-- 
1.7.3.4

[Qemu-devel] [PATCH 1/4] slirp: Keep next_m always valid

2012-02-29 Thread Jan Kiszka

Make sure that next_m always points to a packet if batchq is non-empty.
This will simplify walking the queues in if_start.

CC: Fabien Chouteau 
CC: Zhi Yong Wu 
CC: Stefan Weil 
Signed-off-by: Jan Kiszka 
---
 slirp/if.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/slirp/if.c b/slirp/if.c
index 33f08e1..166852a 100644
--- a/slirp/if.c
+++ b/slirp/if.c
@@ -96,8 +96,13 @@ if_output(struct socket *so, struct mbuf *ifm)
ifs_insque(ifm, ifq->ifs_prev);
goto diddit;
}
-   } else
+} else {
ifq = slirp->if_batchq.ifq_prev;
+/* Set next_m if the queue was empty so far */
+if (slirp->next_m == &slirp->if_batchq) {
+slirp->next_m = ifm;
+}
+}
 
/* Create a new doubly linked list for this session */
ifm->ifq_so = so;
@@ -170,13 +175,8 @@ void if_start(Slirp *slirp)
 if (slirp->if_fastq.ifq_next != &slirp->if_fastq) {
 ifm = slirp->if_fastq.ifq_next;
 } else {
-/* Nothing on fastq, see if next_m is valid */
-if (slirp->next_m != &slirp->if_batchq) {
-ifm = slirp->next_m;
-} else {
-ifm = slirp->if_batchq.ifq_next;
-}
-
+/* Nothing on fastq, pick up from batchq via next_m */
+ifm = slirp->next_m;
 from_batchq = true;
 }
 
-- 
1.7.3.4

[Qemu-devel] [PATCH 3/4] slirp: Remove unneeded if_queued

2012-02-29 Thread Jan Kiszka

There is now a trivial check on entry of if_start for pending packets,
so we can drop the additional tracking via if_queued.

Signed-off-by: Jan Kiszka 
---
 slirp/if.c|8 
 slirp/slirp.c |7 +--
 slirp/slirp.h |1 -
 3 files changed, 1 insertions(+), 15 deletions(-)

diff --git a/slirp/if.c b/slirp/if.c
index 78a9b78..90bf398 100644
--- a/slirp/if.c
+++ b/slirp/if.c
@@ -110,8 +110,6 @@ if_output(struct socket *so, struct mbuf *ifm)
insque(ifm, ifq);
 
 diddit:
-   slirp->if_queued++;
-
if (so) {
/* Update *_queued */
so->so_queued++;
@@ -157,7 +155,6 @@ diddit:
 void if_start(Slirp *slirp)
 {
 uint64_t now = qemu_get_clock_ns(rt_clock);
-int requeued = 0;
 bool from_batchq, from_batchq_next;
 struct mbuf *ifm, *ifm_next, *ifqt;
 
@@ -195,12 +192,9 @@ void if_start(Slirp *slirp)
 ifm_next = NULL;
 }
 
-slirp->if_queued--;
-
 /* Try to send packet unless it already expired */
 if (ifm->expiration_date >= now && !if_encap(slirp, ifm)) {
 /* Packet is delayed due to pending ARP resolution */
-requeued++;
 continue;
 }
 
@@ -231,6 +225,4 @@ void if_start(Slirp *slirp)
 
 m_free(ifm);
 }
-
-slirp->if_queued = requeued;
 }
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 19d69eb..bcffc34 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -581,12 +581,7 @@ void slirp_select_poll(fd_set *readfds, fd_set *writefds, 
fd_set *xfds,
 }
}
 
-   /*
-* See if we can start outputting
-*/
-   if (slirp->if_queued) {
-   if_start(slirp);
-   }
+if_start(slirp);
 }
 
/* clear global file descriptor sets.
diff --git a/slirp/slirp.h b/slirp/slirp.h
index 28a5c03..950eccd 100644
--- a/slirp/slirp.h
+++ b/slirp/slirp.h
@@ -235,7 +235,6 @@ struct Slirp {
 int mbuf_alloced;
 
 /* if states */
-int if_queued;  /* number of packets queued so far */
 struct mbuf if_fastq;   /* fast queue (for interactive data) */
 struct mbuf if_batchq;  /* queue for non-interactive data */
 struct mbuf *next_m;/* pointer to next mbuf to output */
-- 
1.7.3.4

[Qemu-devel] [PATCH 4/4] slirp: Cleanup resources on instance removal

2012-02-29 Thread Jan Kiszka

Close & free sockets when shutting down a slirp instance, also release
all buffers.

CC: Michael S. Tsirkin 
Signed-off-by: Jan Kiszka 
---
 slirp/ip_icmp.c  |7 +++
 slirp/ip_icmp.h  |1 +
 slirp/ip_input.c |7 +++
 slirp/mbuf.c |   21 +
 slirp/mbuf.h |1 +
 slirp/slirp.c|3 +++
 slirp/slirp.h|2 ++
 slirp/tcp_subr.c |7 +++
 slirp/udp.c  |8 
 slirp/udp.h  |1 +
 10 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/slirp/ip_icmp.c b/slirp/ip_icmp.c
index 5dbf21d..d571fd0 100644
--- a/slirp/ip_icmp.c
+++ b/slirp/ip_icmp.c
@@ -66,6 +66,13 @@ void icmp_init(Slirp *slirp)
 slirp->icmp_last_so = &slirp->icmp;
 }
 
+void icmp_cleanup(Slirp *slirp)
+{
+while (slirp->icmp.so_next != &slirp->icmp) {
+icmp_detach(slirp->icmp.so_next);
+}
+}
+
 static int icmp_send(struct socket *so, struct mbuf *m, int hlen)
 {
 struct ip *ip = mtod(m, struct ip *);
diff --git a/slirp/ip_icmp.h b/slirp/ip_icmp.h
index b3da1f2..1a1af91 100644
--- a/slirp/ip_icmp.h
+++ b/slirp/ip_icmp.h
@@ -154,6 +154,7 @@ struct icmp {
(type) == ICMP_MASKREQ || (type) == ICMP_MASKREPLY)
 
 void icmp_init(Slirp *slirp);
+void icmp_cleanup(Slirp *slirp);
 void icmp_input(struct mbuf *, int);
 void icmp_error(struct mbuf *msrc, u_char type, u_char code, int minsize,
 const char *message);
diff --git a/slirp/ip_input.c b/slirp/ip_input.c
index c7b3eb4..ce24faf 100644
--- a/slirp/ip_input.c
+++ b/slirp/ip_input.c
@@ -61,6 +61,13 @@ ip_init(Slirp *slirp)
 icmp_init(slirp);
 }
 
+void ip_cleanup(Slirp *slirp)
+{
+udp_cleanup(slirp);
+tcp_cleanup(slirp);
+icmp_cleanup(slirp);
+}
+
 /*
  * Ip input routine.  Checksum and byte swap header.  If fragmented
  * try to reassemble.  Process options.  Pass to next level.
diff --git a/slirp/mbuf.c b/slirp/mbuf.c
index c699c75..4fefb04 100644
--- a/slirp/mbuf.c
+++ b/slirp/mbuf.c
@@ -32,6 +32,27 @@ m_init(Slirp *slirp)
 slirp->m_usedlist.m_next = slirp->m_usedlist.m_prev = &slirp->m_usedlist;
 }
 
+void m_cleanup(Slirp *slirp)
+{
+struct mbuf *m, *next;
+
+m = slirp->m_usedlist.m_next;
+while (m != &slirp->m_usedlist) {
+next = m->m_next;
+if (m->m_flags & M_EXT) {
+free(m->m_ext);
+}
+free(m);
+m = next;
+}
+m = slirp->m_freelist.m_next;
+while (m != &slirp->m_freelist) {
+next = m->m_next;
+free(m);
+m = next;
+}
+}
+
 /*
  * Get an mbuf from the free list, if there are none
  * malloc one
diff --git a/slirp/mbuf.h b/slirp/mbuf.h
index 8d7951f..3f3ab09 100644
--- a/slirp/mbuf.h
+++ b/slirp/mbuf.h
@@ -116,6 +116,7 @@ struct mbuf {
 * it rather than putting it on the 
free list */
 
 void m_init(Slirp *);
+void m_cleanup(Slirp *slirp);
 struct mbuf * m_get(Slirp *);
 void m_free(struct mbuf *);
 void m_cat(register struct mbuf *, register struct mbuf *);
diff --git a/slirp/slirp.c b/slirp/slirp.c
index bcffc34..1502830 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -246,6 +246,9 @@ void slirp_cleanup(Slirp *slirp)
 
 unregister_savevm(NULL, "slirp", slirp);
 
+ip_cleanup(slirp);
+m_cleanup(slirp);
+
 g_free(slirp->tftp_prefix);
 g_free(slirp->bootp_filename);
 g_free(slirp);
diff --git a/slirp/slirp.h b/slirp/slirp.h
index 950eccd..013a3b3 100644
--- a/slirp/slirp.h
+++ b/slirp/slirp.h
@@ -314,6 +314,7 @@ void if_output(struct socket *, struct mbuf *);
 
 /* ip_input.c */
 void ip_init(Slirp *);
+void ip_cleanup(Slirp *);
 void ip_input(struct mbuf *);
 void ip_slowtimo(Slirp *);
 void ip_stripoptions(register struct mbuf *, struct mbuf *);
@@ -331,6 +332,7 @@ void tcp_setpersist(register struct tcpcb *);
 
 /* tcp_subr.c */
 void tcp_init(Slirp *);
+void tcp_cleanup(Slirp *);
 void tcp_template(struct tcpcb *);
 void tcp_respond(struct tcpcb *, register struct tcpiphdr *, register struct 
mbuf *, tcp_seq, tcp_seq, int);
 struct tcpcb * tcp_newtcpcb(struct socket *);
diff --git a/slirp/tcp_subr.c b/slirp/tcp_subr.c
index 143a238..6f6585a 100644
--- a/slirp/tcp_subr.c
+++ b/slirp/tcp_subr.c
@@ -55,6 +55,13 @@ tcp_init(Slirp *slirp)
 slirp->tcp_last_so = &slirp->tcb;
 }
 
+void tcp_cleanup(Slirp *slirp)
+{
+while (slirp->tcb.so_next != &slirp->tcb) {
+tcp_close(sototcpcb(slirp->tcb.so_next));
+}
+}
+
 /*
  * Create template to be used to send tcp packets on a connection.
  * Call after host entry created, fills
diff --git a/slirp/udp.c b/slirp/udp.c
index 5b060f3..ced5096 100644
--- a/slirp/udp.c
+++ b/slirp/udp.c
@@ -49,6 +49,14 @@ udp_init(Slirp *slirp)
 slirp->udb.so_next = slirp->udb.so_prev = &slirp->udb;
 slirp->udp_last_so = &slirp->udb;
 }
+
+void udp_cleanup(Slirp *slirp)
+{
+while (slirp->udb.so_next != &slirp->udb) {
+udp_detach(slirp->udb.so_next);
+}
+}
+
 /* m->m_data  points at ip packet header
  * m->m_len   length ip packe

[Qemu-devel] [PATCH] build: Fix installation of target-dependant files

2012-02-29 Thread Lluís Vilanova

Pass all the relevant sub-directory make variables.

Signed-off-by: Lluís Vilanova 
Cc: Anthony Liguori 
Cc: Paul Brook 
---
 Makefile |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Makefile b/Makefile
index ad1e627..3f1e2b9 100644
--- a/Makefile
+++ b/Makefile
@@ -299,7 +299,7 @@ endif
$(INSTALL_DATA) $(SRC_PATH)/pc-bios/keymaps/$$x 
"$(DESTDIR)$(datadir)/keymaps"; \
done
for d in $(TARGET_DIRS); do \
-   $(MAKE) -C $$d $@ || exit 1 ; \
+   $(MAKE) $(SUBDIR_MAKEFLAGS) -C $$d $@ || exit 1 ; \
 done
 
 # various test targets

Re: [Qemu-devel] [PATCH 4/6] add reuse field

2012-02-29 Thread Eric Blake

On 02/29/2012 06:37 AM, Paolo Bonzini wrote:
> In some cases it can be useful to use an existing file as the new image
> in a snapshot.  Add this capability to blockdev-transaction.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  blockdev.c   |   18 +++---
>  qapi-schema.json |3 ++-
>  qmp-commands.hx  |7 +++
>  3 files changed, 20 insertions(+), 8 deletions(-)
> @@ -805,13 +807,15 @@ void qmp_blockdev_transaction(BlockdevActionList 
> *dev_list,
>  }
>  
>  /* create new image w/backing file */
> -ret = bdrv_img_create(new_image_file, format,
> -  states->old_bs->filename,
> -  states->old_bs->drv->format_name,
> -  NULL, -1, flags);
> -if (ret) {
> -error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
> -goto delete_and_fail;
> +if (do_snapshot) {
> +ret = bdrv_img_create(new_image_file, format,
> +  states->old_bs->filename,
> +  states->old_bs->drv->format_name,
> +  NULL, -1, flags);
> +if (ret) {
> +error_set(errp, QERR_OPEN_FILE_FAILED, new_image_file);
> +goto delete_and_fail;
> +}

Is there any sanity checking that we should be doing if the 'reuse' flag
was provided, or is this a case of 'the user better be giving us the
right information, and it's their fault if things break'?

> +++ b/qapi-schema.json
> @@ -1127,7 +1127,8 @@
>  # @format: #optional the format of the snapshot image, default is 'qcow2'.
>  ##
>  { 'type': 'BlockdevSnapshot',
> -  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str' } }
> +  'data': { 'device': 'str', 'snapshot-file': 'str', '*format': 'str',
> +'*reuse': 'bool' } }
>  
>  ##
>  # @BlockdevAction
> diff --git a/qmp-commands.hx b/qmp-commands.hx
> index ee05ec2..6728495 100644
> --- a/qmp-commands.hx
> +++ b/qmp-commands.hx
> @@ -704,6 +704,10 @@ the original disks pre-snapshot attempt are used.
>  A list of dictionaries is accepted, that contains the actions to be 
> performed.
>  For snapshots this is the device, the file to use for the new snapshot,
>  and the format.  The default format, if not specified, is qcow2.
> +Image files can be created by QEMU, or it can be created externally.

'files' is plural, 'it' is singular.  Perhaps this wording is better?

Each new snapshot defaults to being created by QEMU (wiping any contents
if the file already exists), but it is also possible to reuse an
externally-created file.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH 3/6] rename blockdev-group-snapshot-sync

2012-02-29 Thread Eric Blake

On 02/29/2012 06:37 AM, Paolo Bonzini wrote:
> We will add other kinds of operation.  Prepare for this by adjusting
> the schema and renaming some types/variables.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  blockdev.c   |   73 +
>  qapi-schema.json |   33 
>  qmp-commands.hx  |   51 -
>  3 files changed, 90 insertions(+), 67 deletions(-)
> 

>  ##
> -# @blockdev-group-snapshot-sync
> +# @blockdev-transaction
>  #
> -# Generates a synchronous snapshot of a group of one or more block devices,
> -# as atomically as possible.  If the snapshot of any device in the group
> -# fails, then the entire group snapshot will be abandoned and the
> -# appropriate error returned.
> +# Atomically operate on a group of one or more block devices.  If
> +# any operation fails, then the entire set of actions will be
> +# abandoned and the appropriate error returned.  The only operation
> +# supported is currently snapshot.
>  #
>  #  List of:
> -#  @SnapshotDev: information needed for the device snapshot
> +#  @BlockdevAction: information needed for the device snapshot

>  Arguments:
>  
> -devlist array:
> -- "device": device name to snapshot (json-string)
> -- "snapshot-file": name of new image file (json-string)
> -- "format": format of new image (json-string, optional)
> +actions array:
> +- "type": the operation to perform.  The only supported
> +  value is "snapshot". (json-string)
> +- "data": a dictionary.  The contents depend on the value
> +  of "type".  When "type" is "snapshot":
> +  - "device": device name to snapshot (json-string)
> +  - "snapshot-file": name of new image file (json-string)
> +  - "format": format of new image (json-string, optional)

I like it.  Of course, this means I still have a moving target for
implementing the libvirt side of things, so I'd like consensus sooner
rather than later.  More importantly, as long as
blockdev-group-snapshot-sync is not part of a release, renaming it is
fine; but if we get to the qemu 1.1 release with Jeff's patches but not
Paolo's rename, then libvirt has a harder job to cope with both names.


>  
>  Example:
>  
> --> { "execute": "blockdev-group-snapshot-sync", "arguments":
> -  { "devlist": [{ "device": "ide-hd0",
> -  "snapshot-file": 
> "/some/place/my-image",
> -  "format": "qcow2" },
> -{ "device": "ide-hd1",
> -  "snapshot-file": 
> "/some/place/my-image2",
> -  "format": "qcow2" }] } }
> +-> { "execute": "blockdev-transaction",
> + "arguments": { "actions": [
> + { 'type': 'snapshot, 'data' : { "device": "ide-hd0",
> + "snapshot-file": 
> "/some/place/my-image",
> + "format": "qcow2" } },
> + { 'type': 'snapshot, 'data' : { "device": "ide-hd1",
> + "snapshot-file": 
> "/some/place/my-image2",
> + "format": "qcow2",
> + "op": "snapshot" } } ] } }

Drop the "op":"snapshot".  It's a leftover from before your conversion
to a union type.

Question - when reading/writing these examples, are 'type' and "type"
interchangeable (like in XML)?

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

[Qemu-devel] [PATCH] target-arm: Fix typo in ARM946 cp15 c5 handling

2012-02-29 Thread Peter Maydell

Fix a typo in handling of the ARM946 cp15 c5 c0 0 1 handling
(instruction access permission bits) that meant it would
return the data access permission bits by mistake.

Signed-off-by: Peter Maydell 
---
(Yeah, it says ARM_FEATURE_MPU but actually (a) the only MPU
core we support is the 946 and (b) these registers are 946
specific -- in PMSAv6 and v7 this encoding is the IFSR, the
same as it is for VMSA.)

 target-arm/helper.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/target-arm/helper.c b/target-arm/helper.c
index 4929372..8e6da06 100644
--- a/target-arm/helper.c
+++ b/target-arm/helper.c
@@ -2030,7 +2030,7 @@ uint32_t HELPER(get_cp15)(CPUState *env, uint32_t insn)
 return env->cp15.c5_data;
 case 1:
 if (arm_feature(env, ARM_FEATURE_MPU))
-return simple_mpu_ap_bits(env->cp15.c5_data);
+return simple_mpu_ap_bits(env->cp15.c5_insn);
 return env->cp15.c5_insn;
 case 2:
 if (!arm_feature(env, ARM_FEATURE_MPU))
-- 
1.7.1

Re: [Qemu-devel] [PATCH] MSI / MSIX injection for Xen HVM

2012-02-29 Thread Jan Kiszka

On 2012-02-29 18:21, Wei Liu wrote:
> Hi all
> 
> This patch adds MSI / MSIX injection for Xen HVM guest. This is not new,
> six months ago we had a discussion in
> http://marc.info/?l=qemu-devel&m=130639451725966&w=2

There are some coding style issues, please use checkpatch.pl.

Back then I voted against "if (xen_enabled())" as I was planning for a
msi injection hook that also Xen could use. That may change again, the
final MSI layer design is not settled yet. Therefore, no concerns from
that POV, this work takes longer. We can refactor the Xen hooks during
that run again.

However, you know that you miss those (uncommon) messages that are
injected via DMA? They end up directly in apic_deliver_msi (where KVM
will once pick them up as well).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

Re: [Qemu-devel] [PATCH] qemu-iotests: Filter out DOS line endings

2012-02-29 Thread Eric Blake

On 02/29/2012 06:59 AM, Kevin Wolf wrote:
> This one makes it possible to run qemu-iotests on a Windows build using Wine
> and get somewhat meaningful results.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/common.filter |8 +++-
>  1 files changed, 7 insertions(+), 1 deletions(-)
> 
> diff --git a/tests/qemu-iotests/common.filter 
> b/tests/qemu-iotests/common.filter
> index da77ede..fa26b62 100644
> --- a/tests/qemu-iotests/common.filter
> +++ b/tests/qemu-iotests/common.filter
> @@ -140,10 +140,16 @@ _filter_imgfmt()
>  sed -e "s#$IMGFMT#IMGFMT#g"
>  }
>  
> +# Removes \r from messages
> +_filter_win32()
> +{
> +sed -e 's/\r//g'

POSIX does not require sed to recognize \r as a synonym for carriage
return.  You are better off using tr(1) (tr -d '\r') if all you want to
do is strip carriage returns in a POSIX-compliant manner.  Also be aware
that on Solaris, you have to make sure you are using a PATH that first
finds a POSIX-compliant tr.

-- 
Eric Blake   ebl...@redhat.com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] full valrind report

2012-02-29 Thread Stefan Weil


Am 29.02.2012 17:19, schrieb Michael S. Tsirkin:

Here's a full report of possible leaks:
Any idea? I am invedtigating.


Hi Michael,

try valgrind with --track-origins=yes. It costs some memory, but
improves diagnostics not only for memory leaks.

Most important are the leaks marked with "definitely lost".
A lot of them are just missing destructors when QEMU terminates.
Some QEMU classes provide an init function, but no exit function,
for example. If you suspect a leak, you can try to re-run QEMU
and look whether it is possible to increase the leak: repeat an
action in the QEMU monitor several times, connect to the VNC
server more than once, let the emulation run for a long time
and so on. This kind of leaks is dangerous for long running
QEMU instances or allows denial of service attacks.

Good (bug) hunting!

Stefan W.

[Qemu-devel] [PATCH 13/27] block: remove unused fields in BlockDriverState

2012-02-29 Thread Kevin Wolf

From: Paolo Bonzini 

sync_aiocb is unused since commit ce1a14d (Dynamically allocate AIO
Completion Blocks., 2006-08-07).

private is unused since commit 56a1493 (drive cleanup fixes., 2009-09-25).

Signed-off-by: Paolo Bonzini 
Signed-off-by: Kevin Wolf 
---
 block_int.h |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/block_int.h b/block_int.h
index 04f4b83..25aff80 100644
--- a/block_int.h
+++ b/block_int.h
@@ -259,10 +259,6 @@ struct BlockDriverState {
 /* number of in-flight copy-on-read requests */
 unsigned int copy_on_read_in_flight;
 
-/* async read/write emulation */
-
-void *sync_aiocb;
-
 /* the time for latest disk I/O */
 int64_t slice_time;
 int64_t slice_start;
@@ -299,7 +295,6 @@ struct BlockDriverState {
 int64_t dirty_count;
 int in_use; /* users other than guest access, eg. block migration */
 QTAILQ_ENTRY(BlockDriverState) list;
-void *private;
 
 QLIST_HEAD(, BdrvTrackedRequest) tracked_requests;
 
-- 
1.7.6.5

[Qemu-devel] [PATCH] MSI / MSIX injection for Xen HVM

2012-02-29 Thread Wei Liu

Hi all

This patch adds MSI / MSIX injection for Xen HVM guest. This is not new,
six months ago we had a discussion in
http://marc.info/?l=qemu-devel&m=130639451725966&w=2


Wei.

---8<-
MSI / MSIX injection for Xen.

This is supposed to be used in conjunction with Xen's
hypercall interface for emualted MSI / MSIX injection.

Signed-off-by: Wei Liu 
---
 hw/msi.c   |7 ++-
 hw/msix.c  |8 +++-
 hw/xen.h   |1 +
 xen-all.c  |5 +
 xen-stub.c |4 
 5 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/hw/msi.c b/hw/msi.c
index 5d6ceb6..b11eeac 100644
--- a/hw/msi.c
+++ b/hw/msi.c
@@ -20,6 +20,7 @@
 
 #include "msi.h"
 #include "range.h"
+#include "xen.h"
 
 /* Eventually those constants should go to Linux pci_regs.h */
 #define PCI_MSI_PENDING_32  0x10
@@ -257,7 +258,11 @@ void msi_notify(PCIDevice *dev, unsigned int vector)
"notify vector 0x%x"
" address: 0x%"PRIx64" data: 0x%"PRIx32"\n",
vector, address, data);
-stl_le_phys(address, data);
+if (xen_enabled()) {
+xen_hvm_inject_msi(address, data);
+} else {
+stl_le_phys(address, data);
+}
 }
 
 /* call this function after updating configs by pci_default_write_config(). */
diff --git a/hw/msix.c b/hw/msix.c
index 3835eaa..1ddd34e 100644
--- a/hw/msix.c
+++ b/hw/msix.c
@@ -19,6 +19,7 @@
 #include "msix.h"
 #include "pci.h"
 #include "range.h"
+#include "xen.h"
 
 #define MSIX_CAP_LENGTH 12
 
@@ -365,7 +366,12 @@ void msix_notify(PCIDevice *dev, unsigned vector)
 
 address = pci_get_quad(table_entry + PCI_MSIX_ENTRY_LOWER_ADDR);
 data = pci_get_long(table_entry + PCI_MSIX_ENTRY_DATA);
-stl_le_phys(address, data);
+
+if (xen_enabled()) {
+   xen_hvm_inject_msi(address, data);
+} else {
+   stl_le_phys(address, data);
+}
 }
 
 void msix_reset(PCIDevice *dev)
diff --git a/hw/xen.h b/hw/xen.h
index b46879c..e5926b7 100644
--- a/hw/xen.h
+++ b/hw/xen.h
@@ -34,6 +34,7 @@ static inline int xen_enabled(void)
 int xen_pci_slot_get_pirq(PCIDevice *pci_dev, int irq_num);
 void xen_piix3_set_irq(void *opaque, int irq_num, int level);
 void xen_piix_pci_write_config_client(uint32_t address, uint32_t val, int len);
+void xen_hvm_inject_msi(uint64_t addr, uint32_t data);
 void xen_cmos_set_s3_resume(void *opaque, int irq, int level);
 
 qemu_irq *xen_interrupt_controller_init(void);
diff --git a/xen-all.c b/xen-all.c
index b0ed1ed..78c6df3 100644
--- a/xen-all.c
+++ b/xen-all.c
@@ -122,6 +122,11 @@ void xen_piix_pci_write_config_client(uint32_t address, 
uint32_t val, int len)
 }
 }
 
+void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
+{
+xc_hvm_inject_msi(xen_xc, xen_domid, addr, data);
+}
+
 static void xen_suspend_notifier(Notifier *notifier, void *data)
 {
 xc_set_hvm_param(xen_xc, xen_domid, HVM_PARAM_ACPI_S_STATE, 3);
diff --git a/xen-stub.c b/xen-stub.c
index 9ea02d4..8ff2b79 100644
--- a/xen-stub.c
+++ b/xen-stub.c
@@ -29,6 +29,10 @@ void xen_piix_pci_write_config_client(uint32_t address, 
uint32_t val, int len)
 {
 }
 
+void xen_hvm_inject_msi(uint64_t addr, uint32_t data)
+{
+}
+
 void xen_cmos_set_s3_resume(void *opaque, int irq, int level)
 {
 }
-- 
1.7.2.5

[Qemu-devel] [PATCH 09/27] fdc: fix seek command, which shouldn't check tracks

2012-02-29 Thread Kevin Wolf

From: Hervé Poussineau 

The seek command just sends step pulses to the drive and doesn't care if
there is a medium inserted of if it is banging the head against the drive.

Signed-off-by: Hervé Poussineau 
Signed-off-by: Kevin Wolf 
---
 hw/fdc.c |9 ++---
 1 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index cc03326..7879b70 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -1622,13 +1622,16 @@ static void fdctrl_handle_seek(FDCtrl *fdctrl, int 
direction)
 SET_CUR_DRV(fdctrl, fdctrl->fifo[1] & FD_DOR_SELMASK);
 cur_drv = get_cur_drv(fdctrl);
 fdctrl_reset_fifo(fdctrl);
+/* The seek command just sends step pulses to the drive and doesn't care if
+ * there is a medium inserted of if it's banging the head against the 
drive.
+ */
 if (fdctrl->fifo[2] > cur_drv->max_track) {
-fdctrl_raise_irq(fdctrl, FD_SR0_ABNTERM | FD_SR0_SEEK);
+cur_drv->track = cur_drv->max_track;
 } else {
 cur_drv->track = fdctrl->fifo[2];
-/* Raise Interrupt */
-fdctrl_raise_irq(fdctrl, FD_SR0_SEEK);
 }
+/* Raise Interrupt */
+fdctrl_raise_irq(fdctrl, FD_SR0_SEEK);
 }
 
 static void fdctrl_handle_perpendicular_mode(FDCtrl *fdctrl, int direction)
-- 
1.7.6.5

[Qemu-devel] [PATCH 15/27] qcow2: Fix offset in qcow2_read_extensions

2012-02-29 Thread Kevin Wolf

The spec says that the length of extensions is padded to 8 bytes, not
the offset. Currently this is the same because the header size is a
multiple of 8, so this is only about compatibility with future changes
to the header size.

While touching it, move the calculation to a common place instead of
duplicating it for each header extension type.

Signed-off-by: Kevin Wolf 
Reviewed-by: Stefan Hajnoczi 
---
 block/qcow2.c |5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index dea12c1..f68f0e1 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -126,7 +126,6 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 #ifdef DEBUG_EXT
 printf("Qcow2: Got format extension %s\n", bs->backing_format);
 #endif
-offset = ((offset + ext.len + 7) & ~7);
 break;
 
 default:
@@ -143,11 +142,11 @@ static int qcow2_read_extensions(BlockDriverState *bs, 
uint64_t start_offset,
 if (ret < 0) {
 return ret;
 }
-
-offset = ((offset + ext.len + 7) & ~7);
 }
 break;
 }
+
+offset += ((ext.len + 7) & ~7);
 }
 
 return 0;
-- 
1.7.6.5

[Qemu-devel] [PATCH 10/27] fdc: DIR (Digital Input Register) should return status of current drive...

2012-02-29 Thread Kevin Wolf

From: Hervé Poussineau 

Signed-off-by: Hervé Poussineau 
Signed-off-by: Kevin Wolf 
---
 hw/fdc.c |   10 +++---
 1 files changed, 3 insertions(+), 7 deletions(-)

diff --git a/hw/fdc.c b/hw/fdc.c
index 7879b70..a0236b7 100644
--- a/hw/fdc.c
+++ b/hw/fdc.c
@@ -216,6 +216,7 @@ static void fdctrl_reset_fifo(FDCtrl *fdctrl);
 static int fdctrl_transfer_handler (void *opaque, int nchan,
 int dma_pos, int dma_len);
 static void fdctrl_raise_irq(FDCtrl *fdctrl, uint8_t status0);
+static FDrive *get_cur_drv(FDCtrl *fdctrl);
 
 static uint32_t fdctrl_read_statusA(FDCtrl *fdctrl);
 static uint32_t fdctrl_read_statusB(FDCtrl *fdctrl);
@@ -956,14 +957,9 @@ static uint32_t fdctrl_read_dir(FDCtrl *fdctrl)
 {
 uint32_t retval = 0;
 
-if (fdctrl_media_changed(drv0(fdctrl))
- || fdctrl_media_changed(drv1(fdctrl))
-#if MAX_FD == 4
- || fdctrl_media_changed(drv2(fdctrl))
- || fdctrl_media_changed(drv3(fdctrl))
-#endif
-)
+if (fdctrl_media_changed(get_cur_drv(fdctrl))) {
 retval |= FD_DIR_DSKCHG;
+}
 if (retval != 0) {
 FLOPPY_DPRINTF("Floppy digital input register: 0x%02x\n", retval);
 }
-- 
1.7.6.5

1 2 3 >

1 - 100 of 243 matches

Mail list logo