Re: [Qemu-devel] add debugger command

2011-12-28 Thread Stefan Weil
Am 28.12.2011 07:35, schrieb Peter Cheung:
> Dear All
> Please take a look http://peter-bochs.googlecode.com , I am an
> operating system developer, bochs has a great build in command-line
> debugger, but it is not good enough for normal use, so I created
> peter-bochs for it. But bochs has a deadly weak point, it runs very
> slow. So I want to add debugger feature for qemu. To let peter-bochs
> works with qemu, need to add these to qemu
> 1) able to let peter-bochs pause qemu during running. In bochs,
> peter-bochs just sending a "ctrl-c" command to bochs, then it pause.
> 2) magic breakpoint, in bochs, when bochs execute a intstruction xchg
> bx,bx, it will pause
> 3) able to send debug command to qemu, through pipeline/socket/whatever.
>
> Is QEMU a tai wan project? I am living in hong kong.
>
> Thanks
> from Peter
>

Hello Peter,

QEMU is not a national project. The QEMU contributors and users
are living all over the world. See http://www.ohloh.net/p/qemu/map.

QEMU has no built-in debugger, but it supports the GDB remote protocol
which allows remote debugging via TCP socket, for example. See
http://sourceware.org/gdb/onlinedocs/gdb/Remote-Debugging.html and
http://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html
for more information on this protocol and
http://qemu.weilnetz.de/qemu-doc.html#gdb_005fusage for instructions
how to use this remote debugging feature with QEMU.

So it should be possible to attach you Java application to QEMU
without any changes of the QEMU source code. All you have to do
is extend your application to support the GDB remote protocol.
There are other graphical debugging front ends (for example DDD
or Insight) which work like this.

Please note that QEMU is not limited to 80x86 emulation. Any
debugging interface must be able to support all QEMU emulation
targets.

Regards,
Stefan Weil




Re: [Qemu-devel] add debugger command

2011-12-28 Thread Peter Cheung

Good, thanks a lot. I will give it a try tonight. Why I got double-email from 
the mailing list. I subscribe twice?

Thanksfrom Peter

> Date: Wed, 28 Dec 2011 09:25:01 +0100
> From: s...@weilnetz.de
> To: mcheun...@hotmail.com
> CC: qemu-devel@nongnu.org
> Subject: Re: [Qemu-devel] add debugger command
> 
> Am 28.12.2011 07:35, schrieb Peter Cheung:
> > Dear All
> > Please take a look http://peter-bochs.googlecode.com , I am an
> > operating system developer, bochs has a great build in command-line
> > debugger, but it is not good enough for normal use, so I created
> > peter-bochs for it. But bochs has a deadly weak point, it runs very
> > slow. So I want to add debugger feature for qemu. To let peter-bochs
> > works with qemu, need to add these to qemu
> > 1) able to let peter-bochs pause qemu during running. In bochs,
> > peter-bochs just sending a "ctrl-c" command to bochs, then it pause.
> > 2) magic breakpoint, in bochs, when bochs execute a intstruction xchg
> > bx,bx, it will pause
> > 3) able to send debug command to qemu, through pipeline/socket/whatever.
> >
> > Is QEMU a tai wan project? I am living in hong kong.
> >
> > Thanks
> > from Peter
> >
> 
> Hello Peter,
> 
> QEMU is not a national project. The QEMU contributors and users
> are living all over the world. See http://www.ohloh.net/p/qemu/map.
> 
> QEMU has no built-in debugger, but it supports the GDB remote protocol
> which allows remote debugging via TCP socket, for example. See
> http://sourceware.org/gdb/onlinedocs/gdb/Remote-Debugging.html and
> http://sourceware.org/gdb/onlinedocs/gdb/Remote-Protocol.html
> for more information on this protocol and
> http://qemu.weilnetz.de/qemu-doc.html#gdb_005fusage for instructions
> how to use this remote debugging feature with QEMU.
> 
> So it should be possible to attach you Java application to QEMU
> without any changes of the QEMU source code. All you have to do
> is extend your application to support the GDB remote protocol.
> There are other graphical debugging front ends (for example DDD
> or Insight) which work like this.
> 
> Please note that QEMU is not limited to 80x86 emulation. Any
> debugging interface must be able to support all QEMU emulation
> targets.
> 
> Regards,
> Stefan Weil
> 
> 
  

Re: [Qemu-devel] [PATCH 2/3] hw/sd.c: add SD card save/load support

2011-12-28 Thread Avi Kivity
On 12/27/2011 11:30 PM, Peter Maydell wrote:
> On 27 December 2011 14:13, Avi Kivity  wrote:
> > On 12/26/2011 04:58 PM, Peter Maydell wrote:
> >> >  void sd_enable(SDState *sd, int enable)
> >> >  {
> >> > -sd->enable = enable;
> >> > +sd->enable = enable ? true : false;
> >>
> >> This kind of thing is why I don't like bool :-)
> >
> > /me leaps to bool's defence:
> >
> >  sd->enable = enable should work just fine.
>
> This is true, but the code snippet also illustrates that it sits
> oddly to have the internal state variable be bool when the external
> facing function's API is clearly using the traditional C style of
> int-for-booleans.

We should change those too.  bool is self-documenting.

> Plus 'bool' gives me C++ flashbacks :-)

And QOM doesn't?  How about
glue(glue(glue(cirrus_colorexpand_pattern_transp_, ROP_NAME), _),DEPTH)?

-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] virtio-net with virtio-mmio

2011-12-28 Thread Ying-Shiuan Pan
Hi, all:

I'm very interested in virtio-mmio Peter Maydell did for QEMU,
(http://lists.nongnu.org/archive/html/qemu-devel/2011-11/msg01870.html)

actually, I've tested the virtio-blk, and it is working.
I applied those patch to QEMU-1.0 and brought the virtio_mmio.c from
Linux-3.2-rc back to Linux-linaro-2.6.38.

I also found a small bug in virito-mmio.c: virtio_mmio_write(),
Peter forgot to break in each case of switch block.
After fixed the small bug, the virtio-balloon works as well.

Then, when I attempted to enable the virtio-net, the initialization part is
fine,
however, the QEMU crashed and printed this message:
"virtio-net header not in first element"

It happens when the front-end virtio-net is invoking virtqueue_kick() at
the end of try_fill_recv().
Then, QEMU gets this message and invokes virtio_net_receive(), then the
error happens.

The parameters I passed to qemu-system-arm is
qemu-system-arm XXX -netdev type=tap,id=mynet -global
virtio-net-mmio.netdev=mynet
I don't know this is correct or not.

Does anyone have any idea of this problem?
Or give me some directions to solve this bug?

Any comment is welcome. I really appreciate your help!!



Best Regards,
Ying-Shiuan Pan


[Qemu-devel] DERNIER RAPPEL. Profitez des offres de décembre sur Mes-Contacts

2011-12-28 Thread Mes-Contacts
Cher(s)  CLIENTS,   cgp(s) indépendant(s), cabinet(s) de commercilisation, 
promoteur(s)... Rendez-vous sur www.mes-contacts.com (recommandé par les 
promoteurs) pour obtenir maintenant les meilleurs Contacts entrants en 
Défiscalisation et Placements financiers (Ass.vie, 
SCPI,mutuelle,retraite..) Offre valable tout le mois de Décembre 2011 
: CONTACTS DEFISCALISATION :*Forfait 750 € :   6 contacts offerts soit 23 
fiches au total au lieu de 17*Forfait 1000 € :9 contacts offerts soit 33 
fiches au total au lieu de 24*Forfait 2000 € : 19 contacts offerts soit 
70 fiches au total au lieu de 51*Forfait 5000 € : 35 contacts offerts 
soit 170 fiches au total au lieu de 135  CONTACTS PLACEMENTS 
FINANCIERS:*Forfait 550 € :   2 contacts offerts soit 17 fiches au total 
au lieu de 15*Forfait 1000 € : 5 contacts offerts soit 35 fiches au total 
au lieu de 30*Forfait 2000 € : 10 contacts offerts soit 90 fiches au 
total au lieu de 80-Contacts Exclusifs-Démarche volontaire des 
prospects-Livraison des contacts sur votre boîte e-mail en temps réel-Gros 
volume d'envois possible-Simple, efficace,économique et rentableà partir de 25 
€/le contactCordialement,L'équipe 
Mes-Contacts.comwww.mes-contacts.com Désinscription

[Qemu-devel] [PATCH] Fix qapi code generation fix

2011-12-28 Thread Avi Kivity
The fixes to qapi code generation had multiple bugs:
- the Null class used to drop output was missing some methods
- in some scripts it was never instantiated, leading to a None return,
  which is missing even more methods
- the --source and --header options were swapped

Luckily, all those bugs were hidden by a makefile bug which caused the
old behaviour (with the race) to be invoked.

Signed-off-by: Avi Kivity 
---
 Makefile |2 +-
 scripts/qapi-commands.py |   12 
 scripts/qapi-types.py|   12 +---
 scripts/qapi-visit.py|   12 +---
 4 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/Makefile b/Makefile
index 0838bc4..8118478 100644
--- a/Makefile
+++ b/Makefile
@@ -173,7 +173,7 @@ qapi-dir := $(BUILD_DIR)/qapi-generated
 test-qmp-input-visitor.o test-qmp-output-visitor.o test-qmp-commands.o 
qemu-ga$(EXESUF): QEMU_CFLAGS += -I $(qapi-dir)
 qemu-ga$(EXESUF): LIBS = $(LIBS_QGA)
 
-gen-out-type = $(subst .,-,$@)
+gen-out-type = $(subst .,-,$(suffix $@))
 
 $(qapi-dir)/test-qapi-types.c $(qapi-dir)/test-qapi-types.h :\
 $(SRC_PATH)/qapi-schema-test.json $(SRC_PATH)/scripts/qapi-types.py
diff --git a/scripts/qapi-commands.py b/scripts/qapi-commands.py
index bd7b207..3aabf61 100644
--- a/scripts/qapi-commands.py
+++ b/scripts/qapi-commands.py
@@ -399,9 +399,9 @@ for o, a in opts:
 elif o in ("-m", "--middle"):
 middle_mode = True
 elif o in ("-c", "--source"):
-do_h = True
-elif o in ("-h", "--header"):
 do_c = True
+elif o in ("-h", "--header"):
+do_h = True
 
 if not do_c and not do_h:
 do_c = True
@@ -411,15 +411,11 @@ c_file = output_dir + prefix + c_file
 h_file = output_dir + prefix + h_file
 
 def maybe_open(really, name, opt):
-class Null(object):
-def write(self, str):
-pass
-def read(self):
-return ''
 if really:
 return open(name, opt)
 else:
-return Null()
+import StringIO
+return StringIO.StringIO()
 
 try:
 os.makedirs(output_dir)
diff --git a/scripts/qapi-types.py b/scripts/qapi-types.py
index ae644bc..b56225b 100644
--- a/scripts/qapi-types.py
+++ b/scripts/qapi-types.py
@@ -183,9 +183,9 @@ for o, a in opts:
 elif o in ("-o", "--output-dir"):
 output_dir = a + "/"
 elif o in ("-c", "--source"):
-do_h = True
-elif o in ("-h", "--header"):
 do_c = True
+elif o in ("-h", "--header"):
+do_h = True
 
 if not do_c and not do_h:
 do_c = True
@@ -201,13 +201,11 @@ except os.error, e:
 raise
 
 def maybe_open(really, name, opt):
-class Null(object):
-def write(self, str):
-pass
-def read(self):
-return ''
 if really:
 return open(name, opt)
+else:
+import StringIO
+return StringIO.StringIO()
 
 fdef = maybe_open(do_c, c_file, 'w')
 fdecl = maybe_open(do_h, h_file, 'w')
diff --git a/scripts/qapi-visit.py b/scripts/qapi-visit.py
index e9d0584..5160d83 100644
--- a/scripts/qapi-visit.py
+++ b/scripts/qapi-visit.py
@@ -159,9 +159,9 @@ for o, a in opts:
 elif o in ("-o", "--output-dir"):
 output_dir = a + "/"
 elif o in ("-c", "--source"):
-do_h = True
-elif o in ("-h", "--header"):
 do_c = True
+elif o in ("-h", "--header"):
+do_h = True
 
 if not do_c and not do_h:
 do_c = True
@@ -177,13 +177,11 @@ except os.error, e:
 raise
 
 def maybe_open(really, name, opt):
-class Null(object):
-def write(self, str):
-pass
-def read(self):
-return ''
 if really:
 return open(name, opt)
+else:
+import StringIO
+return StringIO.StringIO()
 
 fdef = maybe_open(do_c, c_file, 'w')
 fdecl = maybe_open(do_h, h_file, 'w')
-- 
1.7.7.1




Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Avi Kivity
On 12/28/2011 01:12 AM, Xin Tong wrote:
> QEMU does not exit and handle interrupt within translation blocks. it
> only exits after the translation block is finished. Assuming a
> translation block is very long, is it possible that QEMU could have
> exceeded the interrupt's "timing window" and yields unexpected
> behavior.
>
> The reason I ask is that I am searching for alternatives to QEMU
> current way of handling interrupt (unlink translation blocks on
> interrupt). However, an obvious approach - checking for interrupt in
> every basic block,  seems to be too heavy ( too many tb enters/exits
> ). Maybe checking interrupt in a few basic blocks might be better, but
> what is a good measure for the number of basic blocks to execute
> before checking for interrupt ?
>

It's possible to check for an interrupt before every instruction,
without any overhead:

- when a signal arrives, check the instruction pointer. If it points
outside tcg code, set a flag and return.
- consult a table indexed by the instruction pointer, that gives the
number of bytes to the next guest instruction boundary
- if nonzero, set a breakpoint at that boundary, and resume
- remove the breakpoint (if set)
- adjust the TB to return on the current instruction pointer
- return

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [PATCH 3/3] linux-user:Signal handling for MIPS64

2011-12-28 Thread Khansa Butt
On Wed, Dec 14, 2011 at 9:20 PM, Richard Henderson  wrote:
> On 12/07/2011 09:25 PM, kha...@kics.edu.pk wrote:
>> +#if defined(TARGET_MIPS64)
>> +        /* tswapal() do 64 bit swap in case of MIPS64 but
>> +           we need 32 bit swap as sa_flags is 32 bit */
>> +        k->sa_flags = bswap32(act->sa_flags);
>> +#else
>>          k->sa_flags = tswapal(act->sa_flags);
>> +#endif
>
> The condition in syscall_defs.h is TARGET_MIPS, not TARGET_MIPS64.
> They should match, despite the fact that it doesn't actually matter
> for the 32-bit abis.
>

actually sa_flags is 32 bit for MIPS64 but tswapal calls tswap64() as
TARGET_LONG_SIZE != 4
in case of MIPS64( see cpu-all.h) hence sa_flags has wrong value at
the end so I used above hunk

>>  #elif defined(TARGET_ABI_MIPSN64)
>>
>> -# warning signal handling not implemented
>> +struct target_sigcontext {
>> +    uint32_t   sc_regmask;     /* Unused */
>> +    uint32_t   sc_status;
>
> There's no reason to duplicate all this code.  Yes, when someone wrote
> this in the first place, they wrote separate sectons for each mips abi.
> However, as you can see that huge portions of this block are identical,
> this was obviously a mistake.
>
> Start by changing the original section to #elif defined(TARGET_MIPS)
> and see what needs changing specifically for the ABIs.  I'm not even
> sure there are any differences at all.
>
>
> r~



Re: [Qemu-devel] [PATCH 2/3] Add a new PCI region type to supports 64 bit ranges

2011-12-28 Thread Michael S. Tsirkin
On Wed, Dec 28, 2011 at 06:26:05PM +1300, Alexey Korolev wrote:
> This patch adds PCI_REGION_TYPE_PREFMEM_64 region type and modifies types of
> variables to make it possible to work with 64 bit addresses.
> 
> Why I've added just one region type PCI_REGION_TYPE_PREFMEM_64 and haven't
> added PCI_REGION_TYPE_MEM_64? According to PCI architecture
> specification, the
> bridges can describe 64bit ranges for prefetchable type of memory
> only. So it's very
> unlikely that devices exporting 64bit non-prefetchable BARs.

Might happen for system devices I guess.

> Anyway
> this code will work
> with 64bit non-prefetchable BARs unless the PCI device is not behind
> the secondary bus.

So what happens if such a device is on root bus?

> 
> Signed-off-by: Alexey Korolev 
> ---
>  src/pci.h |1 -
>  src/pciinit.c |   59
> +---
>  2 files changed, 35 insertions(+), 25 deletions(-)
> 
> diff --git a/src/pci.h b/src/pci.h
> index a2a5a4c..71f15fe 100644
> --- a/src/pci.h
> +++ b/src/pci.h
> @@ -54,7 +54,6 @@ struct pci_device {
>  struct {
>  u32 addr;
>  u32 size;
> -int is64;
>  } bars[PCI_NUM_REGIONS];
> 
>  // Local information on device.
> diff --git a/src/pciinit.c b/src/pciinit.c
> index 7d83368..a574e38 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -22,6 +22,7 @@ enum pci_region_type {
>  PCI_REGION_TYPE_IO,
>  PCI_REGION_TYPE_MEM,
>  PCI_REGION_TYPE_PREFMEM,
> +PCI_REGION_TYPE_PREFMEM_64,
>  PCI_REGION_TYPE_COUNT,
>  };
> 
> @@ -29,18 +30,20 @@ static const char *region_type_name[] = {
>  [ PCI_REGION_TYPE_IO ]  = "io",
>  [ PCI_REGION_TYPE_MEM ] = "mem",
>  [ PCI_REGION_TYPE_PREFMEM ] = "prefmem",
> +[ PCI_REGION_TYPE_PREFMEM_64 ] = "prefmem64",
>  };
> 
>  struct pci_bus {
>  struct {
> -/* pci region stats */
> -u32 count[32 - PCI_MEM_INDEX_SHIFT];
> -u32 sum, max;
>  /* seconday bus region sizes */
>  u32 size;
> -/* pci region assignments */
> -u32 bases[32 - PCI_MEM_INDEX_SHIFT];
> -u32 base;
> +/* pci region stats */
> +u32 max;
> +u32 count[32 - PCI_MEM_INDEX_SHIFT];
> +s64 sum;
> + /* pci region assignments */
> +s64 base;
> +s64 bases[32 - PCI_MEM_INDEX_SHIFT];
>  } r[PCI_REGION_TYPE_COUNT];
>  struct pci_device *bus_dev;
>  };
> @@ -69,6 +72,8 @@ static enum pci_region_type pci_addr_to_type(u32 addr)
>  {
>  if (addr & PCI_BASE_ADDRESS_SPACE_IO)
>  return PCI_REGION_TYPE_IO;
> +if (addr & PCI_BASE_ADDRESS_MEM_TYPE_64)
> +return PCI_REGION_TYPE_PREFMEM_64;
>  if (addr & PCI_BASE_ADDRESS_MEM_PREFETCH)
>  return PCI_REGION_TYPE_PREFMEM;
>  return PCI_REGION_TYPE_MEM;
> @@ -378,19 +383,16 @@ static void pci_bios_check_devices(struct
> pci_bus *busses)
>  struct pci_bus *bus = &busses[pci_bdf_to_bus(pci->bdf)];
>  int i;
>  for (i = 0; i < PCI_NUM_REGIONS; i++) {
> -u32 val, size;
> +u32 val, size, type;
>  pci_bios_get_bar(pci, i, &val, &size);
>  if (val == 0)
>  continue;
> 
> -pci_bios_bus_reserve(bus, pci_addr_to_type(val), size);
> +type = pci_addr_to_type(val);
> +pci_bios_bus_reserve(bus, type, size);
>  pci->bars[i].addr = val;
>  pci->bars[i].size = size;
> -pci->bars[i].is64 = (!(val & PCI_BASE_ADDRESS_SPACE_IO) &&
> - (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
> - == PCI_BASE_ADDRESS_MEM_TYPE_64);
> -
> -if (pci->bars[i].is64)
> +if (type == PCI_REGION_TYPE_PREFMEM_64)
>  i++;
>  }
>  }
> @@ -426,6 +428,7 @@ static void pci_bios_check_devices(struct
> pci_bus *busses)
>  static int pci_bios_init_root_regions(struct pci_bus *bus, u32
> start, u32 end)
>  {
>  bus->r[PCI_REGION_TYPE_IO].base = 0xc000;
> +bus->r[PCI_REGION_TYPE_PREFMEM_64].base = BUILD_PCIMEM_64_START;
> 
>  int reg1 = PCI_REGION_TYPE_PREFMEM, reg2 = PCI_REGION_TYPE_MEM;
>  if (bus->r[reg1].sum < bus->r[reg2].sum) {
> @@ -449,29 +452,34 @@ static int pci_bios_init_root_regions(struct
> pci_bus *bus, u32 start, u32 end)
> 
>  static void pci_bios_init_bus_bases(struct pci_bus *bus)
>  {
> -u32 base, newbase, size;
> +u32 size;
> +s64 base, newbase;
>  int type, i;
> 
>  for (type = 0; type < PCI_REGION_TYPE_COUNT; type++) {
> -dprintf(1, "  type %s max %x sum %x base %x\n",
> region_type_name[type],
> -bus->r[type].max, bus->r[type].sum, bus->r[type].base);
> +dprintf(1, "  type %s max %x sum 0x%08x%08x base %08x%08x\n",
> +region_type_name[type], bus->r[type].max,
> +(u32)(bus->r[type].sum>>32), (u32)bus->r[type].sum,
> +(u32)(bus->

Re: [Qemu-devel] DMA active hw_error

2011-12-28 Thread Peter Maydell
On 28 December 2011 06:44, Richard Cole  wrote:
> I'll have to learn quite a bit more about QEMU and arm before I'll be
> able to contribute back any patches. I bought a beagle board today so
> that at least is a start, being able to compare QEMU to some real
> hardware.

There's no beagle board model in qemu at the moment (there is one
in qemu-linaro but it's taking me a long time to get the patches
cleaned up to submit upstream).

> Is there an issues list for QEMU.

There's https://bugs.launchpad.net/qemu, although it mostly contains
user-reported bugs rather than "things the developer community
knows are broken" issues.

> Does anyone use github features like
> being able to comment on particular sources lines or create issues
> refer to specific lines of code? That would seem a good way to track
> bugs discovered in the code.

The github mirror is a very recent development. I'm not sure
we want to have two different bug trackers on two different sites...

> A more broad question. Does anyone know why linux doesn't use the DMA?
> Is the DMA really old school (I grew up with an Amiga that was full of
> them), or is it just for portability, i.e that DMA differ from
> platform to platform too much?

The bugs we've been talking about are specific to the PL180 (which
is used on the ARM dev boards like vexpress, realview, etc).
Since these are only dev boards, I think nobody has ever got round
to implementing the DMA support in Linux for them. Other platforms
have different DMA setups and get more attention because there is
more interest in tuning the performance on those platforms. (eg OMAP
has its own DMA infrastructure, which is modelled in QEMU, because
the kernel uses it.)

-- PMM



Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Peter Maydell
On 28 December 2011 10:42, Avi Kivity  wrote:
> It's possible to check for an interrupt before every instruction,
> without any overhead:
>
> - when a signal arrives, check the instruction pointer. If it points
> outside tcg code, set a flag and return.
> - consult a table indexed by the instruction pointer, that gives the
> number of bytes to the next guest instruction boundary
> - if nonzero, set a breakpoint at that boundary, and resume
> - remove the breakpoint (if set)
> - adjust the TB to return on the current instruction pointer
> - return

This assumes you have hardware breakpoints on your host, so
it's not portable.

(You also need to add a check-and-handle-flag for every return
from a helper function to TCG code, and of course you need to
actually create the instruction-boundary table. These are both
overheads.)

-- PMM



Re: [Qemu-devel] [PATCH 3/3] Changes related to secondary buses and 64bit regions

2011-12-28 Thread Michael S. Tsirkin
On Wed, Dec 28, 2011 at 06:35:55PM +1300, Alexey Korolev wrote:
> All devices behind a bridge need to have all their regions consecutive and
> not overlapping with all the normal memory ranges.
> Since prefetchable memory is described by one record, we must avoid the 
> situations
> when 32bit and 64bit prefetchable regions are present within one secondary 
> bus.

How do we avoid this? Assume we have two devices:
a 32 bit and a 64 bit one, behind a bridge.
There are two main things we can do:
1. Make the 64 bit device only use the low 32 bit
2. Put the 32 bit one in the non-prefetcheable range

1 probably makes more sense for small BARs
2 probably makes more sense for large ones

Try also looking at e.g. linux bus scanning code for more ideas.

Another thing I don't see addressed here is that support for 64 bit
ranges is I think optional in the bridge.

> 
> Signed-off-by: Alexey Korolev

Whitespace is corrupted: checkyour mail setup?
There should be spaces around operators:
a < b, I see a< b. Sometimes a<  b (two spaces after).

> ---
>  src/pciinit.c |   69 +++-
>  1 files changed, 48 insertions(+), 21 deletions(-)
> 
> diff --git a/src/pciinit.c b/src/pciinit.c
> index a574e38..92942d5 100644
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -17,6 +17,7 @@
> 
>  #define PCI_BRIDGE_IO_MIN  0x1000
>  #define PCI_BRIDGE_MEM_MIN   0x10
> +#define PCI_BRIDGE_MEM_MAX   0x8000
> 
>  enum pci_region_type {
>  PCI_REGION_TYPE_IO,
> @@ -45,6 +46,7 @@ struct pci_bus {
>  s64 base;
>  s64 bases[32 - PCI_MEM_INDEX_SHIFT];
>  } r[PCI_REGION_TYPE_COUNT];
> +int is64;
>  struct pci_device *bus_dev;
>  };
> 
> @@ -369,6 +371,26 @@ static void pci_bios_bus_reserve(struct pci_bus *bus, 
> int type, u32 size)
>  bus->r[type].max = size;
>  }
> 
> +static void pci_bios_secondary_bus_reserve(struct pci_bus *parent,
> +   struct pci_bus *s, int type)
> +{
> +u32 limit = (type == PCI_REGION_TYPE_IO) ?
> +PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
> +
> +if (s->r[type].sum>  PCI_BRIDGE_MEM_MAX) {
> +panic("Size: %08x%08x is too big\n",
> +(u32)s->r[type].sum, (u32)(s->r[type].sum>>32));
> +}
> +s->r[type].size = (u32)s->r[type].sum;
> +if (s->r[type].size<  limit)
> +s->r[type].size = limit;
> +s->r[type].size = pci_size_roundup(s->r[type].size);
> +
> +pci_bios_bus_reserve(parent, type, s->r[type].size);
> +dprintf(1, "size: %x, type %s\n",
> +s->r[type].size, region_type_name[type]);
> +}
> +
>  static void pci_bios_check_devices(struct pci_bus *busses)
>  {
>  dprintf(1, "PCI: check devices\n");
> @@ -392,8 +414,10 @@ static void pci_bios_check_devices(struct pci_bus 
> *busses)
>  pci_bios_bus_reserve(bus, type, size);
>  pci->bars[i].addr = val;
>  pci->bars[i].size = size;
> -if (type == PCI_REGION_TYPE_PREFMEM_64)
> +if (type == PCI_REGION_TYPE_PREFMEM_64) {
> +bus->is64 = 1;
>  i++;
> +}
>  }
>  }
> 
> @@ -404,22 +428,21 @@ static void pci_bios_check_devices(struct pci_bus 
> *busses)
>  if (!s->bus_dev)
>  continue;
>  struct pci_bus *parent =&busses[pci_bdf_to_bus(s->bus_dev->bdf)];
> +
> +if (s->r[PCI_REGION_TYPE_PREFMEM_64].sum&&

Space before && here and elsewhere.

> +   s->r[PCI_REGION_TYPE_PREFMEM].sum) {
> +   panic("Sparse PCI prefmem regions on the bus %d\n", 
> secondary_bus);
> +}
> +
> +dprintf(1, "PCI: secondary bus %d\n", secondary_bus);
>  int type;
>  for (type = 0; type<  PCI_REGION_TYPE_COUNT; type++) {
> -u32 limit = (type == PCI_REGION_TYPE_IO) ?
> -PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
> -s->r[type].size = s->r[type].sum;
> -if (s->r[type].size<  limit)
> -s->r[type].size = limit;
> -s->r[type].size = pci_size_roundup(s->r[type].size);
> -pci_bios_bus_reserve(parent, type, s->r[type].size);
> -}
> -dprintf(1, "PCI: secondary bus %d sizes: io %x, mem %x, prefmem 
> %x\n",
> -secondary_bus,
> -s->r[PCI_REGION_TYPE_IO].size,
> -s->r[PCI_REGION_TYPE_MEM].size,
> -s->r[PCI_REGION_TYPE_PREFMEM].size);
> -}
> +if ((type == PCI_REGION_TYPE_PREFMEM_64&&  !s->is64) ||
> +(type == PCI_REGION_TYPE_PREFMEM&&  s->is64))
> +continue;

Can't figure this out. What does this do?

> +pci_bios_secondary_bus_reserve(parent, s, type);
> +   }
> +   }
>  }
> 
>  #define ROOT_BASE(top, sum, max) ALIGN_DOWN((top)-(sum),(max) ?: 1)
> @@ -507,14 +530,17 @@ static void pci_bios_map_devices(struct pci_bus *busses)
>  struct pci_bus *parent =&busses

Re: [Qemu-devel] [Seabios] [PATCH 0/3] 64bit PCI BARs allocations

2011-12-28 Thread Michael S. Tsirkin
On Wed, Dec 28, 2011 at 05:41:20PM +1300, Alexey Korolev wrote:
> Hi,
> 
> There were a number of requests about support of 64bit PCI BAR allocations.
> 
> Also we have observed the issue on guests with older linux version
> (2.6.18): if we
> have a 64bit BAR allocated within first 4GB, the OS may hang during
> start process.
> (I guess it is an OS bug, but we need to take care of this).
> 
> This patch addresses these two issues and allows 64bit BARs to be
> allocated in ranges
> above 4GB.
> Patch consists of three parts:
> 1. Add new range above 4GB in _CRS table to let Windows 2008 work
> properly. Thanks
> a lot to Michael S. Triskin for this brilliant idea.
> 2. Added new PCI_REGION_TYPE_PREFMEM_64 region type in pciinit and changed
> types of variables.
> 3. Take care about PCI devices with 64bit BARs on secondary buses.
> 
> Patches have been tested on several configurations which includes
> linux 2.6.18 - 3.0 &
> windows 2008. Everything works quite well.

Which qemu version did you use?



Re: [Qemu-devel] [PATCH v4 2/7] arm: Set frequencies for arm_timer

2011-12-28 Thread Andreas Färber
Am 28.12.2011 02:24, schrieb Mark Langsdorf:
> Use qdev properties to allow board modelers to set the frequencies
> for the sp804 timer. Each of the sp804's timers can have an
> individual frequency. The timers default to 1MHz.
> 
> Signed-off-by: Mark Langsdorf 
> Reviewed-by: Peter Maydell 

Reviewed-by: Andreas Färber 

Thanks,
Andreas

> ---
> Changes from v3
>   None
> Changes from v2
> Comment correctly describes behavior of properties
> freqX variables are defined as uint32_t, not int
> Changes from v1
> Simplified multiple timer frequency handling
> Removed the shared default
> 
>  hw/arm_timer.c |   24 +++-
>  1 files changed, 19 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/arm_timer.c b/hw/arm_timer.c
> index 0a5b9d2..60e1c63 100644
> --- a/hw/arm_timer.c
> +++ b/hw/arm_timer.c
> @@ -9,6 +9,8 @@
>  
>  #include "sysbus.h"
>  #include "qemu-timer.h"
> +#include "qemu-common.h"
> +#include "qdev.h"
>  
>  /* Common timer implementation.  */
>  
> @@ -178,6 +180,7 @@ typedef struct {
>  SysBusDevice busdev;
>  MemoryRegion iomem;
>  arm_timer_state *timer[2];
> +uint32_t freq0, freq1;
>  int level[2];
>  qemu_irq irq;
>  } sp804_state;
> @@ -269,10 +272,11 @@ static int sp804_init(SysBusDevice *dev)
>  
>  qi = qemu_allocate_irqs(sp804_set_irq, s, 2);
>  sysbus_init_irq(dev, &s->irq);
> -/* ??? The timers are actually configurable between 32kHz and 1MHz, but
> -   we don't implement that.  */
> -s->timer[0] = arm_timer_init(100);
> -s->timer[1] = arm_timer_init(100);
> +/* The timers are configurable between 32kHz and 1MHz
> + * defaulting to 1MHz but overrideable as individual properties */
> +s->timer[0] = arm_timer_init(s->freq0);
> +s->timer[1] = arm_timer_init(s->freq1);
> +
>  s->timer[0]->irq = qi[0];
>  s->timer[1]->irq = qi[1];
>  memory_region_init_io(&s->iomem, &sp804_ops, s, "sp804", 0x1000);
> @@ -281,6 +285,16 @@ static int sp804_init(SysBusDevice *dev)
>  return 0;
>  }
>  
> +static SysBusDeviceInfo sp804_info = {
> +.init = sp804_init,
> +.qdev.name = "sp804",
> +.qdev.size = sizeof(sp804_state),
> +.qdev.props = (Property[]) {
> +DEFINE_PROP_UINT32("freq0", sp804_state, freq0, 100),
> +DEFINE_PROP_UINT32("freq1", sp804_state, freq1, 100),
> +DEFINE_PROP_END_OF_LIST(),
> +}
> +};
>  
>  /* Integrator/CP timer module.  */
>  
> @@ -349,7 +363,7 @@ static int icp_pit_init(SysBusDevice *dev)
>  static void arm_timer_register_devices(void)
>  {
>  sysbus_register_dev("integrator_pit", sizeof(icp_pit_state), 
> icp_pit_init);
> -sysbus_register_dev("sp804", sizeof(sp804_state), sp804_init);
> +sysbus_register_withprop(&sp804_info);
>  }
>  
>  device_init(arm_timer_register_devices)


-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg



Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Avi Kivity
On 12/28/2011 01:40 PM, Peter Maydell wrote:
> On 28 December 2011 10:42, Avi Kivity  wrote:
> > It's possible to check for an interrupt before every instruction,
> > without any overhead:
> >
> > - when a signal arrives, check the instruction pointer. If it points
> > outside tcg code, set a flag and return.
> > - consult a table indexed by the instruction pointer, that gives the
> > number of bytes to the next guest instruction boundary
> > - if nonzero, set a breakpoint at that boundary, and resume
> > - remove the breakpoint (if set)
> > - adjust the TB to return on the current instruction pointer
> > - return
>
> This assumes you have hardware breakpoints on your host, so
> it's not portable.

You could also use software breakpoints.  Or just temporarily replace
the host instruction on the next guest instruction boundary with a return.

> (You also need to add a check-and-handle-flag for every return
> from a helper function to TCG code, 

ah yes - didn't consider that.

you could put all helper in their own section, an do something around
that - but that assumes no callouts from helpers to the standard library.

> and of course you need to
> actually create the instruction-boundary table. 

This should be well amortized.

> These are both
> overheads.)

-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH V2 0/3] Improve SD controllers emulation

2011-12-28 Thread Mitsyanko Igor
Changelog v1->v2:
 PATCH 1/3: 
 - .calc_size field replaced with .get_bufsize field in VMStateField;
 - .size_offset removed completely, macros based on it rewritten to use
 new .get_bufsize field.
 PATCH 2/3:
 - all binary variables in SDState now have bool type;
 - SDState structure rearranged to ensure alignement;
 - sd_init(), sd_enable() now receive bool;
 - new version of PATCH 1/3 now used to save wp_groups array;
 PATCH 3/3:
 - DMA transfers modified and now operate with data blocks of
 BLKSIZE only, like real hardware does;
 - reset procedure optimized;
 - new version of PATCH 1/3 now used to save fifo_buffer.

First patch of this patch set modifies existing VMStateField interface
to ease save/restore functionality implementation for device's dynamically
allocated buffers. This modification is used later in second patch to
implement SD card's VMStateDescription structure.
Third patch adds imlementation of new device: SD host controller fully
compliant with "SD host controller specification version 2.00". It also
uses first patch modifications.


Mitsyanko Igor (3):
  vmstate: introduce get_bufsize entry in VMStateField
  hw/sd.c: add SD card save/load support
  hw: Introduce spec. ver. 2.00 compliant   SD host controller

 Makefile.target |1 +
 hw/g364fb.c |7 +-
 hw/hw.h |   41 +--
 hw/m48t59.c |7 +-
 hw/mac_nvram.c  |8 +-
 hw/onenand.c|7 +-
 hw/sd.c |  130 +++--
 hw/sd.h |4 +-
 hw/sdhc_ver2.c  | 1569 +++
 hw/sdhc_ver2.h  |  327 
 savevm.c|   10 +-
 11 files changed, 2020 insertions(+), 91 deletions(-)
 create mode 100644 hw/sdhc_ver2.c
 create mode 100644 hw/sdhc_ver2.h

-- 
1.7.4.1




[Qemu-devel] [PATCH V2 1/3] vmstate: introduce get_bufsize entry in VMStateField

2011-12-28 Thread Mitsyanko Igor
New get_bufsize field in VMStateField is supposed to help us easily add 
save/restore
support of dynamically allocated buffers in device's states.
There are some cases when information about size of dynamically allocated 
buffer is
already presented in specific device's state structure, but in such a form that
can not be used with existing VMStateField interface. Currently, we either can 
get size from
another variable in device's state as it is with VMSTATE_VBUFFER_* macros, or 
we can
also multiply value kept in a variable by a constant with 
VMSTATE_BUFFER_MULTIPLY
macro. If we need to perform any other action, we're forced to add additional
variable with size information to device state structure with the only intention
to use it in VMStateDescription. This approach is not very pretty. Adding extra
flags to VMStateFlags enum for every other possible operation with size field
seems redundant, and still it would't cover cases when we need to perform a set 
of
operations to get size value.
With get_bufsize callback we can calculate size of dynamic array in whichever
way we need. We don't need .size_offset field anymore, so we can remove it from
VMState Field structure to compensate for extra memory consuption because of
get_bufsize addition. Macros VMSTATE_VBUFFER* are modified to use new callback
instead of .size_offset. Macro VMSTATE_BUFFER_MULTIPLY and VMFlag VMS_MULTIPLY
are removed completely as they are now redundant.

Signed-off-by: Mitsyanko Igor 
---
 hw/g364fb.c|7 ++-
 hw/hw.h|   41 +++--
 hw/m48t59.c|7 ++-
 hw/mac_nvram.c |8 +++-
 hw/onenand.c   |7 ++-
 savevm.c   |   10 ++
 6 files changed, 34 insertions(+), 46 deletions(-)

diff --git a/hw/g364fb.c b/hw/g364fb.c
index 34fb08c..1ab36c2 100644
--- a/hw/g364fb.c
+++ b/hw/g364fb.c
@@ -495,6 +495,11 @@ static int g364fb_post_load(void *opaque, int version_id)
 return 0;
 }
 
+static int g364fb_get_vramsize(void *opaque, int version_id)
+{
+return ((G364State *)opaque)->vram_size;
+}
+
 static const VMStateDescription vmstate_g364fb = {
 .name = "g364fb",
 .version_id = 1,
@@ -502,7 +507,7 @@ static const VMStateDescription vmstate_g364fb = {
 .minimum_version_id_old = 1,
 .post_load = g364fb_post_load,
 .fields = (VMStateField[]) {
-VMSTATE_VBUFFER_UINT32(vram, G364State, 1, NULL, 0, vram_size),
+VMSTATE_VBUFFER(vram, G364State, 1, NULL, 0, g364fb_get_vramsize),
 VMSTATE_BUFFER_UNSAFE(color_palette, G364State, 0, 256 * 3),
 VMSTATE_BUFFER_UNSAFE(cursor_palette, G364State, 0, 9),
 VMSTATE_UINT16_ARRAY(cursor, G364State, 512),
diff --git a/hw/hw.h b/hw/hw.h
index efa04d1..a2a43b6 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -303,7 +303,6 @@ enum VMStateFlags {
 VMS_ARRAY_OF_POINTER = 0x040,
 VMS_VARRAY_UINT16= 0x080,  /* Array with size in uint16_t field */
 VMS_VBUFFER  = 0x100,  /* Buffer with size in int32_t field */
-VMS_MULTIPLY = 0x200,  /* multiply "size" field by field_size */
 VMS_VARRAY_UINT8 = 0x400,  /* Array with size in uint8_t field*/
 VMS_VARRAY_UINT32= 0x800,  /* Array with size in uint32_t field*/
 };
@@ -315,12 +314,12 @@ typedef struct {
 size_t start;
 int num;
 size_t num_offset;
-size_t size_offset;
 const VMStateInfo *info;
 enum VMStateFlags flags;
 const VMStateDescription *vmsd;
 int version_id;
 bool (*field_exists)(void *opaque, int version_id);
+int (*get_bufsize)(void *opaque, int version_id);
 } VMStateField;
 
 typedef struct VMStateSubsection {
@@ -584,34 +583,11 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 .offset   = vmstate_offset_buffer(_state, _field) + _start,  \
 }
 
-#define VMSTATE_BUFFER_MULTIPLY(_field, _state, _version, _test, _start, 
_field_size, _multiply) { \
+#define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _get_bufsize) 
{ \
 .name = (stringify(_field)), \
 .version_id   = (_version),  \
 .field_exists = (_test), \
-.size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\
-.size = (_multiply),  \
-.info = &vmstate_info_buffer,\
-.flags= VMS_VBUFFER|VMS_MULTIPLY,\
-.offset   = offsetof(_state, _field),\
-.start= (_start),\
-}
-
-#define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _field_size) 
{ \
-.name = (stringify(_field)), \
-.version_id   = (_version),  \
-.field_exists = (_test), \
-.size_offset  = vmstate_offset_value(_state, _field

[Qemu-devel] [PATCH V2 2/3] hw/sd.c: add SD card save/load support

2011-12-28 Thread Mitsyanko Igor
We couldn't properly implement save/restore functionality of SD host controllers
states without SD card's state VMStateDescription implementation. This patch
updates SD card emulation to support save/load of card's state. Update requires
changing of data type of several variables in SDState. Variables order 
rearranged
to ensure proper data alignment in SDState structure.
For consistency, because several variables now have bool datatype, API was 
modified
to use bool as well, 0 was changed to 'false' and 1 was changed to 'true' in 
those
places where it was appropriate.

Signed-off-by: Mitsyanko Igor 
---
 hw/sd.c |  130 ++-
 hw/sd.h |4 +-
 2 files changed, 89 insertions(+), 45 deletions(-)

diff --git a/hw/sd.c b/hw/sd.c
index 07eb263..4b5b538 100644
--- a/hw/sd.c
+++ b/hw/sd.c
@@ -54,49 +54,53 @@ typedef enum {
 sd_illegal = -2,
 } sd_rsp_type_t;
 
+enum {
+sd_inactive,
+sd_card_identification_mode,
+sd_data_transfer_mode,
+};
+
+enum {
+sd_inactive_state = -1,
+sd_idle_state = 0,
+sd_ready_state,
+sd_identification_state,
+sd_standby_state,
+sd_transfer_state,
+sd_sendingdata_state,
+sd_receivingdata_state,
+sd_programming_state,
+sd_disconnect_state,
+};
+
 struct SDState {
-enum {
-sd_inactive,
-sd_card_identification_mode,
-sd_data_transfer_mode,
-} mode;
-enum {
-sd_inactive_state = -1,
-sd_idle_state = 0,
-sd_ready_state,
-sd_identification_state,
-sd_standby_state,
-sd_transfer_state,
-sd_sendingdata_state,
-sd_receivingdata_state,
-sd_programming_state,
-sd_disconnect_state,
-} state;
+int32_t state;
 uint32_t ocr;
 uint8_t scr[8];
 uint8_t cid[16];
 uint8_t csd[16];
-uint16_t rca;
 uint32_t card_status;
 uint8_t sd_status[64];
 uint32_t vhs;
-int wp_switch;
-int *wp_groups;
 uint64_t size;
-int blk_len;
+uint32_t blk_len;
 uint32_t erase_start;
 uint32_t erase_end;
 uint8_t pwd[16];
-int pwd_len;
-int function_group[6];
-
-int spi;
-int current_cmd;
+uint32_t pwd_len;
+uint8_t function_group[6];
+uint8_t mode;
+uint8_t current_cmd;
 /* True if we will handle the next command as an ACMD. Note that this does
  * *not* track the APP_CMD status bit!
  */
-int expecting_acmd;
-int blk_written;
+bool expecting_acmd;
+bool spi;
+bool enable;
+bool wp_switch;
+bool *wp_groups;
+
+uint32_t blk_written;
 uint64_t data_start;
 uint32_t data_offset;
 uint8_t data[512];
@@ -104,8 +108,7 @@ struct SDState {
 qemu_irq inserted_cb;
 BlockDriverState *bdrv;
 uint8_t *buf;
-
-int enable;
+uint16_t rca;
 };
 
 static void sd_set_mode(SDState *sd)
@@ -415,14 +418,14 @@ static void sd_reset(SDState *sd, BlockDriverState *bdrv)
 if (sd->wp_groups)
 g_free(sd->wp_groups);
 sd->wp_switch = bdrv ? bdrv_is_read_only(bdrv) : 0;
-sd->wp_groups = (int *) g_malloc0(sizeof(int) * sect);
-memset(sd->function_group, 0, sizeof(int) * 6);
+sd->wp_groups = (bool *)g_malloc0(sizeof(bool) * sect);
+memset(sd->function_group, 0, sizeof(sd->function_group));
 sd->erase_start = 0;
 sd->erase_end = 0;
 sd->size = size;
 sd->blk_len = 0x200;
 sd->pwd_len = 0;
-sd->expecting_acmd = 0;
+sd->expecting_acmd = false;
 }
 
 static void sd_cardchange(void *opaque, bool load)
@@ -440,23 +443,64 @@ static const BlockDevOps sd_block_ops = {
 .change_media_cb = sd_cardchange,
 };
 
+
+static int sd_get_wpgroups_size(void *opaque, int version_id)
+{
+SDState *sd = (SDState *)opaque;
+return sizeof(bool) * (sd->size >> (HWBLOCK_SHIFT + SECTOR_SHIFT +
+WPGROUP_SHIFT));
+}
+
+static const VMStateDescription sd_vmstate = {
+.name = "sd_card",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_INT32(state, SDState),
+VMSTATE_UINT8_ARRAY(cid, SDState, 16),
+VMSTATE_UINT8_ARRAY(csd, SDState, 16),
+VMSTATE_UINT32(card_status, SDState),
+VMSTATE_PARTIAL_BUFFER(sd_status, SDState, 1),
+VMSTATE_UINT32(vhs, SDState),
+VMSTATE_UINT32(blk_len, SDState),
+VMSTATE_UINT32(erase_start, SDState),
+VMSTATE_UINT32(erase_end, SDState),
+VMSTATE_UINT8_ARRAY(pwd, SDState, 16),
+VMSTATE_UINT32(pwd_len, SDState),
+VMSTATE_UINT8_ARRAY(function_group, SDState, 6),
+VMSTATE_UINT8(mode, SDState),
+VMSTATE_UINT8(current_cmd, SDState),
+VMSTATE_BOOL(expecting_acmd, SDState),
+VMSTATE_BOOL(enable, SDState),
+VMSTATE_VBUFFER(wp_groups, SDState, 1, NULL, 0, sd_get_wpgroups_size),
+VMSTATE_UINT32(blk_written, SDState),
+VMSTATE_UINT64(data_start, SDState),
+VMSTATE_UINT32(data_offset, 

Re: [Qemu-devel] [PATCH V2 2/3] hw/sd.c: add SD card save/load support

2011-12-28 Thread Peter Maydell
On 28 December 2011 12:08, Mitsyanko Igor  wrote:
> We couldn't properly implement save/restore functionality of SD host 
> controllers
> states without SD card's state VMStateDescription implementation. This patch
> updates SD card emulation to support save/load of card's state. Update 
> requires
> changing of data type of several variables in SDState. Variables order 
> rearranged
> to ensure proper data alignment in SDState structure.
> For consistency, because several variables now have bool datatype, API was 
> modified
> to use bool as well, 0 was changed to 'false' and 1 was changed to 'true' in 
> those
> places where it was appropriate.

If you're going to switch things to bool, can you break those out
into separate patches for the individual things you're changing,
please? Otherwise this patch is trying to do too many things
at once.

Also, why should we care particularly about the order of
fields in SDState? There will be at most a handful of copies
of this struct in qemu, costing a handful of bytes in extra
padding. ("ensure proper data alignment" is wrong -- the compiler
does this for us.) If you feel you must rearrange things, again,
please put it in a separate patch so it's easier to read.

> @@ -415,14 +418,14 @@ static void sd_reset(SDState *sd, BlockDriverState 
> *bdrv)
>     if (sd->wp_groups)
>         g_free(sd->wp_groups);
>     sd->wp_switch = bdrv ? bdrv_is_read_only(bdrv) : 0;
> -    sd->wp_groups = (int *) g_malloc0(sizeof(int) * sect);
> -    memset(sd->function_group, 0, sizeof(int) * 6);
> +    sd->wp_groups = (bool *)g_malloc0(sizeof(bool) * sect);

 sd->wp_groups = g_new0(bool, sect);

-- PMM



Re: [Qemu-devel] [PATCH V2 2/3] hw/sd.c: add SD card save/load support

2011-12-28 Thread Mitsyanko Igor

On 12/28/2011 05:26 PM, Peter Maydell wrote:

On 28 December 2011 12:08, Mitsyanko Igor  wrote:

We couldn't properly implement save/restore functionality of SD host controllers
states without SD card's state VMStateDescription implementation. This patch
updates SD card emulation to support save/load of card's state. Update requires
changing of data type of several variables in SDState. Variables order 
rearranged
to ensure proper data alignment in SDState structure.
For consistency, because several variables now have bool datatype, API was 
modified
to use bool as well, 0 was changed to 'false' and 1 was changed to 'true' in 
those
places where it was appropriate.


If you're going to switch things to bool, can you break those out
into separate patches for the individual things you're changing,
please? Otherwise this patch is trying to do too many things
at once.


Sure, I'll split this patch into a few smaller ones, thanks.


Also, why should we care particularly about the order of
fields in SDState? There will be at most a handful of copies
of this struct in qemu, costing a handful of bytes in extra
padding. ("ensure proper data alignment" is wrong -- the compiler
does this for us.) If you feel you must rearrange things, again,
please put it in a separate patch so it's easier to read.


Why not, it wouldn't hurt anyone.

--
Mitsyanko Igor
ASWG, Moscow R&D center, Samsung Electronics
email: i.mitsya...@samsung.com



Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Cleber Rosa

On 12/27/2011 11:37 PM, Anthony Liguori wrote:

On 12/27/2011 04:35 PM, Cleber Rosa wrote:

On 12/26/2011 08:00 PM, Dor Laor wrote:

On 12/26/2011 05:12 PM, Anthony Liguori wrote:

Hi Dor,



Merry Christmas Anthony,


On 12/25/2011 09:19 AM, Dor Laor wrote:

On 12/19/2011 07:13 PM, Anthony Liguori wrote:

Well, I'm still not convinced that a new standalone package should
handle these
cases instead of kvm autotest. I'll be happy to integrate the 
tests to

kvm
autotest anyway and the more the merrier but imho it's a duplicate.


I'm sure kvm autotest could be taught to do exactly what qemu-test is
doing. But why does kvm autotest have to do everything? I doubt there
would be much code reuse.

I think it's not a bad thing to have multiple test suites when there
isn't considerable overlap.


I think the main goal of qemu-tests (may be implicit) is to be quick 
and simple.


qemu-test doesn't have a main goal.  My goal is to improve QEMU's 
quality. qemu-test is just a tool to help achieve that goal.


Maybe I've used the wrong wording. Besides the different approach 
(simpler requirements, initramfs with busybox, etc), it looks to me that 
keeping it simple played an important role in your decision to write 
qemu-tests, which is indeed a sign of good design. But read on...




That is indeed great, but if one thinks that's all we'll ever going 
to need,

that thought is pretty naive.


I don't know who "we" is, but I can tell you that qemu-test is exactly 
what *I* need.  Consider that I spent a good portion of every single 
day testing QEMU with either my own or other people's patches, making 
that job easier and more automated is fairly important to me.


"We" is everyone that contributes one way or another to qemu. So if you 
only concerned with your needs, you're definitely on the right track.


If not, I think (trying to think as a community) it'd be beneficial to 
concentrate all efforts, unless it's not possible to do so. It's as 
simple as that. All my reasoning here revolves around this.




I'm sharing it because I suspect that a lot of other developers have a 
similar need.



And it may be true that there's room for both test
suites... or that, as busy developers, we're refusing to deal with 
the added
complexity (qemu alone accounts for a lot) and delaying to fix the 
fixable. I

believe on the later.

One example: kvm-autotest has a complex configuration file format 
with a steep
learning curve, while a test such as qemu-tests/tests/simple-ping.sh 
would have
to be tweaked if somehow the kernel detects the first ethernet 
interface as em1

(such as recent Fedora systems do). Simple, but not scalable.


I can tell by this comment that you don't actually understand how 
qemu-test works.  Please take a look at it before jumping to 
conclusions about whether it should or shouldn't be part of kvm-autotest.


I can tell you did not grasp my point: kvm autotest is more complex, but 
more capable and flexible. And I did *not* come to a conclusion, I'm 
giving examples in an attempt to enrich the discussion.




Hint: qemu-test always uses the same kernel because it builds it as 
part of the test suite.  The behavior of how a nic test will never 
change unless someone explicitly changes the kernel.


Hopefully you understand by now that I'm giving some reasons of why 
kvm-autotest does some things the way it does (usually more complex), 
and how qemu-tests, because of its approach, does not have to deal with 
that.





1) It builds a custom kernel and initramfs based on busybox. This is
fairly important to ensure that we can run tests with no device
pre-requisites.


This can be done easily w/ autotest too.


The Python requirement inside the guest is true *if* we want to run 
regular
autotest tests inside the guest (see 
autotest/client/virt/tests/autotest.py) and
this accounts for very very little of kvm autotest usage. All other 
tests
interact with the monitor directly and with the guest via 
ssh/telnet/serial.


qemu-test does not require any specific hardware to be used in the 
guest which lets it test a wider variety of scenarios in QEMU.  So you 
cannot assume there is ssh/telnet/serial available.


I was assuming that we could count on at least a serial port in the 
guest. If not, then current kvm-autotest can not absorb the same 
functionality of qemu-tests without re-writing it. A






So, I see no reason for not using a more expressive language,


I seriously doubt you can build a useful initramfs that contains 
python without doing something crazy like livecd-tools does



Actually, kvm-autotest has various layers of abstraction in how QEMU
ends up being launched. As you mention below, those layers are 
there to

allow for things like using libvirt.


Indeed the qemu command line parameters gets generated depending on many
configuration parameters. It'd be *really* simple to add a configuration
parameters that overwrites the qemu command with an static one.


But if you're a QEMU dev

Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Anthony Liguori

On 12/27/2011 11:01 PM, Cleber Rosa wrote:

On 12/27/2011 11:37 PM, Anthony Liguori wrote:

I think the main goal of qemu-tests (may be implicit) is to be quick and simple.


qemu-test doesn't have a main goal. My goal is to improve QEMU's quality.
qemu-test is just a tool to help achieve that goal.


Maybe I've used the wrong wording. I got the feeling that, besides testing qemu
the way you need it, keeping qemu-test simple was really important.


Simple is always important.  In the case of qemu-test, there are some important 
trade offs made to keep things simple and fast.  It doesn't try to work with 
arbitrary guests which means the tests need to handle only one environment.  The 
guest is pre-made to have exactly what is needed for the tests so there is no 
setup required.  The guest is as small as possible such that the test can run as 
quickly as possible.



That is indeed great, but if one thinks that's all we'll ever going to need,
that thought is pretty naive.


I don't know who "we" is, but I can tell you that qemu-test is exactly what
*I* need. Consider that I spent a good portion of every single day testing
QEMU with either my own or other people's patches, making that job easier and
more automated is fairly important to me.


"We" is everyone that somehow contributes to QEMU, that is, the QEMU community.
If you're only concerned about what *you* need, then you're on the right track.
If, besides that, you feel it'd be nice to *try to* concentrate our efforts,
then we're all on the same track.


There is no need to have a single tool that meets every possible need.  In fact, 
the Unix tradition is to have separate single purposed tools.


Having two test tools it not a bad thing provided that the overlap isn't 
significant.  We shouldn't be discussing whether it's possible to merge the two 
tools, but rather what the technical benefits of doing so would be.


Since at this point, there is almost no overlap between the two, I don't see any 
actual technical benefit to merging them.  I see benefit to autotest executing 
qemu-test, of course.



I'm sharing it because I suspect that a lot of other developers have a similar
need.


And it may be true that there's room for both test
suites... or that, as busy developers, we're refusing to deal with the added
complexity (qemu alone accounts for a lot) and delaying to fix the fixable. I
believe on the later.

One example: kvm-autotest has a complex configuration file format with a steep
learning curve, while a test such as qemu-tests/tests/simple-ping.sh would have
to be tweaked if somehow the kernel detects the first ethernet interface as em1
(such as recent Fedora systems do). Simple, but not scalable.


I can tell by this comment that you don't actually understand how qemu-test
works. Please take a look at it before jumping to conclusions about whether it
should or shouldn't be part of kvm-autotest.

Hint: qemu-test always uses the same kernel because it builds it as part of
the test suite. The behavior of how a nic test will never change unless
someone explicitly changes the kernel.


I can tell you did not get my point: I'm giving some reasons of why current kvm
autotest is somehow complex, and how qemu-tests gets away and keeps it simple.


You're claiming that "we're refusing to deal with the added complexity (qemu 
alone accounts for a lot) and delaying to fix the fixable".


There is no way that qemu-test would ever need to deal with how Fedora 
configures udev to name ethernet devices without becoming something totally 
different than it is.  So there's no "delaying to fix the fixable" here.


qemu-test makes a simplifying assumption.  By restricting the guest to a fixed 
environment (initramfs w/busybox), things inherently become much, much simpler.


Of course, this is not an adequate assumption to make if it were our only test 
tool but fortunately, we have an existing test suite that does a very good job 
at testing a wide variety of guests :-)




The Python requirement inside the guest is true *if* we want to run regular
autotest tests inside the guest (see autotest/client/virt/tests/autotest.py) and
this accounts for very very little of kvm autotest usage. All other tests
interact with the monitor directly and with the guest via ssh/telnet/serial.


qemu-test does not require any specific hardware to be used in the guest which
lets it test a wider variety of scenarios in QEMU. So you cannot assume there
is ssh/telnet/serial available.


I really thought we could rely on, at least, a serial connection. If not, then
indeed the current kvm autotest approach is not compatible with that test
environment. That is not to say that kvm autotest couldn't incorporate the
qemu-tests approach/functionality.


With the scope that qemu-test has, it cannot assume any hardware is present 
because it wants to test every piece of hardware.



BTW, I just don't like the idea of having lots of functionalities/tests
implemented on two test suites for a s

Re: [Qemu-devel] [PATCH V2 2/3] hw/sd.c: add SD card save/load support

2011-12-28 Thread Peter Maydell
On 28 December 2011 14:02, Mitsyanko Igor  wrote:
> On 12/28/2011 05:26 PM, Peter Maydell wrote:
>> Also, why should we care particularly about the order of
>> fields in SDState? There will be at most a handful of copies
>> of this struct in qemu, costing a handful of bytes in extra
>> padding. ("ensure proper data alignment" is wrong -- the compiler
>> does this for us.) If you feel you must rearrange things, again,
>> please put it in a separate patch so it's easier to read.
>>
> Why not, it wouldn't hurt anyone.

Basically I think this is pointless microoptimisation. If you
haven't measured something and determined that this is a useful
change because it makes things faster or usefully decreases
memory use by some non-trivial amount, then this kind of change
is a poor use of your time and also mine (as reviewer) and of
whoever eventually commits the patch to git.

-- PMM



Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Christoph Hellwig
On Mon, Dec 19, 2011 at 11:13:24AM -0600, Anthony Liguori wrote:
> Hi,
>
> I've published a set of tests I wrote over the weekend on qemu.org.  My 
> motivations were 1) to prevent regressions like the libguestfs one and 2) 
> to have an easier way to do development testing as I work on QEMU Object 
> Model.
>
> Now before sending the obligatory, "What about using KVM autotest" reply, 
> note that this is significantly different than KVM autotest and really 
> occupies a different use-case.

Different question:  Do you want to merge qemu-iotests into it?




Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Avi Kivity
On 12/28/2011 04:27 PM, Anthony Liguori wrote:
>> Maybe I've used the wrong wording. I got the feeling that, besides
>> testing qemu
>> the way you need it, keeping qemu-test simple was really important.
>
>
> Simple is always important.  In the case of qemu-test, there are some
> important trade offs made to keep things simple and fast.  It doesn't
> try to work with arbitrary guests which means the tests need to handle
> only one environment.  The guest is pre-made to have exactly what is
> needed for the tests so there is no setup required.  The guest is as
> small as possible such that the test can run as quickly as possible.

In fact using linux as a guest negates that.  First of all, which linux
version? if it's fixed, you'll eventually miss functionality and need to
migrate.  If it keeps changing, so does your test, and it will keep
breaking.

Using Linux as a guest allows you to do system tests (ping was an
example) but doesn't allow you to do unit tests (test regression where
if this bit in that register was set, but if them bits in thar registers
were clear, then the packet would be encoded with ROT-26 before being
sent out).

I think qemu-test needs to use its own drivers which allow full control
over what you do with the hardware.  Otherwise, a regression that only
shows up in non-Linux guests will not be testable with qemu-test.

> There is no need to have a single tool that meets every possible
> need.  In fact, the Unix tradition is to have separate single purposed
> tools.

Those single purpose tools, if successful, tend to grow more purposes
(cf. qemu), and if unsuccessful, tend to lose purpose.

> Having two test tools it not a bad thing provided that the overlap
> isn't significant.  

This is important, if the boundary isn't clear, then it will grow more
fuzzy in time.

I suggest the following:

 - qemu-test: qemu unit tests
 - kvm-unit-tests: kvm unit tests
 - kvm-autotest: unit test drivers + system tests

> We shouldn't be discussing whether it's possible to merge the two
> tools, but rather what the technical benefits of doing so would be.
>
> Since at this point, there is almost no overlap between the two, I
> don't see any actual technical benefit to merging them.  I see benefit
> to autotest executing qemu-test, of course.

I'd say that running a ping test is a weak version of kvm-autotest's
system tests.  Running a synthetic test that pokes values into memory
and mmio and sees a packet coming out is a unit test (the latter can in
fact be executed without a guest at all, just a table driving calls to
the memory and irq APIs).


> Just putting the code in kvm-autotest.git in a directory doesn't make
> sense to me.  Beyond the lack of a technical reason to do so,
> logistically, it makes it harder for me to ask people to submit test
> cases with a patch series if I can't actually apply the test case when
> I'm applying the patches to qemu.git.
>
> If qemu-test didn't use large submodules (like linux.git), I would
> have made qemu-test part of qemu.git.  As far as I'm concerned,
> qemu-test.git is just an extension to qemu.git.

Why not just put it in qemu.git?

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Avi Kivity
On 12/28/2011 05:01 PM, Avi Kivity wrote:
> I'd say that running a ping test is a weak version of kvm-autotest's
> system tests.  Running a synthetic test that pokes values into memory
> and mmio and sees a packet coming out is a unit test (the latter can in
> fact be executed without a guest at all, just a table driving calls to
> the memory and irq APIs).
>

Consider
  98d23704138e0
  7b4252e83f6f7d
  f7e80adf3cc4
  c16ada980f43
  4abf12f4ea8

(found by looking for 'fix' in the commit log and filtering out the
commits that don't support my case)

how can you reject such patches on the grounds that they're not
accompanied by unit tests? only by making it easy to add tests for
them.  I think it would be hard/impossible to test them with
linux-as-a-guest, since they fix edge cases that linux doesn't invoke. 
But by having our own driver (often just using qmp to poke at memory),
we can easily generate the sequence that triggers the error.

We'd probably need a library to support setting up a pci device's BARs,
but that's easy with qmp/python integration.  You can even poke a small
executable into memory and execute it directly, if you really need guest
cpu interaction.

-- 
error compiling committee.c: too many arguments to function




[Qemu-devel] [PATCH V3 1/5] vmstate: introduce get_bufsize entry in VMStateField

2011-12-28 Thread Mitsyanko Igor
New get_bufsize field in VMStateField is supposed to help us easily add 
save/restore
support of dynamically allocated buffers in device's states.
There are some cases when information about size of dynamically allocated 
buffer is
already presented in specific device's state structure, but in such a form that
can not be used with existing VMStateField interface. Currently, we either can 
get size from
another variable in device's state as it is with VMSTATE_VBUFFER_* macros, or 
we can
also multiply value kept in a variable by a constant with 
VMSTATE_BUFFER_MULTIPLY
macro. If we need to perform any other action, we're forced to add additional
variable with size information to device state structure with the only intention
to use it in VMStateDescription. This approach is not very pretty. Adding extra
flags to VMStateFlags enum for every other possible operation with size field
seems redundant, and still it would't cover cases when we need to perform a set 
of
operations to get size value.
With get_bufsize callback we can calculate size of dynamic array in whichever
way we need. We don't need .size_offset field anymore, so we can remove it from
VMState Field structure to compensate for extra memory consuption because of
get_bufsize addition. Macros VMSTATE_VBUFFER* are modified to use new callback
instead of .size_offset. Macro VMSTATE_BUFFER_MULTIPLY and VMFlag VMS_MULTIPLY
are removed completely as they are now redundant.

Signed-off-by: Mitsyanko Igor 
---
 hw/g364fb.c|7 ++-
 hw/hw.h|   41 +++--
 hw/m48t59.c|7 ++-
 hw/mac_nvram.c |8 +++-
 hw/onenand.c   |7 ++-
 savevm.c   |   10 ++
 6 files changed, 34 insertions(+), 46 deletions(-)

diff --git a/hw/g364fb.c b/hw/g364fb.c
index 34fb08c..1ab36c2 100644
--- a/hw/g364fb.c
+++ b/hw/g364fb.c
@@ -495,6 +495,11 @@ static int g364fb_post_load(void *opaque, int version_id)
 return 0;
 }
 
+static int g364fb_get_vramsize(void *opaque, int version_id)
+{
+return ((G364State *)opaque)->vram_size;
+}
+
 static const VMStateDescription vmstate_g364fb = {
 .name = "g364fb",
 .version_id = 1,
@@ -502,7 +507,7 @@ static const VMStateDescription vmstate_g364fb = {
 .minimum_version_id_old = 1,
 .post_load = g364fb_post_load,
 .fields = (VMStateField[]) {
-VMSTATE_VBUFFER_UINT32(vram, G364State, 1, NULL, 0, vram_size),
+VMSTATE_VBUFFER(vram, G364State, 1, NULL, 0, g364fb_get_vramsize),
 VMSTATE_BUFFER_UNSAFE(color_palette, G364State, 0, 256 * 3),
 VMSTATE_BUFFER_UNSAFE(cursor_palette, G364State, 0, 9),
 VMSTATE_UINT16_ARRAY(cursor, G364State, 512),
diff --git a/hw/hw.h b/hw/hw.h
index efa04d1..a2a43b6 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -303,7 +303,6 @@ enum VMStateFlags {
 VMS_ARRAY_OF_POINTER = 0x040,
 VMS_VARRAY_UINT16= 0x080,  /* Array with size in uint16_t field */
 VMS_VBUFFER  = 0x100,  /* Buffer with size in int32_t field */
-VMS_MULTIPLY = 0x200,  /* multiply "size" field by field_size */
 VMS_VARRAY_UINT8 = 0x400,  /* Array with size in uint8_t field*/
 VMS_VARRAY_UINT32= 0x800,  /* Array with size in uint32_t field*/
 };
@@ -315,12 +314,12 @@ typedef struct {
 size_t start;
 int num;
 size_t num_offset;
-size_t size_offset;
 const VMStateInfo *info;
 enum VMStateFlags flags;
 const VMStateDescription *vmsd;
 int version_id;
 bool (*field_exists)(void *opaque, int version_id);
+int (*get_bufsize)(void *opaque, int version_id);
 } VMStateField;
 
 typedef struct VMStateSubsection {
@@ -584,34 +583,11 @@ extern const VMStateInfo vmstate_info_unused_buffer;
 .offset   = vmstate_offset_buffer(_state, _field) + _start,  \
 }
 
-#define VMSTATE_BUFFER_MULTIPLY(_field, _state, _version, _test, _start, 
_field_size, _multiply) { \
+#define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _get_bufsize) 
{ \
 .name = (stringify(_field)), \
 .version_id   = (_version),  \
 .field_exists = (_test), \
-.size_offset  = vmstate_offset_value(_state, _field_size, uint32_t),\
-.size = (_multiply),  \
-.info = &vmstate_info_buffer,\
-.flags= VMS_VBUFFER|VMS_MULTIPLY,\
-.offset   = offsetof(_state, _field),\
-.start= (_start),\
-}
-
-#define VMSTATE_VBUFFER(_field, _state, _version, _test, _start, _field_size) 
{ \
-.name = (stringify(_field)), \
-.version_id   = (_version),  \
-.field_exists = (_test), \
-.size_offset  = vmstate_offset_value(_state, _field

[Qemu-devel] [PATCH V3 0/5] Improve SD controllers emulation

2011-12-28 Thread Mitsyanko Igor
Changelog
v2->v3:
 - PATCH 2/3 splitted into smaller patches 2-4/5.
 - SDState structure rearrengment dropped.

v1->v2:
 PATCH 1/3: 
 - .calc_size field replaced with .get_bufsize field in VMStateField;
 - .size_offset removed completely, macros based on it rewritten to use
 new .get_bufsize field.
 PATCH 2/3:
 - all binary variables in SDState now have bool type;
 - SDState structure rearranged to ensure alignement;
 - sd_init(), sd_enable() now receive bool;
 - new version of PATCH 1/3 now used to save wp_groups array;
 PATCH 3/3:
 - DMA transfers modified and now operate with data blocks of
 BLKSIZE only, like real hardware does;
 - reset procedure optimized;
 - new version of PATCH 1/3 now used to save fifo_buffer.

First patch of this patch set modifies existing VMStateField interface
to ease save/restore functionality implementation for device's dynamically
allocated buffers. This modification is used later in second patch to
implement SD card's VMStateDescription structure.
Third patch adds imlementation of new device: SD host controller fully
compliant with "SD host controller specification version 2.00". It also
uses first patch modifications.

Mitsyanko Igor (5):
  vmstate: introduce get_bufsize entry in VMStateField
  hw/sd.c: add SD card save/load support
  hw/sd.c: convert wp_groups, expecting_acmd and enable to bool
  hw/sd.c: convert wp_switch and spi to bool
  hw: Introduce spec. ver. 2.00 compliant   SD host controller

 Makefile.target |1 +
 hw/g364fb.c |7 +-
 hw/hw.h |   41 +--
 hw/m48t59.c |7 +-
 hw/mac_nvram.c  |8 +-
 hw/onenand.c|7 +-
 hw/sd.c |  126 +++--
 hw/sd.h |4 +-
 hw/sdhc_ver2.c  | 1569 +++
 hw/sdhc_ver2.h  |  327 
 savevm.c|   10 +-
 11 files changed, 2018 insertions(+), 89 deletions(-)
 create mode 100644 hw/sdhc_ver2.c
 create mode 100644 hw/sdhc_ver2.h

-- 
1.7.4.1




[Qemu-devel] [PATCH V3 2/5] hw/sd.c: add SD card save/load support

2011-12-28 Thread Mitsyanko Igor
We couldn't properly implement save/restore functionality of SD host controllers
states without SD card's state VMStateDescription implementation. This patch
updates SD card emulation to support save/load of card's state.

Signed-off-by: Mitsyanko Igor 
---
 hw/sd.c |  100 +-
 1 files changed, 72 insertions(+), 28 deletions(-)

diff --git a/hw/sd.c b/hw/sd.c
index 07eb263..3e5628e 100644
--- a/hw/sd.c
+++ b/hw/sd.c
@@ -54,24 +54,28 @@ typedef enum {
 sd_illegal = -2,
 } sd_rsp_type_t;
 
+enum {
+sd_inactive,
+sd_card_identification_mode,
+sd_data_transfer_mode,
+};
+
+enum {
+sd_inactive_state = -1,
+sd_idle_state = 0,
+sd_ready_state,
+sd_identification_state,
+sd_standby_state,
+sd_transfer_state,
+sd_sendingdata_state,
+sd_receivingdata_state,
+sd_programming_state,
+sd_disconnect_state,
+} state;
+
 struct SDState {
-enum {
-sd_inactive,
-sd_card_identification_mode,
-sd_data_transfer_mode,
-} mode;
-enum {
-sd_inactive_state = -1,
-sd_idle_state = 0,
-sd_ready_state,
-sd_identification_state,
-sd_standby_state,
-sd_transfer_state,
-sd_sendingdata_state,
-sd_receivingdata_state,
-sd_programming_state,
-sd_disconnect_state,
-} state;
+uint32_t mode;
+int32_t state;
 uint32_t ocr;
 uint8_t scr[8];
 uint8_t cid[16];
@@ -81,22 +85,22 @@ struct SDState {
 uint8_t sd_status[64];
 uint32_t vhs;
 int wp_switch;
-int *wp_groups;
+bool *wp_groups;
 uint64_t size;
-int blk_len;
+uint32_t blk_len;
 uint32_t erase_start;
 uint32_t erase_end;
 uint8_t pwd[16];
-int pwd_len;
-int function_group[6];
+uint32_t pwd_len;
+uint8_t function_group[6];
 
 int spi;
-int current_cmd;
+uint8_t current_cmd;
 /* True if we will handle the next command as an ACMD. Note that this does
  * *not* track the APP_CMD status bit!
  */
-int expecting_acmd;
-int blk_written;
+bool expecting_acmd;
+uint32_t blk_written;
 uint64_t data_start;
 uint32_t data_offset;
 uint8_t data[512];
@@ -105,7 +109,7 @@ struct SDState {
 BlockDriverState *bdrv;
 uint8_t *buf;
 
-int enable;
+bool enable;
 };
 
 static void sd_set_mode(SDState *sd)
@@ -415,8 +419,8 @@ static void sd_reset(SDState *sd, BlockDriverState *bdrv)
 if (sd->wp_groups)
 g_free(sd->wp_groups);
 sd->wp_switch = bdrv ? bdrv_is_read_only(bdrv) : 0;
-sd->wp_groups = (int *) g_malloc0(sizeof(int) * sect);
-memset(sd->function_group, 0, sizeof(int) * 6);
+sd->wp_groups = g_new0(bool, sect);
+memset(sd->function_group, 0, sizeof(sd->function_group));
 sd->erase_start = 0;
 sd->erase_end = 0;
 sd->size = size;
@@ -440,6 +444,45 @@ static const BlockDevOps sd_block_ops = {
 .change_media_cb = sd_cardchange,
 };
 
+static int sd_get_wpgroups_size(void *opaque, int version_id)
+{
+SDState *sd = (SDState *)opaque;
+return sizeof(bool) * (sd->size >> (HWBLOCK_SHIFT + SECTOR_SHIFT +
+WPGROUP_SHIFT));
+}
+
+static const VMStateDescription sd_vmstate = {
+.name = "sd_card",
+.version_id = 1,
+.minimum_version_id = 1,
+.fields  = (VMStateField[]) {
+VMSTATE_UINT32(mode, SDState),
+VMSTATE_INT32(state, SDState),
+VMSTATE_UINT8_ARRAY(cid, SDState, 16),
+VMSTATE_UINT8_ARRAY(csd, SDState, 16),
+VMSTATE_UINT16(rca, SDState),
+VMSTATE_UINT32(card_status, SDState),
+VMSTATE_PARTIAL_BUFFER(sd_status, SDState, 1),
+VMSTATE_UINT32(vhs, SDState),
+VMSTATE_VBUFFER(wp_groups, SDState, 1, NULL, 0, sd_get_wpgroups_size),
+VMSTATE_UINT32(blk_len, SDState),
+VMSTATE_UINT32(erase_start, SDState),
+VMSTATE_UINT32(erase_end, SDState),
+VMSTATE_UINT8_ARRAY(pwd, SDState, 16),
+VMSTATE_UINT32(pwd_len, SDState),
+VMSTATE_UINT8_ARRAY(function_group, SDState, 6),
+VMSTATE_UINT8(current_cmd, SDState),
+VMSTATE_BOOL(expecting_acmd, SDState),
+VMSTATE_UINT32(blk_written, SDState),
+VMSTATE_UINT64(data_start, SDState),
+VMSTATE_UINT32(data_offset, SDState),
+VMSTATE_UINT8_ARRAY(data, SDState, 512),
+VMSTATE_BUFFER_UNSAFE(buf, SDState, 1, 512),
+VMSTATE_BOOL(enable, SDState),
+VMSTATE_END_OF_LIST()
+}
+};
+
 /* We do not model the chip select pin, so allow the board to select
whether card should be in SSI or MMC/SD mode.  It is also up to the
board to ensure that ssi transfers only occur when the chip select
@@ -457,6 +500,7 @@ SDState *sd_init(BlockDriverState *bs, int is_spi)
 bdrv_attach_dev_nofail(sd->bdrv, sd);
 bdrv_set_dev_ops(sd->bdrv, &sd_block_ops, sd);
 }
+vmstate_register(NULL, -1, &sd_vmstate, sd);
 return sd;
 }
 
@@ -560,7 +6

[Qemu-devel] [PATCH V3 3/5] hw/sd.c: convert wp_groups, expecting_acmd and enable to bool

2011-12-28 Thread Mitsyanko Igor
SDState variables wp_groups, expecting_acmd and enable are of bool data type but
are currently treated as int type variables by rest of the code. This patch 
updates
sd_enable() and sd_wp_addr() so now they explicitly work with bool type, and
replaces 0 and 1 with 'false' and 'true' where it's required.

Signed-off-by: Mitsyanko Igor 
---
 hw/sd.c |   18 +-
 hw/sd.h |2 +-
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/hw/sd.c b/hw/sd.c
index 3e5628e..955f4fb 100644
--- a/hw/sd.c
+++ b/hw/sd.c
@@ -426,7 +426,7 @@ static void sd_reset(SDState *sd, BlockDriverState *bdrv)
 sd->size = size;
 sd->blk_len = 0x200;
 sd->pwd_len = 0;
-sd->expecting_acmd = 0;
+sd->expecting_acmd = false;
 }
 
 static void sd_cardchange(void *opaque, bool load)
@@ -494,7 +494,7 @@ SDState *sd_init(BlockDriverState *bs, int is_spi)
 sd = (SDState *) g_malloc0(sizeof(SDState));
 sd->buf = qemu_blockalign(bs, 512);
 sd->spi = is_spi;
-sd->enable = 1;
+sd->enable = true;
 sd_reset(sd, bs);
 if (sd->bdrv) {
 bdrv_attach_dev_nofail(sd->bdrv, sd);
@@ -578,7 +578,7 @@ static void sd_function_switch(SDState *sd, uint32_t arg)
 sd->data[66] = crc & 0xff;
 }
 
-static inline int sd_wp_addr(SDState *sd, uint32_t addr)
+static inline bool sd_wp_addr(SDState *sd, uint32_t addr)
 {
 return sd->wp_groups[addr >>
 (HWBLOCK_SHIFT + SECTOR_SHIFT + WPGROUP_SHIFT)];
@@ -1052,7 +1052,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 
 sd->state = sd_programming_state;
 sd->wp_groups[addr >> (HWBLOCK_SHIFT +
-SECTOR_SHIFT + WPGROUP_SHIFT)] = 1;
+SECTOR_SHIFT + WPGROUP_SHIFT)] = true;
 /* Bzzztt  Operation complete.  */
 sd->state = sd_transfer_state;
 return sd_r1b;
@@ -1072,7 +1072,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 
 sd->state = sd_programming_state;
 sd->wp_groups[addr >> (HWBLOCK_SHIFT +
-SECTOR_SHIFT + WPGROUP_SHIFT)] = 0;
+SECTOR_SHIFT + WPGROUP_SHIFT)] = false;
 /* Bzzztt  Operation complete.  */
 sd->state = sd_transfer_state;
 return sd_r1b;
@@ -1169,7 +1169,7 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 if (sd->rca != rca)
 return sd_r0;
 
-sd->expecting_acmd = 1;
+sd->expecting_acmd = true;
 sd->card_status |= APP_CMD;
 return sd_r1;
 
@@ -1351,7 +1351,7 @@ int sd_do_command(SDState *sd, SDRequest *req,
 if (sd->card_status & CARD_IS_LOCKED) {
 if (!cmd_valid_while_locked(sd, req)) {
 sd->card_status |= ILLEGAL_COMMAND;
-sd->expecting_acmd = 0;
+sd->expecting_acmd = false;
 fprintf(stderr, "SD: Card is locked\n");
 rtype = sd_illegal;
 goto send_response;
@@ -1362,7 +1362,7 @@ int sd_do_command(SDState *sd, SDRequest *req,
 sd_set_mode(sd);
 
 if (sd->expecting_acmd) {
-sd->expecting_acmd = 0;
+sd->expecting_acmd = false;
 rtype = sd_app_command(sd, *req);
 } else {
 rtype = sd_normal_command(sd, *req);
@@ -1748,7 +1748,7 @@ int sd_data_ready(SDState *sd)
 return sd->state == sd_sendingdata_state;
 }
 
-void sd_enable(SDState *sd, int enable)
+void sd_enable(SDState *sd, bool enable)
 {
 sd->enable = enable;
 }
diff --git a/hw/sd.h b/hw/sd.h
index ac4b7c4..f446783 100644
--- a/hw/sd.h
+++ b/hw/sd.h
@@ -74,6 +74,6 @@ void sd_write_data(SDState *sd, uint8_t value);
 uint8_t sd_read_data(SDState *sd);
 void sd_set_cb(SDState *sd, qemu_irq readonly, qemu_irq insert);
 int sd_data_ready(SDState *sd);
-void sd_enable(SDState *sd, int enable);
+void sd_enable(SDState *sd, bool enable);
 
 #endif /* __hw_sd_h */
-- 
1.7.4.1




[Qemu-devel] [PATCH V3 4/5] hw/sd.c: convert wp_switch and spi to bool

2011-12-28 Thread Mitsyanko Igor
Currently several binary variables in SDState represented as bool type while
several other represented as int. This patch converts wp_switch and spi
variables to bool and modifies rest of the code to treat this variables as
bool instead of int.

Signed-off-by: Mitsyanko Igor 
---
 hw/sd.c |8 
 hw/sd.h |2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/hw/sd.c b/hw/sd.c
index 955f4fb..147a7e0 100644
--- a/hw/sd.c
+++ b/hw/sd.c
@@ -84,7 +84,7 @@ struct SDState {
 uint32_t card_status;
 uint8_t sd_status[64];
 uint32_t vhs;
-int wp_switch;
+bool wp_switch;
 bool *wp_groups;
 uint64_t size;
 uint32_t blk_len;
@@ -94,7 +94,7 @@ struct SDState {
 uint32_t pwd_len;
 uint8_t function_group[6];
 
-int spi;
+bool spi;
 uint8_t current_cmd;
 /* True if we will handle the next command as an ACMD. Note that this does
  * *not* track the APP_CMD status bit!
@@ -418,7 +418,7 @@ static void sd_reset(SDState *sd, BlockDriverState *bdrv)
 
 if (sd->wp_groups)
 g_free(sd->wp_groups);
-sd->wp_switch = bdrv ? bdrv_is_read_only(bdrv) : 0;
+sd->wp_switch = bdrv ? bdrv_is_read_only(bdrv) : false;
 sd->wp_groups = g_new0(bool, sect);
 memset(sd->function_group, 0, sizeof(sd->function_group));
 sd->erase_start = 0;
@@ -487,7 +487,7 @@ static const VMStateDescription sd_vmstate = {
whether card should be in SSI or MMC/SD mode.  It is also up to the
board to ensure that ssi transfers only occur when the chip select
is asserted.  */
-SDState *sd_init(BlockDriverState *bs, int is_spi)
+SDState *sd_init(BlockDriverState *bs, bool is_spi)
 {
 SDState *sd;
 
diff --git a/hw/sd.h b/hw/sd.h
index f446783..d25342f 100644
--- a/hw/sd.h
+++ b/hw/sd.h
@@ -67,7 +67,7 @@ typedef struct {
 
 typedef struct SDState SDState;
 
-SDState *sd_init(BlockDriverState *bs, int is_spi);
+SDState *sd_init(BlockDriverState *bs, bool is_spi);
 int sd_do_command(SDState *sd, SDRequest *req,
   uint8_t *response);
 void sd_write_data(SDState *sd, uint8_t value);
-- 
1.7.4.1




Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Anthony Liguori

On 12/28/2011 08:49 AM, Christoph Hellwig wrote:

On Mon, Dec 19, 2011 at 11:13:24AM -0600, Anthony Liguori wrote:

Hi,

I've published a set of tests I wrote over the weekend on qemu.org.  My
motivations were 1) to prevent regressions like the libguestfs one and 2)
to have an easier way to do development testing as I work on QEMU Object
Model.

Now before sending the obligatory, "What about using KVM autotest" reply,
note that this is significantly different than KVM autotest and really
occupies a different use-case.


Different question:  Do you want to merge qemu-iotests into it?


I'd rather just merge it into qemu.git since qemu-io already lives there.

Regards,

Anthony Liguori









Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Anthony Liguori

On 12/28/2011 09:01 AM, Avi Kivity wrote:

On 12/28/2011 04:27 PM, Anthony Liguori wrote:

Maybe I've used the wrong wording. I got the feeling that, besides
testing qemu
the way you need it, keeping qemu-test simple was really important.



Simple is always important.  In the case of qemu-test, there are some
important trade offs made to keep things simple and fast.  It doesn't
try to work with arbitrary guests which means the tests need to handle
only one environment.  The guest is pre-made to have exactly what is
needed for the tests so there is no setup required.  The guest is as
small as possible such that the test can run as quickly as possible.


In fact using linux as a guest negates that.  First of all, which linux
version? if it's fixed, you'll eventually miss functionality and need to
migrate.  If it keeps changing, so does your test, and it will keep
breaking.


The kernel is a git submodule so it's a very specific version.  Yes, we may need 
to bump the version down the road and obviously, if we have to change any tests 
in the process, we can.




Using Linux as a guest allows you to do system tests (ping was an
example) but doesn't allow you to do unit tests (test regression where
if this bit in that register was set, but if them bits in thar registers
were clear, then the packet would be encoded with ROT-26 before being
sent out).

I think qemu-test needs to use its own drivers which allow full control
over what you do with the hardware.  Otherwise, a regression that only
shows up in non-Linux guests will not be testable with qemu-test.


I think you're advocating for qtest.  This is another important part of my 
testing strategy.  I haven't received a lot of input on that RFC...


http://mid.gmane.org/1322765012-3164-1-git-send-email-aligu...@us.ibm.com

But there's certain things that I still consider to be unit testing (like basic 
networking tests) that I don't want to have to write with qtest.  I'm not up for 
writing a TCP/IP stack in Python...



There is no need to have a single tool that meets every possible
need.  In fact, the Unix tradition is to have separate single purposed
tools.


Those single purpose tools, if successful, tend to grow more purposes
(cf. qemu), and if unsuccessful, tend to lose purpose.


Having two test tools it not a bad thing provided that the overlap
isn't significant.


This is important, if the boundary isn't clear, then it will grow more
fuzzy in time.

I suggest the following:

  - qemu-test: qemu unit tests
  - kvm-unit-tests: kvm unit tests
  - kvm-autotest: unit test drivers + system tests


I would counter with:

- gtester unit tests (test-visitor, test-qobject, etc., qemu-iotest)
- qtest: low level, single device functional tests
- kvm-unit-tests: low level, instruction-set level functional tests
- qemu-test: higher level functional/coverage tests (multiple device
 interaction)
- kvm-autotest: unit/functional test drivers + acceptance testing

Note that the line I'm drawing is acceptance vs. functional testing, not unit 
vs. integration testing.  Technically, our unit tests are things like 
test-visitor.  Everything else is an integration test.


But the separation between kvm-autotest is acceptance testing vs. functional 
testing.


Acceptance testing is, "does Windows boot", "can I create three virtio-serial 
devices".


Obviously, part of acceptance testing is, "does this set of functional tests 
pass".


We shouldn't be discussing whether it's possible to merge the two
tools, but rather what the technical benefits of doing so would be.

Since at this point, there is almost no overlap between the two, I
don't see any actual technical benefit to merging them.  I see benefit
to autotest executing qemu-test, of course.


I'd say that running a ping test is a weak version of kvm-autotest's
system tests.


Consider the Linux kernel to be our library of functionality to write our unit 
tests.  I don't want to write a TCP/IP stack.  We aren't just grabbing a random 
distro kernel.  We're building one from scratch configured in a specific way.


In theory, we could even host the kernel source on git.qemu.org and fork it to 
add more interesting things in the kernel (although I'd prefer not to do this, 
obviously).



 Running a synthetic test that pokes values into memory
and mmio and sees a packet coming out is a unit test (the latter can in
fact be executed without a guest at all, just a table driving calls to
the memory and irq APIs).



Just putting the code in kvm-autotest.git in a directory doesn't make
sense to me.  Beyond the lack of a technical reason to do so,
logistically, it makes it harder for me to ask people to submit test
cases with a patch series if I can't actually apply the test case when
I'm applying the patches to qemu.git.

If qemu-test didn't use large submodules (like linux.git), I would
have made qemu-test part of qemu.git.  As far as I'm concerned,
qemu-test.git is just an extension to qemu.git.


Why not just pu

Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Anthony Liguori

On 12/28/2011 09:28 AM, Avi Kivity wrote:

On 12/28/2011 05:01 PM, Avi Kivity wrote:

I'd say that running a ping test is a weak version of kvm-autotest's
system tests.  Running a synthetic test that pokes values into memory
and mmio and sees a packet coming out is a unit test (the latter can in
fact be executed without a guest at all, just a table driving calls to
the memory and irq APIs).



Consider
   98d23704138e0
   7b4252e83f6f7d
   f7e80adf3cc4
   c16ada980f43
   4abf12f4ea8

(found by looking for 'fix' in the commit log and filtering out the
commits that don't support my case)

how can you reject such patches on the grounds that they're not
accompanied by unit tests?


That's why I've also proposed qtest.  But having written quite a few qtest unit 
tests by now, you hit the limits of this type of testing pretty quickly.



only by making it easy to add tests for
them.  I think it would be hard/impossible to test them with
linux-as-a-guest, since they fix edge cases that linux doesn't invoke.
But by having our own driver (often just using qmp to poke at memory),
we can easily generate the sequence that triggers the error.

We'd probably need a library to support setting up a pci device's BARs,
but that's easy with qmp/python integration.  You can even poke a small
executable into memory and execute it directly, if you really need guest
cpu interaction.


Please review the qtest series.  I think it offers a pretty good approach to 
writing this style of test.  But as I mentioned, you hit the limits pretty quickly.


Regards,

Anthony Liguori








Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Xin Tong
My main concern here is not how timely the interrupts can be handled,
i am more interested in reducing the number of TB enters/exits due to
interrupt. Returning to qemu mainloop requires saving and restoring
register contexts which are expensive, what i am thinking is that can
we check and handle interrupts every few TBs executed. But the
drawback is that I do not know how many TBs would be a good number
such that the interrupts do not get delayed too much.

Thanks


On Wed, Dec 28, 2011 at 5:04 AM, Avi Kivity  wrote:
> On 12/28/2011 01:40 PM, Peter Maydell wrote:
>> On 28 December 2011 10:42, Avi Kivity  wrote:
>> > It's possible to check for an interrupt before every instruction,
>> > without any overhead:
>> >
>> > - when a signal arrives, check the instruction pointer. If it points
>> > outside tcg code, set a flag and return.
>> > - consult a table indexed by the instruction pointer, that gives the
>> > number of bytes to the next guest instruction boundary
>> > - if nonzero, set a breakpoint at that boundary, and resume
>> > - remove the breakpoint (if set)
>> > - adjust the TB to return on the current instruction pointer
>> > - return
>>
>> This assumes you have hardware breakpoints on your host, so
>> it's not portable.
>
> You could also use software breakpoints.  Or just temporarily replace
> the host instruction on the next guest instruction boundary with a return.
>
>> (You also need to add a check-and-handle-flag for every return
>> from a helper function to TCG code,
>
> ah yes - didn't consider that.
>
> you could put all helper in their own section, an do something around
> that - but that assumes no callouts from helpers to the standard library.
>
>> and of course you need to
>> actually create the instruction-boundary table.
>
> This should be well amortized.
>
>> These are both
>> overheads.)
>
> --
> error compiling committee.c: too many arguments to function
>



Re: [Qemu-devel] [PATCH 1/5] vfio: Introduce documentation for VFIO driver

2011-12-28 Thread Ronen Hod

On 12/21/2011 11:42 PM, Alex Williamson wrote:

Including rationale for design, example usage and API description.

Signed-off-by: Alex Williamson
---

  Documentation/vfio.txt |  352 
  1 files changed, 352 insertions(+), 0 deletions(-)
  create mode 100644 Documentation/vfio.txt

diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
new file mode 100644
index 000..09a5a5b
--- /dev/null
+++ b/Documentation/vfio.txt
@@ -0,0 +1,352 @@
+VFIO - "Virtual Function I/O"[1]
+---
+Many modern system now provide DMA and interrupt remapping facilities
+to help ensure I/O devices behave within the boundaries they've been
+allotted.  This includes x86 hardware with AMD-Vi and Intel VT-d,
+POWER systems with Partitionable Endpoints (PEs) and embedded PowerPC
+systems such as Freescale PAMU.  The VFIO driver is an IOMMU/device
+agnostic framework for exposing direct device access to userspace, in
+a secure, IOMMU protected environment.  In other words, this allows
+safe[2], non-privileged, userspace drivers.
+
+Why do we want that?  Virtual machines often make use of direct device
+access ("device assignment") when configured for the highest possible
+I/O performance.  From a device and host perspective, this simply
+turns the VM into a userspace driver, with the benefits of
+significantly reduced latency, higher bandwidth, and direct use of
+bare-metal device drivers[3].
+
+Some applications, particularly in the high performance computing
+field, also benefit from low-overhead, direct device access from
+userspace.  Examples include network adapters (often non-TCP/IP based)
+and compute accelerators.  Prior to VFIO, these drivers had to either
+go through the full development cycle to become proper upstream
+driver, be maintained out of tree, or make use of the UIO framework,
+which has no notion of IOMMU protection, limited interrupt support,
+and requires root privileges to access things like PCI configuration
+space.
+
+The VFIO driver framework intends to unify these, replacing both the
+KVM PCI specific device assignment code as well as provide a more
+secure, more featureful userspace driver environment than UIO.
+
+Groups, Devices, and IOMMUs
+---
+
+Userspace drivers are primarily concerned with manipulating individual
+devices and setting up mappings in the IOMMU for those devices.
+Unfortunately, the IOMMU doesn't always have the granularity to track
+mappings for an individual device.  Sometimes this is a topology
+barrier, such as a PCIe-to-PCI bridge interposing the device and
+IOMMU, other times this is an IOMMU limitation.  In any case, the
+reality is that devices are not always independent with respect to the
+IOMMU.  Translations setup for one device can be used by another
+device in these scenarios.
+
+The IOMMU API exposes these relationships by identifying an "IOMMU
+group" for these dependent devices.  Devices on the same bus with the
+same IOMMU group (or just "group" for this document) are not isolated
+from each other with respect to DMA mappings.  For userspace usage,
+this logically means that instead of being able to grant ownership of
+an individual device, we must grant ownership of a group, which may
+contain one or more devices.
+
+These groups therefore become a fundamental component of VFIO and the
+working unit we use for exposing devices and granting permissions to
+userspace.  In addition, VFIO make efforts to ensure the integrity of
+the group for user access.  This includes ensuring that all devices
+within the group are controlled by VFIO (vs native host drivers)
+before allowing a user to access any member of the group or the IOMMU
+mappings, as well as maintaining the group viability as devices are
+dynamically added or removed from the system.
+
+To access a device through VFIO, a user must open a character device
+for the group that the device belongs to and then issue an ioctl to
+retrieve a file descriptor for the individual device.  This ensures
+that the user has permissions to the group (file based access to the
+/dev entry) and allows a check point at which VFIO can deny access to
+the device if the group is not viable (all devices within the group
+controlled by VFIO).  A file descriptor for the IOMMU is obtain in the
+same fashion.
+
+VFIO defines a standard set of APIs for access to devices and a
+modular interface for adding new, bus-specific VFIO device drivers.
+We call these "VFIO bus drivers".  The vfio-pci module is an example
+of a bus driver for exposing PCI devices.  When the bus driver module
+is loaded it enumerates all of the devices for it's bus, registering
+each device with the vfio core along with a set of callbacks.  For
+buses that support hotplug, the bus driver also adds itself to the
+notification chain for such events.  The callbacks reg

Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Avi Kivity
On 12/28/2011 06:42 PM, Anthony Liguori wrote:
>> In fact using linux as a guest negates that.  First of all, which linux
>> version? if it's fixed, you'll eventually miss functionality and need to
>> migrate.  If it keeps changing, so does your test, and it will keep
>> breaking.
>
>
> The kernel is a git submodule so it's a very specific version.  Yes,
> we may need to bump the version down the road and obviously, if we
> have to change any tests in the process, we can.

Having a full linux source as part of the build process detracts
somewhat from the advantages here.

>
>>
>> Using Linux as a guest allows you to do system tests (ping was an
>> example) but doesn't allow you to do unit tests (test regression where
>> if this bit in that register was set, but if them bits in thar registers
>> were clear, then the packet would be encoded with ROT-26 before being
>> sent out).
>>
>> I think qemu-test needs to use its own drivers which allow full control
>> over what you do with the hardware.  Otherwise, a regression that only
>> shows up in non-Linux guests will not be testable with qemu-test.
>
> I think you're advocating for qtest.  This is another important part
> of my testing strategy.  I haven't received a lot of input on that RFC...
>
> http://mid.gmane.org/1322765012-3164-1-git-send-email-aligu...@us.ibm.com
>
> But there's certain things that I still consider to be unit testing
> (like basic networking tests) that I don't want to have to write with
> qtest.  I'm not up for writing a TCP/IP stack in Python...

A ping test is not a unit test.

The ping test is already covered by kvm-autotest; just set up a config
that runs just that; after the initial run you'll have a guest installed
so it'll be quick.  If we have a DSL guest it'll even be very quick.

To test the various NIC emulations, you don't need a full TCP stack,
just like you didn't need to write an NTP implementation for qtest's rtc
test.  Instead you just poke values until it sends out a packet.  If you
want to test virtio-net with both direct and indirect buffers, you can
only do that with a unit test, you can't do it using a full linux guest
since it has its own ideas of when to use indirects and when to avoid
them (for example, it may choose to always avoid them).

>> I suggest the following:
>>
>>   - qemu-test: qemu unit tests
>>   - kvm-unit-tests: kvm unit tests
>>   - kvm-autotest: unit test drivers + system tests
>
>
> I would counter with:
>
> - gtester unit tests (test-visitor, test-qobject, etc., qemu-iotest)
> - qtest: low level, single device functional tests
> - kvm-unit-tests: low level, instruction-set level functional tests

Not really.  kvm-unit-tests tests things specific to kvm.

> - qemu-test: higher level functional/coverage tests (multiple device
>  interaction)
> - kvm-autotest: unit/functional test drivers + acceptance testing
>
> Note that the line I'm drawing is acceptance vs. functional testing,
> not unit vs. integration testing.  Technically, our unit tests are
> things like test-visitor.  Everything else is an integration test.
>
> But the separation between kvm-autotest is acceptance testing vs.
> functional testing.
>
> Acceptance testing is, "does Windows boot", "can I create three
> virtio-serial devices".
>
> Obviously, part of acceptance testing is, "does this set of functional
> tests pass".

Seems like a very blurry line.  Especially as the functional test is
weaker than either qtest and kvm-autotest.  I now have to agree with the
others that it's duplicate functionality.  Does it really matter whether
you're creating an image by compiling Linux and assembling and
initramfs, or by downloading Fedora.iso and installing it?  It's testing
exactly the same thing, guest boot and functionality.

Would you add live migration testing to qemu-test?  If yes, you're
duplicating some more.  If not, you're not doing functional or coverage
tests for that functionality.

>
>>> We shouldn't be discussing whether it's possible to merge the two
>>> tools, but rather what the technical benefits of doing so would be.
>>>
>>> Since at this point, there is almost no overlap between the two, I
>>> don't see any actual technical benefit to merging them.  I see benefit
>>> to autotest executing qemu-test, of course.
>>
>> I'd say that running a ping test is a weak version of kvm-autotest's
>> system tests.
>
> Consider the Linux kernel to be our library of functionality to write
> our unit tests.  

I was about to say the same thing, but with a negative implication. 
Using Linux restricts your tests to what Linux does with the devices.

> I don't want to write a TCP/IP stack.  We aren't just grabbing a
> random distro kernel.  We're building one from scratch configured in a
> specific way.

How does that help?

> In theory, we could even host the kernel source on git.qemu.org and
> fork it to add more interesting things in the kernel (although I'd
> prefer not to do this, obviously).

That way lies madness, thou

Re: [Qemu-devel] [ANNOUNCE] qemu-test: a set of tests scripts for QEMU

2011-12-28 Thread Avi Kivity
On 12/28/2011 06:44 PM, Anthony Liguori wrote:
> On 12/28/2011 09:28 AM, Avi Kivity wrote:
>> On 12/28/2011 05:01 PM, Avi Kivity wrote:
>>> I'd say that running a ping test is a weak version of kvm-autotest's
>>> system tests.  Running a synthetic test that pokes values into memory
>>> and mmio and sees a packet coming out is a unit test (the latter can in
>>> fact be executed without a guest at all, just a table driving calls to
>>> the memory and irq APIs).
>>>
>>
>> Consider
>>98d23704138e0
>>7b4252e83f6f7d
>>f7e80adf3cc4
>>c16ada980f43
>>4abf12f4ea8
>>
>> (found by looking for 'fix' in the commit log and filtering out the
>> commits that don't support my case)
>>
>> how can you reject such patches on the grounds that they're not
>> accompanied by unit tests?
>
> That's why I've also proposed qtest.  But having written quite a few
> qtest unit tests by now, you hit the limits of this type of testing
> pretty quickly.

Can you describe those limits?

>
>> only by making it easy to add tests for
>> them.  I think it would be hard/impossible to test them with
>> linux-as-a-guest, since they fix edge cases that linux doesn't invoke.
>> But by having our own driver (often just using qmp to poke at memory),
>> we can easily generate the sequence that triggers the error.
>>
>> We'd probably need a library to support setting up a pci device's BARs,
>> but that's easy with qmp/python integration.  You can even poke a small
>> executable into memory and execute it directly, if you really need guest
>> cpu interaction.
>
> Please review the qtest series.  I think it offers a pretty good
> approach to writing this style of test.  But as I mentioned, you hit
> the limits pretty quickly.

I think it's great, it looks like exactly what I wanted, except it's
been delivered on time.  I'd really like to see it integrated quickly
with some flesh around it, then replying -ENOTEST to all patches.  This
will improve qemu's quality a lot more than guest boot/ping tests, which
we do regularly with kvm-autotest anyway.

Think of how new instruction emulations are always accompanied by new
kvm-unit-tests tests, I often don't even have to ask for them.

-- 
error compiling committee.c: too many arguments to function




Re: [Qemu-devel] [PATCH] Expose tsc deadline timer cpuid to guest

2011-12-28 Thread Liu, Jinsong
>> diff --git a/qemu-kvm.h b/qemu-kvm.h
>> index 2bd5602..8c6c2ea 100644
>> --- a/qemu-kvm.h
>> +++ b/qemu-kvm.h
>> @@ -260,6 +260,7 @@ extern int kvm_irqchip;
>>  extern int kvm_pit;
>>  extern int kvm_pit_reinject;
>>  extern unsigned int kvm_shadow_memory;
>> +extern int tsc_deadline_timer;
>> 
>>  int kvm_handle_tpr_access(CPUState *env);
>>  void kvm_tpr_enable_vapic(CPUState *env);
>> diff --git a/qemu-options.hx b/qemu-options.hx
>> index f6df6b9..eff6644 100644
>> --- a/qemu-options.hx
>> +++ b/qemu-options.hx
>> @@ -2619,6 +2619,9 @@ DEF("no-kvm-pit-reinjection", 0,
>>  QEMU_OPTION_no_kvm_pit_reinjection, "-no-kvm-pit-reinjection\n"
>>  "disable KVM kernel mode PIT interrupt
>> reinjection\n",  QEMU_ARCH_I386) +DEF("no-tsc-deadline-timer",
>> 0, QEMU_OPTION_no_tsc_deadline_timer, +"-no-tsc-deadline-timer  
>> disable tsc deadline timer\n", +QEMU_ARCH_I386)
> 
> Hmm, I would really prefer to stop adding switches like this. They
> won't make it upstream anyway.

OK, I will try to write a patch w/ better user control cpuid method, i.e. by 
plus_features and minus_features.

> 
> Can't this control be attached to legacy qemu machine models, ie. here
> anything <= pc-1.0? See how we handle kvmclock.
> 

You mean, by adding input para like pc_init1(..., kvmclock_enabled, 
tscdeadline_enabled)?
I think that's not a good way. With more and more cpuid features (N) controlled 
in this way, machine models would be 2^N.

Thanks,
Jinsong


Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Lluís Vilanova
Xin Tong writes:

> My main concern here is not how timely the interrupts can be handled,
> i am more interested in reducing the number of TB enters/exits due to
> interrupt. Returning to qemu mainloop requires saving and restoring
> register contexts which are expensive, what i am thinking is that can
> we check and handle interrupts every few TBs executed. But the
> drawback is that I do not know how many TBs would be a good number
> such that the interrupts do not get delayed too much.

I think a maximum amount of guest time for which interrupts can be delayed would
provide a better response (maybe together with a maximum number of delayed
interrupts).

For that you could program a "special" timer that forces a return-from-guest,
whatever the mechanism. But I'm sure that's going to make the system slower than
just checking every fixed number of TBs.


Lluis

-- 
 "And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer."
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth



Re: [Qemu-devel] [PATCH] Expose tsc deadline timer cpuid to guest

2011-12-28 Thread Jan Kiszka
On 2011-12-28 18:35, Liu, Jinsong wrote:
>>> diff --git a/qemu-kvm.h b/qemu-kvm.h
>>> index 2bd5602..8c6c2ea 100644
>>> --- a/qemu-kvm.h
>>> +++ b/qemu-kvm.h
>>> @@ -260,6 +260,7 @@ extern int kvm_irqchip;
>>>  extern int kvm_pit;
>>>  extern int kvm_pit_reinject;
>>>  extern unsigned int kvm_shadow_memory;
>>> +extern int tsc_deadline_timer;
>>>
>>>  int kvm_handle_tpr_access(CPUState *env);
>>>  void kvm_tpr_enable_vapic(CPUState *env);
>>> diff --git a/qemu-options.hx b/qemu-options.hx
>>> index f6df6b9..eff6644 100644
>>> --- a/qemu-options.hx
>>> +++ b/qemu-options.hx
>>> @@ -2619,6 +2619,9 @@ DEF("no-kvm-pit-reinjection", 0,
>>>  QEMU_OPTION_no_kvm_pit_reinjection, "-no-kvm-pit-reinjection\n"
>>>  "disable KVM kernel mode PIT interrupt
>>> reinjection\n",  QEMU_ARCH_I386) +DEF("no-tsc-deadline-timer",
>>> 0, QEMU_OPTION_no_tsc_deadline_timer, +"-no-tsc-deadline-timer  
>>> disable tsc deadline timer\n", +QEMU_ARCH_I386)
>>
>> Hmm, I would really prefer to stop adding switches like this. They
>> won't make it upstream anyway.
> 
> OK, I will try to write a patch w/ better user control cpuid method, i.e. by 
> plus_features and minus_features.

Yep, that would be better.

> 
>>
>> Can't this control be attached to legacy qemu machine models, ie. here
>> anything <= pc-1.0? See how we handle kvmclock.
>>
> 
> You mean, by adding input para like pc_init1(..., kvmclock_enabled, 
> tscdeadline_enabled)?
> I think that's not a good way.

I think it is mandatory as older qemu versions won't expose tscdeadline
to the guest, thus newer versions must not do this when emulating older
machines.

> With more and more cpuid features (N) controlled in this way, machine models 
> would be 2^N.

We likely need a better way to express this via code, I agree. Likely
something declarative as for compat_props.

Jan



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Peter Maydell
On 28 December 2011 00:43, Xin Tong  wrote:
> I modified QEMU to check for interrupt status at the end of every TB
> and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
> is 70% of the unmodified one for some benchmarks on a x86_64 host. I
> agree that the extra load-test-branch-not-taken per TB is minimal, but
> what I found is that the average number of TB executed per TB enter is
> low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
> enter. this makes me wonder why. Maybe the mechanism i used to gather
> this statistics is flawed. but the performance is indeed hindered.

Since you said you're using system mode, here's my guess. The
unlink-tbs method of interrupting the guest CPU thread runs
in a second thread (the io thread), and doesn't stop the guest
CPU thread. So while the io thread is trying to unlink TBs,
the CPU thread is still running on, and might well execute
a few more TBs before the io thread's traversal of the TB
graph catches up with it and manages to unlink the TB link
the CPU thread is about to traverse.

More generally: are we really taking an interrupt every 3 to
5 TBs? This seems very high -- surely we will be spending more
time in the OS servicing interrupts than running useful guest
userspace code...

-- PMM



[Qemu-devel] [PATCH 1/2] Define KVM_CAP_TSC_DEADLINE_TIMER

2011-12-28 Thread Liu, Jinsong
>From 5afecc308bc25c7fd8d124e7557f08fb067d6caa Mon Sep 17 00:00:00 2001
From: Liu Jinsong 
Date: Thu, 29 Dec 2011 01:45:45 +0800
Subject: [PATCH 1/2] Define KVM_CAP_TSC_DEADLINE_TIMER

Signed-off-by: Liu, Jinsong 
Signed-off-by: Jan Kiszka 
---
 linux-headers/linux/kvm.h |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/linux-headers/linux/kvm.h b/linux-headers/linux/kvm.h
index a8761d3..1d3a4f4 100644
--- a/linux-headers/linux/kvm.h
+++ b/linux-headers/linux/kvm.h
@@ -558,6 +558,7 @@ struct kvm_ppc_pvinfo {
 #define KVM_CAP_PPC_PAPR 68
 #define KVM_CAP_SW_TLB 69
 #define KVM_CAP_ONE_REG 70
+#define KVM_CAP_TSC_DEADLINE_TIMER 72
 
 #ifdef KVM_CAP_IRQ_ROUTING
 
-- 
1.6.5.6


0001-Define-KVM_CAP_TSC_DEADLINE_TIMER.patch
Description: 0001-Define-KVM_CAP_TSC_DEADLINE_TIMER.patch


[Qemu-devel] [PATCH 2/2] Expose tsc deadline timer cpuid to guest

2011-12-28 Thread Liu, Jinsong
>From 3a78adf8006ec6189bfe2f55f7ae213e75bf3815 Mon Sep 17 00:00:00 2001
From: Liu Jinsong 
Date: Thu, 29 Dec 2011 05:28:12 +0800
Subject: [PATCH 2/2] Expose tsc deadline timer cpuid to guest

Depend on several factors:
1. Considering live migration, user enable/disable tsc deadline timer;
2. If guest use kvm apic and kvm emulate tsc deadline timer, expose it;
3. If in the future qemu support tsc deadline timer emulation,
   and guest use qemu apic, add cpuid exposing case then.

Signed-off-by: Liu, Jinsong 
---
 target-i386/cpu.h   |2 ++
 target-i386/cpuid.c |7 ++-
 target-i386/kvm.c   |   13 +
 3 files changed, 21 insertions(+), 1 deletions(-)

diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 177d8aa..f2d0ad5 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -399,6 +399,7 @@
 #define CPUID_EXT_X2APIC   (1 << 21)
 #define CPUID_EXT_MOVBE(1 << 22)
 #define CPUID_EXT_POPCNT   (1 << 23)
+#define CPUID_EXT_TSC_DEADLINE_TIMER (1 << 24)
 #define CPUID_EXT_XSAVE(1 << 26)
 #define CPUID_EXT_OSXSAVE  (1 << 27)
 #define CPUID_EXT_HYPERVISOR  (1 << 31)
@@ -693,6 +694,7 @@ typedef struct CPUX86State {
 
 uint64_t tsc;
 uint64_t tsc_deadline;
+bool tsc_deadline_timer_enabled;
 
 uint64_t mcg_status;
 uint64_t msr_ia32_misc_enable;
diff --git a/target-i386/cpuid.c b/target-i386/cpuid.c
index 0b3af90..fe749e0 100644
--- a/target-i386/cpuid.c
+++ b/target-i386/cpuid.c
@@ -48,7 +48,7 @@ static const char *ext_feature_name[] = {
 "fma", "cx16", "xtpr", "pdcm",
 NULL, NULL, "dca", "sse4.1|sse4_1",
 "sse4.2|sse4_2", "x2apic", "movbe", "popcnt",
-NULL, "aes", "xsave", "osxsave",
+"tsc_deadline", "aes", "xsave", "osxsave",
 "avx", NULL, NULL, "hypervisor",
 };
 static const char *ext2_feature_name[] = {
@@ -225,6 +225,7 @@ typedef struct x86_def_t {
 int model;
 int stepping;
 int tsc_khz;
+bool tsc_deadline_timer_enabled;
 uint32_t features, ext_features, ext2_features, ext3_features;
 uint32_t kvm_features, svm_features;
 uint32_t xlevel;
@@ -742,6 +743,9 @@ static int cpu_x86_find_by_name(x86_def_t *x86_cpu_def, 
const char *cpu_model)
 x86_cpu_def->ext3_features &= ~minus_ext3_features;
 x86_cpu_def->kvm_features &= ~minus_kvm_features;
 x86_cpu_def->svm_features &= ~minus_svm_features;
+/* Defaultly user don't against tsc_deadline_timer */
+x86_cpu_def->tsc_deadline_timer_enabled =
+!(minus_ext_features & CPUID_EXT_TSC_DEADLINE_TIMER);
 if (check_cpuid) {
 if (check_features_against_host(x86_cpu_def) && enforce_cpuid)
 goto error;
@@ -885,6 +889,7 @@ int cpu_x86_register (CPUX86State *env, const char 
*cpu_model)
 env->cpuid_ext4_features = def->ext4_features;
 env->cpuid_xlevel2 = def->xlevel2;
 env->tsc_khz = def->tsc_khz;
+env->tsc_deadline_timer_enabled = def->tsc_deadline_timer_enabled;
 if (!kvm_enabled()) {
 env->cpuid_features &= TCG_FEATURES;
 env->cpuid_ext_features &= TCG_EXT_FEATURES;
diff --git a/target-i386/kvm.c b/target-i386/kvm.c
index d50de90..79baf0b 100644
--- a/target-i386/kvm.c
+++ b/target-i386/kvm.c
@@ -370,6 +370,19 @@ int kvm_arch_init_vcpu(CPUState *env)
 i = env->cpuid_ext_features & CPUID_EXT_HYPERVISOR;
 env->cpuid_ext_features &= kvm_arch_get_supported_cpuid(s, 1, 0, R_ECX);
 env->cpuid_ext_features |= i;
+/*
+ * 1. Considering live migration, user enable/disable tsc deadline timer;
+ * 2. If guest use kvm apic and kvm emulate tsc deadline timer, expose it;
+ * 3. If in the future qemu support tsc deadline timer emulation,
+ *and guest use qemu apic, add cpuid exposing case then.
+ */
+env->cpuid_ext_features &= ~CPUID_EXT_TSC_DEADLINE_TIMER;
+if (env->tsc_deadline_timer_enabled) {
+if (kvm_irqchip_in_kernel() &&
+kvm_check_extension(s, KVM_CAP_TSC_DEADLINE_TIMER)) {
+env->cpuid_ext_features |= CPUID_EXT_TSC_DEADLINE_TIMER;
+}
+}
 
 env->cpuid_ext2_features &= kvm_arch_get_supported_cpuid(s, 0x8001,
  0, R_EDX);
-- 
1.6.5.6


0002-Expose-tsc-deadline-timer-cpuid-to-guest.patch
Description: 0002-Expose-tsc-deadline-timer-cpuid-to-guest.patch


Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Xin Tong
That is my guess as well in the first place, but my QEMU is built with
CONFIG_IOTHREAD set to 0.

I am not 100% sure about how interrupts are delivered in QEMU, my
guess is that some kind of timer devices will have to fire and qemu
might have installed a signal handler and the signal handler takes the
signal and invokes unlink_tb.  I hope you can enlighten me on that.

Thanks

Xin
On Wed, Dec 28, 2011 at 2:10 PM, Peter Maydell  wrote:
> On 28 December 2011 00:43, Xin Tong  wrote:
>> I modified QEMU to check for interrupt status at the end of every TB
>> and ran it on SPECINT2000 benchmarks with QEMU 0.15.0. The performance
>> is 70% of the unmodified one for some benchmarks on a x86_64 host. I
>> agree that the extra load-test-branch-not-taken per TB is minimal, but
>> what I found is that the average number of TB executed per TB enter is
>> low (~3.5 TBs), while the unmodified approach has ~10 TBs per TB
>> enter. this makes me wonder why. Maybe the mechanism i used to gather
>> this statistics is flawed. but the performance is indeed hindered.
>
> Since you said you're using system mode, here's my guess. The
> unlink-tbs method of interrupting the guest CPU thread runs
> in a second thread (the io thread), and doesn't stop the guest
> CPU thread. So while the io thread is trying to unlink TBs,
> the CPU thread is still running on, and might well execute
> a few more TBs before the io thread's traversal of the TB
> graph catches up with it and manages to unlink the TB link
> the CPU thread is about to traverse.
>
> More generally: are we really taking an interrupt every 3 to
> 5 TBs? This seems very high -- surely we will be spending more
> time in the OS servicing interrupts than running useful guest
> userspace code...
>
> -- PMM



[Qemu-devel] [PATCH 20/21] postcopy outgoing: add -p and -n option to migrate command

2011-12-28 Thread Isaku Yamahata
Added -p option to migrate command for postcopy mode and
introduce postcopy parameter for migration to indicate that postcopy mode is 
enabled.
Add -n option for postcopy migration which indicates disabling background 
transfer.

Signed-off-by: Isaku Yamahata 
---
 hmp-commands.hx |   12 
 migration.c |2 ++
 migration.h |2 ++
 qmp-commands.hx |   10 +++---
 savevm.c|2 ++
 5 files changed, 21 insertions(+), 7 deletions(-)

diff --git a/hmp-commands.hx b/hmp-commands.hx
index 14838b7..42a5f7e 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -746,24 +746,28 @@ ETEXI
 
 {
 .name   = "migrate",
-.args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-.params = "[-d] [-b] [-i] uri",
+.args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
+.params = "[-d] [-b] [-i] [-p [-n]] uri",
 .help   = "migrate to URI (using -d to not wait for completion)"
  "\n\t\t\t -b for migration without shared storage with"
  " full copy of disk\n\t\t\t -i for migration without "
  "shared storage with incremental copy of disk "
- "(base image shared between src and destination)",
+ "(base image shared between src and destination)"
+ "\n\t\t\t-p for migration with postcopy mode enabled"
+ "\n\t\t\t-n for no background transfer of postcopy mode",
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
 
 
 STEXI
-@item migrate [-d] [-b] [-i] @var{uri}
+@item migrate [-d] [-b] [-i] [-p [-n]] @var{uri}
 @findex migrate
 Migrate to @var{uri} (using -d to not wait for completion).
-b for migration with full copy of disk
-i for migration with incremental copy of disk (base image is shared)
+   -p for migration with postcopy mode enabled
+   -n for migration with postcopy mode enabled without background transfer
 ETEXI
 
 {
diff --git a/migration.c b/migration.c
index 2cef246..0149ab3 100644
--- a/migration.c
+++ b/migration.c
@@ -422,6 +422,8 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 
 params.blk = qdict_get_try_bool(qdict, "blk", 0);
 params.shared = qdict_get_try_bool(qdict, "inc", 0);
+params.postcopy = qdict_get_try_bool(qdict, "postcopy", 0);
+params.nobg = qdict_get_try_bool(qdict, "nobg", 0);
 
 if (s->state == MIG_STATE_ACTIVE) {
 monitor_printf(mon, "migration already in progress\n");
diff --git a/migration.h b/migration.h
index 29f468c..90ae362 100644
--- a/migration.h
+++ b/migration.h
@@ -22,6 +22,8 @@
 struct MigrationParams {
 int blk;
 int shared;
+int postcopy;
+int nobg;
 };
 
 typedef struct MigrationState MigrationState;
diff --git a/qmp-commands.hx b/qmp-commands.hx
index 7e3f4b9..67c7df6 100644
--- a/qmp-commands.hx
+++ b/qmp-commands.hx
@@ -430,13 +430,15 @@ EQMP
 
 {
 .name   = "migrate",
-.args_type  = "detach:-d,blk:-b,inc:-i,uri:s",
-.params = "[-d] [-b] [-i] uri",
+.args_type  = "detach:-d,blk:-b,inc:-i,postcopy:-p,nobg:-n,uri:s",
+.params = "[-d] [-b] [-i [-n]] uri",
 .help   = "migrate to URI (using -d to not wait for completion)"
  "\n\t\t\t -b for migration without shared storage with"
  " full copy of disk\n\t\t\t -i for migration without "
  "shared storage with incremental copy of disk "
- "(base image shared between src and destination)",
+ "(base image shared between src and destination)"
+ "\n\t\t\t-p for migration with postcopy mode enabled"
+ "\n\t\t\t-n for no background transfer of postcopy mode",
 .user_print = monitor_user_noop,   
.mhandler.cmd_new = do_migrate,
 },
@@ -451,6 +453,8 @@ Arguments:
 
 - "blk": block migration, full disk copy (json-bool, optional)
 - "inc": incremental disk copy (json-bool, optional)
+- "postcopy": postcopy migration (json-bool, optional)
+- "nobg": postcopy without background transfer (json-bool, optional)
 - "uri": Destination URI (json-string)
 
 Example:
diff --git a/savevm.c b/savevm.c
index 2d8e09f..bafb706 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1715,6 +1715,8 @@ static int qemu_savevm_state(Monitor *mon, QEMUFile *f)
 MigrationParams params = {
 .blk = 0,
 .shared = 0,
+.postcopy = 0,
+.nobg = 0,
 };
 
 if (qemu_savevm_state_blocked(mon)) {
-- 
1.7.1.1




[Qemu-devel] [PATCH 2/2] umem: chardevice for kvm postcopy

2011-12-28 Thread Isaku Yamahata
This is a character device to hook page access.
The page fault in the area is reported to another user process by
this chardriver. Then, the process fills the page contents and
resolves the page fault.

Signed-off-by: Isaku Yamahata 
---
 drivers/char/Kconfig  |9 +
 drivers/char/Makefile |1 +
 drivers/char/umem.c   |  898 +
 include/linux/umem.h  |   83 +
 4 files changed, 991 insertions(+), 0 deletions(-)
 create mode 100644 drivers/char/umem.c
 create mode 100644 include/linux/umem.h

diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig
index 4364303..001e3e4 100644
--- a/drivers/char/Kconfig
+++ b/drivers/char/Kconfig
@@ -15,6 +15,15 @@ config DEVKMEM
  kind of kernel debugging operations.
  When in doubt, say "N".
 
+config UMEM
+tristate "/dev/umem user process backed memory support"
+   default n
+   help
+ User process backed memory driver provides /dev/umem device.
+ The /dev/umem device is designed for some sort of distributed
+ shared memory. Especially post-copy live migration with KVM.
+ When in doubt, say "N".
+
 config STALDRV
bool "Stallion multiport serial support"
depends on SERIAL_NONSTANDARD
diff --git a/drivers/char/Makefile b/drivers/char/Makefile
index 32762ba..1eb14dc 100644
--- a/drivers/char/Makefile
+++ b/drivers/char/Makefile
@@ -3,6 +3,7 @@
 #
 
 obj-y  += mem.o random.o
+obj-$(CONFIG_UMEM) += umem.o
 obj-$(CONFIG_TTY_PRINTK)   += ttyprintk.o
 obj-y  += misc.o
 obj-$(CONFIG_ATARI_DSP56K) += dsp56k.o
diff --git a/drivers/char/umem.c b/drivers/char/umem.c
new file mode 100644
index 000..df669fb
--- /dev/null
+++ b/drivers/char/umem.c
@@ -0,0 +1,898 @@
+/*
+ * UMEM: user process backed memory.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct umem_page_req_list {
+   struct list_head list;
+   pgoff_t pgoff;
+};
+
+struct umem {
+   loff_t size;
+   pgoff_t pgoff_end;
+   spinlock_t lock;
+
+   wait_queue_head_t req_wait;
+
+   int async_req_max;
+   int async_req_nr;
+   pgoff_t *async_req;
+
+   int sync_req_max;
+   unsigned long *sync_req_bitmap;
+   unsigned long *sync_wait_bitmap;
+   pgoff_t *sync_req;
+   wait_queue_head_t *page_wait;
+
+   int req_list_nr;
+   struct list_head req_list;
+   wait_queue_head_t req_list_wait;
+
+   unsigned long *cached;
+   unsigned long *faulted;
+
+   bool mmapped;
+   unsigned long vm_start;
+   unsigned int vma_nr;
+   struct task_struct *task;
+
+   struct file *shmem_filp;
+   struct vm_area_struct vma;
+
+   struct kref kref;
+   struct list_head list;
+   struct umem_name name;
+};
+
+
+static LIST_HEAD(umem_list);
+DEFINE_MUTEX(umem_list_mutex);
+
+static bool umem_name_eq(const struct umem_name *lhs,
+ const struct umem_name *rhs)
+{
+   return memcmp(lhs->id, rhs->id, sizeof(lhs->id)) == 0 &&
+   memcmp(lhs->name, rhs->name, sizeof(lhs->name)) == 0;
+}
+
+static int umem_add_list(struct umem *umem)
+{
+   struct umem *entry;
+   BUG_ON(!mutex_is_locked(&umem_list_mutex));
+   list_for_each_entry(entry, &umem_list, list) {
+   if (umem_name_eq(&entry->name, &umem->name)) {
+   mutex_unlock(&umem_list_mutex);
+   return -EBUSY;
+   }
+   }
+
+   list_add(&umem->list, &umem_list);
+   return 0;
+}
+
+static void umem_release_fake_vmf(int ret, struct vm_fault *fake_vmf)
+{
+   if (ret & VM_FAULT_LOCKED) {
+   unlock_page(fake_vmf->page);
+   }
+   page_cache_release(fake_vmf->page);
+}
+
+static int umem_minor_fault(struct umem *umem,
+   struct vm_area_struct *vma,
+   struct vm_fault *vmf)
+{
+   struct vm_fault fake_vmf;
+   int ret;
+   struct page *page;
+
+   BUG_ON(!test_bit(vmf->pgoff, umem->cached));
+   fake_vmf = *

[Qemu-devel] [PATCH 15/21] migration: factor out parameters into MigrationParams

2011-12-28 Thread Isaku Yamahata
Introduce MigrationParams for parameters of migration.

Signed-off-by: Isaku Yamahata 
---
 block-migration.c |8 
 hw/hw.h   |2 +-
 migration.c   |   16 +---
 migration.h   |8 ++--
 qemu-common.h |1 +
 savevm.c  |   12 
 sysemu.h  |4 ++--
 7 files changed, 31 insertions(+), 20 deletions(-)

diff --git a/block-migration.c b/block-migration.c
index 2b7edbc..c320913 100644
--- a/block-migration.c
+++ b/block-migration.c
@@ -706,13 +706,13 @@ static int block_load(QEMUFile *f, void *opaque, int 
version_id)
 return 0;
 }
 
-static void block_set_params(int blk_enable, int shared_base, void *opaque)
+static void block_set_params(const MigrationParams *params, void *opaque)
 {
-block_mig_state.blk_enable = blk_enable;
-block_mig_state.shared_base = shared_base;
+block_mig_state.blk_enable = params->blk;
+block_mig_state.shared_base = params->shared;
 
 /* shared base means that blk_enable = 1 */
-block_mig_state.blk_enable |= shared_base;
+block_mig_state.blk_enable |= params->shared;
 }
 
 void blk_mig_init(void)
diff --git a/hw/hw.h b/hw/hw.h
index a59b770..c17f837 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -250,7 +250,7 @@ static inline void qemu_get_sbe64s(QEMUFile *f, int64_t *pv)
 int64_t qemu_ftell(QEMUFile *f);
 int64_t qemu_fseek(QEMUFile *f, int64_t pos, int whence);
 
-typedef void SaveSetParamsHandler(int blk_enable, int shared, void * opaque);
+typedef void SaveSetParamsHandler(const MigrationParams *params, void * 
opaque);
 typedef void SaveStateHandler(QEMUFile *f, void *opaque);
 typedef int SaveLiveStateHandler(Monitor *mon, QEMUFile *f, int stage,
  void *opaque);
diff --git a/migration.c b/migration.c
index 057dde7..2cef246 100644
--- a/migration.c
+++ b/migration.c
@@ -365,7 +365,7 @@ void migrate_fd_connect(MigrationState *s)
   migrate_fd_close);
 
 DPRINTF("beginning savevm\n");
-ret = qemu_savevm_state_begin(s->mon, s->file, s->blk, s->shared);
+ret = qemu_savevm_state_begin(s->mon, s->file, &s->params);
 if (ret < 0) {
 DPRINTF("failed, %d\n", ret);
 migrate_fd_error(s);
@@ -374,15 +374,15 @@ void migrate_fd_connect(MigrationState *s)
 migrate_fd_put_ready(s);
 }
 
-static MigrationState *migrate_init(Monitor *mon, int detach, int blk, int inc)
+static MigrationState *migrate_init(Monitor *mon, int detach,
+const MigrationParams *params)
 {
 MigrationState *s = migrate_get_current();
 int64_t bandwidth_limit = s->bandwidth_limit;
 
 memset(s, 0, sizeof(*s));
 s->bandwidth_limit = bandwidth_limit;
-s->blk = blk;
-s->shared = inc;
+s->params = *params;
 
 /* s->mon is used for two things:
- pass fd in fd migration
@@ -414,13 +414,15 @@ void migrate_del_blocker(Error *reason)
 int do_migrate(Monitor *mon, const QDict *qdict, QObject **ret_data)
 {
 MigrationState *s = migrate_get_current();
+MigrationParams params;
 const char *p;
 int detach = qdict_get_try_bool(qdict, "detach", 0);
-int blk = qdict_get_try_bool(qdict, "blk", 0);
-int inc = qdict_get_try_bool(qdict, "inc", 0);
 const char *uri = qdict_get_str(qdict, "uri");
 int ret;
 
+params.blk = qdict_get_try_bool(qdict, "blk", 0);
+params.shared = qdict_get_try_bool(qdict, "inc", 0);
+
 if (s->state == MIG_STATE_ACTIVE) {
 monitor_printf(mon, "migration already in progress\n");
 return -1;
@@ -436,7 +438,7 @@ int do_migrate(Monitor *mon, const QDict *qdict, QObject 
**ret_data)
 return -1;
 }
 
-s = migrate_init(mon, detach, blk, inc);
+s = migrate_init(mon, detach, ¶ms);
 
 if (strstart(uri, "tcp:", &p)) {
 ret = tcp_start_outgoing_migration(s, p);
diff --git a/migration.h b/migration.h
index 0a5e66f..2e79779 100644
--- a/migration.h
+++ b/migration.h
@@ -19,6 +19,11 @@
 #include "notify.h"
 #include "error.h"
 
+struct MigrationParams {
+int blk;
+int shared;
+};
+
 typedef struct MigrationState MigrationState;
 
 struct MigrationState
@@ -32,8 +37,7 @@ struct MigrationState
 int (*close)(MigrationState *s);
 int (*write)(MigrationState *s, const void *buff, size_t size);
 void *opaque;
-int blk;
-int shared;
+MigrationParams params;
 };
 
 void process_incoming_migration(QEMUFile *f);
diff --git a/qemu-common.h b/qemu-common.h
index b2de015..725922b 100644
--- a/qemu-common.h
+++ b/qemu-common.h
@@ -240,6 +240,7 @@ typedef struct SSIBus SSIBus;
 typedef struct EventNotifier EventNotifier;
 typedef struct VirtIODevice VirtIODevice;
 typedef struct QEMUSGList QEMUSGList;
+typedef struct MigrationParams MigrationParams;
 
 typedef uint64_t pcibus_t;
 
diff --git a/savevm.c b/savevm.c
index 891c4fd..2d8e09f 100644
--- a/savevm.c
+++ b/savevm.c
@@ -1564,8 +1564,8 @@ bool qemu_savevm_state_blocked(Monitor *mon)
 retur

Re: [Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2011-12-28 Thread Isaku Yamahata
On Thu, Dec 29, 2011 at 10:26:16AM +0900, Isaku Yamahata wrote:

> UMEM_DEV_LIST: list created umem devices
> UMEM_DEV_REATTACH: re-attach the created umem device
> UMEM_DEV_LIST and UMEM_DEV_REATTACH are used when
> the process that services page fault disappears or get stack.
> Then, administrator can list the umem devices and unblock
> the process which is waiting for page.

Here is a simple utility which cleans up umem devices.

---

/*
 * simple clean up utility of for umem devices
 *
 * Copyright (c) 2011,
 * National Institute of Advanced Industrial Science and Technology
 *
 * https://sites.google.com/site/grivonhome/quick-kvm-migration
 * Author: Isaku Yamahata 
 *
 * This program is free software; you can redistribute it and/or modify it
 * under the terms and conditions of the GNU General Public License,
 * version 2, as published by the Free Software Foundation.
 *
 * This program is distributed in the hope it will be useful, but WITHOUT
 * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
 * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
 * more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, see .
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#include 

void mark_all_pages_cached(int umem_dev_fd, const char *id, const char *name)
{
struct umem_create create;
memset(&create, 0, sizeof(create));
strncpy(create.name.id, id, sizeof(create.name.id));
strncpy(create.name.name, name, sizeof(create.name.name));

if (ioctl(umem_dev_fd, UMEM_DEV_REATTACH, &create) < 0) {
err(EXIT_FAILURE, "UMEM_DEV_REATTACH");
}

close(create.shmem_fd);
long page_size = sysconf(_SC_PAGESIZE);
int page_shift = ffs(page_size) - 1;
int umem_fd = create.umem_fd;
printf("umem_fd %d size %"PRId64"\n", umem_fd, (uint64_t)create.size);

__u64 i;
__u64 e_pgoff = (create.size + page_size - 1) >> page_shift;
#define UMEM_CACHED_MAX 512
__u64 pgoffs[UMEM_CACHED_MAX];
struct umem_page_cached page_cached = {
.nr = 0,
.pgoffs = pgoffs,
};

for (i = 0; i < e_pgoff; i++) {
page_cached.pgoffs[page_cached.nr] = i;
page_cached.nr++;
if (page_cached.nr == UMEM_CACHED_MAX) {
if (ioctl(umem_fd, UMEM_MARK_PAGE_CACHED,
  &page_cached) < 0) {
err(EXIT_FAILURE, "UMEM_MARK_PAGE_CACHED");
}
page_cached.nr = 0;
}
}
if (page_cached.nr > 0) {
if (ioctl(umem_fd, UMEM_MARK_PAGE_CACHED, &page_cached) < 0) {
err(EXIT_FAILURE, "UMEM_MARK_PAGE_CACHED");
}
}
close(umem_fd);
}

#define DEV_UMEM"/dev/umem"

int main(int argc, char **argv)
{
const char *id = NULL;
const char *name = NULL;
if (argc >= 2) {
id = argv[1];
}
if (argc >= 3) {
name = argv[2];
}

int umem_dev_fd = open(DEV_UMEM, O_RDWR);
if (umem_dev_fd < 0) {
perror("can't open "DEV_UMEM);
exit(EXIT_FAILURE);
}

struct umem_list tmp_ulist = {
.nr = 0,
};
if (ioctl(umem_dev_fd, UMEM_DEV_LIST, &tmp_ulist) < 0) {
err(EXIT_FAILURE, "UMEM_DEV_LIST");
}
if (tmp_ulist.nr == 0) {
printf("no umem files\n");
exit(EXIT_SUCCESS);
}
struct umem_list *ulist = malloc(
sizeof(*ulist) + sizeof(ulist->names[0]) * tmp_ulist.nr);
ulist->nr = tmp_ulist.nr;
if (ioctl(umem_dev_fd, UMEM_DEV_LIST, ulist) < 0) {
err(EXIT_FAILURE, "UMEM_DEV_LIST");
}

uint32_t i;
for (i = 0; i < ulist->nr; ++i) {
char *u_id = ulist->names[i].id;
char *u_name = ulist->names[i].name;

char tmp_id_c = u_id[UMEM_ID_MAX - 1];
char tmp_name_c = u_name[UMEM_NAME_MAX - 1];
u_id[UMEM_ID_MAX - 1] = '\0';
u_name[UMEM_NAME_MAX - 1] = '\0';
printf("%d: id: %s name: %s\n", i, u_id, u_name);

if ((id != NULL || name != NULL) &&
(id == NULL || strncmp(id, u_id, UMEM_ID_MAX) == 0) &&
(name == NULL ||
 strncmp(name, u_name, UMEM_NAME_MAX) == 0)) {
printf("marking cached: %d: id: %s name: %s\n",
   i, u_id, 

Re: [Qemu-devel] interrupt handling in qemu

2011-12-28 Thread Peter Maydell
On 29 December 2011 00:48, Xin Tong  wrote:
> That is my guess as well in the first place, but my QEMU is built with
> CONFIG_IOTHREAD set to 0.

Your QEMU is old -- iothread is now the only option (the config
option to use not-iothread has gone away).

> I am not 100% sure about how interrupts are delivered in QEMU, my
> guess is that some kind of timer devices will have to fire and qemu
> might have installed a signal handler and the signal handler takes the
> signal and invokes unlink_tb.  I hope you can enlighten me on that.

I think the non-iothread config used to use a signal handler, yes.
However I don't recall the details and it's all a bit irrelevant
now anyway. I recommend using an up to date source tree to do your
experiments with...

-- PMM



[Qemu-devel] [PATCH 12/21] savevm: qemu_pending_size() to return pending buffered size

2011-12-28 Thread Isaku Yamahata
This will be used later by postcopy migration.

Signed-off-by: Isaku Yamahata 
---
 hw/hw.h  |1 +
 savevm.c |5 +
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index 0b481ba..d508b4e 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -80,6 +80,7 @@ int qemu_get_byte(QEMUFile *f);
 int qemu_peek_byte(QEMUFile *f, int offset);
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
 void qemu_file_skip(QEMUFile *f, int size);
+int qemu_pending_size(const QEMUFile *f);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index ff77846..1d9e218 100644
--- a/savevm.c
+++ b/savevm.c
@@ -593,6 +593,11 @@ void qemu_file_skip(QEMUFile *f, int size)
 }
 }
 
+int qemu_pending_size(const QEMUFile *f)
+{
+return f->buf_size - f->buf_index;
+}
+
 int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
-- 
1.7.1.1




[Qemu-devel] [PATCH 16/21] umem.h: import Linux umem.h

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 linux-headers/linux/umem.h |   83 
 1 files changed, 83 insertions(+), 0 deletions(-)
 create mode 100644 linux-headers/linux/umem.h

diff --git a/linux-headers/linux/umem.h b/linux-headers/linux/umem.h
new file mode 100644
index 000..e1a8633
--- /dev/null
+++ b/linux-headers/linux/umem.h
@@ -0,0 +1,83 @@
+/*
+ * User process backed memory.
+ * This is mainly for KVM post copy.
+ *
+ * Copyright (c) 2011,
+ * National Institute of Advanced Industrial Science and Technology
+ *
+ * https://sites.google.com/site/grivonhome/quick-kvm-migration
+ * Author: Isaku Yamahata 
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
+ * with this program; if not, see .
+ */
+
+#ifndef __LINUX_UMEM_H
+#define __LINUX_UMEM_H
+
+#include 
+#include 
+
+#ifdef __KERNEL__
+#include 
+#else
+#define __user
+#endif
+
+#define UMEM_ID_MAX256
+#define UMEM_NAME_MAX  256
+
+struct umem_name {
+   char id[UMEM_ID_MAX];   /* non-zero terminated */
+   char name[UMEM_NAME_MAX];   /* non-zero terminated */
+};
+
+struct umem_list {
+   __u32 nr;
+   __u32 padding;
+   struct umem_name names[0];
+};
+
+struct umem_create {
+   __u64 size; /* in bytes */
+   __s32 umem_fd;
+   __s32 shmem_fd;
+   __u32 async_req_max;
+   __u32 sync_req_max;
+   struct umem_name name;
+};
+
+struct umem_page_request {
+   __u64 __user *pgoffs;
+   __u32 nr;
+   __u32 padding;
+};
+
+struct umem_page_cached {
+   __u64 __user *pgoffs;
+   __u32 nr;
+   __u32 padding;
+};
+
+#define UMEMIO 0x1E
+
+/* ioctl for umem_dev fd */
+#define UMEM_DEV_CREATE_UMEM   _IOWR(UMEMIO, 0x0, struct umem_create)
+#define UMEM_DEV_LIST  _IOWR(UMEMIO, 0x1, struct umem_list)
+#define UMEM_DEV_REATTACH  _IOWR(UMEMIO, 0x2, struct umem_create)
+
+/* ioctl for umem fd */
+#define UMEM_GET_PAGE_REQUEST  _IOWR(UMEMIO, 0x10, struct umem_page_request)
+#define UMEM_MARK_PAGE_CACHED  _IOW (UMEMIO, 0x11, struct umem_page_cached)
+#define UMEM_MAKE_VMA_ANONYMOUS_IO  (UMEMIO, 0x12)
+
+#endif /* __LINUX_UMEM_H */
-- 
1.7.1.1




[Qemu-devel] [PATCH 14/21] migration: export migrate_fd_completed() and migrate_fd_cleanup()

2011-12-28 Thread Isaku Yamahata
This will be used by postcopy migration.

Signed-off-by: Isaku Yamahata 
---
 migration.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/migration.c b/migration.c
index 412fdfe..057dde7 100644
--- a/migration.c
+++ b/migration.c
@@ -166,7 +166,7 @@ static void migrate_fd_monitor_suspend(MigrationState *s, 
Monitor *mon)
 }
 }
 
-static int migrate_fd_cleanup(MigrationState *s)
+int migrate_fd_cleanup(MigrationState *s)
 {
 int ret = 0;
 
@@ -198,7 +198,7 @@ void migrate_fd_error(MigrationState *s)
 migrate_fd_cleanup(s);
 }
 
-static void migrate_fd_completed(MigrationState *s)
+void migrate_fd_completed(MigrationState *s)
 {
 DPRINTF("setting completed state\n");
 if (migrate_fd_cleanup(s) < 0) {
diff --git a/migration.h b/migration.h
index 6459457..0a5e66f 100644
--- a/migration.h
+++ b/migration.h
@@ -64,7 +64,9 @@ int fd_start_incoming_migration(const char *path);
 
 int fd_start_outgoing_migration(MigrationState *s, const char *fdname);
 
+int migrate_fd_cleanup(MigrationState *s);
 void migrate_fd_error(MigrationState *s);
+void migrate_fd_completed(MigrationState *s);
 
 void migrate_fd_connect(MigrationState *s);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 19/21] postcopy: introduce -postcopy and -postcopy-flags option

2011-12-28 Thread Isaku Yamahata
This patch prepares for postcopy livemigration.
It introduces -postcopy option and its internal flag, migration_postcopy.
It introduces -postcopy-flags for chaging the behavior of incoming postcopy
mainly for benchmark/debug.

Signed-off-by: Isaku Yamahata 

postcopy: introduce -postcopy-flags option

Signed-off-by: Isaku Yamahata 
---
 migration.h |3 +++
 qemu-options.hx |   22 ++
 vl.c|8 
 3 files changed, 33 insertions(+), 0 deletions(-)

diff --git a/migration.h b/migration.h
index 2e79779..29f468c 100644
--- a/migration.h
+++ b/migration.h
@@ -105,4 +105,7 @@ void migrate_add_blocker(Error *reason);
  */
 void migrate_del_blocker(Error *reason);
 
+extern bool incoming_postcopy;
+extern unsigned long incoming_postcopy_flags;
+
 #endif
diff --git a/qemu-options.hx b/qemu-options.hx
index a60191f..5c5b8f3 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2497,6 +2497,28 @@ STEXI
 Prepare for incoming migration, listen on @var{port}.
 ETEXI
 
+DEF("postcopy", 0, QEMU_OPTION_postcopy,
+"-postcopy postcopy incoming migration when -incoming is specified\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -postcopy
+@findex -postcopy
+start incoming migration in postcopy mode.
+ETEXI
+
+DEF("postcopy-flags", HAS_ARG, QEMU_OPTION_postcopy_flags,
+"-postcopy-flags unsigned-int(flags)\n"
+"  flags for postcopy incoming migration\n"
+"   when -incoming and -postcopy are specified.\n"
+"   This is for benchmark/debug purpose (default: 0)\n",
+QEMU_ARCH_ALL)
+STEXI
+@item -postcopy-flags int
+@findex -postcopy-flags
+Specify flags for incoming postcopy migration when -incoming and -postcopy are
+specified. This is for benchamrk/debug purpose. (default: 0)
+ETEXI
+
 DEF("nodefaults", 0, QEMU_OPTION_nodefaults, \
 "-nodefaults don't create default devices\n", QEMU_ARCH_ALL)
 STEXI
diff --git a/vl.c b/vl.c
index a4c9489..5430b8c 100644
--- a/vl.c
+++ b/vl.c
@@ -188,6 +188,8 @@ int mem_prealloc = 0; /* force preallocation of physical 
target memory */
 int nb_nics;
 NICInfo nd_table[MAX_NICS];
 int autostart;
+bool incoming_postcopy = false; /* When -incoming is specified, postcopy mode 
*/
+unsigned long incoming_postcopy_flags = 0; /* flags for postcopy incoming mode 
*/
 static int rtc_utc = 1;
 static int rtc_date_offset = -1; /* -1 means no change */
 QEMUClock *rtc_clock;
@@ -2969,6 +2971,12 @@ int main(int argc, char **argv, char **envp)
 case QEMU_OPTION_incoming:
 incoming = optarg;
 break;
+case QEMU_OPTION_postcopy:
+incoming_postcopy = true;
+break;
+case QEMU_OPTION_postcopy_flags:
+incoming_postcopy_flags = strtoul(optarg, NULL, 0);
+break;
 case QEMU_OPTION_nodefaults:
 default_serial = 0;
 default_parallel = 0;
-- 
1.7.1.1




[Qemu-devel] [PATCH 09/21] exec.c: factor out qemu_get_ram_ptr()

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 cpu-all.h |2 ++
 exec.c|   51 +--
 2 files changed, 31 insertions(+), 22 deletions(-)

diff --git a/cpu-all.h b/cpu-all.h
index 9d78715..0244f7a 100644
--- a/cpu-all.h
+++ b/cpu-all.h
@@ -496,6 +496,8 @@ extern RAMList ram_list;
 extern const char *mem_path;
 extern int mem_prealloc;
 
+RAMBlock *qemu_get_ram_block(ram_addr_t adar);
+
 /* physical memory access */
 
 /* MMIO pages are identified by a combination of an IO device index and
diff --git a/exec.c b/exec.c
index 32782b4..51b8d15 100644
--- a/exec.c
+++ b/exec.c
@@ -3117,15 +3117,7 @@ void qemu_ram_remap(ram_addr_t addr, ram_addr_t length)
 }
 #endif /* !_WIN32 */
 
-/* Return a host pointer to ram allocated with qemu_ram_alloc.
-   With the exception of the softmmu code in this file, this should
-   only be used for local memory (e.g. video ram) that the device owns,
-   and knows it isn't going to access beyond the end of the block.
-
-   It should not be used for general purpose DMA.
-   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
- */
-void *qemu_get_ram_ptr(ram_addr_t addr)
+RAMBlock *qemu_get_ram_block(ram_addr_t addr)
 {
 RAMBlock *block;
 
@@ -3136,19 +3128,7 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 QLIST_REMOVE(block, next);
 QLIST_INSERT_HEAD(&ram_list.blocks, block, next);
 }
-if (xen_enabled()) {
-/* We need to check if the requested address is in the RAM
- * because we don't want to map the entire memory in QEMU.
- * In that case just map until the end of the page.
- */
-if (block->offset == 0) {
-return xen_map_cache(addr, 0, 0);
-} else if (block->host == NULL) {
-block->host =
-xen_map_cache(block->offset, block->length, 1);
-}
-}
-return block->host + (addr - block->offset);
+return block;
 }
 }
 
@@ -3159,6 +3139,33 @@ void *qemu_get_ram_ptr(ram_addr_t addr)
 }
 
 /* Return a host pointer to ram allocated with qemu_ram_alloc.
+   With the exception of the softmmu code in this file, this should
+   only be used for local memory (e.g. video ram) that the device owns,
+   and knows it isn't going to access beyond the end of the block.
+
+   It should not be used for general purpose DMA.
+   Use cpu_physical_memory_map/cpu_physical_memory_rw instead.
+ */
+void *qemu_get_ram_ptr(ram_addr_t addr)
+{
+RAMBlock *block = qemu_get_ram_block(addr);
+
+if (xen_enabled()) {
+/* We need to check if the requested address is in the RAM
+ * because we don't want to map the entire memory in QEMU.
+ * In that case just map until the end of the page.
+ */
+if (block->offset == 0) {
+return xen_map_cache(addr, 0, 0);
+} else if (block->host == NULL) {
+block->host =
+xen_map_cache(block->offset, block->length, 1);
+}
+}
+return block->host + (addr - block->offset);
+}
+
+/* Return a host pointer to ram allocated with qemu_ram_alloc.
  * Same as qemu_get_ram_ptr but avoid reordering ramblocks.
  */
 void *qemu_safe_ram_ptr(ram_addr_t addr)
-- 
1.7.1.1




[Qemu-devel] [PATCH 00/21][RFC] postcopy live migration

2011-12-28 Thread Isaku Yamahata
Intro
=
This patch series implements postcopy live migration.[1]
As discussed at KVM forum 2011, dedicated character device is used for
distributed shared memory between migration source and destination.
Now we can discuss/benchmark/compare with precopy. I believe there are
much rooms for improvement.

[1] http://wiki.qemu.org/Features/PostCopyLiveMigration


Usage
=
You need load umem character device on the host before starting migration.
Postcopy can be used for tcg and kvm accelarator. The implementation depend
on only linux umem character device. But the driver dependent code is split
into a file.
I tested only host page size == guest page size case, but the implementation
allows host page size != guest page size case.

The following options are added with this patch series.
- incoming part
  command line options
  -postcopy [-postcopy-flags ]
  where flags is for changing behavior for benchmark/debugging
  Currently the following flags are available
  0: default
  1: enable touching page request

  example:
  qemu -postcopy -incoming tcp:0: -monitor stdio -machine accel=kvm

- outging part
  options for migrate command 
  migrate [-p [-n]] URI
  -p: indicate postcopy migration
  -n: disable background transferring pages: This is for benchmark/debugging

  example:
  migrate -p -n tcp::


TODO

- benchmark/evaluation. Especially how async page fault affects the result.
- improve/optimization
  At the moment at least what I'm aware of is
  - touching pages in incoming qemu process by fd handler seems suboptimal.
creating dedicated thread?
  - making incoming socket non-blocking
  - outgoing handler seems suboptimal causing latency.
- catch up memory API change
- consider on FUSE/CUSE possibility
- and more...

basic postcopy work flow

qemu on the destination
  |
  V
open(/dev/umem)
  |
  V
UMEM_DEV_CREATE_UMEM
  |
  V
Here we have two file descriptors to
umem device and shmem file
  |
  |  umemd
  |  daemon on the destination
  |
  Vcreate pipe to communicate
fork()---,
  |  |
  V  |
close(socket)V
close(shmem)  mmap(shmem file)
  |  |
  V  V
mmap(umem device) for guest RAM   close(shmem file)
  |  |
close(umem device)   |
  |  |
  V  |
wait for ready from daemon  the owner of the socket
  | to the source  
  V  |
entering post copy stage |
start guest execution|
  |  |
  V  V
access guest RAM  UMEM_GET_PAGE_REQUEST
  |  |
  V  V
page fault -->page offset is returned
block|
 V
  pull page from the source
  write the page contents
  to the shmem.
 |
 V
unblock <-UMEM_MARK_PAGE_CACHED
the fault handler returns the page
page fault is resolved
  |
  |   pages can be sent
  |   backgroundly
  |  |
  |  V
  |   UMEM_MARK_PAGE_CACHED
  |  |
  V  V
The specified pages<-piperequest to touch pages
are made present by  |

[Qemu-devel] [PATCH 06/21] arch_init: refactor ram_save_block()

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 arch_init.c |   82 +++---
 arch_init.h |1 +
 2 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 9bc313e..982c846 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -102,6 +102,44 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
 return 1;
 }
 
+static RAMBlock *last_block_sent = NULL;
+
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset)
+{
+ram_addr_t current_addr = block->offset + offset;
+uint8_t *p;
+int cont;
+
+if (!cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) {
+return 0;
+}
+cpu_physical_memory_reset_dirty(current_addr,
+current_addr + TARGET_PAGE_SIZE,
+MIGRATION_DIRTY_FLAG);
+
+p = block->host + offset;
+cont = (block == last_block_sent) ? RAM_SAVE_FLAG_CONTINUE : 0;
+last_block_sent = block;
+
+if (is_dup_page(p, *p)) {
+qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
+if (!cont) {
+qemu_put_byte(f, strlen(block->idstr));
+qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+}
+qemu_put_byte(f, *p);
+return 1;
+}
+
+qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
+if (!cont) {
+qemu_put_byte(f, strlen(block->idstr));
+qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+}
+qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
+return TARGET_PAGE_SIZE;
+}
+
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
@@ -109,47 +147,13 @@ int ram_save_block(QEMUFile *f)
 {
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
-ram_addr_t current_addr;
 int bytes_sent = 0;
 
 if (!block)
 block = QLIST_FIRST(&ram_list.blocks);
 
-current_addr = block->offset + offset;
-
 do {
-if (cpu_physical_memory_get_dirty(current_addr, MIGRATION_DIRTY_FLAG)) 
{
-uint8_t *p;
-int cont = (block == last_block) ? RAM_SAVE_FLAG_CONTINUE : 0;
-
-cpu_physical_memory_reset_dirty(current_addr,
-current_addr + TARGET_PAGE_SIZE,
-MIGRATION_DIRTY_FLAG);
-
-p = block->host + offset;
-
-if (is_dup_page(p, *p)) {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_COMPRESS);
-if (!cont) {
-qemu_put_byte(f, strlen(block->idstr));
-qemu_put_buffer(f, (uint8_t *)block->idstr,
-strlen(block->idstr));
-}
-qemu_put_byte(f, *p);
-bytes_sent = 1;
-} else {
-qemu_put_be64(f, offset | cont | RAM_SAVE_FLAG_PAGE);
-if (!cont) {
-qemu_put_byte(f, strlen(block->idstr));
-qemu_put_buffer(f, (uint8_t *)block->idstr,
-strlen(block->idstr));
-}
-qemu_put_buffer(f, p, TARGET_PAGE_SIZE);
-bytes_sent = TARGET_PAGE_SIZE;
-}
-
-break;
-}
+bytes_sent = ram_save_page(f, block, offset);
 
 offset += TARGET_PAGE_SIZE;
 if (offset >= block->length) {
@@ -159,9 +163,10 @@ int ram_save_block(QEMUFile *f)
 block = QLIST_FIRST(&ram_list.blocks);
 }
 
-current_addr = block->offset + offset;
-
-} while (current_addr != last_block->offset + last_offset);
+if (bytes_sent > 0) {
+break;
+}
+} while (block->offset + offset != last_block->offset + last_offset);
 
 last_block = block;
 last_offset = offset;
@@ -277,6 +282,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 if (stage == 1) {
 RAMBlock *block;
 bytes_transferred = 0;
+last_block_sent = NULL;
 last_block = NULL;
 last_offset = 0;
 sort_ram_list();
diff --git a/arch_init.h b/arch_init.h
index 14d6644..118461a 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -44,6 +44,7 @@ int xen_available(void);
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
 #ifdef NEED_CPU_H
+int ram_save_page(QEMUFile *f, RAMBlock *block, ram_addr_t offset);
 void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
-- 
1.7.1.1




[Qemu-devel] [PATCH 10/21] exec.c: export last_ram_offset()

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 exec-obsolete.h |1 +
 exec.c  |4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/exec-obsolete.h b/exec-obsolete.h
index 34b9fc5..8f69f1c 100644
--- a/exec-obsolete.h
+++ b/exec-obsolete.h
@@ -25,6 +25,7 @@
 
 #ifndef CONFIG_USER_ONLY
 
+ram_addr_t qemu_last_ram_offset(void);
 ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, const char *name,
ram_addr_t size, void *host,
MemoryRegion *mr);
diff --git a/exec.c b/exec.c
index 51b8d15..c8c6692 100644
--- a/exec.c
+++ b/exec.c
@@ -2907,7 +2907,7 @@ static ram_addr_t find_ram_offset(ram_addr_t size)
 return offset;
 }
 
-static ram_addr_t last_ram_offset(void)
+ram_addr_t qemu_last_ram_offset(void)
 {
 RAMBlock *block;
 ram_addr_t last = 0;
@@ -2989,7 +2989,7 @@ ram_addr_t qemu_ram_alloc_from_ptr(DeviceState *dev, 
const char *name,
 QLIST_INSERT_HEAD(&ram_list.blocks, new_block, next);
 
 ram_list.phys_dirty = g_realloc(ram_list.phys_dirty,
-   last_ram_offset() >> TARGET_PAGE_BITS);
+qemu_last_ram_offset() >> 
TARGET_PAGE_BITS);
 memset(ram_list.phys_dirty + (new_block->offset >> TARGET_PAGE_BITS),
0xff, size >> TARGET_PAGE_BITS);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 02/21] arch_init: export RAM_SAVE_xxx flags for postcopy

2011-12-28 Thread Isaku Yamahata
Those constants will be also used by postcopy.

Signed-off-by: Isaku Yamahata 
---
 arch_init.c |7 ---
 arch_init.h |7 +++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 1947396..4ede5ad 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -87,13 +87,6 @@ const uint32_t arch_type = QEMU_ARCH;
 /***/
 /* ram save/restore */
 
-#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
-#define RAM_SAVE_FLAG_COMPRESS 0x02
-#define RAM_SAVE_FLAG_MEM_SIZE 0x04
-#define RAM_SAVE_FLAG_PAGE 0x08
-#define RAM_SAVE_FLAG_EOS  0x10
-#define RAM_SAVE_FLAG_CONTINUE 0x20
-
 static int is_dup_page(uint8_t *page, uint8_t ch)
 {
 uint32_t val = ch << 24 | ch << 16 | ch << 8 | ch;
diff --git a/arch_init.h b/arch_init.h
index 828256c..cf27625 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -32,4 +32,11 @@ int tcg_available(void);
 int kvm_available(void);
 int xen_available(void);
 
+#define RAM_SAVE_FLAG_FULL 0x01 /* Obsolete, not used anymore */
+#define RAM_SAVE_FLAG_COMPRESS 0x02
+#define RAM_SAVE_FLAG_MEM_SIZE 0x04
+#define RAM_SAVE_FLAG_PAGE 0x08
+#define RAM_SAVE_FLAG_EOS  0x10
+#define RAM_SAVE_FLAG_CONTINUE 0x20
+
 #endif
-- 
1.7.1.1




[Qemu-devel] [PATCH 0/2][RFC] postcopy migration: Linux char device for postcopy

2011-12-28 Thread Isaku Yamahata
This is Linux kernel driver for qemu/kvm postcopy live migration.
This is used by qemu/kvm postcopy live migration patch.

TODO:
- Consider FUSE/CUSE option
  So far several mmap patches for FUSE/CUSE are floating around. (their
  purpose isn't different from our purpose, though). They haven't merged
  into the upstream yet.
  The driver specific part in qemu patches is modularized. So I expect it
  wouldn't be difficult to switch kernel driver to CUSE based driver.

ioctl commands:

UMEM_DEV_CRATE_UMEM: create umem device for qemu
UMEM_DEV_LIST: list created umem devices
UMEM_DEV_REATTACH: re-attach the created umem device
  UMEM_DEV_LIST and UMEM_DEV_REATTACH are used when
  the process that services page fault disappears or get stack.
  Then, administrator can list the umem devices and unblock
  the process which is waiting for page.

UMEM_GET_PAGE_REQUEST: retrieve page fault of qemu process
UMEM_MARK_PAGE_CACHED: mark the specified pages pulled from the source
   for daemon

UMEM_MAKE_VMA_ANONYMOUS: make the specified vma in the qemu process
 This is _NOT_ implemented yet.
 anonymous I'm not sure whether this can be implemented
 or not.


---
Changes version 1 -> 2:
- make ioctl structures padded to align
- un-KVM
  KVM_VMEM -> UMEM
- dropped some ioctl commands as Avi requested

Isaku Yamahata (2):
  export necessary symbols
  umem: chardevice for kvm postcopy

 drivers/char/Kconfig  |9 +
 drivers/char/Makefile |1 +
 drivers/char/umem.c   |  898 +
 include/linux/umem.h  |   83 +
 mm/memcontrol.c   |1 +
 mm/shmem.c|1 +
 6 files changed, 993 insertions(+), 0 deletions(-)
 create mode 100644 drivers/char/umem.c
 create mode 100644 include/linux/umem.h




[Qemu-devel] [PATCH 01/21] arch_init: export sort_ram_list() and ram_save_block()

2011-12-28 Thread Isaku Yamahata
This will be used by postcopy.

Signed-off-by: Isaku Yamahata 
---
 arch_init.c |4 ++--
 migration.h |2 ++
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d4c92b0..1947396 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -112,7 +112,7 @@ static int is_dup_page(uint8_t *page, uint8_t ch)
 static RAMBlock *last_block;
 static ram_addr_t last_offset;
 
-static int ram_save_block(QEMUFile *f)
+int ram_save_block(QEMUFile *f)
 {
 RAMBlock *block = last_block;
 ram_addr_t offset = last_offset;
@@ -229,7 +229,7 @@ static int block_compar(const void *a, const void *b)
 return 0;
 }
 
-static void sort_ram_list(void)
+void sort_ram_list(void)
 {
 RAMBlock *block, *nblock, **blocks;
 int n;
diff --git a/migration.h b/migration.h
index 372b066..e79a69b 100644
--- a/migration.h
+++ b/migration.h
@@ -78,6 +78,8 @@ uint64_t ram_bytes_remaining(void);
 uint64_t ram_bytes_transferred(void);
 uint64_t ram_bytes_total(void);
 
+void sort_ram_list(void);
+int ram_save_block(QEMUFile *f);
 int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 1/2] export necessary symbols

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 mm/memcontrol.c |1 +
 mm/shmem.c  |1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index b63f5f7..85530fc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2807,6 +2807,7 @@ int mem_cgroup_cache_charge(struct page *page, struct 
mm_struct *mm,
 
return ret;
 }
+EXPORT_SYMBOL_GPL(mem_cgroup_cache_charge);
 
 /*
  * While swap-in, try_charge -> commit or cancel, the page is locked.
diff --git a/mm/shmem.c b/mm/shmem.c
index d672250..d137a37 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2546,6 +2546,7 @@ int shmem_zero_setup(struct vm_area_struct *vma)
vma->vm_flags |= VM_CAN_NONLINEAR;
return 0;
 }
+EXPORT_SYMBOL_GPL(shmem_zero_setup);
 
 /**
  * shmem_read_mapping_page_gfp - read into page cache, using specified page 
allocation flags.
-- 
1.7.1.1




[Qemu-devel] [PATCH 03/21] arch_init/ram_save: introduce constant for ram save version = 4

2011-12-28 Thread Isaku Yamahata
Introduce RAM_SAVE_VERSION_ID to represent version_id for ram save format.

Signed-off-by: Isaku Yamahata 
---
 arch_init.c |2 +-
 arch_init.h |2 ++
 vl.c|4 ++--
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 4ede5ad..5ad6956 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -371,7 +371,7 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 int flags;
 int error;
 
-if (version_id < 3 || version_id > 4) {
+if (version_id < 3 || version_id > RAM_SAVE_VERSION_ID) {
 return -EINVAL;
 }
 
diff --git a/arch_init.h b/arch_init.h
index cf27625..a3aa059 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -39,4 +39,6 @@ int xen_available(void);
 #define RAM_SAVE_FLAG_EOS  0x10
 #define RAM_SAVE_FLAG_CONTINUE 0x20
 
+#define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
+
 #endif
diff --git a/vl.c b/vl.c
index c03abb6..a4c9489 100644
--- a/vl.c
+++ b/vl.c
@@ -3266,8 +3266,8 @@ int main(int argc, char **argv, char **envp)
 default_drive(default_sdcard, snapshot, machine->use_scsi,
   IF_SD, 0, SD_OPTS);
 
-register_savevm_live(NULL, "ram", 0, 4, NULL, ram_save_live, NULL,
- ram_load, NULL);
+register_savevm_live(NULL, "ram", 0, RAM_SAVE_VERSION_ID, NULL,
+ ram_save_live, NULL, ram_load, NULL);
 
 if (nb_numa_nodes > 0) {
 int i;
-- 
1.7.1.1




[Qemu-devel] [PATCH 13/21] savevm, buffered_file: introduce method to drain buffer of buffered file

2011-12-28 Thread Isaku Yamahata
Introduce a new method to drain the buffer of QEMUBufferedFile.
When postcopy migration, buffer size can increase unboundedly.
To keep the buffer size reasonably small, introduce the method to
wait for buffer to drain.

Signed-off-by: Isaku Yamahata 
---
 buffered_file.c |   20 +++-
 buffered_file.h |1 +
 hw/hw.h |1 +
 savevm.c|6 ++
 4 files changed, 23 insertions(+), 5 deletions(-)

diff --git a/buffered_file.c b/buffered_file.c
index fed9a22..be1a192 100644
--- a/buffered_file.c
+++ b/buffered_file.c
@@ -168,6 +168,15 @@ static int buffered_put_buffer(void *opaque, const uint8_t 
*buf, int64_t pos, in
 return offset;
 }
 
+static void buffered_drain(QEMUFileBuffered *s)
+{
+while (!qemu_file_get_error(s->file) && s->buffer_size) {
+buffered_flush(s);
+if (s->freeze_output)
+s->wait_for_unfreeze(s->opaque);
+}
+}
+
 static int buffered_close(void *opaque)
 {
 QEMUFileBuffered *s = opaque;
@@ -175,11 +184,7 @@ static int buffered_close(void *opaque)
 
 DPRINTF("closing\n");
 
-while (!qemu_file_get_error(s->file) && s->buffer_size) {
-buffered_flush(s);
-if (s->freeze_output)
-s->wait_for_unfreeze(s->opaque);
-}
+buffered_drain(s);
 
 ret = s->close(s->opaque);
 
@@ -289,3 +294,8 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque,
 
 return s->file;
 }
+
+void qemu_buffered_file_drain_buffer(void *buffered_file)
+{
+buffered_drain(buffered_file);
+}
diff --git a/buffered_file.h b/buffered_file.h
index 98d358b..cd8e1e8 100644
--- a/buffered_file.h
+++ b/buffered_file.h
@@ -26,5 +26,6 @@ QEMUFile *qemu_fopen_ops_buffered(void *opaque, size_t 
xfer_limit,
   BufferedPutReadyFunc *put_ready,
   BufferedWaitForUnfreezeFunc 
*wait_for_unfreeze,
   BufferedCloseFunc *close);
+void qemu_buffered_file_drain_buffer(void *buffered_file);
 
 #endif
diff --git a/hw/hw.h b/hw/hw.h
index d508b4e..a59b770 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -61,6 +61,7 @@ QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
 void qemu_fflush(QEMUFile *f);
+void qemu_buffered_file_drain(QEMUFile *f);
 int qemu_fclose(QEMUFile *f);
 void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, int size);
 void qemu_put_byte(QEMUFile *f, int v);
diff --git a/savevm.c b/savevm.c
index 1d9e218..891c4fd 100644
--- a/savevm.c
+++ b/savevm.c
@@ -83,6 +83,7 @@
 #include "qemu-queue.h"
 #include "qemu-timer.h"
 #include "cpus.h"
+#include "buffered_file.h"
 
 #define SELF_ANNOUNCE_ROUNDS 5
 
@@ -475,6 +476,11 @@ void qemu_fflush(QEMUFile *f)
 }
 }
 
+void qemu_buffered_file_drain(QEMUFile *f)
+{
+qemu_buffered_file_drain_buffer(f->opaque);
+}
+
 static void qemu_fill_buffer(QEMUFile *f)
 {
 int len;
-- 
1.7.1.1




[Qemu-devel] [PATCH 08/21] arch_init/ram_load: refactor ram_load

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 arch_init.c |   67 +-
 arch_init.h |1 +
 2 files changed, 39 insertions(+), 29 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 249b440..bc53092 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -395,6 +395,41 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 return ram_load_host_from_stream_offset(f, offset, flags, &block);
 }
 
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes)
+{
+/* Synchronize RAM block list */
+char id[256];
+ram_addr_t length;
+
+while (total_ram_bytes) {
+RAMBlock *block;
+uint8_t len;
+
+len = qemu_get_byte(f);
+qemu_get_buffer(f, (uint8_t *)id, len);
+id[len] = 0;
+length = qemu_get_be64(f);
+
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+if (!strncmp(id, block->idstr, sizeof(id))) {
+if (block->length != length)
+return -EINVAL;
+break;
+}
+}
+
+if (!block) {
+fprintf(stderr, "Unknown ramblock \"%s\", cannot "
+"accept migration\n", id);
+return -EINVAL;
+}
+
+total_ram_bytes -= length;
+}
+
+return 0;
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
@@ -417,35 +452,9 @@ int ram_load(QEMUFile *f, void *opaque, int version_id)
 return -EINVAL;
 }
 } else {
-/* Synchronize RAM block list */
-char id[256];
-ram_addr_t length;
-ram_addr_t total_ram_bytes = addr;
-
-while (total_ram_bytes) {
-RAMBlock *block;
-uint8_t len;
-
-len = qemu_get_byte(f);
-qemu_get_buffer(f, (uint8_t *)id, len);
-id[len] = 0;
-length = qemu_get_be64(f);
-
-QLIST_FOREACH(block, &ram_list.blocks, next) {
-if (!strncmp(id, block->idstr, sizeof(id))) {
-if (block->length != length)
-return -EINVAL;
-break;
-}
-}
-
-if (!block) {
-fprintf(stderr, "Unknown ramblock \"%s\", cannot "
-"accept migration\n", id);
-return -EINVAL;
-}
-
-total_ram_bytes -= length;
+error = ram_load_mem_size(f, addr);
+if (error) {
+return error;
 }
 }
 }
diff --git a/arch_init.h b/arch_init.h
index 118461a..72b906d 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -49,6 +49,7 @@ void *ram_load_host_from_stream_offset(QEMUFile *f,
ram_addr_t offset,
int flags,
RAMBlock **last_blockp);
+int ram_load_mem_size(QEMUFile *f, ram_addr_t total_ram_bytes);
 #endif
 
 #endif
-- 
1.7.1.1




[Qemu-devel] [PATCH 17/21] update-linux-headers.sh: teach umem.h to update-linux-headers.sh

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 scripts/update-linux-headers.sh |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/scripts/update-linux-headers.sh b/scripts/update-linux-headers.sh
index 9d2a4bc..2afdd54 100755
--- a/scripts/update-linux-headers.sh
+++ b/scripts/update-linux-headers.sh
@@ -43,7 +43,7 @@ done
 
 rm -rf "$output/linux-headers/linux"
 mkdir -p "$output/linux-headers/linux"
-for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h; do
+for header in kvm.h kvm_para.h vhost.h virtio_config.h virtio_ring.h umem.h; do
 cp "$tmpdir/include/linux/$header" "$output/linux-headers/linux"
 done
 if [ -L "$linux/source" ]; then
-- 
1.7.1.1




[Qemu-devel] [PATCH 05/21] arch_init/ram_save_live: factor out RAM_SAVE_FLAG_MEM_SIZE case

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 arch_init.c |   21 ++---
 migration.h |1 +
 2 files changed, 15 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index d55e39c..9bc313e 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -243,6 +243,19 @@ void sort_ram_list(void)
 g_free(blocks);
 }
 
+void ram_save_live_mem_size(QEMUFile *f)
+{
+RAMBlock *block;
+
+qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
+
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+qemu_put_byte(f, strlen(block->idstr));
+qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
+qemu_put_be64(f, block->length);
+}
+}
+
 int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
 {
 ram_addr_t addr;
@@ -282,13 +295,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 /* Enable dirty memory tracking */
 cpu_physical_memory_set_dirty_tracking(1);
 
-qemu_put_be64(f, ram_bytes_total() | RAM_SAVE_FLAG_MEM_SIZE);
-
-QLIST_FOREACH(block, &ram_list.blocks, next) {
-qemu_put_byte(f, strlen(block->idstr));
-qemu_put_buffer(f, (uint8_t *)block->idstr, strlen(block->idstr));
-qemu_put_be64(f, block->length);
-}
+ram_save_live_mem_size(f);
 }
 
 bytes_transferred_last = bytes_transferred;
diff --git a/migration.h b/migration.h
index e79a69b..cb4a2d5 100644
--- a/migration.h
+++ b/migration.h
@@ -80,6 +80,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 07/21] arch_init/ram_save_live: factor out ram_save_limit

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 arch_init.c |   28 +---
 migration.h |1 +
 2 files changed, 18 insertions(+), 11 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 982c846..249b440 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -261,9 +261,24 @@ void ram_save_live_mem_size(QEMUFile *f)
 }
 }
 
+void ram_save_memory_set_dirty(void)
+{
+RAMBlock *block;
+
+QLIST_FOREACH(block, &ram_list.blocks, next) {
+ram_addr_t addr;
+for (addr = block->offset; addr < block->offset + block->length;
+ addr += TARGET_PAGE_SIZE) {
+if (!cpu_physical_memory_get_dirty(addr,
+   MIGRATION_DIRTY_FLAG)) {
+cpu_physical_memory_set_dirty(addr);
+}
+}
+}
+}
+
 int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque)
 {
-ram_addr_t addr;
 uint64_t bytes_transferred_last;
 double bwidth = 0;
 uint64_t expected_time = 0;
@@ -280,7 +295,6 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 }
 
 if (stage == 1) {
-RAMBlock *block;
 bytes_transferred = 0;
 last_block_sent = NULL;
 last_block = NULL;
@@ -288,15 +302,7 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 sort_ram_list();
 
 /* Make sure all dirty bits are set */
-QLIST_FOREACH(block, &ram_list.blocks, next) {
-for (addr = block->offset; addr < block->offset + block->length;
- addr += TARGET_PAGE_SIZE) {
-if (!cpu_physical_memory_get_dirty(addr,
-   MIGRATION_DIRTY_FLAG)) {
-cpu_physical_memory_set_dirty(addr);
-}
-}
-}
+ram_save_memory_set_dirty();
 
 /* Enable dirty memory tracking */
 cpu_physical_memory_set_dirty_tracking(1);
diff --git a/migration.h b/migration.h
index cb4a2d5..6459457 100644
--- a/migration.h
+++ b/migration.h
@@ -80,6 +80,7 @@ uint64_t ram_bytes_total(void);
 
 void sort_ram_list(void);
 int ram_save_block(QEMUFile *f);
+void ram_save_memory_set_dirty(void);
 void ram_save_live_mem_size(QEMUFile *f);
 int ram_save_live(Monitor *mon, QEMUFile *f, int stage, void *opaque);
 int ram_load(QEMUFile *f, void *opaque, int version_id);
-- 
1.7.1.1




[Qemu-devel] [PATCH 18/21] configure: add CONFIG_POSTCOPY option

2011-12-28 Thread Isaku Yamahata
Add enable/disable postcopy mode. No dynamic test yet.

Signed-off-by: Isaku Yamahata 
---
 configure |   12 
 1 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/configure b/configure
index 640e815..440fa9e 100755
--- a/configure
+++ b/configure
@@ -190,6 +190,7 @@ opengl=""
 zlib="yes"
 guest_agent="yes"
 libiscsi=""
+postcopy="yes"
 
 # parse CC options first
 for opt do
@@ -789,6 +790,10 @@ for opt do
   ;;
   --disable-guest-agent) guest_agent="no"
   ;;
+  --enable-postcopy) postcopy="yes"
+  ;;
+  --disable-postcopy) postcopy="no"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -1075,6 +1080,8 @@ echo "  --disable-usb-redir  disable usb network 
redirection support"
 echo "  --enable-usb-redir   enable usb network redirection support"
 echo "  --disable-guest-agentdisable building of the QEMU Guest Agent"
 echo "  --enable-guest-agent enable building of the QEMU Guest Agent"
+echo "  --disable-postcopy   disable postcopy mode for live migration"
+echo "  --enable-postcopyenable postcopy mode for live migration"
 echo ""
 echo "NOTE: The object files are built at the place where configure is 
launched"
 exit 1
@@ -2879,6 +2886,7 @@ echo "usb net redir $usb_redir"
 echo "OpenGL support$opengl"
 echo "libiscsi support  $libiscsi"
 echo "build guest agent $guest_agent"
+echo "postcopy support  $postcopy"
 
 if test "$sdl_too_old" = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -3192,6 +3200,10 @@ if test "$libiscsi" = "yes" ; then
   echo "CONFIG_LIBISCSI=y" >> $config_host_mak
 fi
 
+if test "$postcopy" = "yes" ; then
+  echo "CONFIG_POSTCOPY=y" >> $config_host_mak
+fi
+
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
   echo "CONFIG_BSD=y" >> $config_host_mak
-- 
1.7.1.1




[Qemu-devel] [PATCH 11/21] savevm: export qemu_peek_buffer, qemu_peek_byte, qemu_file_skip

2011-12-28 Thread Isaku Yamahata
Those will be used by postcopy.

Signed-off-by: Isaku Yamahata 
---
 hw/hw.h  |3 +++
 savevm.c |6 +++---
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/hw/hw.h b/hw/hw.h
index efa04d1..0b481ba 100644
--- a/hw/hw.h
+++ b/hw/hw.h
@@ -77,6 +77,9 @@ void qemu_put_be32(QEMUFile *f, unsigned int v);
 void qemu_put_be64(QEMUFile *f, uint64_t v);
 int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size);
 int qemu_get_byte(QEMUFile *f);
+int qemu_peek_byte(QEMUFile *f, int offset);
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset);
+void qemu_file_skip(QEMUFile *f, int size);
 
 static inline unsigned int qemu_get_ubyte(QEMUFile *f)
 {
diff --git a/savevm.c b/savevm.c
index f153c25..ff77846 100644
--- a/savevm.c
+++ b/savevm.c
@@ -586,14 +586,14 @@ void qemu_put_byte(QEMUFile *f, int v)
 qemu_fflush(f);
 }
 
-static void qemu_file_skip(QEMUFile *f, int size)
+void qemu_file_skip(QEMUFile *f, int size)
 {
 if (f->buf_index + size <= f->buf_size) {
 f->buf_index += size;
 }
 }
 
-static int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
+int qemu_peek_buffer(QEMUFile *f, uint8_t *buf, int size, size_t offset)
 {
 int pending;
 int index;
@@ -641,7 +641,7 @@ int qemu_get_buffer(QEMUFile *f, uint8_t *buf, int size)
 return done;
 }
 
-static int qemu_peek_byte(QEMUFile *f, int offset)
+int qemu_peek_byte(QEMUFile *f, int offset)
 {
 int index = f->buf_index + offset;
 
-- 
1.7.1.1




[Qemu-devel] [PATCH 04/21] arch_init: refactor host_from_stream_offset()

2011-12-28 Thread Isaku Yamahata
Signed-off-by: Isaku Yamahata 
---
 arch_init.c |   25 ++---
 arch_init.h |9 +
 2 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/arch_init.c b/arch_init.c
index 5ad6956..d55e39c 100644
--- a/arch_init.c
+++ b/arch_init.c
@@ -335,21 +335,22 @@ int ram_save_live(Monitor *mon, QEMUFile *f, int stage, 
void *opaque)
 return (stage == 2) && (expected_time <= migrate_max_downtime());
 }
 
-static inline void *host_from_stream_offset(QEMUFile *f,
-ram_addr_t offset,
-int flags)
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+   ram_addr_t offset,
+   int flags,
+   RAMBlock **last_blockp)
 {
-static RAMBlock *block = NULL;
+RAMBlock *block;
 char id[256];
 uint8_t len;
 
 if (flags & RAM_SAVE_FLAG_CONTINUE) {
-if (!block) {
+if (!(*last_blockp)) {
 fprintf(stderr, "Ack, bad migration stream!\n");
 return NULL;
 }
 
-return block->host + offset;
+return (*last_blockp)->host + offset;
 }
 
 len = qemu_get_byte(f);
@@ -357,14 +358,24 @@ static inline void *host_from_stream_offset(QEMUFile *f,
 id[len] = 0;
 
 QLIST_FOREACH(block, &ram_list.blocks, next) {
-if (!strncmp(id, block->idstr, sizeof(id)))
+if (!strncmp(id, block->idstr, sizeof(id))) {
+*last_blockp = block;
 return block->host + offset;
+}
 }
 
 fprintf(stderr, "Can't find block %s!\n", id);
 return NULL;
 }
 
+static inline void *host_from_stream_offset(QEMUFile *f,
+ram_addr_t offset,
+int flags)
+{
+static RAMBlock *block = NULL;
+return ram_load_host_from_stream_offset(f, offset, flags, &block);
+}
+
 int ram_load(QEMUFile *f, void *opaque, int version_id)
 {
 ram_addr_t addr;
diff --git a/arch_init.h b/arch_init.h
index a3aa059..14d6644 100644
--- a/arch_init.h
+++ b/arch_init.h
@@ -1,6 +1,8 @@
 #ifndef QEMU_ARCH_INIT_H
 #define QEMU_ARCH_INIT_H
 
+#include "qemu-common.h"
+
 extern const char arch_config_name[];
 
 enum {
@@ -41,4 +43,11 @@ int xen_available(void);
 
 #define RAM_SAVE_VERSION_ID 4 /* currently version 4 */
 
+#ifdef NEED_CPU_H
+void *ram_load_host_from_stream_offset(QEMUFile *f,
+   ram_addr_t offset,
+   int flags,
+   RAMBlock **last_blockp);
+#endif
+
 #endif
-- 
1.7.1.1




Re: [Qemu-devel] [PATCH 2/3] Add a new PCI region type to supports 64 bit ranges

2011-12-28 Thread Kevin O'Connor
On Wed, Dec 28, 2011 at 06:26:05PM +1300, Alexey Korolev wrote:
> This patch adds PCI_REGION_TYPE_PREFMEM_64 region type and modifies types of
> variables to make it possible to work with 64 bit addresses.
> 
> Why I've added just one region type PCI_REGION_TYPE_PREFMEM_64 and haven't
> added PCI_REGION_TYPE_MEM_64? According to PCI architecture
> specification, the
> bridges can describe 64bit ranges for prefetchable type of memory
> only. So it's very
> unlikely that devices exporting 64bit non-prefetchable BARs. Anyway
> this code will work
> with 64bit non-prefetchable BARs unless the PCI device is not behind
> the secondary bus.
[...]
> --- a/src/pciinit.c
> +++ b/src/pciinit.c
> @@ -22,6 +22,7 @@ enum pci_region_type {
>  PCI_REGION_TYPE_IO,
>  PCI_REGION_TYPE_MEM,
>  PCI_REGION_TYPE_PREFMEM,
> +PCI_REGION_TYPE_PREFMEM_64,
>  PCI_REGION_TYPE_COUNT,
>  };

This doesn't seem right.  A 64bit bar is not a new category - it's
just a way of representing larger values within the existing
categories.  Tracking of 64bit prefmem sections separately from
regular prefmem sections doesn't make sense, because both need to be
allocated from the same pool when behind a bridge.

>  struct pci_bus {
>  struct {
> -/* pci region stats */
> -u32 count[32 - PCI_MEM_INDEX_SHIFT];
> -u32 sum, max;
>  /* seconday bus region sizes */
>  u32 size;
> -/* pci region assignments */
> -u32 bases[32 - PCI_MEM_INDEX_SHIFT];
> -u32 base;
> +/* pci region stats */
> +u32 max;
> +u32 count[32 - PCI_MEM_INDEX_SHIFT];
> +s64 sum;
> + /* pci region assignments */
> +s64 base;
> +s64 bases[32 - PCI_MEM_INDEX_SHIFT];

Why the choice of s64 over u64?  Given the amount of bit manipulation
on these values, I think using u64 would be safer.

> @@ -69,6 +72,8 @@ static enum pci_region_type pci_addr_to_type(u32 addr)
>  {
>  if (addr & PCI_BASE_ADDRESS_SPACE_IO)
>  return PCI_REGION_TYPE_IO;
> +if (addr & PCI_BASE_ADDRESS_MEM_TYPE_64)
> +return PCI_REGION_TYPE_PREFMEM_64;

This seems dangerous - a 64bit bar can be non-prefetchable - getting
this wrong could cause random (hard to debug) crashes.

> @@ -378,19 +383,16 @@ static void pci_bios_check_devices(struct
> pci_bus *busses)
>  struct pci_bus *bus = &busses[pci_bdf_to_bus(pci->bdf)];
>  int i;
>  for (i = 0; i < PCI_NUM_REGIONS; i++) {
> -u32 val, size;
> +u32 val, size, type;
>  pci_bios_get_bar(pci, i, &val, &size);
>  if (val == 0)
>  continue;
> 
> -pci_bios_bus_reserve(bus, pci_addr_to_type(val), size);
> +type = pci_addr_to_type(val);
> +pci_bios_bus_reserve(bus, type, size);
>  pci->bars[i].addr = val;
>  pci->bars[i].size = size;
> -pci->bars[i].is64 = (!(val & PCI_BASE_ADDRESS_SPACE_IO) &&
> - (val & PCI_BASE_ADDRESS_MEM_TYPE_MASK)
> - == PCI_BASE_ADDRESS_MEM_TYPE_64);
> -
> -if (pci->bars[i].is64)
> +if (type == PCI_REGION_TYPE_PREFMEM_64)
>  i++;

If there is a 64bit bar, then the size could be over 32bits - the code
really needs to handle this (or it could cause overlapping regions
which result in random crashes).

-Kevin



Re: [Qemu-devel] [PATCH 2/3] Add a new PCI region type to supports 64 bit ranges

2011-12-28 Thread Alexey Korolev

On 29/12/11 00:30, Michael S. Tsirkin wrote:

On Wed, Dec 28, 2011 at 06:26:05PM +1300, Alexey Korolev wrote:

This patch adds PCI_REGION_TYPE_PREFMEM_64 region type and modifies types of
variables to make it possible to work with 64 bit addresses.

Why I've added just one region type PCI_REGION_TYPE_PREFMEM_64 and haven't
added PCI_REGION_TYPE_MEM_64? According to PCI architecture
specification, the
bridges can describe 64bit ranges for prefetchable type of memory
only. So it's very
unlikely that devices exporting 64bit non-prefetchable BARs.

Might happen for system devices I guess.


Anyway
this code will work
with 64bit non-prefetchable BARs unless the PCI device is not behind
the secondary bus.

So what happens if such a device is on root bus?
If a device is on the root bus and have BAR flags 0x4 (TYPE_MEMORY and 
64 bit),
memory will be allocated in 64bit range all flags remain the same. I did 
this just out

of curiosity and this appears to work well.








Re: [Qemu-devel] [PATCH 2/3] Add a new PCI region type to supports 64 bit ranges

2011-12-28 Thread Alexey Korolev

On 29/12/11 15:56, Kevin O'Connor wrote:

On Wed, Dec 28, 2011 at 06:26:05PM +1300, Alexey Korolev wrote:

This patch adds PCI_REGION_TYPE_PREFMEM_64 region type and modifies types of
variables to make it possible to work with 64 bit addresses.

Why I've added just one region type PCI_REGION_TYPE_PREFMEM_64 and haven't
added PCI_REGION_TYPE_MEM_64? According to PCI architecture
specification, the
bridges can describe 64bit ranges for prefetchable type of memory
only. So it's very
unlikely that devices exporting 64bit non-prefetchable BARs. Anyway
this code will work
with 64bit non-prefetchable BARs unless the PCI device is not behind
the secondary bus.

[...]

--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -22,6 +22,7 @@ enum pci_region_type {
  PCI_REGION_TYPE_IO,
  PCI_REGION_TYPE_MEM,
  PCI_REGION_TYPE_PREFMEM,
+PCI_REGION_TYPE_PREFMEM_64,
  PCI_REGION_TYPE_COUNT,
  };

This doesn't seem right.  A 64bit bar is not a new category - it's
just a way of representing larger values within the existing
categories.  Tracking of 64bit prefmem sections separately from
regular prefmem sections doesn't make sense, because both need to be
allocated from the same pool when behind a bridge.

It's a way to account all memory sections on the root bus, as
on the root bus we can have all 4 regions.
I don't like this part as well, it causes confusions.

One possible solution is to have different descriptors for secondary buses
and the root bus.  In that case we can have 3 sections per secondary bus
and the root bus will contain memory regions without any 'physical' meaning.


  struct pci_bus {
  struct {
-/* pci region stats */
-u32 count[32 - PCI_MEM_INDEX_SHIFT];
-u32 sum, max;
  /* seconday bus region sizes */
  u32 size;
-/* pci region assignments */
-u32 bases[32 - PCI_MEM_INDEX_SHIFT];
-u32 base;
+/* pci region stats */
+u32 max;
+u32 count[32 - PCI_MEM_INDEX_SHIFT];
+s64 sum;
+ /* pci region assignments */
+s64 base;
+s64 bases[32 - PCI_MEM_INDEX_SHIFT];

Why the choice of s64 over u64?  Given the amount of bit manipulation
on these values, I think using u64 would be safer.

No problem.
Addresses could not exceed 40bit's so we basically may touch the last bit
only when negative value is stored.



@@ -69,6 +72,8 @@ static enum pci_region_type pci_addr_to_type(u32 addr)
  {
  if (addr&  PCI_BASE_ADDRESS_SPACE_IO)
  return PCI_REGION_TYPE_IO;
+if (addr&  PCI_BASE_ADDRESS_MEM_TYPE_64)
+return PCI_REGION_TYPE_PREFMEM_64;

This seems dangerous - a 64bit bar can be non-prefetchable - getting
this wrong could cause random (hard to debug) crashes.

It is bit confusing but this doesn't touch actual types. So even if
we have a 64bit not-prefetchable BAR code will be working.

@@ -378,19 +383,16 @@ static void pci_bios_check_devices(struct
pci_bus *busses)
  struct pci_bus *bus =&busses[pci_bdf_to_bus(pci->bdf)];
  int i;
  for (i = 0; i<  PCI_NUM_REGIONS; i++) {
-u32 val, size;
+u32 val, size, type;
  pci_bios_get_bar(pci, i,&val,&size);
  if (val == 0)
  continue;

-pci_bios_bus_reserve(bus, pci_addr_to_type(val), size);
+type = pci_addr_to_type(val);
+pci_bios_bus_reserve(bus, type, size);
  pci->bars[i].addr = val;
  pci->bars[i].size = size;
-pci->bars[i].is64 = (!(val&  PCI_BASE_ADDRESS_SPACE_IO)&&
- (val&  PCI_BASE_ADDRESS_MEM_TYPE_MASK)
- == PCI_BASE_ADDRESS_MEM_TYPE_64);
-
-if (pci->bars[i].is64)
+if (type == PCI_REGION_TYPE_PREFMEM_64)
  i++;

If there is a 64bit bar, then the size could be over 32bits - the code
really needs to handle this (or it could cause overlapping regions
which result in random crashes).

Agree, something has hold in my mind that BAR size it is limited to 4GB,
but just looked into PCI spec - there are no limitations stated.
Well this requires bigger changes, as 64bit BAR size accounting touches
more computations comparing to 64bit BAR addresses.

It makes sense to figure out how shall we account all memory sections on 
the root bus.
Will it be separated structures for root bus and secondary buses? Do you 
have another idea?






Re: [Qemu-devel] [PATCH 2/3] Add a new PCI region type to supports 64 bit ranges

2011-12-28 Thread Alexey Korolev



@@ -69,6 +72,8 @@ static enum pci_region_type pci_addr_to_type(u32 addr)
  {
  if (addr&  PCI_BASE_ADDRESS_SPACE_IO)
  return PCI_REGION_TYPE_IO;
+if (addr&  PCI_BASE_ADDRESS_MEM_TYPE_64)
+return PCI_REGION_TYPE_PREFMEM_64;

This seems dangerous - a 64bit bar can be non-prefetchable - getting
this wrong could cause random (hard to debug) crashes.


Just out of curiosity - how this could happen? Having 64bit
non-prefetchable BAR implies that the device is not behind
any bridge (as bridges describe 64bit ranges for prefetchable
memory only). Is it possible on nowadays systems?



Re: [Qemu-devel] [PATCH 3/3] Changes related to secondary buses and 64bit regions

2011-12-28 Thread Alexey Korolev

On 29/12/11 00:43, Michael S. Tsirkin wrote:

On Wed, Dec 28, 2011 at 06:35:55PM +1300, Alexey Korolev wrote:

All devices behind a bridge need to have all their regions consecutive and
not overlapping with all the normal memory ranges.
Since prefetchable memory is described by one record, we must avoid the 
situations
when 32bit and 64bit prefetchable regions are present within one secondary bus.

How do we avoid this? Assume we have two devices:
a 32 bit and a 64 bit one, behind a bridge.
There are two main things we can do:
1. Make the 64 bit device only use the low 32 bit
 It was my first implementation. Unfortunately older versions of Linux 
(Like 2.6.18) hang during startup with this.
As far as I remember it was qemu-0.15 so may be 1.0 have no such an 
issue. I will check this.

2. Put the 32 bit one in the non-prefetcheable range
I'd rather not do this. Bios should not change memory region types. It 
will confuse guest OS drivers.


1 probably makes more sense for small BARs
2 probably makes more sense for large ones

Try also looking at e.g. linux bus scanning code for more ideas.



Another thing I don't see addressed here is that support for 64 bit
ranges is I think optional in the bridge.


Signed-off-by: Alexey Korolev

Whitespace is corrupted: checkyour mail setup?
There should be spaces around operators:
a<  b, I see a<  b. Sometimes a<   b (two spaces after).

Yes, it's thunderbird :(. Sorry about that.
It seems the patches need to have some functional changes.

---
  src/pciinit.c |   69 +++-
  1 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index a574e38..92942d5 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -17,6 +17,7 @@

  #define PCI_BRIDGE_IO_MIN  0x1000
  #define PCI_BRIDGE_MEM_MIN   0x10
+#define PCI_BRIDGE_MEM_MAX   0x8000

  enum pci_region_type {
  PCI_REGION_TYPE_IO,
@@ -45,6 +46,7 @@ struct pci_bus {
  s64 base;
  s64 bases[32 - PCI_MEM_INDEX_SHIFT];
  } r[PCI_REGION_TYPE_COUNT];
+int is64;
  struct pci_device *bus_dev;
  };

@@ -369,6 +371,26 @@ static void pci_bios_bus_reserve(struct pci_bus *bus, int 
type, u32 size)
  bus->r[type].max = size;
  }

+static void pci_bios_secondary_bus_reserve(struct pci_bus *parent,
+   struct pci_bus *s, int type)
+{
+u32 limit = (type == PCI_REGION_TYPE_IO) ?
+PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
+
+if (s->r[type].sum>   PCI_BRIDGE_MEM_MAX) {
+panic("Size: %08x%08x is too big\n",
+(u32)s->r[type].sum, (u32)(s->r[type].sum>>32));
+}
+s->r[type].size = (u32)s->r[type].sum;
+if (s->r[type].size<   limit)
+s->r[type].size = limit;
+s->r[type].size = pci_size_roundup(s->r[type].size);
+
+pci_bios_bus_reserve(parent, type, s->r[type].size);
+dprintf(1, "size: %x, type %s\n",
+s->r[type].size, region_type_name[type]);
+}
+
  static void pci_bios_check_devices(struct pci_bus *busses)
  {
  dprintf(1, "PCI: check devices\n");
@@ -392,8 +414,10 @@ static void pci_bios_check_devices(struct pci_bus *busses)
  pci_bios_bus_reserve(bus, type, size);
  pci->bars[i].addr = val;
  pci->bars[i].size = size;
-if (type == PCI_REGION_TYPE_PREFMEM_64)
+if (type == PCI_REGION_TYPE_PREFMEM_64) {
+bus->is64 = 1;
  i++;
+}
  }
  }

@@ -404,22 +428,21 @@ static void pci_bios_check_devices(struct pci_bus *busses)
  if (!s->bus_dev)
  continue;
  struct pci_bus *parent =&busses[pci_bdf_to_bus(s->bus_dev->bdf)];
+
+if (s->r[PCI_REGION_TYPE_PREFMEM_64].sum&&

Space before&&  here and elsewhere.


+   s->r[PCI_REGION_TYPE_PREFMEM].sum) {
+   panic("Sparse PCI prefmem regions on the bus %d\n", secondary_bus);
+}
+
+dprintf(1, "PCI: secondary bus %d\n", secondary_bus);
  int type;
  for (type = 0; type<   PCI_REGION_TYPE_COUNT; type++) {
-u32 limit = (type == PCI_REGION_TYPE_IO) ?
-PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
-s->r[type].size = s->r[type].sum;
-if (s->r[type].size<   limit)
-s->r[type].size = limit;
-s->r[type].size = pci_size_roundup(s->r[type].size);
-pci_bios_bus_reserve(parent, type, s->r[type].size);
-}
-dprintf(1, "PCI: secondary bus %d sizes: io %x, mem %x, prefmem %x\n",
-secondary_bus,
-s->r[PCI_REGION_TYPE_IO].size,
-s->r[PCI_REGION_TYPE_MEM].size,
-s->r[PCI_REGION_TYPE_PREFMEM].size);
-}
+if ((type == PCI_REGION_TYPE_PREFMEM_64&&   !s->is64) ||
+(type == PCI_REGION_TYPE_PREFMEM&&   s->is64))
+continue;

Can't figure this out. What does this d

Re: [Qemu-devel] [PATCH 3/3] Changes related to secondary buses and 64bit regions

2011-12-28 Thread Alexey Korolev

On 29/12/11 00:43, Michael S. Tsirkin wrote:

On Wed, Dec 28, 2011 at 06:35:55PM +1300, Alexey Korolev wrote:

All devices behind a bridge need to have all their regions consecutive and
not overlapping with all the normal memory ranges.
Since prefetchable memory is described by one record, we must avoid the 
situations
when 32bit and 64bit prefetchable regions are present within one secondary bus.

How do we avoid this? Assume we have two devices:
a 32 bit and a 64 bit one, behind a bridge.
There are two main things we can do:
1. Make the 64 bit device only use the low 32 bit
 It was my first implementation. Unfortunately older versions of Linux 
(Like 2.6.18) hang during startup with this.
As far as I remember it was qemu-0.15 so may be 1.0 have no such an 
issue. I will check this.

2. Put the 32 bit one in the non-prefetcheable range
I'd rather not do this. Bios should not change memory region types. It 
will confuse guest OS drivers.


1 probably makes more sense for small BARs
2 probably makes more sense for large ones

Try also looking at e.g. linux bus scanning code for more ideas.



Another thing I don't see addressed here is that support for 64 bit
ranges is I think optional in the bridge.


Signed-off-by: Alexey Korolev

Whitespace is corrupted: checkyour mail setup?
There should be spaces around operators:
a<  b, I see a<  b. Sometimes a<   b (two spaces after).

Yes, it's thunderbird :(. Sorry about that.
It seems the patches need to have some functional changes anyway.

---
  src/pciinit.c |   69 +++-
  1 files changed, 48 insertions(+), 21 deletions(-)

diff --git a/src/pciinit.c b/src/pciinit.c
index a574e38..92942d5 100644
--- a/src/pciinit.c
+++ b/src/pciinit.c
@@ -17,6 +17,7 @@

  #define PCI_BRIDGE_IO_MIN  0x1000
  #define PCI_BRIDGE_MEM_MIN   0x10
+#define PCI_BRIDGE_MEM_MAX   0x8000

  enum pci_region_type {
  PCI_REGION_TYPE_IO,
@@ -45,6 +46,7 @@ struct pci_bus {
  s64 base;
  s64 bases[32 - PCI_MEM_INDEX_SHIFT];
  } r[PCI_REGION_TYPE_COUNT];
+int is64;
  struct pci_device *bus_dev;
  };

@@ -369,6 +371,26 @@ static void pci_bios_bus_reserve(struct pci_bus *bus, int 
type, u32 size)
  bus->r[type].max = size;
  }

+static void pci_bios_secondary_bus_reserve(struct pci_bus *parent,
+   struct pci_bus *s, int type)
+{
+u32 limit = (type == PCI_REGION_TYPE_IO) ?
+PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
+
+if (s->r[type].sum>   PCI_BRIDGE_MEM_MAX) {
+panic("Size: %08x%08x is too big\n",
+(u32)s->r[type].sum, (u32)(s->r[type].sum>>32));
+}
+s->r[type].size = (u32)s->r[type].sum;
+if (s->r[type].size<   limit)
+s->r[type].size = limit;
+s->r[type].size = pci_size_roundup(s->r[type].size);
+
+pci_bios_bus_reserve(parent, type, s->r[type].size);
+dprintf(1, "size: %x, type %s\n",
+s->r[type].size, region_type_name[type]);
+}
+
  static void pci_bios_check_devices(struct pci_bus *busses)
  {
  dprintf(1, "PCI: check devices\n");
@@ -392,8 +414,10 @@ static void pci_bios_check_devices(struct pci_bus *busses)
  pci_bios_bus_reserve(bus, type, size);
  pci->bars[i].addr = val;
  pci->bars[i].size = size;
-if (type == PCI_REGION_TYPE_PREFMEM_64)
+if (type == PCI_REGION_TYPE_PREFMEM_64) {
+bus->is64 = 1;
  i++;
+}
  }
  }

@@ -404,22 +428,21 @@ static void pci_bios_check_devices(struct pci_bus *busses)
  if (!s->bus_dev)
  continue;
  struct pci_bus *parent =&busses[pci_bdf_to_bus(s->bus_dev->bdf)];
+
+if (s->r[PCI_REGION_TYPE_PREFMEM_64].sum&&

Space before&&  here and elsewhere.


+   s->r[PCI_REGION_TYPE_PREFMEM].sum) {
+   panic("Sparse PCI prefmem regions on the bus %d\n", secondary_bus);
+}
+
+dprintf(1, "PCI: secondary bus %d\n", secondary_bus);
  int type;
  for (type = 0; type<   PCI_REGION_TYPE_COUNT; type++) {
-u32 limit = (type == PCI_REGION_TYPE_IO) ?
-PCI_BRIDGE_IO_MIN : PCI_BRIDGE_MEM_MIN;
-s->r[type].size = s->r[type].sum;
-if (s->r[type].size<   limit)
-s->r[type].size = limit;
-s->r[type].size = pci_size_roundup(s->r[type].size);
-pci_bios_bus_reserve(parent, type, s->r[type].size);
-}
-dprintf(1, "PCI: secondary bus %d sizes: io %x, mem %x, prefmem %x\n",
-secondary_bus,
-s->r[PCI_REGION_TYPE_IO].size,
-s->r[PCI_REGION_TYPE_MEM].size,
-s->r[PCI_REGION_TYPE_PREFMEM].size);
-}
+if ((type == PCI_REGION_TYPE_PREFMEM_64&&   !s->is64) ||
+(type == PCI_REGION_TYPE_PREFMEM&&   s->is64))
+continue;

Can't figure this out. What does

Re: [Qemu-devel] [PATCH 2/3] target-mips:enabling of 64 bit user mode and floating point operations MIPS_HFLAG_UX is included in env->hflags so that the address computation for LD instruction does not

2011-12-28 Thread Khansa Butt
On Fri, Dec 9, 2011 at 5:04 AM, Andreas Färber  wrote:
> Thanks for extending the commit description. Please see this for a
> template though:
>
> http://live.gnome.org/Git/CommitMessages
>
> Looks like there's an empty line missing between subject and description
> (and the space after "target-mips:").
>
> Am 08.12.2011 06:25, schrieb kha...@kics.edu.pk:
>> From: Khansa Butt 
>>
>>
>> Signed-off-by: Abdul Qadeer 
>> ---
>>  target-mips/translate.c |    4 
>>  1 files changed, 4 insertions(+), 0 deletions(-)
>>
>> diff --git a/target-mips/translate.c b/target-mips/translate.c
>> index d5b1c76..452a63b 100644
>> --- a/target-mips/translate.c
>> +++ b/target-mips/translate.c
>> @@ -12779,6 +12779,10 @@ void cpu_reset (CPUMIPSState *env)
>>          env->hflags |= MIPS_HFLAG_FPU;
>>      }
>>  #ifdef TARGET_MIPS64
>> +    env->hflags |=  MIPS_HFLAG_UX;
>
> So for those of us not knowing mips, it's defined as:
>
> #define MIPS_HFLAG_UX     0x00200 /* 64-bit user mode                 */
>
> The code above is inside CONFIG_USER_ONLY, so this looks right for n64
> but not for n32 ABI.
>
> If you put this into its own patch with a description of
>
> ---8<---
> target-mips: Enable 64 bit user mode for n64
>
> For user mode n64 ABI emulation, MIPS_HFLAG_UX is included in
> env->hflags so that the address computation for LD instruction does not
> get treated as 32 bit code, see gen_op_addr_add() in translate.c.
>
> Signed-off-by: Abdul Qadeer 
> Signed-off-by: (you)
> ---8<---
>
> and make it depend on TARGET_ABI_MIPSN64 then I will happily add my
> Acked-by.
>
>
>> +    /* if cpu has FPU, MIPS_HFLAG_F64 must be included in env->hflags
>> +       so that floating point operations can be emulated */
>> +    env->active_fpu.fcr0 = env->cpu_model->CP1_fcr0;
>>      if (env->active_fpu.fcr0 & (1 << FCR0_F64)) {
>>          env->hflags |= MIPS_HFLAG_F64;
>>      }
>
> Nack. env->active_fpu.fcr0 gets initialized in translate_init.c based on
> cpu_model->CR1_fcr0, where FCR0_F64 is set only for 24Kf, 34Kf,
> MIPS64R2-generic. TARGET_ABI_MIPSN64 linux-user defaults to 20Kc. So it
> seems to rather be an issue of using the right -cpu parameter or
> changing the default for n64. [cc'ing Nathan, who introduced the if]

The reason why I add this line " env->active_fpu.fcr0 =
env->cpu_model->CP1_fcr0" is as follows
in translate_init.c fpu_init() initializes active_fpu for given cpu
model afterwards cpu_reset() reset the values
to zero using this
memset(env, 0, offsetof(CPUMIPSState, breakpoints));
so whatever the value of  cpu_model->CR1_fcr0 was , the value of
env->active_fpu.fcr0 will be zero now  thats why I add above
line to retrieve the correct env->active_fpu.fcr0 value according to
CPU model( whether it is 24Kf or 20Kc or something else)
During the development of mips64-linux-user I observed this issue. I
gave qemu-mips64 command with -cpu option equal to MIPS64R2-generic
and an illegal instruction error occurred, so I used above hunk.

>
> Andreas