date:20120312

Re: [Qemu-devel] [PATCH] fix bug of isa_bus irq

2012-03-12 Thread Wanpeng Li

On Sun, Mar 11, 2012 at 08:46:38AM +0100, Jan Kiszka wrote:
On 2012-03-11 08:04, Wanpeng Li wrote:
 ISA bus only use IRQ 0~15, so don't need to give an array qemu_irq 0~23, just
 array qemu_irq i8259 is ok.
 
 Signed-off-by: Wanpeng Li l...@linux.vnet.ibm.com
 ---
  hw/pc_piix.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)
 
 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 63dba4e..52f7cf8 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -210,7 +210,6 @@ static void pc_init1(MemoryRegion *system_memory,
  isa_bus = isa_bus_new(NULL, system_io);
  no_hpet = 1;
  }
 -isa_bus_irqs(isa_bus, gsi);
  
  if (kvm_irqchip_in_kernel()) {
  i8259 = kvm_i8259_init(isa_bus);
 @@ -221,6 +220,8 @@ static void pc_init1(MemoryRegion *system_memory,
  i8259 = i8259_init(isa_bus, cpu_irq[0]);
  }
  
 +isa_bus_irqs(isa_bus, i8259);
 +
  for (i = 0; i  ISA_NUM_IRQS; i++) {
  gsi_state-i8259_irq[i] = i8259[i];
  }

This is bogus. isa_bus_irqs sets the output IRQs of the ISA bus. And
those are not only delivered to the PIC on the PIIX2, but also the
IOAPIC. Thus we have to pass in the GSI input lines which dispatch to
both. Of those lines, only the first 16 will be used by the ISA bus
(there is even an assert to ensure this).

Did you see any concrete bug in the context of this logic?

Jan


Yes, but actually PIC is being used at present, whether passing qemu_irq
0~23 to isa_bus is not safe or not.

Wanpeng Li

Re: [Qemu-devel] [RFC][PATCH 05/16 v8] Add API to get memory mapping

2012-03-12 Thread HATAYAMA Daisuke

From: Jan Kiszka jan.kis...@siemens.com
Subject: Re: [RFC][PATCH 05/16 v8] Add API to get memory mapping
Date: Fri, 09 Mar 2012 14:24:41 +0100

 On 2012-03-09 13:53, HATAYAMA Daisuke wrote:
 From: Jan Kiszka jan.kis...@siemens.com
 Subject: Re: [RFC][PATCH 05/16 v8] Add API to get memory mapping
 Date: Fri, 09 Mar 2012 11:06:30 +0100

 On 2012-03-09 11:05, Jan Kiszka wrote:
 If crash can work both with and without paging, it should be default
 *on* to avoid writing cores that can later on only be analyzed with that
 tool. Still not sure, though, if that changes the requirement on what
 memory regions should be written in that mode.

 If this logic is not remvoed, crash can work both with and without paging.
 But the default value is 'off' now, because the option is '-p'.

 And this would be unfortunate if you do not want to use crash for
 analyzing (I'm working on gdb python scripts which will make gdb - one
 day - at least as powerful as crash). If paging mode has the same
 information that non-paging mode has, I would even suggest to drop it.

 Err, with it = non-paging mode.

 Jan

 Paging at default is not good idea. Performing paging in qemu has risk.

   - Guest machine is not always in the state where paging mode is
 enabled. Also, CR3 doesn't always refer to page table.

 That's detectable and means physical == linear address.

CR0 itself is one of the guest's resources. There's still the issue
whether to trust CR0 or not.

The assumption behind my idea is the host is running in a good
condition but the quest in a bad condition. So we can use qemu dump,
which is the host's feature.

Only checking if CR3 refers to page table correctly is considerably
complicated. CR3 can have any physical address. There's no hint such
as magic number to check it's invalid. So, to check if CR3 correctly
points at page table, the only way is to check if we can see the data
through actually performing paging for some virtual address. The
virtual address would be better if it's more typical one, but it tends
to be guest specific, and I think it's not suitable for qemu due to OS
dependency.
# Of course, this story is done out of the assumption that the data
# could be corrupted.

   - If guest machine is in catastrophic state, its memory data could
 be corrupted. Then, we cannot trust such corrupted page table.

 OK, here I agree.

 # In this point, managing PT_LOAD program headers based on such
 # potencially corruppted data has risk.

 The idea of yours that performing paging in debugger side is better
 than doing in qemu.

 Another alternative is to add guest-awareness to the dump code. If we
 detect (or get told) that the target is a Linux kernel, at least the
 linear kernel mapping can be written reliably.

 But also the fact that there can be as many different page tables as
 active processors means that paging likely needs a second thought and
 more awareness of the debugger.

Also, it seems to me that the lacking feature of gdb used to dumpfile
compared with used to running guest is paging only. Paging is a
feature in architecture, which is unlikely to change in the
future. It's better in maintainability than introducing OS-level
dependency.

Thanks.
HATAYAMA, Daisuke

Re: [Qemu-devel] How to trace all the guest OS instructions and the micro-ops

2012-03-12 Thread Chen Yufei

On Mon, Mar 12, 2012 at 5:43 AM, Mulyadi Santosa
mulyadi.sant...@gmail.com wrote:
 Hi

 On Sun, Mar 11, 2012 at 10:12, Yue Chen ycyc...@gmail.com wrote:
 I am doing some research based on the QEMU. Does anyone know how to get
 (trace) all the instructions of the guest OS, and get all the intermediate
 micro-ops ?  (Not in the 0.9.1 version)

QEMU has release version 1.0.1. Why are you still using 0.9.1?


 I believe it's -d option you're looking for. Please read qemu manual
 for further clarification and info.

-d can only give a static view of what instruction is translated,
but can't get a dynamic instruction execution trace.


 Additionally, how to get the whole memory or each process' memory data of
 the guest OS?

 you wanna do that simply from Qemu's monitor? I don't think that's
 doable...or at least easily. Qemu sees guest RAM like your physical
 RAM. It doesn't differentiate which pages belongs to which process.
 You need to hook or go straight inside the guest OS, maybe using gdb
 or other tool to get the core dump of those processes.

 I really appreciate your help.

 Hope it helps...

 --
 regards,

 Mulyadi Santosa
 Freelance Linux trainer and consultant

 blog: the-hydra.blogspot.com
 training: mulyaditraining.blogspot.com




-- 
Best regards,
Chen Yufei

Re: [Qemu-devel] [RFC][PATCH 05/16 v8] Add API to get memory mapping

2012-03-12 Thread HATAYAMA Daisuke

From: HATAYAMA Daisuke d.hatay...@jp.fujitsu.com
Subject: Re: [Qemu-devel] [RFC][PATCH 05/16 v8] Add API to get memory mapping
Date: Mon, 12 Mar 2012 15:16:55 +0900 (   )

 The assumption behind my idea is the host is running in a good
 condition but the quest in a bad condition. So we can use qemu dump,
 which is the host's feature.

There's also the situation that a guest is running in a good condition
and users can recognize that its data is obviously not corrupted. For
such situation, it's natural for qemu dump to have paging mode to some
amount.

Thanks.
HATAYAMA, Daisuke

[Qemu-devel] [PATCH v2 1/2] block: add the support to drain throttled requests

2012-03-12 Thread zwu . kernel

From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 block.c |   21 +
 block_int.h |1 +
 2 files changed, 22 insertions(+), 0 deletions(-)

diff --git a/block.c b/block.c
index 52ffe14..0825168 100644
--- a/block.c
+++ b/block.c
@@ -853,6 +853,21 @@ void bdrv_close_all(void)
 }
 }
 
+/**
+ * Complete all pending requests for a block device
+ */
+void bdrv_drain(BlockDriverState *bs)
+{
+do {
+qemu_co_queue_restart_all(bs-throttled_reqs);
+} while (!qemu_co_queue_empty(bs-throttled_reqs));
+
+qemu_aio_flush();
+
+assert(QLIST_EMPTY(bs-tracked_requests));
+assert(qemu_co_queue_empty(bs-throttled_reqs));
+}
+
 /*
  * Wait for pending requests to complete across all BlockDriverStates
  *
@@ -863,6 +878,12 @@ void bdrv_drain_all(void)
 {
 BlockDriverState *bs;
 
+QTAILQ_FOREACH(bs, bdrv_states, list) {
+do {
+qemu_co_queue_restart_all(bs-throttled_reqs);
+} while (!qemu_co_queue_empty(bs-throttled_reqs));
+}
+
 qemu_aio_flush();
 
 /* If requests are still pending there is a bug somewhere */
diff --git a/block_int.h b/block_int.h
index b460c36..c624399 100644
--- a/block_int.h
+++ b/block_int.h
@@ -318,6 +318,7 @@ void qemu_aio_release(void *p);
 
 void bdrv_set_io_limits(BlockDriverState *bs,
 BlockIOLimit *io_limits);
+void bdrv_drain(BlockDriverState *bs);
 
 #ifdef _WIN32
 int is_windows_drive(const char *filename);
-- 
1.7.6

[Qemu-devel] [PATCH v2 2/2] block: drain throttled requests for one block device

2012-03-12 Thread zwu . kernel

From: Zhi Yong Wu wu...@linux.vnet.ibm.com

Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
---
 blockdev.c |4 ++--
 hw/ide/macio.c |2 +-
 hw/ide/pci.c   |3 ++-
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d78aa51..1bc4667 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -694,7 +694,7 @@ void qmp_blockdev_snapshot_sync(const char *device, const 
char *snapshot_file,
 return;
 }
 
-bdrv_drain_all();
+bdrv_drain(bs);
 bdrv_flush(bs);
 
 bdrv_close(bs);
@@ -1014,7 +1014,7 @@ int do_drive_del(Monitor *mon, const QDict *qdict, 
QObject **ret_data)
 }
 
 /* quiesce block driver; prevent further io */
-bdrv_drain_all();
+bdrv_drain(bs);
 bdrv_flush(bs);
 bdrv_close(bs);
 
diff --git a/hw/ide/macio.c b/hw/ide/macio.c
index a4df244..454a020 100644
--- a/hw/ide/macio.c
+++ b/hw/ide/macio.c
@@ -192,7 +192,7 @@ static void pmac_ide_flush(DBDMA_io *io)
 MACIOIDEState *m = io-opaque;
 
 if (m-aiocb) {
-bdrv_drain_all();
+bdrv_drain(m-aiocb-bs);
 }
 }
 
diff --git a/hw/ide/pci.c b/hw/ide/pci.c
index 88c0942..205d65a 100644
--- a/hw/ide/pci.c
+++ b/hw/ide/pci.c
@@ -27,6 +27,7 @@
 #include hw/pci.h
 #include hw/isa.h
 #include block.h
+#include block_int.h
 #include dma.h
 
 #include hw/ide/pci.h
@@ -309,7 +310,7 @@ void bmdma_cmd_writeb(BMDMAState *bm, uint32_t val)
  * aio operation with preadv/pwritev.
  */
 if (bm-bus-dma-aiocb) {
-bdrv_drain_all();
+bdrv_drain(bm-bus-dma-aiocb-bs);
 assert(bm-bus-dma-aiocb == NULL);
 assert((bm-status  BM_STATUS_DMAING) == 0);
 }
-- 
1.7.6

Re: [Qemu-devel] How to trace all the guest OS instructions and the micro-ops

2012-03-12 Thread Yue Chen

Thanks a lot. So any approach to get the dynamic or static whole memory
information of the guest OS ? Not the memory of each process.

Sorry for the confusion. I do use version 1.0.1.  I mention not in 0.9.1
because someone has already implemented the dynamic tracing in 0.9.1, but
not in the latest version.




On Mon, Mar 12, 2012 at 2:20 AM, Chen Yufei cyfde...@gmail.com wrote:

 On Mon, Mar 12, 2012 at 5:43 AM, Mulyadi Santosa
 mulyadi.sant...@gmail.com wrote:
  Hi
 
  On Sun, Mar 11, 2012 at 10:12, Yue Chen ycyc...@gmail.com wrote:
  I am doing some research based on the QEMU. Does anyone know how to get
  (trace) all the instructions of the guest OS, and get all the
 intermediate
  micro-ops ?  (Not in the 0.9.1 version)

 QEMU has release version 1.0.1. Why are you still using 0.9.1?

 
  I believe it's -d option you're looking for. Please read qemu manual
  for further clarification and info.

 -d can only give a static view of what instruction is translated,
 but can't get a dynamic instruction execution trace.

 
  Additionally, how to get the whole memory or each process' memory data
 of
  the guest OS?
 
  you wanna do that simply from Qemu's monitor? I don't think that's
  doable...or at least easily. Qemu sees guest RAM like your physical
  RAM. It doesn't differentiate which pages belongs to which process.
  You need to hook or go straight inside the guest OS, maybe using gdb
  or other tool to get the core dump of those processes.
 
  I really appreciate your help.
 
  Hope it helps...
 
  --
  regards,
 
  Mulyadi Santosa
  Freelance Linux trainer and consultant
 
  blog: the-hydra.blogspot.com
  training: mulyaditraining.blogspot.com
 



 --
 Best regards,
 Chen Yufei

Re: [Qemu-devel] [PATCH] fix bug of isa_bus irq

2012-03-12 Thread Jan Kiszka

On 2012-03-12 07:08, Wanpeng Li wrote:
 On Sun, Mar 11, 2012 at 08:46:38AM +0100, Jan Kiszka wrote:
 On 2012-03-11 08:04, Wanpeng Li wrote:
 ISA bus only use IRQ 0~15, so don't need to give an array qemu_irq 0~23, 
 just
 array qemu_irq i8259 is ok.

 Signed-off-by: Wanpeng Li l...@linux.vnet.ibm.com
 ---
  hw/pc_piix.c |3 ++-
  1 files changed, 2 insertions(+), 1 deletions(-)

 diff --git a/hw/pc_piix.c b/hw/pc_piix.c
 index 63dba4e..52f7cf8 100644
 --- a/hw/pc_piix.c
 +++ b/hw/pc_piix.c
 @@ -210,7 +210,6 @@ static void pc_init1(MemoryRegion *system_memory,
  isa_bus = isa_bus_new(NULL, system_io);
  no_hpet = 1;
  }
 -isa_bus_irqs(isa_bus, gsi);
  
  if (kvm_irqchip_in_kernel()) {
  i8259 = kvm_i8259_init(isa_bus);
 @@ -221,6 +220,8 @@ static void pc_init1(MemoryRegion *system_memory,
  i8259 = i8259_init(isa_bus, cpu_irq[0]);
  }
  
 +isa_bus_irqs(isa_bus, i8259);
 +
  for (i = 0; i  ISA_NUM_IRQS; i++) {
  gsi_state-i8259_irq[i] = i8259[i];
  }

 This is bogus. isa_bus_irqs sets the output IRQs of the ISA bus. And
 those are not only delivered to the PIC on the PIIX2, but also the
 IOAPIC. Thus we have to pass in the GSI input lines which dispatch to
 both. Of those lines, only the first 16 will be used by the ISA bus
 (there is even an assert to ensure this).

 Did you see any concrete bug in the context of this logic?

 Jan

 
 Yes, but actually PIC is being used at present, whether passing qemu_irq
 0~23 to isa_bus is not safe or not.

Sorry, IRQ routing to PIC and IOAPIC is actually not a property of the
PIIX3 but the board we emulate. And here we follow the Multiprocessor
Specification of Intel and route ISA bus IRQs to both interrupt
controllers. Thus the bus must be connected to the GSIs. And, again,
GSI[16..13] aren't referenced by the ISA bus at any time.

Jan



signature.asc
Description: OpenPGP digital signature

Re: [Qemu-devel] [PATCH v2 1/2] block: add the support to drain throttled requests

2012-03-12 Thread Paolo Bonzini

Il 12/03/2012 07:29, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com
 
 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  block.c |   21 +
  block_int.h |1 +
  2 files changed, 22 insertions(+), 0 deletions(-)
 
 diff --git a/block.c b/block.c
 index 52ffe14..0825168 100644
 --- a/block.c
 +++ b/block.c
 @@ -853,6 +853,21 @@ void bdrv_close_all(void)
  }
  }
  
 +/**
 + * Complete all pending requests for a block device
 + */
 +void bdrv_drain(BlockDriverState *bs)
 +{
 +do {
 +qemu_co_queue_restart_all(bs-throttled_reqs);
 +} while (!qemu_co_queue_empty(bs-throttled_reqs));
 +
 +qemu_aio_flush();

This doesn't work, qemu_aio_flush can start new I/O.

Paolo

Re: [Qemu-devel] QAPI conversion status and async commands support

2012-03-12 Thread Alon Levy

On Wed, Mar 07, 2012 at 10:06:31PM +0200, Alon Levy wrote:
 On Wed, Mar 07, 2012 at 11:36:06AM -0600, Anthony Liguori wrote:
  On 03/07/2012 11:29 AM, Paolo Bonzini wrote:
  Il 07/03/2012 17:36, Luiz Capitulino ha scritto:
  Hi there,
  
  In the last few weeks we've had some proposals for new QMP commands that 
  need
  to be asynchronous. As we lack a standard asynchronous API today, each 
  command
  ends up adding its own way to execute in the background.
  
  This multiplies the API complexity as each command has to be implemented 
  and
  learned by clients separately, with their own way of doing more or less 
  the
  same things.
  
  The solution for this, envisioned for us for a long time now, is to 
  introduce
  an unified QMP API for asynchronous commands.
  
  But before doing this we have to:
  
 1. Finish the commands conversion to the QAPI
  
This is almost done, the only missing commands are: 
   add_graphics_client,
do_closefd, do_device_add, do_device_del, do_getfd, do_migrate,
do_netdev_add, do_netdev_del, do_qmp_capabilities and 
   do_screen_dump.
  
Note that do_migrate has already been posted to the list, and I have
the screendump more or less done. Also, Anthony has an old branch 
   where most
of the conversions are already done, they just need to be rebased  
   tested.
  
 2. Integrate the new QAPI server
  
Implemented by Anthony, may have missing pieces.
  
 3. Implement async command support
  
  
  I think the missing commands to be converted can be done in around one 
  week,
  but unfortunately I've been busy at other things and will need a few days 
  to
  resume this work. Then there's the new QAPI server  async support, which 
  I'm
  not sure how much time we'll need to integrate them, but we should have 
  this
  done for 1.1.
  
  The main question is: what should we do for the already posted async 
  commands?
  Should we hold them until we finish this work?
  
  I think yes, and we could even have a list of features without which 1.1
  should not ship.  QOM buses, drive mirroring and QAPI async command
  support may be them.  Perhaps qtest too.
  
  Okay, let's get serious about what we can and can't do.
  
  Hard freeze for 1.1 is May 1st which is roughly 6 weeks from now.
  
  I think QOM buses can go in no problem along with qtest.  I would be
  okay considering QOM buses a release blocker but probably not qtest.
  
  I'm not really sure about drive mirroring.  Is the work already done
  such that we just need to talk about merging it?
  
  With QAPI async command, I don't think 1.1 is a viable target.
  We're not just talking about converting existing commands to QAPI,
  but also replacing the QMP server infrastructure.  I don't think
  that is a change that should be made at the tail end of the
  development cycle.
 
 ... so, Avi, Anthony, as:
 1. screendump is broken for qxl
 2. there will not be QAPI async for 1.1
 3. monitor async is unacceptable by maintainer
 
 Is screendump-async to be accepted?

I've send a patchset that works for hmp and I intend to fix libvirt by
using the hmp screendump command instead of the qmp one for now. See:
 http://lists.gnu.org/archive/html/qemu-devel/2012-03/msg01802.html

 
  
  Regards,
  
  Anthony Liguori
  
  
  Paolo

Re: [Qemu-devel] seamless migration with spice

2012-03-12 Thread Gerd Hoffmann

  Hi,

 What about the second part? it's independant of the async issue.
 
 Isn't this a client problem?  The client has this state, no?

It is state of the client - server session.  Today spice client
creates a new session on migration, so there is simply no need to
maintain any state.  Drawback is that everything needs to be resent from
the server to the client.  Thats why we want be able to continue the
spice session, so the client caches will stay valid.

Of course the spice-server on the migration target needs the session
state for that, i.e. know for example which bits the client has cached
which it hasn't.

 If the state is stored in the server, wouldn't it be marshaled as part
 of the server's migration state?

spice-server is stateless today when it comes to migration.  QXL handles
all (device) state, by keeping track of some commands (such as
create/destroy surface) which it needs to transfer on migration, and by
asking spice-server to render all surfaces on migration, which
effectively flushes the spice server state to qxl device memory.

To transfer the client session state there are basically two options:

  (a) transfer it as part of the qemu migration data stream.  I don't
  want have any details about the qemu migration implementation
  and/or protocol in the spice-server library api, which basically
  leaves a ugly transfer-this-blob-for-me-please style interface
  as only option.

  (b) transfer it as part of the spice protocol.  As the spice
  client has a connection to both source and target while the
  migration runs we can send session state from the source host via
  spice client to the target host.  This needs some form of
  synchronization, to make sure both vmstate and spice migration
  are completed when qemu on the source machine quits.

I think (b) is the better approach.

cheers,
  Gerd

Re: [Qemu-devel] [PATCH v2 2/3] make check: Add qemu-iotests subset

2012-03-12 Thread Kevin Wolf

Am 09.03.2012 20:31, schrieb Anthony Liguori:
 On 03/09/2012 06:46 AM, Kevin Wolf wrote:
 Run the 'quick' group from qemu-iotests during 'make check'.

 Signed-off-by: Kevin Wolfkw...@redhat.com
 ---
   tests/Makefile  |5 +
   tests/qemu-iotests-quick.sh |   17 +
   2 files changed, 22 insertions(+), 0 deletions(-)
   create mode 100755 tests/qemu-iotests-quick.sh

 diff --git a/tests/Makefile b/tests/Makefile
 index 74b29dc..571ad42 100644
 --- a/tests/Makefile
 +++ b/tests/Makefile
 @@ -1,6 +1,9 @@
 +export SRC_PATH
 +
   CHECKS = check-qdict check-qfloat check-qint check-qstring check-qlist
   CHECKS += check-qjson test-qmp-output-visitor test-qmp-input-visitor
   CHECKS += test-string-input-visitor test-string-output-visitor 
 test-coroutine
 +CHECKS += $(SRC_PATH)/tests/qemu-iotests-quick.sh
 
 Not to suggest that we don't start here, but I think we should look into how 
 to 
 make qemu-iotest use gtester in the near future.
 
 That will allow the qemu-iotest to be part of the make check-report output 
 and 
 will provide an easy way for other tools (like autotest) to run these tests.

Yes, we need to figure out how to integrate shell scripts with gtester.
If we can't, we need to rethink gtester as our framework and put
something more flexible on top.

Kevin

[Qemu-devel] [PATCH 1/4] add MIPS DSP helpers define

2012-03-12 Thread Jia Liu


This patch is the helper define of MIPS ASE DSP.

Signed-off-by: Jia Liu pro...@gmail.com
---
 target-mips/helper.h |  152 ++
 1 files changed, 152 insertions(+), 0 deletions(-)

diff --git a/target-mips/helper.h b/target-mips/helper.h
index 442f684..1abf582 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -297,4 +297,156 @@ DEF_HELPER_0(rdhwr_ccres, tl)
 DEF_HELPER_1(pmon, void, int)
 DEF_HELPER_0(wait, void)
 
+/* MIPS32 DSP */
+DEF_HELPER_1(absqsph,   i32,  i32)
+DEF_HELPER_1(absqsw,i32,  i32)
+DEF_HELPER_2(addqph,i32,  i32, i32)
+DEF_HELPER_2(addqsph,   i32,  i32, i32)
+DEF_HELPER_2(addqsw,i32,  i32, i32)
+DEF_HELPER_2(addsc, i32,  i32, i32)
+DEF_HELPER_2(addwc, i32,  i32, i32)
+DEF_HELPER_1(bitrev,i32,  i32)
+DEF_HELPER_2(cmpeqph,   void, i32, i32)
+DEF_HELPER_2(cmpltph,   void, i32, i32)
+DEF_HELPER_2(cmpleph,   void, i32, i32)
+DEF_HELPER_2(cmpgueqqb, i32,  i32, i32)
+DEF_HELPER_2(cmpgultqb, i32,  i32, i32)
+DEF_HELPER_2(cmpguleqb, i32,  i32, i32)
+DEF_HELPER_2(cmpueqqb,  void, i32, i32)
+DEF_HELPER_2(cmpultqb,  void, i32, i32)
+DEF_HELPER_2(cmpuleqb,  void, i32, i32)
+DEF_HELPER_3(dpaqswph,  void, int, i32, i32)
+DEF_HELPER_3(dpaqsalw,  void, int, i32, i32)
+DEF_HELPER_3(dpauhqbl,  void, int, i32, i32)
+DEF_HELPER_3(dpauhqbr,  void, int, i32, i32)
+DEF_HELPER_3(dpsqswph,  void, int, i32, i32)
+DEF_HELPER_3(dpsqsalw,  void, int, i32, i32)
+DEF_HELPER_3(dpsuhqbl,  void, int, i32, i32)
+DEF_HELPER_3(dpsuhqbr,  void, int, i32, i32)
+DEF_HELPER_3(extp,  void, int, int, int)
+DEF_HELPER_3(extpdp,void, int, int, int)
+DEF_HELPER_3(extpdpv,   void, int, i32, int)
+DEF_HELPER_3(extpv, void, int, i32, int)
+DEF_HELPER_3(extrsh,void, int, int, int)
+DEF_HELPER_3(extrw, void, int, int, int)
+DEF_HELPER_3(extrrw,void, int, int, int)
+DEF_HELPER_3(extrrsw,   void, int, int, int)
+DEF_HELPER_2(extrvsh,   i32,  int, i32);
+DEF_HELPER_3(extrvw,void, int, i32, int)
+DEF_HELPER_3(extrvrw,   void, int, i32, int)
+DEF_HELPER_3(extrvrsw,  void, int, i32, int)
+DEF_HELPER_3(insv,  void, int, i32, i32)
+DEF_HELPER_2(lbux,  tl,   tl,  int)
+DEF_HELPER_2(lhx,   i32,  i32, int)
+DEF_HELPER_2(lwx,   i32,  i32, int)
+DEF_HELPER_3(maqswphl,  void, int, i32, i32)
+DEF_HELPER_3(maqsawphl, void, int, i32, i32)
+DEF_HELPER_3(maqswphr,  void, int, i32, i32)
+DEF_HELPER_3(maqsawphr, void, int, i32, i32)
+DEF_HELPER_2(modsub,i32,  i32, i32)
+DEF_HELPER_2(mthlip,void, int, i32)
+DEF_HELPER_2(muleqswphl,i32,  i32, i32)
+DEF_HELPER_2(muleqswphr,i32,  i32, i32)
+DEF_HELPER_2(muleusphqbl,   i32,  i32, i32)
+DEF_HELPER_2(muleusphqbr,   i32,  i32, i32)
+DEF_HELPER_2(mulqrsph,  i32,  i32, i32)
+DEF_HELPER_3(mulsaqswph,void, int, i32, i32)
+DEF_HELPER_2(packrlph,  i32,  i32, i32)
+DEF_HELPER_2(pickqb,i32,  i32, i32)
+DEF_HELPER_1(preceqwphl,i32,  i32)
+DEF_HELPER_2(pickph,i32,  i32, i32)
+DEF_HELPER_1(preceqwphr,i32,  i32)
+DEF_HELPER_1(precequphqbl,  i32,  i32)
+DEF_HELPER_1(precequphqbla, i32,  i32)
+DEF_HELPER_1(precequphqbr,  i32,  i32)
+DEF_HELPER_1(precequphqbra, i32,  i32)
+DEF_HELPER_1(preceuphqbl,   i32,  i32)
+DEF_HELPER_1(preceuphqbla,  i32,  i32)
+DEF_HELPER_1(preceuphqbr,   i32,  i32)
+DEF_HELPER_1(preceuphqbra,  i32,  i32)
+DEF_HELPER_2(precrqqbph,i32,  i32, i32)
+DEF_HELPER_2(precrqphw, i32,  i32, i32)
+DEF_HELPER_2(precrqrsphw,   i32,  i32, i32)
+DEF_HELPER_2(precrqusqbph,  i32,  i32, i32)
+DEF_HELPER_1(radduwqb,  i32,  i32)
+DEF_HELPER_1(rddsp, i32,  i32)
+DEF_HELPER_1(replph,i32,  i32)
+DEF_HELPER_1(replqb,i32,  i32)
+DEF_HELPER_1(replvph,   i32,  i32)
+DEF_HELPER_1(replvqb,   i32,  i32)
+DEF_HELPER_2(shilo, void, int, int)
+DEF_HELPER_2(shilov,void, int, i32)
+DEF_HELPER_2(shllph,i32,  int, i32)
+DEF_HELPER_2(shllsph,   i32,  int, i32)
+DEF_HELPER_2(shllqb,i32,  int, i32)
+DEF_HELPER_2(shllsw,i32,  int, i32)
+DEF_HELPER_2(shllvph,   i32,  i32, i32)
+DEF_HELPER_2(shllvsph,  i32,  i32, i32)
+DEF_HELPER_2(shllvqb,   i32,  i32, i32)
+DEF_HELPER_2(shllvsw,   i32,  i32, i32)
+DEF_HELPER_2(shraph,i32,  int, i32)
+DEF_HELPER_2(shrarph,   i32,  int, i32)
+DEF_HELPER_2(shrarw,i32,  int, i32)
+DEF_HELPER_2(shravph,   i32,  i32, i32)
+DEF_HELPER_2(shravrph,  i32,  i32, i32)
+DEF_HELPER_2(shravrw,   i32,  i32, i32)
+DEF_HELPER_2(shrlqb,i32,  int, i32)
+DEF_HELPER_2(shrlvqb,   i32,  i32, i32)
+DEF_HELPER_2(subqph,i32,  i32, i32)
+DEF_HELPER_2(subqsph,   i32,  i32, i32)
+DEF_HELPER_2(subqsw,i32,  i32, i32)
+DEF_HELPER_2(subuqb,i32,  i32, i32)
+DEF_HELPER_2(subusqb,   i32,  i32, i32)

[Qemu-devel] [PATCH 0/4] MIPS ASE DSP Support for Qemu

2012-03-12 Thread Jia Liu

Hi all

This is the MIPS ASE DSP Support for Qemu.


Jia Liu (4):
  add MIPS DSP helpers define
  add MIPS DSP helpers implement
  add MIPS DSP translation
  add MIPS DSP testcase

 target-mips/helper.h   |  152 +
 target-mips/op_helper.c| 3936 
 target-mips/translate.c| 1114 +++-
 tests/tcg/mips/mips32-dsp/Makefile |  133 +
 tests/tcg/mips/mips32-dsp/absq_s_ph.c  |   28 +
 tests/tcg/mips/mips32-dsp/absq_s_w.c   |   35 +
 tests/tcg/mips/mips32-dsp/addq_ph.c|   29 +
 tests/tcg/mips/mips32-dsp/addq_s_ph.c  |   29 +
 tests/tcg/mips/mips32-dsp/addsc.c  |   28 +
 tests/tcg/mips/mips32-dsp/addu_qb.c|   28 +
 tests/tcg/mips/mips32-dsp/addu_s_qb.c  |   28 +
 tests/tcg/mips/mips32-dsp/addwc.c  |   28 +
 tests/tcg/mips/mips32-dsp/bitrev.c |   18 +
 tests/tcg/mips/mips32-dsp/bposge32.c   |   42 +
 tests/tcg/mips/mips32-dsp/cmp_eq_ph.c  |   33 +
 tests/tcg/mips/mips32-dsp/cmp_le_ph.c  |   33 +
 tests/tcg/mips/mips32-dsp/cmp_lt_ph.c  |   33 +
 tests/tcg/mips/mips32-dsp/cmpgu_eq_qb.c|   29 +
 tests/tcg/mips/mips32-dsp/cmpgu_le_qb.c|   29 +
 tests/tcg/mips/mips32-dsp/cmpgu_lt_qb.c|   29 +
 tests/tcg/mips/mips32-dsp/cmpu_eq_qb.c |   33 +
 tests/tcg/mips/mips32-dsp/cmpu_le_qb.c |   33 +
 tests/tcg/mips/mips32-dsp/cmpu_lt_qb.c |   33 +
 tests/tcg/mips/mips32-dsp/dpaq_s_w_ph.c|   29 +
 tests/tcg/mips/mips32-dsp/dpaq_sa_l_w.c|   29 +
 tests/tcg/mips/mips32-dsp/dpau_h_qbl.c |   25 +
 tests/tcg/mips/mips32-dsp/dpau_h_qbr.c |   25 +
 tests/tcg/mips/mips32-dsp/dpsq_s_w_ph.c|   25 +
 tests/tcg/mips/mips32-dsp/dpsq_sa_l_w.c|   29 +
 tests/tcg/mips/mips32-dsp/dpsu_h_qbl.c |   25 +
 tests/tcg/mips/mips32-dsp/dpsu_h_qbr.c |   25 +
 tests/tcg/mips/mips32-dsp/extp.c   |   42 +
 tests/tcg/mips/mips32-dsp/extpdp.c |   44 +
 tests/tcg/mips/mips32-dsp/extpdpv.c|   45 +
 tests/tcg/mips/mips32-dsp/extpv.c  |   43 +
 tests/tcg/mips/mips32-dsp/extr_r_w.c   |   23 +
 tests/tcg/mips/mips32-dsp/extr_rs_w.c  |   23 +
 tests/tcg/mips/mips32-dsp/extr_s_h.c   |   23 +
 tests/tcg/mips/mips32-dsp/extr_w.c |   23 +
 tests/tcg/mips/mips32-dsp/extrv_r_w.c  |   27 +
 tests/tcg/mips/mips32-dsp/extrv_rs_w.c |   27 +
 tests/tcg/mips/mips32-dsp/extrv_s_h.c  |   27 +
 tests/tcg/mips/mips32-dsp/extrv_w.c|   27 +
 tests/tcg/mips/mips32-dsp/insv.c   |   21 +
 tests/tcg/mips/mips32-dsp/lbux.c   |   21 +
 tests/tcg/mips/mips32-dsp/lhx.c|   21 +
 tests/tcg/mips/mips32-dsp/lwx.c|   21 +
 tests/tcg/mips/mips32-dsp/madd.c   |   29 +
 tests/tcg/mips/mips32-dsp/maddu.c  |   29 +
 tests/tcg/mips/mips32-dsp/maq_s_w_phl.c|   29 +
 tests/tcg/mips/mips32-dsp/maq_s_w_phr.c|   29 +
 tests/tcg/mips/mips32-dsp/maq_sa_w_phl.c   |   29 +
 tests/tcg/mips/mips32-dsp/maq_sa_w_phr.c   |   29 +
 tests/tcg/mips/mips32-dsp/mfhi.c   |   19 +
 tests/tcg/mips/mips32-dsp/mflo.c   |   19 +
 tests/tcg/mips/mips32-dsp/modsub.c |   28 +
 tests/tcg/mips/mips32-dsp/msub.c   |   28 +
 tests/tcg/mips/mips32-dsp/msubu.c  |   28 +
 tests/tcg/mips/mips32-dsp/mthi.c   |   19 +
 tests/tcg/mips/mips32-dsp/mthlip.c |   32 +
 tests/tcg/mips/mips32-dsp/mtlo.c   |   19 +
 tests/tcg/mips/mips32-dsp/muleq_s_w_phr.c  |   38 +
 tests/tcg/mips/mips32-dsp/muleu_s_ph_qbl.c |   23 +
 tests/tcg/mips/mips32-dsp/muleu_s_ph_qbr.c |   23 +
 tests/tcg/mips/mips32-dsp/mulq_rs_ph.c |   23 +
 tests/tcg/mips/mips32-dsp/mult.c   |   22 +
 tests/tcg/mips/mips32-dsp/multu.c  |   22 +
 tests/tcg/mips/mips32-dsp/packrl_ph.c  |   19 +
 tests/tcg/mips/mips32-dsp/pick_ph.c|   21 +
 tests/tcg/mips/mips32-dsp/pick_qb.c|   21 +
 tests/tcg/mips/mips32-dsp/preceq_w_phl.c   |   18 +
 tests/tcg/mips/mips32-dsp/preceq_w_phr.c   |   18 +
 tests/tcg/mips/mips32-dsp/precequ_ph_qbl.c |   18 +
 tests/tcg/mips/mips32-dsp/precequ_ph_qbla.c|   18 +
 tests/tcg/mips/mips32-dsp/precequ_ph_qbr.c |   18 +
 tests/tcg/mips/mips32-dsp/precequ_ph_qbra.c|   18 +
 tests/tcg/mips/mips32-dsp/preceu_ph_qbl.c  |   18 +
 tests/tcg/mips/mips32-dsp/preceu_ph_qbla.c |   18 +
 tests/tcg/mips/mips32-dsp/preceu_ph_qbr.c  |   18 +
 tests/tcg/mips/mips32-dsp/preceu_ph_qbra.c |   18 +
 tests/tcg/mips/mips32-dsp/precrq_ph_w.c|   19 +
 tests/tcg/mips/mips32-dsp/precrq_qb_ph.c   |   19 +
 tests/tcg/mips/mips32-dsp/precrq_rs_ph_w.c |   19 +
 tests/tcg/mips/mips32-dsp/precrqu_s_qb_ph.c|   19 +

[Qemu-devel] [PATCH 3/4] add MIPS DSP translation

2012-03-12 Thread Jia Liu


This patch is the translation of MIPS ASE DSP.

Signed-off-by: Jia Liu pro...@gmail.com
---
 target-mips/translate.c | 1114 +--
 1 files changed, 1088 insertions(+), 26 deletions(-)

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 8361d88..1fa5b28 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -249,6 +249,11 @@ enum {
 OPC_SYNCI= (0x1F  16) | OPC_REGIMM,
 };
 
+/* REGIMM mipsdsp opcodes */
+enum {
+OPC_BPOSGE32 = (0x1C  16) | OPC_REGIMM,
+};
+
 /* Special2 opcodes */
 #define MASK_SPECIAL2(op)  MASK_OP_MAJOR(op) | (op  0x3F)
 
@@ -312,6 +317,21 @@ enum {
 OPC_MODU_G_2E   = 0x23 | OPC_SPECIAL3,
 OPC_DMOD_G_2E   = 0x26 | OPC_SPECIAL3,
 OPC_DMODU_G_2E  = 0x27 | OPC_SPECIAL3,
+
+/* MIPS DSP */
+OPC_ABSQ_S_PH_DSP  = 0x12 | OPC_SPECIAL3,
+OPC_ADDU_QB_DSP= 0x10 | OPC_SPECIAL3,
+/* OPC_ADDUH_QB_DSP is same as OPC_MULT_G_2E. */
+/* OPC_ADDUH_QB_DSP = 0x18 | OPC_SPECIAL3, */
+OPC_APPEND_DSP = 0x31 | OPC_SPECIAL3,
+OPC_CMPU_EQ_QB_DSP = 0x11 | OPC_SPECIAL3,
+OPC_DPA_W_PH_DSP   = 0x30 | OPC_SPECIAL3,
+OPC_EXTR_W_DSP = 0x38 | OPC_SPECIAL3,
+OPC_INSV_DSP   = 0x0C | OPC_SPECIAL3,
+OPC_LX_DSP = 0x0A | OPC_SPECIAL3,
+/* OPC_MUL_PH_DSP is same as OPC_ADDUH_QB_DSP.   */
+/* OPC_MUL_PH_DSP  = 0x18 | OPC_SPECIAL3,   */
+OPC_SHLL_QB_DSP= 0x13 | OPC_SPECIAL3,
 };
 
 /* BSHFL opcodes */
@@ -331,6 +351,231 @@ enum {
 OPC_DSHD = (0x05  6) | OPC_DBSHFL,
 };
 
+#define MASK_ABSQ_S_PH(op) MASK_SPECIAL3(op) | (op  (0x1F  6))
+/* MIPS DSP */
+enum {
+OPC_ABSQ_S_PH   = (0x09  6) | OPC_ABSQ_S_PH_DSP,
+OPC_ABSQ_S_W= (0x11  6) | OPC_ABSQ_S_PH_DSP,
+OPC_BITREV  = (0x1B  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEQ_W_PHL= (0x0C  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEQ_W_PHR= (0x0D  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEQU_PH_QBL  = (0x04  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEQU_PH_QBLA = (0x06  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEQU_PH_QBR  = (0x05  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEQU_PH_QBRA = (0x07  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEU_PH_QBL   = (0x1C  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEU_PH_QBLA  = (0x1E  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEU_PH_QBR   = (0x1D  6) | OPC_ABSQ_S_PH_DSP,
+OPC_PRECEU_PH_QBRA  = (0x1F  6) | OPC_ABSQ_S_PH_DSP,
+OPC_REPL_PH = (0x0A  6) | OPC_ABSQ_S_PH_DSP,
+OPC_REPL_QB = (0x02  6) | OPC_ABSQ_S_PH_DSP,
+OPC_REPLV_PH= (0x0B  6) | OPC_ABSQ_S_PH_DSP,
+OPC_REPLV_QB= (0x03  6) | OPC_ABSQ_S_PH_DSP,
+};
+
+/* MIPS DSPR2 */
+enum {
+OPC_ABSQ_S_QB = (0x01  6) | OPC_ABSQ_S_PH_DSP,
+};
+
+#define MASK_ADDU_QB(op) MASK_SPECIAL3(op) | (op  (0x1F  6))
+/* MIPS DSP */
+enum {
+OPC_ADDQ_PH= (0x0A  6) | OPC_ADDU_QB_DSP,
+OPC_ADDQ_S_PH  = (0x0E  6) | OPC_ADDU_QB_DSP,
+OPC_ADDQ_S_W   = (0x16  6) | OPC_ADDU_QB_DSP,
+OPC_ADDSC  = (0x10  6) | OPC_ADDU_QB_DSP,
+OPC_ADDU_QB= (0x00  6) | OPC_ADDU_QB_DSP,
+OPC_ADDU_S_QB  = (0x04  6) | OPC_ADDU_QB_DSP,
+OPC_ADDWC  = (0x11  6) | OPC_ADDU_QB_DSP,
+OPC_MODSUB = (0x12  6) | OPC_ADDU_QB_DSP,
+OPC_MULEQ_S_W_PHL  = (0x1C  6) | OPC_ADDU_QB_DSP,
+OPC_MULEQ_S_W_PHR  = (0x1D  6) | OPC_ADDU_QB_DSP,
+OPC_MULEU_S_PH_QBL = (0x06  6) | OPC_ADDU_QB_DSP,
+OPC_MULEU_S_PH_QBR = (0x07  6) | OPC_ADDU_QB_DSP,
+OPC_MULQ_RS_PH = (0x1F  6) | OPC_ADDU_QB_DSP,
+OPC_RADDU_W_QB = (0x14  6) | OPC_ADDU_QB_DSP,
+OPC_SUBQ_PH= (0x0B  6) | OPC_ADDU_QB_DSP,
+OPC_SUBQ_S_PH  = (0x0F  6) | OPC_ADDU_QB_DSP,
+OPC_SUBQ_S_W   = (0x17  6) | OPC_ADDU_QB_DSP,
+OPC_SUBU_QB= (0x01  6) | OPC_ADDU_QB_DSP,
+OPC_SUBU_S_QB  = (0x05  6) | OPC_ADDU_QB_DSP,
+};
+
+/* MIPS DSPR2 */
+enum {
+OPC_ADDU_PH   = (0x08  6) | OPC_ADDU_QB_DSP,
+OPC_ADDU_S_PH = (0x0C  6) | OPC_ADDU_QB_DSP,
+OPC_MULQ_S_PH = (0x1E  6) | OPC_ADDU_QB_DSP,
+OPC_SUBU_PH   = (0x09  6) | OPC_ADDU_QB_DSP,
+OPC_SUBU_S_PH = (0x0D  6) | OPC_ADDU_QB_DSP,
+};
+
+#define OPC_ADDUH_QB_DSP OPC_MULT_G_2E
+#define MASK_ADDUH_QB(op) MASK_SPECIAL3(op) | (op  (0x1F  6))
+/* MIPS DSPR2 */
+enum {
+OPC_ADDQH_PH   = (0x08  6) | OPC_ADDUH_QB_DSP,
+OPC_ADDQH_R_PH = (0x0A  6) | OPC_ADDUH_QB_DSP,
+OPC_ADDQH_W= (0x10  6) | OPC_ADDUH_QB_DSP,
+OPC_ADDQH_R_W  = (0x12  6) | OPC_ADDUH_QB_DSP,
+OPC_ADDUH_QB   = (0x00  6) | OPC_ADDUH_QB_DSP,
+OPC_ADDUH_R_QB = (0x02  6) | OPC_ADDUH_QB_DSP,
+OPC_MUL_PH = (0x0C  6) | OPC_ADDUH_QB_DSP,
+OPC_MUL_S_PH   = (0x0E  6) | OPC_ADDUH_QB_DSP,
+OPC_SUBQH_PH   = (0x09  6) | OPC_ADDUH_QB_DSP,
+OPC_SUBQH_R_PH = (0x0B  6) | OPC_ADDUH_QB_DSP,
+OPC_SUBQH_W= (0x11  6) | OPC_ADDUH_QB_DSP,
+OPC_SUBQH_R_W  = (0x13  6) | OPC_ADDUH_QB_DSP,
+OPC_SUBUH_QB   = (0x01  6) | OPC_ADDUH_QB_DSP,
+OPC_SUBUH_R_QB = (0x03  6) |

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Hans de Goede


Hi,

On 03/11/2012 02:16 PM, Yonit Halperin wrote:

Hi,

We would like to implement seamless migration for Spice, i.e., keeping the 
currently opened spice client session valid after migration.
Today, the spice client establishes the connection to the destination before 
migration starts, and when migration completes, the client's session is moved 
to the destination, but all the session data is being reset.

We face 2 main challenges when coming to implement seamless migration:



snip (1)


(2) In order to restore the source-client spice session in the destination, we 
need to pass data from the source to the destination.
Example for such data: in flight copy paste data, in flight usb data
We want to pass the data from the source spice server to the destination, via 
Spice client. This introduces a possible race: after migration completes, the 
source qemu can be killed before the spice-server completes transferring the 
migration data to the client.


I don't understand why we must transfer this via the client, we should transfer 
this info using
established qemu migration technologies, and we should transfer it directly 
from the source
to the dest.

Passing this through the client, means trusting the client which is crazy (from 
a security pov),
the data passed is not always just data buffers often it contains state info. 
And transferring
this through the client means opening a whole window of injection 
vulnerabilities, which can simply
be avoided by sending the data directly.

I know this has been discussed before and I was not involved in that discussion 
due to -ENOTIME,
sorry about that. But just as the solution for sending the data directly from 
source to dest proposed
then was nacked by various qemu people, I nack the send the data through the 
client solution. That
one simply is not acceptable from a security pov. So we must re-think how we 
can send this data
directly from source to dest, in a way which is acceptable in upstream qemu.

Regards,

Hans

Re: [Qemu-devel] QAPI conversion status and async commands support

2012-03-12 Thread Wen Congyang

At 03/08/2012 02:12 AM, Luiz Capitulino Wrote:
 On Wed, 07 Mar 2012 11:36:06 -0600
 Anthony Liguori anth...@codemonkey.ws wrote:
 
 On 03/07/2012 11:29 AM, Paolo Bonzini wrote:
 Il 07/03/2012 17:36, Luiz Capitulino ha scritto:
 Hi there,

 In the last few weeks we've had some proposals for new QMP commands that 
 need
 to be asynchronous. As we lack a standard asynchronous API today, each 
 command
 ends up adding its own way to execute in the background.

 This multiplies the API complexity as each command has to be implemented 
 and
 learned by clients separately, with their own way of doing more or less the
 same things.

 The solution for this, envisioned for us for a long time now, is to 
 introduce
 an unified QMP API for asynchronous commands.

 But before doing this we have to:

1. Finish the commands conversion to the QAPI

   This is almost done, the only missing commands are: 
 add_graphics_client,
   do_closefd, do_device_add, do_device_del, do_getfd, do_migrate,
   do_netdev_add, do_netdev_del, do_qmp_capabilities and do_screen_dump.

   Note that do_migrate has already been posted to the list, and I have
   the screendump more or less done. Also, Anthony has an old branch 
 where most
   of the conversions are already done, they just need to be rebased  
 tested.

2. Integrate the new QAPI server

   Implemented by Anthony, may have missing pieces.

3. Implement async command support


 I think the missing commands to be converted can be done in around one 
 week,
 but unfortunately I've been busy at other things and will need a few days 
 to
 resume this work. Then there's the new QAPI server  async support, which 
 I'm
 not sure how much time we'll need to integrate them, but we should have 
 this
 done for 1.1.

 The main question is: what should we do for the already posted async 
 commands?
 Should we hold them until we finish this work?

 I think yes, and we could even have a list of features without which 1.1
 should not ship.  QOM buses, drive mirroring and QAPI async command
 support may be them.  Perhaps qtest too.

 Okay, let's get serious about what we can and can't do.

 Hard freeze for 1.1 is May 1st which is roughly 6 weeks from now.

 I think QOM buses can go in no problem along with qtest.  I would be okay 
 considering QOM buses a release blocker but probably not qtest.

 I'm not really sure about drive mirroring.  Is the work already done such 
 that 
 we just need to talk about merging it?

 With QAPI async command, I don't think 1.1 is a viable target.  We're not 
 just 
 talking about converting existing commands to QAPI, but also replacing the 
 QMP 
 server infrastructure.  I don't think that is a change that should be made 
 at 
 the tail end of the development cycle.
 
 I will have to agree with this, specially because I think the new server is
 going to generate some fun discussions because iirc it has some novel
 concepts. And those discussions burn a lot of time.
 
 On top of that we also have the async API to design. I have some ideas in 
 mind,
 but again, design  review cycles take some time.
 
 Having said that, I wonder if postponing want-to-be-async commands to 1.2 is
 a good thing. Two solutions other than waiting:
 
  1. Add synchronous versions of them, if possible (I think this is possible
 for the dump command)

I agree with this. libvirt uses migrate command to do dump now, and it cannot 
work
when the guest uses host's device. We need to fix this bug as quickly as 
possible.
So we need dump command recently. So I think we can add synchronous versions of
dump command. Is it OK? If it is OK, I will send the synchronous versions of
dump command.

Thanks
Wen Congyang

 
  2. Accept the ad-hoc async mechanism they introduce
 
 Btw, probably needless to say but, I'd greatly appreciate and encourage 
 anyone who
 wants to join this effort and work on the conversions, new QMP server, etc...

Re: [Qemu-devel] [PATCH v2 1/2] block: add the support to drain throttled requests

2012-03-12 Thread Zhi Yong Wu

On Mon, Mar 12, 2012 at 3:27 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 12/03/2012 07:29, zwu.ker...@gmail.com ha scritto:
 From: Zhi Yong Wu wu...@linux.vnet.ibm.com

 Signed-off-by: Zhi Yong Wu wu...@linux.vnet.ibm.com
 ---
  block.c     |   21 +
  block_int.h |    1 +
  2 files changed, 22 insertions(+), 0 deletions(-)

 diff --git a/block.c b/block.c
 index 52ffe14..0825168 100644
 --- a/block.c
 +++ b/block.c
 @@ -853,6 +853,21 @@ void bdrv_close_all(void)
      }
  }

 +/**
 + * Complete all pending requests for a block device
 + */
 +void bdrv_drain(BlockDriverState *bs)
 +{
 +    do {
 +        qemu_co_queue_restart_all(bs-throttled_reqs);
 +    } while (!qemu_co_queue_empty(bs-throttled_reqs));
 +
 +    qemu_aio_flush();

 This doesn't work, qemu_aio_flush can start new I/O.
Do you mean that it will start next I/O via the current request's cb?
if no, where will it start new I/O?

 Paolo





-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH v2 1/2] block: add the support to drain throttled requests

2012-03-12 Thread Paolo Bonzini

Il 12/03/2012 09:42, Zhi Yong Wu ha scritto:
 
  This doesn't work, qemu_aio_flush can start new I/O.
 Do you mean that it will start next I/O via the current request's cb?

Yes.

Paolo

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Hans de Goede


Hi,

On 03/12/2012 08:57 AM, Gerd Hoffmann wrote:

   Hi,


What about the second part? it's independant of the async issue.


Isn't this a client problem?  The client has this state, no?


It is state of the client-  server session.  Today spice client
creates a new session on migration, so there is simply no need to
maintain any state.  Drawback is that everything needs to be resent from
the server to the client.  Thats why we want be able to continue the
spice session, so the client caches will stay valid.

Of course the spice-server on the migration target needs the session
state for that, i.e. know for example which bits the client has cached
which it hasn't.


If the state is stored in the server, wouldn't it be marshaled as part
of the server's migration state?


spice-server is stateless today when it comes to migration.  QXL handles
all (device) state, by keeping track of some commands (such as
create/destroy surface) which it needs to transfer on migration, and by
asking spice-server to render all surfaces on migration, which
effectively flushes the spice server state to qxl device memory.

To transfer the client session state there are basically two options:

   (a) transfer it as part of the qemu migration data stream.  I don't
   want have any details about the qemu migration implementation
   and/or protocol in the spice-server library api, which basically
   leaves a ugly transfer-this-blob-for-me-please style interface
   as only option.

   (b) transfer it as part of the spice protocol.  As the spice
   client has a connection to both source and target while the
   migration runs we can send session state from the source host via
   spice client to the target host.  This needs some form of
   synchronization, to make sure both vmstate and spice migration
   are completed when qemu on the source machine quits.


The problem with (b) is, that iirc the way b was implemented in the past
was still the big blob approach, but then pass the blob through the client,
which means an evil client could modify it, causing all sorts of interesting
behavior inside spice-server. Since we're re-implementing this to me the
send a blob through the client approach is simply not acceptable from a
security pov, also see my previous mail in this thread.


I think (b) is the better approach.


I disagree. Note that there is more info to send over then just which
surfaces / images are cached by the client. There also is things like
partial complete agent channel messages, including how much bytes must be read
to complete the command, etc.

IMHO (b) would only be acceptable if the data send through the client stops
becoming a blob. Instead the client could simply send a list of all surface ids,
etc. which it has cached after it connects to / starts using the new host. Note
that the old hosts needs to send nothing for this, this is info the client 
already
has, also removing the need for synchronization. As for certain other data, such
as (but not limited to) partially parsed agent messages, these should be
send through the regular vmstate methods IMHO.

So I see 2 options

1) Do (a), sending everything that way
2) Do (a) sending non client state that way; and
   let the client send state like which surfaces it has cached
   when the new session starts.

Regards,

Hans

Re: [Qemu-devel] [PATCH v1 01/12] net: Add a hub net client

2012-03-12 Thread Zhi Yong Wu

On Fri, Mar 9, 2012 at 6:33 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 09/03/2012 10:00, zwu.ker...@gmail.com ha scritto:
 +        /* TODO use qemu_send_packet() or need to call *_deliver_* 
 directly? */
 +        /* TODO ignore return value? */
 +        qemu_send_packet(port-nc, buf, len);

 Did you see my message at

 http://permalink.gmane.org/gmane.comp.emulators.qemu/134998
Yeah, i ever saw this, but read it just now. In your comments,
there're some places where i have not completely understand.

 This means VLANs will wait for all receivers to be ready and then do N-1
 synchronous sends.  Your code will queue N-1 asynchronous sends.

 However, then I noticed that qemu_can_send_packet is not called very
 much, and I do not understand why qemu_net_queue_send and
 qemu_net_queue_send_iov do not call qemu_can_send_packet before calling
 deliver/deliver_iov.

 If they did, hubs could then do their own flow control via can_receive.
  When qemu_send_packet returns zero you increment a count of in-flight
 packets, and a sent-packet callback would decrement the same count.
sent-packet callback is sent_cb here? i noticed that sent_cb is currently NULL.
 When the count is non-zero, can_receive returns false (and vice versa).
Can you elaborate this? since can_receive is called finally by 
qemu_send_packet, how can can_receive return false or true based on the count?
  The sent_cb also needs to call qemu_flush_queued_packets when the
 count drop to zero.
When sent_cb is called, it means that the packet has been sent out, right?

 With this in place, I think the other TODO about the return value is
 easily solved; receive/receive_iov callbacks can simply return immediate
 success, and later block further sends.

 Due to the separate per-port send queues, when many sends are blocking
 many ports might linearize the same packet multiple times in
 qemu_net_queue_append_iov.  The limit however is packet size * number of
I think that when one packet need to be enqueued, we should at first
determine if it has existed in send queue.
 ports, which is not more than a few KB; it's more of a performance
 problem and VLANs/hubs should only be used when performance doesn't
 matter (as with the dump client).

 BTW, an additional cleanup that you can do on top is to remove the
 deliver/deliver_iov function pointers in net/queue.c and
 qemu_new_net_queue, because they will always be qemu_deliver_packet and
 qemu_deliver_packet_iov.


 ?

 Paolo



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH] virtio-serial-bus: use correct lengths in control_out() message

2012-03-12 Thread Amit Shah

On (Sun) 11 Mar 2012 [17:52:59], Michael Tokarev wrote:
 In case of more than one control message, the code will use
 size of the largest message so far for all subsequent messages,
 instead of using size of current one.  Fix it.

Makes sense.  How did you detect this?  Any reproducible test-case?

One comment below.

 diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
 index e22940e..abe48ec 100644
 --- a/hw/virtio-serial-bus.c
 +++ b/hw/virtio-serial-bus.c
 @@ -451,28 +451,28 @@ static void control_out(VirtIODevice *vdev, VirtQueue 
 *vq)
  
  vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
  
  len = 0;
  buf = NULL;
  while (virtqueue_pop(vq, elem)) {
 -size_t cur_len, copied;
 +size_t cur_len;
  
  cur_len = iov_size(elem.out_sg, elem.out_num);
  /*
   * Allocate a new buf only if we didn't have one previously or
   * if the size of the buf differs
   */
  if (cur_len  len) {
  g_free(buf);
  
  buf = g_malloc(cur_len);
  len = cur_len;
  }
 -copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
 +iov_to_buf(elem.out_sg, elem.out_num, buf, 0, cur_len);

Why drop 'copied'?  I don't think we have had a situation where copied
can be less than cur_len, and in any case we don't do anything special
as a recovery mechanism, but a warning message or an abort in case
copied != cur_len should work, I think.

 -handle_control_message(vser, buf, copied);
 +handle_control_message(vser, buf, cur_len);
  virtqueue_push(vq, elem, 0);
  }
  g_free(buf);
  virtio_notify(vdev, vq);
  }

Thanks,

Amit

Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-12 Thread Wen Congyang

At 03/09/2012 09:21 AM, Wen Congyang Wrote:
 At 03/08/2012 07:13 PM, Avi Kivity Wrote:
 On 03/08/2012 09:57 AM, Wen Congyang wrote:
 We can know the guest is paniced when the guest runs on xen.
 But we do not have such feature on kvm.

 Another purpose of this feature is: management app(for example:
 libvirt) can do auto dump when the guest is crashed. If management
 app does not do auto dump, the guest's user can do dump by hand if
 he sees the guest is paniced.

 I touch the hypervisor instead of using virtio-serial, because
 1. it is simple
 2. the virtio-serial is an optional device, and the guest may
not have such device.

 Changes from v2 to v3:
 1. correct spelling

 Changes from v1 to v2:
 1. split up host and guest-side changes
 2. introduce new request flag to avoid changing return values.

 I see no Documentation/ changes.

Do you have any other comments about this patch?

 
 What shoude be writen into Documentation/? I read the documents under
 Documentation/virtual/kvm, not all KVM_EXIT_* are writen in api.txt.
 In this patch set, I introduce a new exit_reason: KVM_EXIT_GUEST_PANICKED,
 but I donot know where I should write.

What about this?

Thanks
Wen Congyang

 
 Thanks
 Wen Congyang
 --
 To unsubscribe from this list: send the line unsubscribe linux-kernel in
 the body of a message to majord...@vger.kernel.org
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 Please read the FAQ at  http://www.tux.org/lkml/

[Qemu-devel] qemu in domain of xen

2012-03-12 Thread 李安伦

hi there guys

recently, i install a qemu in the domian U of xen , the OS of the domain U
is debian squeeze . i want to make use of the qemu to install a WINDOWS XP
in the domain U, but when i create it , the interface is staying in the
processing of formatting the disk. the system of the domain U is likely
halted, so i have to restart the domain U.

can the qemu be used to create another virtual machine in the virtual
machine of the domain U in xen .


every one expericed it what i said in front.

regards

[Qemu-devel] tuning parameters qed

2012-03-12 Thread PANKAJ RAWAT

Hi all i have a kvm machine and i use qed format as the disk image format
I want to improve the I/O performance of my machine
can anyone tell other then cluster_size and cache what are the tunining
parameters for qed so that i can improve the performance of my VM ?

-- 
*Pankaj Rawat*

Re: [Qemu-devel] [PATCH v1 01/12] net: Add a hub net client

2012-03-12 Thread Paolo Bonzini

Il 12/03/2012 09:59, Zhi Yong Wu ha scritto:
 However, then I noticed that qemu_can_send_packet is not called very
 much, and I do not understand why qemu_net_queue_send and
 qemu_net_queue_send_iov do not call qemu_can_send_packet before calling
 deliver/deliver_iov.
 
 If they did, hubs could then do their own flow control via can_receive.
 When qemu_send_packet returns zero you increment a count of in-flight
 packets, and a sent-packet callback would decrement the same count.

 sent-packet callback is sent_cb here? i noticed that sent_cb is currently 
 NULL.

Yes.

 When the count is non-zero, can_receive returns false (and vice versa).
 Can you elaborate this? since can_receive is called finally by 
 qemu_send_packet,
 how can can_receive return false or true based on the count?

Based on counts from the previous sends.

Paolo

Re: [Qemu-devel] [PATCH] virtio-serial-bus: use correct lengths in control_out() message

2012-03-12 Thread Michael Tokarev

On 12.03.2012 12:59, Amit Shah wrote:
 On (Sun) 11 Mar 2012 [17:52:59], Michael Tokarev wrote:
 In case of more than one control message, the code will use
 size of the largest message so far for all subsequent messages,
 instead of using size of current one.  Fix it.
 
 Makes sense.  How did you detect this?  Any reproducible test-case?

There's no test-case, and no detection, just reading the code.
Actually, I think, there's no bug here, but a very, well,
difficult to read code.

 One comment below.

and the answer is below, too.

 diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
 index e22940e..abe48ec 100644
 --- a/hw/virtio-serial-bus.c
 +++ b/hw/virtio-serial-bus.c
 @@ -451,28 +451,28 @@ static void control_out(VirtIODevice *vdev, VirtQueue 
 *vq)
  
  vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
  
  len = 0;
  buf = NULL;
  while (virtqueue_pop(vq, elem)) {
 -size_t cur_len, copied;
 +size_t cur_len;
  
  cur_len = iov_size(elem.out_sg, elem.out_num);
  /*
   * Allocate a new buf only if we didn't have one previously or
   * if the size of the buf differs
   */
  if (cur_len  len) {
  g_free(buf);
  
  buf = g_malloc(cur_len);
  len = cur_len;
  }
 -copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
 +iov_to_buf(elem.out_sg, elem.out_num, buf, 0, cur_len);
 
 Why drop 'copied'?  I don't think we have had a situation where copied
 can be less than cur_len, and in any case we don't do anything special
 as a recovery mechanism, but a warning message or an abort in case
 copied != cur_len should work, I think.

In this case, copied was _always_ == cur_len.  That's why there's
actually no bug.  See:

   cur_len = iov_size(elem.out_sg, elem.out_num);
   len = max(cur_len, buflen) = roughly
   copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);

iov_to_buf() will stop copying when it reaches end of buf
(which is len bytes long) or end of iov, which is cur_len
bytes long.  Obviously in all cases it will be cur_len.
But it is obvious only when you write it one near another
and _think_.  And the reason for this confusion is the
introduction of this `copied' variable, which shouldn't
be there in the first place.

It is like doing, for a memcpy-like function:

 void *memdup(const void *src, size_t size) {
   char *dest = malloc(size+1);
   size_t copied = copybytes(dest, src, size+1);
   if (copied != size+1) {
  /* What?? */
   }
   return dest;
}

The only sane thing here, I think, is to drop 'copied',
to stop any possible confusion :)

 -handle_control_message(vser, buf, copied);
 +handle_control_message(vser, buf, cur_len);
  virtqueue_push(vq, elem, 0);
  }
  g_free(buf);
  virtio_notify(vdev, vq);
  }

Please belive me, I realized that the original code is
actually right only after re-reading your reply.  And
please note that even you, the author, don't understand
what it is doing :)  So I think the patch is correct
still ;)

Thanks!

/mjt

Re: [Qemu-devel] Support for Nested Paging

2012-03-12 Thread Stefan Hajnoczi

On Sat, Mar 10, 2012 at 04:44:27PM -0500, Ankur Agrawal wrote:
 I am a graduate student at Stony Brook University and am working on design
 and implementation of hypervisors for OSCAR lab (
 http://oscar.cs.stonybrook.edu/). Currently I am working on implementing
 emulation of Nested Page Tables in QEMU as present in AMD-V architectures.
 
 I would like to know if the QEMU team will be interested in having a patch
 which emulates the Nested Page Table and other hardware virtualization
 techniques supported by AMD-V or Intel-VT atchitectures.
 
 I would  love to help in maintenance of my patch or any other issues in the
 QEMU in future as well.
 
 I would also like to know if there is any chance that this can become a
 part of Google Summer of Code 2012.

Yes, you can propose your own project idea.  The key thing is to find a
mentor who is willing to work with you.  You can do that on this mailing
list and #qemu IRC.

I don't know the state of i386 TCG but I suggest checking if there are
prerequisites for vmx/svm that are missing from QEMU.  You may find the
i386 TCG needs some improvements before it becomes possible to work on
vmx/svm itself.  I don't know the answer but you'll need to check this
to see if your project idea is viable.

Stefan

Re: [Qemu-devel] Support for Nested Paging

2012-03-12 Thread Stefan Hajnoczi

On Sun, Mar 04, 2012 at 01:28:04AM +0800, 陳韋任 wrote:
  Also I am trying to understand the QEMU source with an objective of 
  participating in the Google Summer of Code and contributing to QEMU. I have 
  tried tracing through the code but seems this link 
  http://repo.or.cz/w/qemu/stefanha.git/blob_plain/refs/heads/tracing:/docs/tracing.txt
   is not updated because many of the options do not work here. I would very 
  happy if someone could provide me links to a good starting point to 
  understand QEMU source code.
 
   The tracing you mentioned is not tend to help reading the code. Depends on
 which part of QEMU you're trying to play with, you have some background
 knowledge of it.

I think the only thing missing is that you need to use:

  ./configure --enable-trace-backend=backend

The docs still say ./configure --trace-backend=backend.  I am sending
a pull request that includes a patch to update this.

Stefan

Re: [Qemu-devel] [PATCH 1/2] Support @documentencoding in scripts/texi2pod.pl

2012-03-12 Thread Stefan Hajnoczi

On Sun, Mar 11, 2012 at 01:56:41PM +0400, Michael Tokarev wrote:
 Ping?  It's been more than a month since this patch has been posted.
 
 Maybe it is a good candidate for -trivial queue?

One thing I don't understand is where you actually use
@documentencoding?  I don't see it in you 2 patches.

Stefan

[Qemu-devel] [PATCH 1/2] trace-events: Rename 'next' argument

2012-03-12 Thread Kevin Wolf

'next' is a systemtap keyword, so it's a bad idea to use it as an
argument name.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 trace-events |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/trace-events b/trace-events
index 606d903..2a96353 100644
--- a/trace-events
+++ b/trace-events
@@ -530,7 +530,7 @@ qemu_coroutine_terminate(void *co) self %p
 
 # qemu-coroutine-lock.c
 qemu_co_queue_next_bh(void) 
-qemu_co_queue_next(void *next) next %p
+qemu_co_queue_next(void *nxt) next %p
 qemu_co_mutex_lock_entry(void *mutex, void *self) mutex %p self %p
 qemu_co_mutex_lock_return(void *mutex, void *self) mutex %p self %p
 qemu_co_mutex_unlock_entry(void *mutex, void *self) mutex %p self %p
-- 
1.7.6.5

[Qemu-devel] [PATCH 2/2] tracetool: Forbid argument name 'next'

2012-03-12 Thread Kevin Wolf

It has happened more than once that patches that look perfectly sane
and work with simpletrace broke systemtap because they use 'next' as an
argument name for a tracing function. However, 'next' is a keyword for
systemtap, so we shouldn't use it.

Signed-off-by: Kevin Wolf kw...@redhat.com
---
 scripts/tracetool |4 
 1 files changed, 4 insertions(+), 0 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index 4c9951d..f892af4 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -81,6 +81,10 @@ get_args()
 args=${1#*\(}
 args=${args%%\)*}
 echo $args
+
+if (echo $args | grep [ *]next\($\|[, ]\)  /dev/null 21); then
+echo -e \n#error 'next' is a bad argument name (clash with systemtap 
keyword)\n 
+fi
 }
 
 # Get the argument name list of a trace event
-- 
1.7.6.5

Re: [Qemu-devel] [RFC PATCH 14/17] block: support FALLOC_FL_PUNCH_HOLE trimming

2012-03-12 Thread Paolo Bonzini

Il 09/03/2012 21:36, Richard Laager ha scritto:
 fallocate() only supports files. In my patch, I was fstat()ing the fd
 just after opening it and caching an is_device boolean. Then, when doing
 discards, if !is_device, call fallocate(), else (i.e. for devices) do
 the following (note: untested code):
 
 __uint64_t range[2];
 
 range[0] = sector_num  9;
 range[1] = (__uint64_t)nb_sectors  9;
 
 if (ioctl(s-fd, BLKDISCARD, range)  errno != EOPNOTSUPP) {
  return -errno;
 }
 return 0;

You can instead put this code in a separate function hdev_discard.  hdev
device is used instead of file when the backend is a special file (block
or character).

  sync requirements 
 
 I'm not sure if fallocate() and/or BLKDISCARD always guarantee that the
 discard has made it to stable storage. If they don't, does O_DIRECT or
 O_DSYNC on open() cause them to make any such guarantee?

O_DIRECT shouldn't have any effect; for O_DSYNC I don't think so.

  Thin vs. Thick Provisioning 
 
 On Fri, 2012-03-09 at 00:35 -0800, Chris Wedgwood wrote:
 Simplest still compare the blocks allocated by the file
 to it's length (ie. stat.st_blocks != stat.st_size9).
 
 I thought of this as well. It covers 99% of the cases, but there's one
 case where it breaks down. Imagine I have a sparse file backing my
 virtual disk. In the guest, I fill the virtual disk 100%. Then, I
 restart QEMU. Now it thinks that sparse file is non-sparse and stops
 issuing hole punches.

I would not really have any problem with that, it seems like a
relatively minor case (also because newer guests will allocate the first
partition at 1 MB, so you might still have a small hole at the beginning).

 To be completely correct, I suggest the following behavior:
  1. Add a discard boolean option to the disk layer.
  2. If discard is not specified:
   * For files, detect a true/false value by comparing
 stat.st_blocks != stat.st_size9.
   * For devices, assume a fixed value (true?).
  3. If discard is true, issue discards.
  4. If discard is false, do not issue discards.

The problem is, who will use this interface?

Paolo

[Qemu-devel] [PATCH] acpi: beginnings of piix acpi interface doc

2012-03-12 Thread Michael S. Tsirkin

Before we start tweaking and enhancing hardware, I think
it makes sense to document what we currently have, to make
sure we stay compatible.
This documents the hotplug interface for piix.
Stubs for cpu hotplug, PM.

Signed-off-by: Michael S. Tsirkin m...@redhat.com
---
 docs/acpi.txt |   32 
 1 files changed, 32 insertions(+), 0 deletions(-)
 create mode 100644 docs/acpi.txt

diff --git a/docs/acpi.txt b/docs/acpi.txt
new file mode 100644
index 000..4938d48
--- /dev/null
+++ b/docs/acpi.txt
@@ -0,0 +1,32 @@
+QEMU exposes the following registers to guests,
+intended primarily for use by the ACPI interface.
+
+PCI Hotplug
+
+
+Events use the standard GPE register:
+GPE  0xafe0 - an ACPI GPE register
+
+Hotplug events set GPE bit 1 (mask 0x2)
+
+The following registers are used for PCI hotplug.
+Each register is 32 bit (4 bytes) long, and has little endian format.
+Bits 0-31 in each register correspond to slots 0-31 on the root bus,
+respectively.
+
+UP   0xae00 - RO - Bit set by host on device insertion (note:existing 
implementations
+   trigger device check event)
+DOWN 0xae04 - RO - Bit set by host on user eject request
+EJ0  0xae08 - WO - Bit set by guest removes all power to device
+RMV  0xae0c - RO - Bit set by host if slot supports hotplug
+   (can not change while guest is up)
+
+
+Power management
+
+TODO
+
+
+CPU hotplug
+
+TODO
-- 
1.7.9.111.gf3fb0

Re: [Qemu-devel] [PATCH RFC v4 44/44] qom: Introduce CPU class

2012-03-12 Thread Igor Mammedov


On 03/10/2012 03:28 AM, Andreas Färber wrote:

Reintroduce CPUState as QOM object: It's abstract and derived directly
from TYPE_OBJECT for compatibility with the user emulators.
The identifier CPUState avoids conflicts between CPU() and the struct.

Introduce $(qom-twice-y) to build it separately for system and for user
emulators.

Prepare a virtual reset method, (re)introduce cpu_reset() as wrapper.

Signed-off-by: Andreas Färberafaer...@suse.de
Cc: Anthony Liguorianth...@codemonkey.ws
---
  Makefile.objs  |3 ++
  configure  |1 +
  include/qemu/cpu.h |   75 
  qom/Makefile   |1 +
  qom/cpu.c  |   58 
  5 files changed, 138 insertions(+), 0 deletions(-)
  create mode 100644 include/qemu/cpu.h
  create mode 100644 qom/cpu.c

diff --git a/Makefile.objs b/Makefile.objs
index 431b7a1..291baf5 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -14,6 +14,7 @@ universal-obj-y += $(qobject-obj-y)
  # QOM
  include $(SRC_PATH)/qom/Makefile
  qom-obj-y = $(addprefix qom/, $(qom-y))
+qom-obj-twice-y = $(addprefix qom/, $(qom-twice-y))

  universal-obj-y += $(qom-obj-y)

@@ -89,6 +90,7 @@ fsdev-obj-$(CONFIG_VIRTFS) += $(addprefix fsdev/, 
$(fsdev-nested-y))

  common-obj-y = $(block-obj-y) blockdev.o
  common-obj-y += $(net-obj-y)
+common-obj-y += $(qom-obj-twice-y)
  common-obj-$(CONFIG_LINUX) += $(fsdev-obj-$(CONFIG_LINUX))
  common-obj-y += readline.o console.o cursor.o
  common-obj-y += $(oslib-obj-y)
@@ -194,6 +196,7 @@ user-obj-y += cutils.o cache-utils.o
  user-obj-y += module.o
  user-obj-y += qemu-user.o
  user-obj-y += $(trace-obj-y)
+user-obj-y += $(qom-obj-twice-y)

  ##
  # libhw
diff --git a/configure b/configure
index 66a65d9..1826af5 100755
--- a/configure
+++ b/configure
@@ -3888,6 +3888,7 @@ fi
  d=libuser
  mkdir -p $d
  mkdir -p $d/trace
+mkdir -p $d/qom
  symlink $source_path/Makefile.user $d/Makefile

  if test $docs = yes ; then
diff --git a/include/qemu/cpu.h b/include/qemu/cpu.h
new file mode 100644
index 000..4291279
--- /dev/null
+++ b/include/qemu/cpu.h
@@ -0,0 +1,75 @@
+/*
+ * QEMU CPU model
+ *
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version 2
+ * of the License, or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see
+ *http://www.gnu.org/licenses/gpl-2.0.html
+ */
+#ifndef QEMU_CPU_H
+#define QEMU_CPU_H
+
+#include qemu/object.h
+
+/**
+ * SECTION:cpu
+ * @section_id: QEMU-cpu
+ * @title: CPU Class
+ * @short_description: Base class for all CPUs
+ */
+
+#define TYPE_CPU cpu
+
+#define CPU(obj) OBJECT_CHECK(CPUState, (obj), TYPE_CPU)
+#define CPU_CLASS(class) OBJECT_CLASS_CHECK(CPUClass, (class), TYPE_CPU)
+#define CPU_GET_CLASS(obj) OBJECT_GET_CLASS(CPUClass, (obj), TYPE_CPU)
+
+typedef struct CPUState CPUState;
+
+/**
+ * CPUClass:
+ * @reset: Callback to reset the #CPU to its initial state.
+ *
+ * Represents a CPU family or model.
+ */
+typedef struct CPUClass {
+/*  private*/
+ObjectClass parent_class;
+/*  public*/
+
+void (*reset)(CPUState *cpu);

Why not use Object* as argument here?
It will be easier to generalize later qdev code and not make special case when
adding cpus.

BTW how we are going to generalize qdev and make its infrastructure available
to other types except of DEVICE_TYPE.
Maybe we should introduce some (abstract) base class (or interface) for basic
device that will define methods like reset, realize, unrealize and use it in
qdev code instead of DEVICE_TYPE?


+} CPUClass;
+
+/**
+ * CPUState:
+ *
+ * State of one CPU core or thread.
+ */
+struct CPUState {
+/*  private*/
+Object parent_obj;
+/*  public*/
+
+/* TODO Move common fields from CPUState here. */
+};
+
+
+/**
+ * cpu_reset:
+ * @cpu: The CPU whose state is to be reset.
+ */
+void cpu_reset(CPUState *cpu);
+
+
+#endif
diff --git a/qom/Makefile b/qom/Makefile
index 885a263..34c6de5 100644
--- a/qom/Makefile
+++ b/qom/Makefile
@@ -1 +1,2 @@
  qom-y = object.o container.o qom-qobject.o
+qom-twice-y = cpu.o
diff --git a/qom/cpu.c b/qom/cpu.c
new file mode 100644
index 000..5b36046
--- /dev/null
+++ b/qom/cpu.c
@@ -0,0 +1,58 @@
+/*
+ * QEMU CPU model
+ *
+ * Copyright (c) 2012 SUSE LINUX Products GmbH
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ *

Re: [Qemu-devel] KVM call agenda for tuesday 31

2012-03-12 Thread Igor Mitsyanko


On 02/21/2012 07:33 PM, Peter Maydell wrote:


Short summary:
  * switch wp groups to bitfield rather than int array
  * convert sd.c to use memory_region_init_ram() to allocate the wp groups
(being careful to use memory_region_set_dirty() when we touch them)
  * we don't need variable-length fields for sd.c any more
  * rest of the vmstate conversion is straightforward



After a conversion to bitfield wp_groups consumes 2048 bytes for 32 GB 
SD image, do we really need live section for this?


--
Mitsyanko Igor
ASWG, Moscow RD center, Samsung Electronics
email: i.mitsya...@samsung.com

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Gerd Hoffmann

  Hi,

 The problem with (b) is, that iirc the way b was implemented in the past
 was still the big blob approach, but then pass the blob through the client,
 which means an evil client could modify it, causing all sorts of
 interesting
 behavior inside spice-server. Since we're re-implementing this to me the
 send a blob through the client approach is simply not acceptable from a
 security pov, also see my previous mail in this thread.

Agree.  It should be a normal spice message which goes through the spice
marshaller for parsing  sanity checking.

 I disagree. Note that there is more info to send over then just which
 surfaces / images are cached by the client. There also is things like
 partial complete agent channel messages, including how much bytes must
 be read
 to complete the command, etc.

Is there a complete list of the session state we need to save?

 IMHO (b) would only be acceptable if the data send through the client stops
 becoming a blob.

Using (a) to send a blob isn't better.

 Instead the client could simply send a list of all
 surface ids,
 etc. which it has cached after it connects to / starts using the new
 host. Note
 that the old hosts needs to send nothing for this, this is info the
 client already
 has, also removing the need for synchronization.

Yes, some session state is known to the client anyway so we don't need a
source - target channel for them.

 As for certain other
 data, such
 as (but not limited to) partially parsed agent messages, these should be
 send through the regular vmstate methods IMHO.

That isn't easy to handle via vmstate, at least as soon as this goes
beyond a fixed number of fields aka 'migrate over this struct for me'.
Think multiple spice clients connected at the same time.

 1) Do (a), sending everything that way
 2) Do (a) sending non client state that way; and
let the client send state like which surfaces it has cached
when the new session starts.

I think we have to look at each piece of state information needed by the
target and look how to handle this best.

cheers,
  Gerd

Re: [Qemu-devel] tuning parameters qed

2012-03-12 Thread Stefan Hajnoczi

On Mon, Mar 12, 2012 at 9:12 AM, PANKAJ RAWAT pankajr...@gmail.com wrote:
 Hi all i have a kvm machine and i use qed format as the disk image format
 I want to improve the I/O performance of my machine
 can anyone tell other then cluster_size and cache what are the tunining
 parameters for qed so that i can improve the performance of my VM ?

Cluster and L1/L2 table size are the only QED-specific parameters.

What to tune really depends on what bottleneck you are seeing.  You
need to do disk I/O benchmarks to find that out.

If you are generally looking for good settings, I suggest leaving the
QED parameters at their default and running with -drive
if=virtio,cache=none,aio=native,file=path/to/image.qed with a recent
QEMU and host kernel.

Stefan

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Alon Levy

On Mon, Mar 12, 2012 at 10:46:44AM +0100, Gerd Hoffmann wrote:
   Hi,
 
  The problem with (b) is, that iirc the way b was implemented in the past
  was still the big blob approach, but then pass the blob through the client,
  which means an evil client could modify it, causing all sorts of
  interesting
  behavior inside spice-server. Since we're re-implementing this to me the
  send a blob through the client approach is simply not acceptable from a
  security pov, also see my previous mail in this thread.
 
 Agree.  It should be a normal spice message which goes through the spice
 marshaller for parsing  sanity checking.
 
  I disagree. Note that there is more info to send over then just which
  surfaces / images are cached by the client. There also is things like
  partial complete agent channel messages, including how much bytes must
  be read
  to complete the command, etc.
 
 Is there a complete list of the session state we need to save?
 
  IMHO (b) would only be acceptable if the data send through the client stops
  becoming a blob.
 
 Using (a) to send a blob isn't better.

Actually, we discussed this in the past and one benefit is that network
between source and target qemu would be fast (otherwise migration
wouldn't work in the first place), as opposed to source-client and
client-dest. Additionally you save one transaction.

 
  Instead the client could simply send a list of all
  surface ids,
  etc. which it has cached after it connects to / starts using the new
  host. Note
  that the old hosts needs to send nothing for this, this is info the
  client already
  has, also removing the need for synchronization.
 
 Yes, some session state is known to the client anyway so we don't need a
 source - target channel for them.
 
  As for certain other
  data, such
  as (but not limited to) partially parsed agent messages, these should be
  send through the regular vmstate methods IMHO.
 
 That isn't easy to handle via vmstate, at least as soon as this goes
 beyond a fixed number of fields aka 'migrate over this struct for me'.
 Think multiple spice clients connected at the same time.

Migrate this struct n times for me.

 
  1) Do (a), sending everything that way
  2) Do (a) sending non client state that way; and
 let the client send state like which surfaces it has cached
 when the new session starts.
 
 I think we have to look at each piece of state information needed by the
 target and look how to handle this best.
 
 cheers,
   Gerd

Re: [Qemu-devel] change the default value of timeout

2012-03-12 Thread Stefan Hajnoczi

On Mon, Mar 12, 2012 at 3:17 AM, Zhang, Yang Z yang.z.zh...@intel.com wrote:
        Currently, if not using nonblocking mode, the default timeout of 
 select() in main_loop_wait is 1000ms. There has no problem if you run few 
 VMs. But when running more VMs like 32 or 64, then the problem is coming. Our 
 experience shows that when running 64 idle VMs, the pkg C6 residency is 88% 
 by default and it goes to 90% when I change timeout to 10s. And 2% means 1 
 watt in my box. Since this is only a timeout value for select, so I suggest 
 to use a more reasonable way to set it rather than using a fixed value. For 
 example, using an argument to set it by user. But I am not sure whether 
 something else is depend on the timeout.

It's not obvious to me why we have a finite select(2) timeout in
main-loop.c:main_loop_wait().  State changes should use
qemu_notify_event() to leave the blocking select(2) call.

A user-visible argument doesn't seem like a solution to me.  How
should a user choose a value?

How about testing with an infinite timeout (timeout == NULL)?  Try
both KVM and TCG to see if it has weird effects.

Stefan

Re: [Qemu-devel] [PATCH 1/2] Support @documentencoding in scripts/texi2pod.pl

2012-03-12 Thread Michael Tokarev

On 12.03.2012 12:21, Stefan Hajnoczi wrote:
 On Sun, Mar 11, 2012 at 01:56:41PM +0400, Michael Tokarev wrote:
 Ping?  It's been more than a month since this patch has been posted.

 Maybe it is a good candidate for -trivial queue?
 
 One thing I don't understand is where you actually use
 @documentencoding?  I don't see it in you 2 patches.

It is already used in the existing docs, that's the
whole point ;)

Thanks,

/mjt

Re: [Qemu-devel] [PATCH] Tracing documentation changes

2012-03-12 Thread Stefan Hajnoczi

On Sun, Mar 11, 2012 at 4:13 PM, Peter Teoh htmldevelo...@gmail.com wrote:
 I was trying out the tracing feature of QEMU after checking out the
 git tree at git://git.qemu.org/qemu.git, and managed to generate some
 traces, but the following are the changes needed to the documentation,
 in order to successfully generate the tracing.

 Some comments:  qemu-system-i386 was used because the present git tree
 does not generate any qemu binary at all.

 Comments?

Thanks for the documentation update.

Please see http://wiki.qemu.org/Contribute/SubmitAPatch.

Although i386-specific command-lines are helpful to someone using QEMU
for the first time, they are subject to change so I'd rather not
include them.

The ./configure --enable-trace-backend fix has been submitted and I
will send a pull request shortly.

The scripts/simpletrace.py update is useful - please submit that.

Stefan

Re: [Qemu-devel] [PATCH v2 1/5] qxl: switch qxl.c to trace-events

2012-03-12 Thread Gerd Hoffmann

On 03/11/12 20:26, Alon Levy wrote:
 dprint is still used for qxl_init_common one time prints.

I think we shouldn't simply convert the dprintf's into trace-points.

We should look at each dprintf and check whenever it makes sense at all,
whenever it makes sense at that place before converting it over to a
tracepoint.

 @@ -409,7 +410,7 @@ static void interface_attach_worker(QXLInstance *sin, 
 QXLWorker *qxl_worker)
  {
  PCIQXLDevice *qxl = container_of(sin, PCIQXLDevice, ssd.qxl);
  
 -dprint(qxl, 1, %s:\n, __FUNCTION__);
 +trace_qxl_interface_attach_worker();
  qxl-ssd.worker = qxl_worker;
  }

For example: Do we really need that one?

 @@ -505,9 +506,10 @@ static int interface_get_command(QXLInstance *sin, 
 struct QXLCommandExt *ext)
  QXLCommand *cmd;
  int notify, ret;
  
 +trace_qxl_interface_get_command_enter(qxl_mode_to_string(qxl-mode));
 +

Why this?

 -dprint(qxl, 1, %s: scheduling update_area_bh, #dirty %d\n,
 -   __func__, qxl-num_dirty_rects);
 +
 trace_qxl_interface_update_area_complete_schedule_bh(qxl-num_dirty_rects);

I think it makes more sense to have the tracepoint in the bottom half
handler instead.

  static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
  {
 -dprint(d, 1, %s: start%s\n, __FUNCTION__,
 -   loadvm ?  (loadvm) : );
 +trace_qxl_hard_reset_enter(loadvm);
  
  qxl_spice_reset_cursor(d);
  qxl_spice_reset_image_cache(d);
 @@ -934,7 +935,7 @@ static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
  qemu_spice_create_host_memslot(d-ssd);
  qxl_soft_reset(d);
  
 -dprint(d, 1, %s: done\n, __FUNCTION__);
 +trace_qxl_hard_reset_exit();
  }

Do we need the exit tracepoint?

  static void qxl_reset_memslots(PCIQXLDevice *d)
  {
 -dprint(d, 1, %s:\n, __FUNCTION__);
 +trace_qxl_reset_memslots();
  qxl_spice_reset_memslots(d);
  memset(d-guest_slots, 0, sizeof(d-guest_slots));
  }

Do we need that one?  qxl_hard_reset is the only caller of that function ...

 @@ -1216,8 +1213,8 @@ static void ioport_write(void *opaque, 
 target_phys_addr_t addr,
  if (d-mode != QXL_MODE_VGA) {
  break;
  }
 -dprint(d, 1, %s: unexpected port 0x%x (%s) in vga mode\n,
 -__func__, io_port, io_port_to_string(io_port));
 +trace_qxl_io_unexpected_vga_mode(
 +io_port, io_port_to_string(io_port));

We might want raise an error irq here, and have a tracepoint in
qxl_guest_bug() of course ...

  case QXL_IO_SET_MODE:
 -dprint(d, 1, QXL_SET_MODE %d\n, (int)val);
 +trace_qxl_io_set_mode(val);
  qxl_set_mode(d, val, 0);

Needed?  There is a tracepoint in qxl_set_mode() ...

  case QXL_IO_RESET:
 -dprint(d, 1, QXL_IO_RESET\n);
 +trace_qxl_io_reset();
  qxl_hard_reset(d, 0);

... likewise ...

  break;
  case QXL_IO_MEMSLOT_ADD:
 @@ -1337,7 +1334,7 @@ async_common:
async);
  goto cancel_async;
  }
 -dprint(d, 1, QXL_IO_CREATE_PRIMARY async=%d\n, async);
 +trace_qxl_io_create_primary(async);
  d-guest_primary.surface = d-ram-create_surface;
  qxl_create_guest_primary(d, 0, async);

... here too ...

We might want to have a trace_qxl_io_write(addr, val) at the start of
the function, so we see all guest writes.  Traces for the actual ops (if
needed at all) are probably much better placed into the functions
executing the op as they can trace more details (i.e. qxl_set_mode has
the tracepoint *after* looking up the mode so we can stick the mode info
into the trace too).

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Stefan Hajnoczi

On Sat, Mar 10, 2012 at 7:56 PM, Floris Bos b...@je-eigen-domein.nl wrote:
 @@ -1885,6 +1885,22 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, 
 IDEDriveKind kind,
         snprintf(s-drive_serial_str, sizeof(s-drive_serial_str),
                  QM%05d, s-drive_serial);
     }
 +    if (model) {
 +        strncpy(s-drive_model_str, model, sizeof(s-drive_model_str));

strncpy(3) does not NUL-terminate if the max length is reached.
Either you need to use pstrcpy() or specify sizeof(s-drive_model_str)
- 1 and make sure s-drive_model_str[40] = '\0'.

 @@ -146,6 +155,9 @@ static int ide_dev_initfn(IDEDevice *dev, IDEDriveKind 
 kind)
     if (!dev-serial) {
         dev-serial = g_strdup(s-drive_serial_str);
     }
 +    if (!dev-model) {
 +        dev-model = g_strdup(s-drive_model_str);
 +    }

Seems this will never be freed but dev-serial has the same issue, so
this isn't new.

Stefan

Re: [Qemu-devel] [PATCH] maintainers: Add docs/tracing.txt to Tracing

2012-03-12 Thread Stefan Hajnoczi

On Sat, Mar 10, 2012 at 12:37 PM, Andreas Färber afaer...@suse.de wrote:
 The topic of whether and by whom docs/tracing.txt is maintained was
 brought up. It currently does not have an official maintainer.

 Add it to the tracing section so that Stefan gets cc'ed on patches.

 Signed-off-by: Andreas Färber afaer...@suse.de
 Cc: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 Cc: Peter Maydell peter.mayd...@linaro.org
 ---
  MAINTAINERS |    1 +
  1 files changed, 1 insertions(+), 0 deletions(-)

Acked-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Gerd Hoffmann

  Hi,

 As for certain other
 data, such
 as (but not limited to) partially parsed agent messages, these should be
 send through the regular vmstate methods IMHO.

 That isn't easy to handle via vmstate, at least as soon as this goes
 beyond a fixed number of fields aka 'migrate over this struct for me'.
 Think multiple spice clients connected at the same time.
 
 Migrate this struct n times for me.

I think for the agent case this isn't needed.  Or is every client
allowed to speak to the agent in case of multiple clients connected?  I
somehow doubt this can work as the agent protocol can't multicast ...

cheers,
  Gerd

Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-12 Thread Avi Kivity

On 03/09/2012 03:21 AM, Wen Congyang wrote:
 
  Changes from v2 to v3:
  1. correct spelling
 
  Changes from v1 to v2:
  1. split up host and guest-side changes
  2. introduce new request flag to avoid changing return values.
  
  I see no Documentation/ changes.

 What shoude be writen into Documentation/? I read the documents under
 Documentation/virtual/kvm, not all KVM_EXIT_* are writen in api.txt.
 In this patch set, I introduce a new exit_reason: KVM_EXIT_GUEST_PANICKED,
 but I donot know where I should write.

Describe the exit (why it happens).  Also document the new hypercall. 
If there is no good place for it, start a new file.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 0/2 v3] kvm: notify host when guest panicked

2012-03-12 Thread Avi Kivity

On 03/12/2012 11:04 AM, Wen Congyang wrote:
 Do you have any other comments about this patch?


Not really, but I'm not 100% convinced the patch is worthwhile.  It's
likely to only be used by Linux, which has kexec facilities, and you can
put talk to management via virtio-serial and describe the crash in more
details than a simple hypercall.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH 1/2] Support @documentencoding in scripts/texi2pod.pl

2012-03-12 Thread Stefan Hajnoczi

On Mon, Mar 12, 2012 at 10:14 AM, Michael Tokarev m...@tls.msk.ru wrote:
 On 12.03.2012 12:21, Stefan Hajnoczi wrote:
 On Sun, Mar 11, 2012 at 01:56:41PM +0400, Michael Tokarev wrote:
 Ping?  It's been more than a month since this patch has been posted.

 Maybe it is a good candidate for -trivial queue?

 One thing I don't understand is where you actually use
 @documentencoding?  I don't see it in you 2 patches.

 It is already used in the existing docs, that's the
 whole point ;)

Ah, it would have been good to explain that in the commit description :).

Stefan

Re: [Qemu-devel] [PATCH v4 1/3] kvmclock: Always register type

2012-03-12 Thread Avi Kivity

On 03/10/2012 03:35 AM, Andreas Färber wrote:
 Am 05.03.2012 10:23, schrieb Avi Kivity:
  On 03/04/2012 10:32 PM, Andreas Färber wrote:
  Currently, the kvmclock type is only registered when kvm_enabled().
 
  This breaks when moving type registration to before command line
  parsing (so that QOM types can be used for CPU and machine).
 
  Since the QOM classes are lazy-initialized anyway and kvmclock_create()
  has another kvm_enabled() check, simply drop the KVM check in
  kvmclock_register_types().
 
  kvm-i8259, kvm-apic and kvm-ioapic do not suffer from such a check.
  
  Patch looks good.

 Ping for series.

 Avi, do you want to sign this patch off through uq/master? Or should I
 make the above a Reviewed-by (and remove the Cc:) within this series?


Reviewed-by: please.

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] Re : Re : Regression: more 0.12 regression (SeaBIOS related?)

2012-03-12 Thread Alain Ribière

On Wed, Mar 07, 2012 at 06:31:31AM -0800, Alain Ribière wrote:

  I ran qemu 1.0.1 and the latest SeaBIOS (from the git) with the following 
  options :
  qemu-system-i386 -L git/bios -fda disk.img -no-fd-bootchk -boot a -m 16
  
  Here is the log :
  https://docs.google.com/open?id=0B7mz0vq6Rpb7UE1ibjJDcEhTRWlNV050QnMyMWwtZw
  
  Here is the floppy disk image I used :
  
  https://docs.google.com/open?id=0B7mz0vq6Rpb7bHpYaEt2SnVUUi1KaWE3a3lBQUJpQQ
  
  
  The floppy disk is simply a C-DOS 720 Ko floppy created by format
  a: /s. So it's quite empty.
  
  Qemu doesn't crash or freeze. But I can just type a single character
  and the nothing else. But the system is still running (there is a
  clock at the bottom right of the screen).
 
 I tracked this down.  Looks like the image takes over the PS2 irq and
 keyboard handling, but then occasionally calls into the BIOS.  When it
 does call the BIOS irq handler (manually), it expects the irq handler
 to enable the keyboard.  Weird.
 
 Anyway, the patch below fixes it for me.
 
 -Kevin

Great ! It works for me too.

Thanks a lot,

Alain

Re: [Qemu-devel] [RFC PATCH 07/17] block: make high level discard operation always zero

2012-03-12 Thread Avi Kivity

On 03/09/2012 06:42 PM, Kevin Wolf wrote:
 
  While (3) can be worked around, the only way around the other two,
  unfortunately, is support in the formats and protocols.  We will still
  provide device options to opt out of this, but with raw and qed covered
  (+ qcow2 without backing file, and qcow3 in the future) it should not
  be too bad.
  
  Can't qcow2 with a backing file also be supported?  Zero out the first
  cluster, and remember it.  The following discards can reuse this zero
  cluster, as long as it hasn't been overwritten.

 qcow2 can't handle clusters that are referenced twice from the same L1
 table. This would require a reverse lookup to adjust the QCOW_O_COPIED
 flags in the L2 tables containing the other references.

Don't follow, sorry.  What adjustment are you talking about?

If it's a 1-0 transition, is it mandatory to adjust the flag?  That is,
it it legal to have a refcount of exactly one, but have the flag clear?

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 07/17] block: make high level discard operation always zero

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 11:42, schrieb Avi Kivity:
 On 03/09/2012 06:42 PM, Kevin Wolf wrote:

 While (3) can be worked around, the only way around the other two,
 unfortunately, is support in the formats and protocols.  We will still
 provide device options to opt out of this, but with raw and qed covered
 (+ qcow2 without backing file, and qcow3 in the future) it should not
 be too bad.

 Can't qcow2 with a backing file also be supported?  Zero out the first
 cluster, and remember it.  The following discards can reuse this zero
 cluster, as long as it hasn't been overwritten.

 qcow2 can't handle clusters that are referenced twice from the same L1
 table. This would require a reverse lookup to adjust the QCOW_O_COPIED
 flags in the L2 tables containing the other references.
 
 Don't follow, sorry.  What adjustment are you talking about?
 
 If it's a 1-0 transition, is it mandatory to adjust the flag?  That is,
 it it legal to have a refcount of exactly one, but have the flag clear?

According to the spec it's illegal and qemu-img check will complain,
too. In practice, I'm not entirely sure if it will cause real corruption
or just unnecessary COWs. I believe it may be the latter.

But it's not worth the trouble anyway when we can have a real zero flag
in qcow3.

Kevin

Re: [Qemu-devel] [PATCH 2/2] tracetool: Forbid argument name 'next'

2012-03-12 Thread Stefan Hajnoczi

On Mon, Mar 12, 2012 at 9:34 AM, Kevin Wolf kw...@redhat.com wrote:
 It has happened more than once that patches that look perfectly sane
 and work with simpletrace broke systemtap because they use 'next' as an
 argument name for a tracing function. However, 'next' is a keyword for
 systemtap, so we shouldn't use it.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  scripts/tracetool |    4 
  1 files changed, 4 insertions(+), 0 deletions(-)

 diff --git a/scripts/tracetool b/scripts/tracetool
 index 4c9951d..f892af4 100755
 --- a/scripts/tracetool
 +++ b/scripts/tracetool
 @@ -81,6 +81,10 @@ get_args()
     args=${1#*\(}
     args=${args%%\)*}
     echo $args
 +
 +    if (echo $args | grep [ *]next\($\|[, ]\)  /dev/null 21); then
 +        echo -e \n#error 'next' is a bad argument name (clash with 
 systemtap keyword)\n 
 +    fi

Good idea, let's prevent it from being used.

I don't think this is the way to do it because callers will parse
stdout and we're not guaranteed to be generating C code where #error
works.  Instead, we can echo to stderr and do exit 1.

Stefan

Re: [Qemu-devel] [PATCH 2/2] tracetool: Forbid argument name 'next'

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 12:01, schrieb Stefan Hajnoczi:
 On Mon, Mar 12, 2012 at 9:34 AM, Kevin Wolf kw...@redhat.com wrote:
 It has happened more than once that patches that look perfectly sane
 and work with simpletrace broke systemtap because they use 'next' as an
 argument name for a tracing function. However, 'next' is a keyword for
 systemtap, so we shouldn't use it.

 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  scripts/tracetool |4 
  1 files changed, 4 insertions(+), 0 deletions(-)

 diff --git a/scripts/tracetool b/scripts/tracetool
 index 4c9951d..f892af4 100755
 --- a/scripts/tracetool
 +++ b/scripts/tracetool
 @@ -81,6 +81,10 @@ get_args()
 args=${1#*\(}
 args=${args%%\)*}
 echo $args
 +
 +if (echo $args | grep [ *]next\($\|[, ]\)  /dev/null 21); then
 +echo -e \n#error 'next' is a bad argument name (clash with 
 systemtap keyword)\n 
 +fi
 
 Good idea, let's prevent it from being used.
 
 I don't think this is the way to do it because callers will parse
 stdout and we're not guaranteed to be generating C code where #error
 works.  Instead, we can echo to stderr and do exit 1.

Yes, we may generate something else additionally. But trace.h is
generated in any case and it will cause a compile error before any non-C
file is used.

I tried the 'exit 1' approach first and it doesn't really work. This
function is called from a subshell, so you don't exit tracetool but just
the one subshell and end up with a broken trace.h that will fail
compilation in the same place, just with a less helpful error message.

Kevin

Re: [Qemu-devel] [PATCH] virtio-serial-bus: use correct lengths in control_out() message

2012-03-12 Thread Amit Shah

On (Mon) 12 Mar 2012 [13:22:22], Michael Tokarev wrote:
 On 12.03.2012 12:59, Amit Shah wrote:
  On (Sun) 11 Mar 2012 [17:52:59], Michael Tokarev wrote:
  In case of more than one control message, the code will use
  size of the largest message so far for all subsequent messages,
  instead of using size of current one.  Fix it.
  
  Makes sense.  How did you detect this?  Any reproducible test-case?
 
 There's no test-case, and no detection, just reading the code.
 Actually, I think, there's no bug here, but a very, well,
 difficult to read code.

Do you mean this code is difficult to read, or in general?  Any ideas
to make it simpler (or at least details on what's difficult?)

  diff --git a/hw/virtio-serial-bus.c b/hw/virtio-serial-bus.c
  index e22940e..abe48ec 100644
  --- a/hw/virtio-serial-bus.c
  +++ b/hw/virtio-serial-bus.c
  @@ -451,28 +451,28 @@ static void control_out(VirtIODevice *vdev, 
  VirtQueue *vq)
   
   vser = DO_UPCAST(VirtIOSerial, vdev, vdev);
   
   len = 0;
   buf = NULL;
   while (virtqueue_pop(vq, elem)) {
  -size_t cur_len, copied;
  +size_t cur_len;
   
   cur_len = iov_size(elem.out_sg, elem.out_num);
   /*
* Allocate a new buf only if we didn't have one previously or
* if the size of the buf differs
*/
   if (cur_len  len) {
   g_free(buf);
   
   buf = g_malloc(cur_len);
   len = cur_len;
   }
  -copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
  +iov_to_buf(elem.out_sg, elem.out_num, buf, 0, cur_len);
  
  Why drop 'copied'?  I don't think we have had a situation where copied
  can be less than cur_len, and in any case we don't do anything special
  as a recovery mechanism, but a warning message or an abort in case
  copied != cur_len should work, I think.
 
 In this case, copied was _always_ == cur_len.  That's why there's
 actually no bug.  See:
 
cur_len = iov_size(elem.out_sg, elem.out_num);
len = max(cur_len, buflen) = roughly
copied = iov_to_buf(elem.out_sg, elem.out_num, buf, 0, len);
 
 iov_to_buf() will stop copying when it reaches end of buf
 (which is len bytes long) or end of iov, which is cur_len
 bytes long.  Obviously in all cases it will be cur_len.
 But it is obvious only when you write it one near another
 and _think_.  And the reason for this confusion is the
 introduction of this `copied' variable, which shouldn't
 be there in the first place.
 
 It is like doing, for a memcpy-like function:
 
  void *memdup(const void *src, size_t size) {
char *dest = malloc(size+1);
size_t copied = copybytes(dest, src, size+1);
if (copied != size+1) {
   /* What?? */
}
return dest;
 }
 
 The only sane thing here, I think, is to drop 'copied',
 to stop any possible confusion :)

  -handle_control_message(vser, buf, copied);
  +handle_control_message(vser, buf, cur_len);
   virtqueue_push(vq, elem, 0);
   }
   g_free(buf);
   virtio_notify(vdev, vq);
   }
 
 Please belive me, I realized that the original code is
 actually right only after re-reading your reply.

Heh.

  And
 please note that even you, the author, don't understand
 what it is doing :)

Of course, I can't claim to remember everything, I sometimes don't
even remember stuff *while* coding it.  However, I do understand this
part of the code, what I meant above was iov_to_buf() could fail or
copy lesser than what was asked of it.  It's a memcpy() right now, it
could change to something else.  Ignoring return values, esp. in copy
functions, is not good style, even if you know it can't fail.

  So I think the patch is correct
 still ;)

No doubt about that.  I never said otherwise.  I just feel we
shouldn't ignore return values.

Amit

Re: [Qemu-devel] [PATCH] acpi: beginnings of piix acpi interface doc

2012-03-12 Thread Gleb Natapov

On Mon, Mar 12, 2012 at 11:35:29AM +0200, Michael S. Tsirkin wrote:
 Before we start tweaking and enhancing hardware, I think
 it makes sense to document what we currently have, to make
 sure we stay compatible.
 This documents the hotplug interface for piix.
 Stubs for cpu hotplug, PM.
 
We already have docs/specs/acpi_pci_hotplug.txt, no?

 Signed-off-by: Michael S. Tsirkin m...@redhat.com
 ---
  docs/acpi.txt |   32 
  1 files changed, 32 insertions(+), 0 deletions(-)
  create mode 100644 docs/acpi.txt
 
 diff --git a/docs/acpi.txt b/docs/acpi.txt
 new file mode 100644
 index 000..4938d48
 --- /dev/null
 +++ b/docs/acpi.txt
 @@ -0,0 +1,32 @@
 +QEMU exposes the following registers to guests,
 +intended primarily for use by the ACPI interface.
 +
 +PCI Hotplug
 +
 +
 +Events use the standard GPE register:
 +GPE  0xafe0 - an ACPI GPE register
 +
 +Hotplug events set GPE bit 1 (mask 0x2)
 +
 +The following registers are used for PCI hotplug.
 +Each register is 32 bit (4 bytes) long, and has little endian format.
 +Bits 0-31 in each register correspond to slots 0-31 on the root bus,
 +respectively.
 +
 +UP   0xae00 - RO - Bit set by host on device insertion (note:existing 
 implementations
 +   trigger device check event)
 +DOWN 0xae04 - RO - Bit set by host on user eject request
 +EJ0  0xae08 - WO - Bit set by guest removes all power to device
 +RMV  0xae0c - RO - Bit set by host if slot supports hotplug
 +   (can not change while guest is up)
 +
 +
 +Power management
 +
 +TODO
 +
 +
 +CPU hotplug
 +
 +TODO
 -- 
 1.7.9.111.gf3fb0

--
Gleb.

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Hans de Goede


Hi,

On 03/12/2012 10:46 AM, Gerd Hoffmann wrote:

   Hi,


The problem with (b) is, that iirc the way b was implemented in the past
was still the big blob approach, but then pass the blob through the client,
which means an evil client could modify it, causing all sorts of
interesting
behavior inside spice-server. Since we're re-implementing this to me the
send a blob through the client approach is simply not acceptable from a
security pov, also see my previous mail in this thread.


Agree.  It should be a normal spice message which goes through the spice
marshaller for parsing  sanity checking.


I disagree. Note that there is more info to send over then just which
surfaces / images are cached by the client. There also is things like
partial complete agent channel messages, including how much bytes must
be read
to complete the command, etc.


Is there a complete list of the session state we need to save?



There is still code in spice-server for the old seamless migration,
someone could go through that and use that as an initial list of
session state we need to save.


IMHO (b) would only be acceptable if the data send through the client stops
becoming a blob.


Using (a) to send a blob isn't better.



It has the distinct advantage that we can trust the contents of the
blob which makes life significantly easier IMHO.


Instead the client could simply send a list of all
surface ids,
etc. which it has cached after it connects to / starts using the new
host. Note
that the old hosts needs to send nothing for this, this is info the
client already
has, also removing the need for synchronization.


Yes, some session state is known to the client anyway so we don't need a
source-  target channel for them.


As for certain other
data, such
as (but not limited to) partially parsed agent messages, these should be
send through the regular vmstate methods IMHO.


That isn't easy to handle via vmstate, at least as soon as this goes
beyond a fixed number of fields aka 'migrate over this struct for me'.
Think multiple spice clients connected at the same time.


1) Do (a), sending everything that way
2) Do (a) sending non client state that way; and
let the client send state like which surfaces it has cached
when the new session starts.


I think we have to look at each piece of state information needed by the
target and look how to handle this best.


Agreed, so for starts someone needs to make a list of all
session state we need to save.

Regards,

Hans

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Alon Levy

On Mon, Mar 12, 2012 at 11:26:50AM +0100, Gerd Hoffmann wrote:
   Hi,
 
  As for certain other
  data, such
  as (but not limited to) partially parsed agent messages, these should be
  send through the regular vmstate methods IMHO.
 
  That isn't easy to handle via vmstate, at least as soon as this goes
  beyond a fixed number of fields aka 'migrate over this struct for me'.
  Think multiple spice clients connected at the same time.
  
  Migrate this struct n times for me.
 
 I think for the agent case this isn't needed.  Or is every client
 allowed to speak to the agent in case of multiple clients connected?  I
 somehow doubt this can work as the agent protocol can't multicast ...
 

Actually the agent protocol does extend nicely to multiple clients - I
forgot the name but there is an additional wrapper between the
client/server originating message and the guest received message, that
is currently used for server or client originating messages, and can be
reused to have multiple in flight different client messages. We don't
use it for guest generated messages, but we could as well. Multicast
would be another number.

 cheers,
   Gerd

Re: [Qemu-devel] [PATCH] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Floris Bos / Maxnet


On 03/12/2012 11:26 AM, Stefan Hajnoczi wrote:

On Sat, Mar 10, 2012 at 7:56 PM, Floris Bosb...@je-eigen-domein.nl  wrote:

@@ -1885,6 +1885,22 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, 
IDEDriveKind kind,
 snprintf(s-drive_serial_str, sizeof(s-drive_serial_str),
  QM%05d, s-drive_serial);
 }
+if (model) {
+strncpy(s-drive_model_str, model, sizeof(s-drive_model_str));

strncpy(3) does not NUL-terminate if the max length is reached.
Either you need to use pstrcpy() or specify sizeof(s-drive_model_str)
- 1 and make sure s-drive_model_str[40] = '\0'.


Thanks for the feedback.

Will change that line (and serial that used strncpy() as well) to 
pstrcpy(), correct the cosmetic issues mentioned by Andreas and submit a 
v2 patch.


--
Yours sincerely,

Floris Bos

Re: [Qemu-devel] Add GSoC project ideas to the wiki!

2012-03-12 Thread Kevin Wolf

Am 24.02.2012 10:19, schrieb Stefan Hajnoczi:
 This is a reminder that QEMU will apply for Google Summer of Code 2012 and we
 need project ideas and mentors.  Libvirt and kvm.ko projects are also welcome!
 
 http://wiki.qemu.org/Google_Summer_of_Code_2012
 
 Please add yourself to the wiki now if you want to mentor a project
 this summer.  I will file our application next week.

Natalia, I saw that you copied a few items from last year's Wiki page to
the current one. But did you check that all of them are still valid? For
example, we do have EHCI and xHCI in the tree for quite a while now. I
don't think there's enough left for a GSoC project with these.

Kevin

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Gerd Hoffmann

On 03/12/12 12:29, Alon Levy wrote:
 On Mon, Mar 12, 2012 at 11:26:50AM +0100, Gerd Hoffmann wrote:
   Hi,

 Migrate this struct n times for me.

 I think for the agent case this isn't needed.  Or is every client
 allowed to speak to the agent in case of multiple clients connected?  I
 somehow doubt this can work as the agent protocol can't multicast ...

 
 Actually the agent protocol does extend nicely to multiple clients - I
 forgot the name but there is an additional wrapper between the
 client/server originating message and the guest received message, that
 is currently used for server or client originating messages, and can be
 reused to have multiple in flight different client messages.

I think you'll have issues in the layer above though.  Two spice clients
doing cut+paste operations at the same time?  Two spice clients
requesting different screen resolutions?

cheers,
  Gerd

Re: [Qemu-devel] [PATCH] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 12:30, schrieb Floris Bos / Maxnet:
 On 03/12/2012 11:26 AM, Stefan Hajnoczi wrote:
 On Sat, Mar 10, 2012 at 7:56 PM, Floris Bosb...@je-eigen-domein.nl  wrote:
 @@ -1885,6 +1885,22 @@ int ide_init_drive(IDEState *s, BlockDriverState 
 *bs, IDEDriveKind kind,
  snprintf(s-drive_serial_str, sizeof(s-drive_serial_str),
   QM%05d, s-drive_serial);
  }
 +if (model) {
 +strncpy(s-drive_model_str, model, sizeof(s-drive_model_str));
 strncpy(3) does not NUL-terminate if the max length is reached.
 Either you need to use pstrcpy() or specify sizeof(s-drive_model_str)
 - 1 and make sure s-drive_model_str[40] = '\0'.
 
 Thanks for the feedback.
 
 Will change that line (and serial that used strncpy() as well) to 
 pstrcpy(), correct the cosmetic issues mentioned by Andreas and submit a 
 v2 patch.

Fixing serial as well is a good idea. Please submit a separate patch for
that, though, as it is an independent fix.

Kevin

Re: [Qemu-devel] [PATCH] qemu-io: add option to enable tracing

2012-03-12 Thread Stefan Hajnoczi

On Wed, Dec 21, 2011 at 3:35 PM, Stefan Hajnoczi
stefa...@linux.vnet.ibm.com wrote:

Ping?  Do you want to take this through the block tree?

 It can be useful to enable QEMU tracing when trying out block layer
 interfaces via qemu-io.  Tracing can be enabled using the new -t FILE
 option where the given file contains a list of trace events to enable
 (just like the qemu --trace events=FILE option).

  $ echo qemu_vfree my-events
  $ ./qemu-io -t my-events ...

 Remember to use ./configure --enable-trace-backend=BACKEND when building
 qemu-io.

 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  qemu-io.c |    8 +++-
  1 files changed, 7 insertions(+), 1 deletions(-)

 diff --git a/qemu-io.c b/qemu-io.c
 index ffa62fb..ad91fd6 100644
 --- a/qemu-io.c
 +++ b/qemu-io.c
 @@ -17,6 +17,7 @@
  #include qemu-common.h
  #include block_int.h
  #include cmd.h
 +#include trace/control.h

  #define VERSION        0.0.1

 @@ -1722,6 +1723,7 @@ static void usage(const char *name)
    -g, --growable       allow file to grow (only applies to protocols)\n
    -m, --misalign       misalign allocations for O_DIRECT\n
    -k, --native-aio     use kernel AIO implementation (on Linux only)\n
 +  -t, --trace FILE     enable trace events listed in the given file\n
    -h, --help           display this help and exit\n
    -V, --version        output version information and exit\n
  \n,
 @@ -1733,7 +1735,7 @@ int main(int argc, char **argv)
  {
     int readonly = 0;
     int growable = 0;
 -    const char *sopt = hVc:rsnmgk;
 +    const char *sopt = hVc:rsnmgkt:;
     const struct option lopt[] = {
         { help, 0, NULL, 'h' },
         { version, 0, NULL, 'V' },
 @@ -1745,6 +1747,7 @@ int main(int argc, char **argv)
         { misalign, 0, NULL, 'm' },
         { growable, 0, NULL, 'g' },
         { native-aio, 0, NULL, 'k' },
 +        { trace, 1, NULL, 't' },
         { NULL, 0, NULL, 0 }
     };
     int c;
 @@ -1776,6 +1779,9 @@ int main(int argc, char **argv)
         case 'k':
             flags |= BDRV_O_NATIVE_AIO;
             break;
 +        case 't':
 +            trace_backend_init(optarg, NULL);
 +            break;
         case 'V':
             printf(%s version %s\n, progname, VERSION);
             exit(0);
 --
 1.7.7.3

Re: [Qemu-devel] [PATCH v2 1/5] qxl: switch qxl.c to trace-events

2012-03-12 Thread Alon Levy

On Mon, Mar 12, 2012 at 11:20:55AM +0100, Gerd Hoffmann wrote:
 On 03/11/12 20:26, Alon Levy wrote:
  dprint is still used for qxl_init_common one time prints.
 
 I think we shouldn't simply convert the dprintf's into trace-points.
 
 We should look at each dprintf and check whenever it makes sense at all,
 whenever it makes sense at that place before converting it over to a
 tracepoint.

I had two changes of heart about this. Initially I started mechanically
converting, then I realized it only makes sense for recurring events,
and things we want to see come out of the same output. But then I
noticed a lot of existing users do use it for as verbose usage as we do
(bh calls) and it is not meant as a stable interface to anyone - clearly
something that should fit the developer and user, and if the subsystem
changes then the events can change.

Bottom line I think having most of the dprints as trace_events makes
sense, and we can use consistent naming to make enabling/disabling them
easy for systemtap/stderr (with monitor trace-events command) easy.

I only left very few dprints.

 
  @@ -409,7 +410,7 @@ static void interface_attach_worker(QXLInstance *sin, 
  QXLWorker *qxl_worker)
   {
   PCIQXLDevice *qxl = container_of(sin, PCIQXLDevice, ssd.qxl);
   
  -dprint(qxl, 1, %s:\n, __FUNCTION__);
  +trace_qxl_interface_attach_worker();
   qxl-ssd.worker = qxl_worker;
   }
 
 For example: Do we really need that one?

I look at it the other way around - can it repeat? yes, it's a callback
from the spice server which we don't control. does it serve the
same purpose as the dprint? yes.

 
  @@ -505,9 +506,10 @@ static int interface_get_command(QXLInstance *sin, 
  struct QXLCommandExt *ext)
   QXLCommand *cmd;
   int notify, ret;
   
  +trace_qxl_interface_get_command_enter(qxl_mode_to_string(qxl-mode));
  +
 
 Why this?

Simplyfication of the dprints to avoid introducing an additional trace
event. We had a dprint for level 1 for both VGA and other modes, so I
moved it up. This trace is for each request from the server, as opposed
to the _ret that is for each returned command (much less frequent).

 
  -dprint(qxl, 1, %s: scheduling update_area_bh, #dirty %d\n,
  -   __func__, qxl-num_dirty_rects);
  +
  trace_qxl_interface_update_area_complete_schedule_bh(qxl-num_dirty_rects);
 
 I think it makes more sense to have the tracepoint in the bottom half
 handler instead.

Why instead? I could add another one at the bottom half.

 
   static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
   {
  -dprint(d, 1, %s: start%s\n, __FUNCTION__,
  -   loadvm ?  (loadvm) : );
  +trace_qxl_hard_reset_enter(loadvm);
   
   qxl_spice_reset_cursor(d);
   qxl_spice_reset_image_cache(d);
  @@ -934,7 +935,7 @@ static void qxl_hard_reset(PCIQXLDevice *d, int loadvm)
   qemu_spice_create_host_memslot(d-ssd);
   qxl_soft_reset(d);
   
  -dprint(d, 1, %s: done\n, __FUNCTION__);
  +trace_qxl_hard_reset_exit();
   }
 
 Do we need the exit tracepoint?

With systemtap I'd say the whole function could be traced, and that
would mean both entry and exit. But you can't do that with the tracing
framework, so this is the only way to have both.

If having both dprints makes no sense, I guess having both trace events
makes none too.

 
   static void qxl_reset_memslots(PCIQXLDevice *d)
   {
  -dprint(d, 1, %s:\n, __FUNCTION__);
  +trace_qxl_reset_memslots();
   qxl_spice_reset_memslots(d);
   memset(d-guest_slots, 0, sizeof(d-guest_slots));
   }
 
 Do we need that one?  qxl_hard_reset is the only caller of that function ...

Agree, I'll drop it.

But maybe I should add trace events for all the qxl_spice_* calls?

 
  @@ -1216,8 +1213,8 @@ static void ioport_write(void *opaque, 
  target_phys_addr_t addr,
   if (d-mode != QXL_MODE_VGA) {
   break;
   }
  -dprint(d, 1, %s: unexpected port 0x%x (%s) in vga mode\n,
  -__func__, io_port, io_port_to_string(io_port));
  +trace_qxl_io_unexpected_vga_mode(
  +io_port, io_port_to_string(io_port));
 
 We might want raise an error irq here, and have a tracepoint in
 qxl_guest_bug() of course ...

ok, I think I can add the tracepoint for qxl_guest_bug. Raise an error
irq I'll do in another patch.

 
   case QXL_IO_SET_MODE:
  -dprint(d, 1, QXL_SET_MODE %d\n, (int)val);
  +trace_qxl_io_set_mode(val);
   qxl_set_mode(d, val, 0);
 
 Needed?  There is a tracepoint in qxl_set_mode() ...

But if qxl_set_mode can be called from multiple places it isn't
equivalent.

 
   case QXL_IO_RESET:
  -dprint(d, 1, QXL_IO_RESET\n);
  +trace_qxl_io_reset();
   qxl_hard_reset(d, 0);
 
 ... likewise ...

Same answer.

 
   break;
   case QXL_IO_MEMSLOT_ADD:
  @@ -1337,7 +1334,7 @@ async_common:
 async);
   goto cancel_async;
   }
  -dprint(d, 1,

Re: [Qemu-devel] [PATCH] virtio-serial-bus: use correct lengths in control_out() message

2012-03-12 Thread Michael Tokarev

On 12.03.2012 15:06, Amit Shah wrote:
 On (Mon) 12 Mar 2012 [13:22:22], Michael Tokarev wrote:
 On 12.03.2012 12:59, Amit Shah wrote:
 On (Sun) 11 Mar 2012 [17:52:59], Michael Tokarev wrote:
 In case of more than one control message, the code will use
 size of the largest message so far for all subsequent messages,
 instead of using size of current one.  Fix it.

 Makes sense.  How did you detect this?  Any reproducible test-case?

 There's no test-case, and no detection, just reading the code.
 Actually, I think, there's no bug here, but a very, well,
 difficult to read code.
 
 Do you mean this code is difficult to read, or in general?  Any ideas
 to make it simpler (or at least details on what's difficult?)

Just difficult to understand, and just this particular (very minor!)
place.

We got one thing, we requested to copy another, and we handle
3rd which is something else.  While actually we are supposed
to get, request and handle the _same_, or else we're doomed.

[]
   It's a memcpy() right now, it
 could change to something else.  Ignoring return values, esp. in copy
 functions, is not good style, even if you know it can't fail.

So that's the reason why the return value should be void, and
that the code should always request as many bytes as it actually
needs, and there must be some assert()s to ensure we're not
outside of something.

  So I think the patch is correct
 still ;)
 
 No doubt about that.  I never said otherwise.  I just feel we
 shouldn't ignore return values.

By _not_ ignoring return value in something like that is not far
away from checking if 1 is still equal to 1 after each instruction :)
I want to change this interface a bit, to be more obvious, to
stop all similar discussions and doubts.  It will handle the
requested amount and will abort() internally if it can't, so
we can stop bothering ignoring the return value, provided that
we actually request what we _want_ it to do, not what we have.
In this particular case, the 'size' argument of iov_from_buf()
should be 'bytes' or 'len', -- actual amount of bytes we need
to process, not size of the buffer we have in our disposal.

(For the iov_* family, we've another set, qemu_iovec_*, and
also qemu_sendv() Co, each of which have different and not
obvious at all interface :)

Thanks!

Re: [Qemu-devel] [PATCH] qemu-io: add option to enable tracing

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 12:39, schrieb Stefan Hajnoczi:
 On Wed, Dec 21, 2011 at 3:35 PM, Stefan Hajnoczi
 stefa...@linux.vnet.ibm.com wrote:
 
 Ping?  Do you want to take this through the block tree?

Sorry, I completely missed this one.

 It can be useful to enable QEMU tracing when trying out block layer
 interfaces via qemu-io.  Tracing can be enabled using the new -t FILE
 option where the given file contains a list of trace events to enable
 (just like the qemu --trace events=FILE option).

  $ echo qemu_vfree my-events
  $ ./qemu-io -t my-events ...

 Remember to use ./configure --enable-trace-backend=BACKEND when building
 qemu-io.

 Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 ---
  qemu-io.c |8 +++-
  1 files changed, 7 insertions(+), 1 deletions(-)

 diff --git a/qemu-io.c b/qemu-io.c
 index ffa62fb..ad91fd6 100644
 --- a/qemu-io.c
 +++ b/qemu-io.c
 @@ -17,6 +17,7 @@
  #include qemu-common.h
  #include block_int.h
  #include cmd.h
 +#include trace/control.h

  #define VERSION0.0.1

 @@ -1722,6 +1723,7 @@ static void usage(const char *name)
-g, --growable   allow file to grow (only applies to protocols)\n
-m, --misalign   misalign allocations for O_DIRECT\n
-k, --native-aio use kernel AIO implementation (on Linux only)\n
 +  -t, --trace FILE enable trace events listed in the given file\n
-h, --help   display this help and exit\n
-V, --versionoutput version information and exit\n
  \n,
 @@ -1733,7 +1735,7 @@ int main(int argc, char **argv)
  {
 int readonly = 0;
 int growable = 0;
 -const char *sopt = hVc:rsnmgk;
 +const char *sopt = hVc:rsnmgkt:;
 const struct option lopt[] = {
 { help, 0, NULL, 'h' },
 { version, 0, NULL, 'V' },
 @@ -1745,6 +1747,7 @@ int main(int argc, char **argv)
 { misalign, 0, NULL, 'm' },
 { growable, 0, NULL, 'g' },
 { native-aio, 0, NULL, 'k' },
 +{ trace, 1, NULL, 't' },
 { NULL, 0, NULL, 0 }
 };
 int c;
 @@ -1776,6 +1779,9 @@ int main(int argc, char **argv)
 case 'k':
 flags |= BDRV_O_NATIVE_AIO;
 break;
 +case 't':
 +trace_backend_init(optarg, NULL);
 +break;

vl.c checks the return value of trace_backend_init. Shouldn't we do the
same here?

Also, I was considering adding a -t for the cache mode (option name for
consistency with qemu-img). Conversely, we'll probably want to add a
tracing option to qemu-img and -t isn't available any more there. Maybe
we should use a different letter?

Kevin

Re: [Qemu-devel] [PATCH] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Kevin Wolf

Am 10.03.2012 20:56, schrieb Floris Bos:
 Some Linux distributions use the 
 /dev/disk/by-id/scsi-SATA_name-of-disk-model_serial addressing scheme
 when refering to partitions in /etc/fstab and elsewhere.
 This causes problems when starting a disk image taken from an existing 
 physical server under qemu,
 because when running under qemu name-of-disk-model is always QEMU HARDDISK
 This patch introduces a model=s option which in combination with the existing 
 serial=s option can be used to
 fake the disk the operating system was previously on, allowing the OS to boot 
 properly.
 
 Cc: kw...@redhat.com
 Signed-off-by: Floris Bos d...@noc-ps.com

Oh, and now that I look at the actual patch, do we really want to add
the model=... option to -drive? I think just the qdev property may be
enough. When you need to set the option, you would need to use -device,
but it's not that hard.

Kevin

Re: [Qemu-devel] [PATCH 2/2] memory: print aliased IO ranges in info mtree

2012-03-12 Thread Avi Kivity

On 03/11/2012 01:00 PM, Blue Swirl wrote:
 Print also I/O ports behind bridges and other aliases.

 Signed-off-by: Blue Swirl blauwir...@gmail.com
 ---
  memory.c |   13 +
  1 files changed, 13 insertions(+), 0 deletions(-)

 diff --git a/memory.c b/memory.c
 index 4c3dc49..0201392 100644
 --- a/memory.c
 +++ b/memory.c
 @@ -1639,7 +1639,20 @@ void mtree_info(fprintf_function mon_printf, void *f)
  if (address_space_io.root 
  !QTAILQ_EMPTY(address_space_io.root-subregions)) {
  QTAILQ_INIT(ml_head);
 +
  mon_printf(f, I/O\n);
  mtree_print_mr(mon_printf, f, address_space_io.root, 0, 0, ml_head);
 +
 +/* print aliased I/O regions */
 +QTAILQ_FOREACH(ml, ml_head, queue) {
 +if (!ml-printed) {
 +mon_printf(f, %s\n, ml-mr-name);
 +mtree_print_mr(mon_printf, f, ml-mr, 0, 0, ml_head);
 +}
 +}
 +}
 +
 +QTAILQ_FOREACH_SAFE(ml, ml_head, queue, ml2) {
 +g_free(ml);
  }
  }

This is just duplication.  A little refactoring would eliminate it.

Also, we might have an alias from the memory address space and an alias
from the I/O address space pointing into the same region.  So maybe
alias printing should be consolidated for all address spaces.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [RFC PATCH 07/17] block: make high level discard operation always zero

2012-03-12 Thread Avi Kivity

On 03/12/2012 01:04 PM, Kevin Wolf wrote:
 
  qcow2 can't handle clusters that are referenced twice from the same L1
  table. This would require a reverse lookup to adjust the QCOW_O_COPIED
  flags in the L2 tables containing the other references.
  
  Don't follow, sorry.  What adjustment are you talking about?
  
  If it's a 1-0 transition, is it mandatory to adjust the flag?  That is,
  it it legal to have a refcount of exactly one, but have the flag clear?

 According to the spec it's illegal and qemu-img check will complain,
 too. In practice, I'm not entirely sure if it will cause real corruption
 or just unnecessary COWs. I believe it may be the latter.

We could retro-doc it but I'm not a huge fan of this practice.

 But it's not worth the trouble anyway when we can have a real zero flag
 in qcow3.

ok.  Thanks for the explanations.

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH v2] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Floris Bos

Some Linux distributions use the 
/dev/disk/by-id/scsi-SATA_name-of-disk-model_serial addressing scheme
when refering to partitions in /etc/fstab and elsewhere.
This causes problems when starting a disk image taken from an existing physical 
server under qemu,
because when running under qemu name-of-disk-model is always QEMU HARDDISK
This patch introduces a model=s option which in combination with the existing 
serial=s option can be used to
fake the disk the operating system was previously on, allowing the OS to boot 
properly.

Cc: kw...@redhat.com
Signed-off-by: Floris Bos d...@noc-ps.com
---
 blockdev.c|5 +
 blockdev.h|2 ++
 hw/ide/core.c |   27 ++-
 hw/ide/internal.h |4 +++-
 hw/ide/qdev.c |   18 --
 qemu-config.c |4 
 qemu-options.hx   |4 +++-
 7 files changed, 55 insertions(+), 9 deletions(-)

diff --git a/blockdev.c b/blockdev.c
index d78aa51..3df9cb9 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -277,6 +277,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 const char *file = NULL;
 char devname[128];
 const char *serial;
+const char *model;
 const char *mediastr = ;
 BlockInterfaceType type;
 enum { MEDIA_DISK, MEDIA_CDROM } media;
@@ -313,6 +314,7 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 
 file = qemu_opt_get(opts, file);
 serial = qemu_opt_get(opts, serial);
+model = qemu_opt_get(opts, model);
 
 if ((buf = qemu_opt_get(opts, if)) != NULL) {
 pstrcpy(devname, sizeof(devname), buf);
@@ -534,6 +536,9 @@ DriveInfo *drive_init(QemuOpts *opts, int default_to_scsi)
 dinfo-refcount = 1;
 if (serial)
 strncpy(dinfo-serial, serial, sizeof(dinfo-serial) - 1);
+if (model) {
+pstrcpy(dinfo-model, sizeof(dinfo-model), model);
+}
 QTAILQ_INSERT_TAIL(drives, dinfo, next);
 
 bdrv_set_on_error(dinfo-bdrv, on_read_error, on_write_error);
diff --git a/blockdev.h b/blockdev.h
index 260e16b..21eb4b5 100644
--- a/blockdev.h
+++ b/blockdev.h
@@ -18,6 +18,7 @@ void blockdev_mark_auto_del(BlockDriverState *bs);
 void blockdev_auto_del(BlockDriverState *bs);
 
 #define BLOCK_SERIAL_STRLEN 20
+#define BLOCK_MODEL_STRLEN 40
 
 typedef enum {
 IF_DEFAULT = -1,/* for use with drive_add() only */
@@ -37,6 +38,7 @@ struct DriveInfo {
 int media_cd;
 QemuOpts *opts;
 char serial[BLOCK_SERIAL_STRLEN + 1];
+char model[BLOCK_MODEL_STRLEN + 1];
 QTAILQ_ENTRY(DriveInfo) next;
 int refcount;
 };
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 4d568ac..7bd2810 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -101,7 +101,7 @@ static void ide_identify(IDEState *s)
 put_le16(p + 21, 512); /* cache size in sectors */
 put_le16(p + 22, 4); /* ecc bytes */
 padstr((char *)(p + 23), s-version, 8); /* firmware version */
-padstr((char *)(p + 27), QEMU HARDDISK, 40); /* model */
+padstr((char *)(p + 27), s-drive_model_str, 40); /* model */
 #if MAX_MULT_SECTORS  1
 put_le16(p + 47, 0x8000 | MAX_MULT_SECTORS);
 #endif
@@ -189,7 +189,7 @@ static void ide_atapi_identify(IDEState *s)
 put_le16(p + 21, 512); /* cache size in sectors */
 put_le16(p + 22, 4); /* ecc bytes */
 padstr((char *)(p + 23), s-version, 8); /* firmware version */
-padstr((char *)(p + 27), QEMU DVD-ROM, 40); /* model */
+padstr((char *)(p + 27), s-drive_model_str, 40); /* model */
 put_le16(p + 48, 1); /* dword I/O (XXX: should not be set on CDROM) */
 #ifdef USE_DMA_CDROM
 put_le16(p + 49, 1  9 | 1  8); /* DMA and LBA supported */
@@ -246,7 +246,7 @@ static void ide_cfata_identify(IDEState *s)
 padstr((char *)(p + 10), s-drive_serial_str, 20); /* serial number */
 put_le16(p + 22, 0x0004);  /* ECC bytes */
 padstr((char *) (p + 23), s-version, 8);  /* Firmware Revision */
-padstr((char *) (p + 27), QEMU MICRODRIVE, 40);/* Model number */
+padstr((char *) (p + 27), s-drive_model_str, 40);/* Model number */
 #if MAX_MULT_SECTORS  1
 put_le16(p + 47, 0x8000 | MAX_MULT_SECTORS);
 #else
@@ -1834,7 +1834,7 @@ static const BlockDevOps ide_cd_block_ops = {
 };
 
 int ide_init_drive(IDEState *s, BlockDriverState *bs, IDEDriveKind kind,
-   const char *version, const char *serial)
+   const char *version, const char *serial, const char *model)
 {
 int cylinders, heads, secs;
 uint64_t nb_sectors;
@@ -1885,6 +1885,22 @@ int ide_init_drive(IDEState *s, BlockDriverState *bs, 
IDEDriveKind kind,
 snprintf(s-drive_serial_str, sizeof(s-drive_serial_str),
  QM%05d, s-drive_serial);
 }
+if (model) {
+pstrcpy(s-drive_model_str, sizeof(s-drive_model_str), model);
+} else {
+switch (kind) {
+case IDE_CD:
+strcpy(s-drive_model_str, QEMU DVD-ROM);
+break;
+case IDE_CFATA:
+strcpy(s-drive_model_str, QEMU

Re: [Qemu-devel] [PATCH] kvm: add flightrecorder script

2012-03-12 Thread Avi Kivity

On 03/09/2012 04:13 PM, Stefan Hajnoczi wrote:
 The kvm kernel module includes a number of trace events which can be
 useful when debugging system behavior.  Even on production systems these
 trace events can be used to observe guest behavior and identify the
 source of problems.

 The kvm_flightrecorder script is a command-line wrapper for the
 /sys/kernel/debug/tracing interface.  Kernel symbols do not need to be
 installed.

 This script captures a fixed-size buffer of KVM trace events.  Recent
 events overwrite the oldest events when the buffer size is exceeded and
 it is possible to leave KVM tracing enabled for any period of time with
 just a fixed-size buffer.  If the buffer is large enough this script is
 a useful tool for collecting detailed information after an issue occurs
 with a guest.  Hence the name flight recorder.

 The script can also be used in 'tail' mode to simply view KVM trace
 events as they occur.  This is handy for development and to ensure that
 the guest is indeed running.



Have you considered updating trace-cmd or perf-something instead?  While
each of these tools are useful we miss out on combining capabilities. 
For example you could 'perf probe' a dynamic tracepoint and flightrecord
it together with kvm tracepoints.

-- 
error compiling committee.c: too many arguments to function

Re: [Qemu-devel] [PATCH RESEND v5 3/6] Set runstate to INMIGRATE earlier

2012-03-12 Thread Stefano Stabellini

On Wed, 7 Mar 2012, Luiz Capitulino wrote:
 On Tue, 28 Feb 2012 15:51:32 +
 Stefano Stabellini stefano.stabell...@eu.citrix.com wrote:
 
  Set runstate to RUN_STATE_INMIGRATE as soon as we can on resume.
  
  Signed-off-by: Stefano Stabellini stefano.stabell...@eu.citrix.com
 
 Would be nice to explain why, but I suggested this myself so:
 
  Acked-by: Luiz Capitulino lcapitul...@redhat.com

Thanks!
It would be nice if somebody took the time to review patch #1 and #2 too
considering that they have been sitting there for a while...

Re: [Qemu-devel] [PATCH] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Floris Bos / Maxnet


On 03/12/2012 12:57 PM, Kevin Wolf wrote:

Am 10.03.2012 20:56, schrieb Floris Bos:

Some Linux distributions use the 
/dev/disk/by-id/scsi-SATA_name-of-disk-model_serial addressing scheme
when refering to partitions in /etc/fstab and elsewhere.
This causes problems when starting a disk image taken from an existing physical 
server under qemu,
because when running under qemu name-of-disk-model is always QEMU HARDDISK
This patch introduces a model=s option which in combination with the existing 
serial=s option can be used to
fake the disk the operating system was previously on, allowing the OS to boot 
properly.

Cc: kw...@redhat.com
Signed-off-by: Floris Bosd...@noc-ps.com

Oh, and now that I look at the actual patch, do we really want to add
the model=... option to -drive? I think just the qdev property may be
enough. When you need to set the option, you would need to use -device,
but it's not that hard.


Well, to me it makes sense to put it at the same places serial is, as 
both options have quite similar functions: faking drive attributes.


--
Yours sincerely,

Floris Bos

Re: [Qemu-devel] [PATCH v1 01/12] net: Add a hub net client

2012-03-12 Thread Zhi Yong Wu

On Mon, Mar 12, 2012 at 5:12 PM, Paolo Bonzini pbonz...@redhat.com wrote:
 Il 12/03/2012 09:59, Zhi Yong Wu ha scritto:
 However, then I noticed that qemu_can_send_packet is not called very
 much, and I do not understand why qemu_net_queue_send and
 qemu_net_queue_send_iov do not call qemu_can_send_packet before calling
 deliver/deliver_iov.

 If they did, hubs could then do their own flow control via can_receive.
 When qemu_send_packet returns zero you increment a count of in-flight
 packets, and a sent-packet callback would decrement the same count.

 sent-packet callback is sent_cb here? i noticed that sent_cb is currently 
 NULL.

 Yes.

 When the count is non-zero, can_receive returns false (and vice versa).
 Can you elaborate this? since can_receive is called finally by 
 qemu_send_packet,
 how can can_receive return false or true based on the count?

 Based on counts from the previous sends.
thanks.

I agree with your opition, but hope to get stefan's comments.

 Paolo



-- 
Regards,

Zhi Yong Wu

Re: [Qemu-devel] [PATCH 2/2] tracetool: Forbid argument name 'next'

2012-03-12 Thread Lluís Vilanova

Stefan Hajnoczi writes:

 On Mon, Mar 12, 2012 at 9:34 AM, Kevin Wolf kw...@redhat.com wrote:
 It has happened more than once that patches that look perfectly sane
 and work with simpletrace broke systemtap because they use 'next' as an
 argument name for a tracing function. However, 'next' is a keyword for
 systemtap, so we shouldn't use it.
 
 Signed-off-by: Kevin Wolf kw...@redhat.com
 ---
  scripts/tracetool |    4 
  1 files changed, 4 insertions(+), 0 deletions(-)
 
 diff --git a/scripts/tracetool b/scripts/tracetool
 index 4c9951d..f892af4 100755
 --- a/scripts/tracetool
 +++ b/scripts/tracetool
 @@ -81,6 +81,10 @@ get_args()
     args=${1#*\(}
     args=${args%%\)*}
     echo $args
 +
 +    if (echo $args | grep [ *]next\($\|[, ]\)  /dev/null 21); then
 +        echo -e \n#error 'next' is a bad argument name (clash with 
 systemtap keyword)\n 
 +    fi

 Good idea, let's prevent it from being used.

 I don't think this is the way to do it because callers will parse
 stdout and we're not guaranteed to be generating C code where #error
 works.  Instead, we can echo to stderr and do exit 1.

I'd rather wait for the python version of tracetool to be integrated, so that
less patches have to be rebased.

In addition, there's a nice 'error' routine to handle this type of cases.


Lluis

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Gerd Hoffmann

  Hi,

 Is there a complete list of the session state we need to save?

 
 There is still code in spice-server for the old seamless migration,
 someone could go through that and use that as an initial list of
 session state we need to save.

That doesn't help much as it is _way_ too old.  Predates surfaces  wan
support which needs additional state.  Predates smardcard and usb too.
Also agent didn't have bulky stuff (cut+paste) back then, so chances are
not bad that just not saving any agent state works in 99.99% of the
cases just fine, so I'm not sure this is handled at all.

Also some bits are are not needed any more.  Transfering the ticket has
been offloaded that to management.  QXL device handles some bits which
used to be transfered with the spice-server state (mouse pointer shape).

 Agreed, so for starts someone needs to make a list of all
 session state we need to save.

Here is what I'm aware of:

  * session id (needed when sending state via vmstate, I think we
don't need it when sending state via spice client).
  * surfaces known to the client (can also be negotiated between
client and target directly).
  * surface state (lossy vs. lossless quality if wan support is
enabled).  Dunno whenever the client knows this.
  * glz compression dictionary state (not sure what exactly is
transfered here and why).
  * vmchannel state (agent, smartcard, usb).

agent is tricky because spice-server needs to maintain state there
because of the message multiplexing.  A fixed number of fields and maybe
a VD_AGENT_MAX_DATA_SIZE-sized buffer could work for that though.

smartcard + usb: This is just pass-through for spice-server, right?
There shouldn't be anything to save, except maybe for stuff buffered in
spice-server.  Is there any?  I mean really in spice-server, migrating
spice-qemu-char.c buffers via vmstate is not a big issue.

cheers,
  Gerd

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations

2012-03-12 Thread Paolo Bonzini

Il 10/03/2012 19:02, Richard Laager ha scritto:
 I propose adding the following behaviors in any event:
   * If a QEMU block device reports a discard_granularity  0, it
 must be equal to 2^n (n = 0), or QEMU's block core will change
 it to 0. (Non-power-of-two granularities are not likely to exist
 in the real world, and this assumption greatly simplifies
 ensuring correctness.)

Yeah, I was considering this to be simply a bug in the block device.

   * For SCSI, report an unmap_granularity to the guest as follows:
   max(logical_block_size, discard_granularity) / logical_block_size

This is more or less already in place later in the series.

 As a design concept, instead of guaranteeing that 512B zero'ing discards
 are supported, I think the QEMU block layer should instead guarantee
 aligned discards to QEMU block devices, emulating any misaligned
 discards (or portions thereof) by writing zeroes if (and only if)
 discard_zeros_data is set.

Yes, this can be done of course.  This series does not include it yet.

 This leaves one remaining issue: In raw-posix.c, for files (i.e. not
 devices), I assume you're going to advertise discard_granularity=1 and
 discard_zeros_data=1 when compiled with support for
 fallocate(FALLOC_FL_PUNCH_HOLE). Note, I'm assuming fallocate() actually
 guarantees that it zeros the data when punching holes.

It does, that's pretty much the definition of a hole.

 If the guest does a big discard (think mkfs) and fallocate() returns
 EOPNOTSUPP, you'll have to zero essentially the whole virtual disk,
 which, as you noted, will also allocate it (unless you explicitly check
 for holes). This is bad. It can be avoided by not advertising
 discard_zeros_data, but as you noted, that's unfortunate.

If you have a new kernel that supports SEEK_HOLE/SEEK_DATA, it can also
be done by skipping the zero write on known holes.

This could even be done at the block layer level using bdrv_is_allocated.

 If we could probe for FALLOC_FL_PUNCH_HOLE support, then we could avoid
 advertising discard support based on FALLOC_FL_PUNCH_HOLE when it is not
 going to work. This would side step these problems. 

... and introduce others when migrating if your datacenter doesn't have
homogeneous kernel versions and/or filesystems. :(

 You said it wasn't
 possible to probe for FALLOC_FL_PUNCH_HOLE. Have you considered probing
 by extending the file by one byte and then punching that:
 char buf = 0;
 fstat(s-fd, st);
 pwrite(s-fd, buf, 1, st.st_size + 1);
 has_discard = !fallocate(s-fd, FALLOC_FL_PUNCH_HOLE | 
 FALLOC_FL_KEEP_SIZE,
  st.st_size + 1, 1);
 ftruncate(s-fd, st.st_size);

Nice trick. :)   Yes, that could work.

Do you know if non-Linux operating systems have something similar to
BLKDISCARDZEROES?

Paolo

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Alon Levy

On Mon, Mar 12, 2012 at 12:34:42PM +0100, Gerd Hoffmann wrote:
 On 03/12/12 12:29, Alon Levy wrote:
  On Mon, Mar 12, 2012 at 11:26:50AM +0100, Gerd Hoffmann wrote:
Hi,
 
  Migrate this struct n times for me.
 
  I think for the agent case this isn't needed.  Or is every client
  allowed to speak to the agent in case of multiple clients connected?  I
  somehow doubt this can work as the agent protocol can't multicast ...
 
  
  Actually the agent protocol does extend nicely to multiple clients - I
  forgot the name but there is an additional wrapper between the
  client/server originating message and the guest received message, that
  is currently used for server or client originating messages, and can be
  reused to have multiple in flight different client messages.
 
 I think you'll have issues in the layer above though.  Two spice clients
 doing cut+paste operations at the same time?  Two spice clients
 requesting different screen resolutions?

Yeah, you're right of course, this needs to be dealt with somehow.
 cut+paste: maps nicely to a number of different buffers. Would need
 some policy, and the session agent becomes closer to a buffer manager.
 resolutions: again policy, perhaps have a master client, or if none
 defined let the last or just the first choose. Not sure.

But these issues don't need to be solved now, do they?

 
 cheers,
   Gerd
 
 ___
 Spice-devel mailing list
 spice-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/spice-devel

[Qemu-devel] [PATCH 7/7] vga: add trace event for ppm_save

2012-03-12 Thread Stefan Hajnoczi

From: Alon Levy al...@redhat.com

Signed-off-by: Alon Levy al...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 hw/vga.c |2 ++
 trace-events |3 +++
 2 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/hw/vga.c b/hw/vga.c
index 5994f43..6dc98f6 100644
--- a/hw/vga.c
+++ b/hw/vga.c
@@ -30,6 +30,7 @@
 #include pixel_ops.h
 #include qemu-timer.h
 #include xen.h
+#include trace.h
 
 //#define DEBUG_VGA
 //#define DEBUG_VGA_MEM
@@ -2372,6 +2373,7 @@ int ppm_save(const char *filename, struct DisplaySurface 
*ds)
 int ret;
 char *linebuf, *pbuf;
 
+trace_ppm_save(filename, ds);
 f = fopen(filename, wb);
 if (!f)
 return -1;
diff --git a/trace-events b/trace-events
index 94c4a6f..dfe28ed 100644
--- a/trace-events
+++ b/trace-events
@@ -662,3 +662,6 @@ dma_map_wait(void *dbs) dbs=%p
 # console.h
 displaysurface_free(void *display_state, void *display_surface) state=%p 
surface=%p
 displaysurface_resize(void *display_state, void *display_surface, int width, 
int height) state=%p surface=%p %dx%d
+
+# vga.c
+ppm_save(const char *filename, void *display_surface) %s surface=%p
-- 
1.7.9.1

[Qemu-devel] [PATCH 3/7] trace: make trace_thread_create() use its function arg

2012-03-12 Thread Stefan Hajnoczi

From: Jun Koi junkoi2...@gmail.com

This patch makes trace_thread_create() to use its function arg to
initialize thread.  The other choice is to make this a function to use
void arg, but i prefer this way.

Signed-off-by: Jun Koi junkoi2...@gmail.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 trace/simple.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/trace/simple.c b/trace/simple.c
index bbc9930..33ae486 100644
--- a/trace/simple.c
+++ b/trace/simple.c
@@ -363,7 +363,7 @@ static GThread *trace_thread_create(GThreadFunc fn)
 sigfillset(set);
 pthread_sigmask(SIG_SETMASK, set, oldset);
 #endif
-thread = g_thread_create(writeout_thread, NULL, FALSE, NULL);
+thread = g_thread_create(fn, NULL, FALSE, NULL);
 #ifndef _WIN32
 pthread_sigmask(SIG_SETMASK, oldset, NULL);
 #endif
-- 
1.7.9.1

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread David Jaša

Hans de Goede píše v Po 12. 03. 2012 v 09:51 +0100:
 Hi,
 
 On 03/12/2012 08:57 AM, Gerd Hoffmann wrote:
 Hi,
 
  What about the second part? it's independant of the async issue.
 
  Isn't this a client problem?  The client has this state, no?
 
  It is state of the client-  server session.  Today spice client
  creates a new session on migration, so there is simply no need to
  maintain any state.  Drawback is that everything needs to be resent from
  the server to the client.  Thats why we want be able to continue the
  spice session, so the client caches will stay valid.
 
  Of course the spice-server on the migration target needs the session
  state for that, i.e. know for example which bits the client has cached
  which it hasn't.
 
  If the state is stored in the server, wouldn't it be marshaled as part
  of the server's migration state?
 
  spice-server is stateless today when it comes to migration.  QXL handles
  all (device) state, by keeping track of some commands (such as
  create/destroy surface) which it needs to transfer on migration, and by
  asking spice-server to render all surfaces on migration, which
  effectively flushes the spice server state to qxl device memory.
 
  To transfer the client session state there are basically two options:
 
 (a) transfer it as part of the qemu migration data stream.  I don't
 want have any details about the qemu migration implementation
 and/or protocol in the spice-server library api, which basically
 leaves a ugly transfer-this-blob-for-me-please style interface
 as only option.
 
 (b) transfer it as part of the spice protocol.  As the spice
 client has a connection to both source and target while the
 migration runs we can send session state from the source host via
 spice client to the target host.  This needs some form of
 synchronization, to make sure both vmstate and spice migration
 are completed when qemu on the source machine quits.
 
 The problem with (b) is, that iirc the way b was implemented in the past
 was still the big blob approach, but then pass the blob through the client,
 which means an evil client could modify it, causing all sorts of interesting
 behavior inside spice-server. Since we're re-implementing this to me the
 send a blob through the client approach is simply not acceptable from a
 security pov, also see my previous mail in this thread.
 

In addition to security POV, it's also bad from network usage POV -
while network connection among hosts is gigabit or better, client may be
connected over high-latency low-bandwidth WAN. Sending any data through
client makes absolutely no sense in such cases.

David

  I think (b) is the better approach.
 
 I disagree. Note that there is more info to send over then just which
 surfaces / images are cached by the client. There also is things like
 partial complete agent channel messages, including how much bytes must be read
 to complete the command, etc.
 
 IMHO (b) would only be acceptable if the data send through the client stops
 becoming a blob. Instead the client could simply send a list of all surface 
 ids,
 etc. which it has cached after it connects to / starts using the new host. 
 Note
 that the old hosts needs to send nothing for this, this is info the client 
 already
 has, also removing the need for synchronization. As for certain other data, 
 such
 as (but not limited to) partially parsed agent messages, these should be
 send through the regular vmstate methods IMHO.
 
 So I see 2 options
 
 1) Do (a), sending everything that way
 2) Do (a) sending non client state that way; and
 let the client send state like which surfaces it has cached
 when the new session starts.
 
 Regards,
 
 Hans
 ___
 Spice-devel mailing list
 spice-de...@lists.freedesktop.org
 http://lists.freedesktop.org/mailman/listinfo/spice-devel

-- 

David Jaša, RHCE

SPICE QE based in Brno
GPG Key: 22C33E24 
Fingerprint: 513A 060B D1B4 2A72 7F0D 0278 B125 CD00 22C3 3E24

[Qemu-devel] KVM call agenda for Tuesday 13

2012-03-12 Thread Juan Quintela


Hi

Please send in any agenda items you are interested in covering.

Cheers,

Juan.

Re: [Qemu-devel] [PATCH RESEND v5 1/6] cirrus_vga: do not reset videoram

2012-03-12 Thread Avi Kivity

On 02/28/2012 05:51 PM, Stefano Stabellini wrote:
 There is no need to set the videoram to 0xff in cirrus_reset, because it
 is the BIOS' job.


Reviewed-by: Avi Kivity a...@redhat.com

-- 
error compiling committee.c: too many arguments to function

[Qemu-devel] [PATCH 4/7] docs: correct ./configure line in tracing.txt

2012-03-12 Thread Stefan Hajnoczi

From: Jun Koi junkoi2...@gmail.com

This patch corrects the configure's trace option in docs/tracing.txt.

Signed-off-by: Jun Koi junkoi2...@gmail.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 docs/tracing.txt |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/docs/tracing.txt b/docs/tracing.txt
index a92716f..c541133 100644
--- a/docs/tracing.txt
+++ b/docs/tracing.txt
@@ -9,7 +9,7 @@ for debugging, profiling, and observing execution.
 
 1. Build with the 'simple' trace backend:
 
-./configure --trace-backend=simple
+./configure --enable-trace-backend=simple
 make
 
 2. Create a file with the events you want to trace:
-- 
1.7.9.1

Re: [Qemu-devel] [Spice-devel] seamless migration with spice

2012-03-12 Thread Yonit Halperin


On 03/12/2012 11:46 AM, Gerd Hoffmann wrote:

   Hi,


The problem with (b) is, that iirc the way b was implemented in the past
was still the big blob approach, but then pass the blob through the client,
which means an evil client could modify it, causing all sorts of
interesting
behavior inside spice-server. Since we're re-implementing this to me the
send a blob through the client approach is simply not acceptable from a
security pov, also see my previous mail in this thread.


Agree.  It should be a normal spice message which goes through the spice
marshaller for parsing  sanity checking.


I disagree. Note that there is more info to send over then just which
surfaces / images are cached by the client. There also is things like
partial complete agent channel messages, including how much bytes must
be read
to complete the command, etc.


Is there a complete list of the session state we need to save?


IMHO (b) would only be acceptable if the data send through the client stops
becoming a blob.


Using (a) to send a blob isn't better.


Gerd/Hans,

Can you explain/exemplify, why sending data as a blob (either by (a) or 
(b)), that is verified only by the two ends that actually use it, is a 
problem?
Lets say the client/qemu are completely aware of the migration data, 
what prevent it from harming it then?



Instead the client could simply send a list of all
surface ids,
etc. which it has cached after it connects to / starts using the new
host. Note
that the old hosts needs to send nothing for this, this is info the
client already
has, also removing the need for synchronization.


Yes, some session state is known to the client anyway so we don't need a
source-  target channel for them.


As for certain other
data, such
as (but not limited to) partially parsed agent messages, these should be
send through the regular vmstate methods IMHO.


That isn't easy to handle via vmstate, at least as soon as this goes
beyond a fixed number of fields aka 'migrate over this struct for me'.
Think multiple spice clients connected at the same time.


1) Do (a), sending everything that way
2) Do (a) sending non client state that way; and
let the client send state like which surfaces it has cached
when the new session starts.


I think we have to look at each piece of state information needed by the
target and look how to handle this best.

cheers,
   Gerd

___
Spice-devel mailing list
spice-de...@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/spice-devel

[Qemu-devel] [PATCH 5/7] maintainers: Add docs/tracing.txt to Tracing

2012-03-12 Thread Stefan Hajnoczi

From: Andreas Färber afaer...@suse.de

The topic of whether and by whom docs/tracing.txt is maintained was
brought up. It currently does not have an official maintainer.

Add it to the tracing section so that Stefan gets cc'ed on patches.

Signed-off-by: Andreas Färber afaer...@suse.de
Acked-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
Cc: Peter Maydell peter.mayd...@linaro.org
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 MAINTAINERS |1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index d249947..f83d07c2 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -548,6 +548,7 @@ Tracing
 M: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
 S: Maintained
 F: trace/
+F: docs/tracing.txt
 T: git://github.com/stefanha/qemu.git tracing
 
 Checkpatch
-- 
1.7.9.1

[Qemu-devel] [PULL 0/7] Tracing patches

2012-03-12 Thread Stefan Hajnoczi

First round of tracing patches.  I should have sent these out a long time ago.
Once these are out of the way I can review and handle bigger patches from Lluís
and Harsh.

The following changes since commit a348f108842fb928563865c9918642900cd0d477:

  Add missing const attributes for MemoryRegionOps (2012-03-11 11:40:15 +)

are available in the git repository at:

  git://github.com/stefanha/qemu.git tracing

for you to fetch changes up to 727500181a2b2470a676e021205d170ede23beb7:

  vga: add trace event for ppm_save (2012-03-12 10:30:27 +)


Alon Levy (2):
  console: add some trace events
  vga: add trace event for ppm_save

Andreas Färber (1):
  maintainers: Add docs/tracing.txt to Tracing

Jun Koi (2):
  trace: make trace_thread_create() use its function arg
  docs: correct ./configure line in tracing.txt

Lluís Vilanova (1):
  trace: Provide a per-event status define for conditional compilation

Stefan Hajnoczi (1):
  tracetool: Omit useless QEMU_*_ENABLED() check

 MAINTAINERS   |1 +
 console.h |3 +++
 docs/tracing.txt  |   48 +---
 hw/vga.c  |2 ++
 scripts/tracetool |   13 +
 trace-events  |7 +++
 trace/simple.c|2 +-
 7 files changed, 64 insertions(+), 12 deletions(-)

-- 
1.7.9.1

[Qemu-devel] [PATCH 6/7] console: add some trace events

2012-03-12 Thread Stefan Hajnoczi

From: Alon Levy al...@redhat.com

Signed-off-by: Alon Levy al...@redhat.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 console.h|3 +++
 trace-events |4 
 2 files changed, 7 insertions(+), 0 deletions(-)

diff --git a/console.h b/console.h
index a95b581..4334db5 100644
--- a/console.h
+++ b/console.h
@@ -5,6 +5,7 @@
 #include qdict.h
 #include notify.h
 #include monitor.h
+#include trace.h
 
 /* keyboard/mouse support */
 
@@ -202,11 +203,13 @@ static inline DisplaySurface* 
qemu_create_displaysurface(DisplayState *ds, int w
 
 static inline DisplaySurface* qemu_resize_displaysurface(DisplayState *ds, int 
width, int height)
 {
+trace_displaysurface_resize(ds, ds-surface, width, height);
 return ds-allocator-resize_displaysurface(ds-surface, width, height);
 }
 
 static inline void qemu_free_displaysurface(DisplayState *ds)
 {
+trace_displaysurface_free(ds, ds-surface);
 ds-allocator-free_displaysurface(ds-surface);
 }
 
diff --git a/trace-events b/trace-events
index c5d0f0f..94c4a6f 100644
--- a/trace-events
+++ b/trace-events
@@ -658,3 +658,7 @@ dma_aio_cancel(void *dbs) dbs=%p
 dma_complete(void *dbs, int ret, void *cb) dbs=%p ret=%d cb=%p
 dma_bdrv_cb(void *dbs, int ret) dbs=%p ret=%d
 dma_map_wait(void *dbs) dbs=%p
+
+# console.h
+displaysurface_free(void *display_state, void *display_surface) state=%p 
surface=%p
+displaysurface_resize(void *display_state, void *display_surface, int width, 
int height) state=%p surface=%p %dx%d
-- 
1.7.9.1

[Qemu-devel] [PATCH 2/7] tracetool: Omit useless QEMU_*_ENABLED() check

2012-03-12 Thread Stefan Hajnoczi

SystemTap provides a semaphore that can optionally be tested before
executing a trace event.  The purpose of this mechanism is to skip
expensive tracing code when the trace event is disabled.

For example, some applications may have trace events that format or
convert strings for trace events.  This expensive processing should only
be done in the case where the trace event is enabled.

Since QEMU's generated trace events never have such special-purpose
code, there is no reason to add the semaphore check.

Reviewed-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
Signed-off-by: Stefan Hajnoczi stefa...@linux.vnet.ibm.com
---
 scripts/tracetool |4 +---
 1 files changed, 1 insertions(+), 3 deletions(-)

diff --git a/scripts/tracetool b/scripts/tracetool
index 701b517..65bd0a1 100755
--- a/scripts/tracetool
+++ b/scripts/tracetool
@@ -415,9 +415,7 @@ linetoh_dtrace()
 # Define an empty function for the trace event
 cat EOF
 static inline void trace_$name($args) {
-if (QEMU_${nameupper}_ENABLED()) {
-QEMU_${nameupper}($argnames);
-}
+QEMU_${nameupper}($argnames);
 }
 EOF
 }
-- 
1.7.9.1

Re: [Qemu-devel] [libvirt] Modern CPU models cannot be used with libvirt

2012-03-12 Thread Eduardo Habkost

On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
 On 03/11/2012 08:27 AM, Gleb Natapov wrote:
 On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
 Let's step back here.
 
 Why are you writing these patches?  It's probably not because you
 have a desire to say -cpu Westmere when you run QEMU on your laptop.
 I'd wager to say that no human has ever done that or that if they
 had, they did so by accident because they read documentation and
 thought they had to.

No, it's because libvirt doesn't handle all the tiny small details
involved in specifying a CPU. All libvirty knows about are a set of CPU
flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
but we would like to allow it to expose a Westmere-like CPU to the
guest.

libvirt does know how to use the Westmere CPU model today, if it is not
disabled by -nodefconfig. The interface it uses for probing has
deficiencies, but it works right now.


 Humans probably do one of two things: 1) no cpu option or 2) -cpu host.
 
 And both are not optimal. Actually both are bad. First one because
 default cpu is very conservative and the second because there is no
 guaranty that guest will continue to work after qemu or kernel upgrade.
 
 Let me elaborate about the later. Suppose host CPU has kill_guest
 feature and at the time a guest was installed it was not implemented by
 kvm. Since it was not implemented by kvm it was not present in vcpu
 during installation and the guest didn't install workaround kill_guest
 module. Now unsuspecting user upgrades the kernel and tries to restart
 the guest and fails. He writes angry letter to qemu-devel and is asked to
 reinstall his guest and move along.
 
 -cpu best wouldn't solve this.  You need a read/write configuration
 file where QEMU probes the available CPU and records it to be used
 for the lifetime of the VM.

If the CPU records are used for probing, this is yet another reason they
are not configuration, but defaults/templates to be used to build the
actual configuration.

IMHO, having to generate an opaque config file based on the results of
probing is poor interface design, for humans _and_ for machines. If we
have any bug on the probing, or on the data used as base for the
probing, or on the config generation, it will be impossible to deploy a
fix for the users.

This is why machine-types exist: you have the ability to implement
probing and/or sane defaults, but at the same time you can change the
probing behavior or the set of defaults without breaking existing
machines. This way, the config file contains only what the user really
wanted to configure, not some complex and opaque result of a probing
process.

Tthe fact that we have a _set_ of CPU definitions to choose from (or to
use as input for probing) instead of a single default CPU definition
that the user can change is a sign that that the cpudefs are _not_ user
configuration, but just templates/defaults.


[...]
 This discussion isn't about whether QEMU should have a Westmere
 processor definition.  In fact, I think I already applied that patch.
 
 It's a discussion about how we handle this up and down the stack.

Agreed on this point.

 
 The question is who should define and manage CPU compatibility.
 Right now QEMU does to a certain degree, libvirt discards this and
 does it's own thing, and VDSM/ovirt-engine assume that we're
 providing something and has built a UI around it.

libvirt doesn't discard this. If it just discarded this and properly
defined its own models, I wouldn't even have (re-)started this thread.

(Well, maybe I would have started a similar thread arguing that we are
wasting time working on equivalent known-to-work CPU model definitions
on Qemu and libvirt. Today we don't waste time doing it because libvirt
currently expects -nodefconfig to not disable the existing default
models).

 
 What I'm proposing we consider: have VDSM manage CPU definitions in
 order to provide a specific user experience in ovirt-engine.

I don't disagree completely with that. The problem is defining what's
CPU definitions. The current cpudef semantics is simply too low level,
it impacts other features that are _already_ managed by Qemu. Let me try
to enumerate:

- Some CPUID leafs are defined based on -smp;
- Some CPUID leafs depend on kernel capabilities;
- The availability of some CPUID leafs depend on some features
  being enabled or not, but they are simply not exposed if a proper
  'level' value is set.

We could have two approaches here: we can define some details of CPU
definitions as not configurable and others as must-be configurable,
and force management layer to agree with us about what should be
configurable or not.

Or, we could simply define that a sane set of CPU definitions are part
of a machine-type, and let managment to reconfigure parts of it if
desired, but do not force it to configure it if not needed.

 
 We would continue to have Westmere/etc in QEMU exposed as part of the
 user

Re: [Qemu-devel] [PATCH 2/4] block: image fragmentation statistics for qed

2012-03-12 Thread Stefan Hajnoczi

On Wed, Mar 7, 2012 at 9:22 AM, Dong Xu Wang wdon...@linux.vnet.ibm.com wrote:
 +    for (i = 0; i  table_noffsets; i++) {
 +        l2_offset = s-l1_table-offsets[i];
 +        if (l2_offset == 0) {

if (qed_offset_is_unalloc_cluster(l2_offset)) {

 +            continue;
 +        }
 +        ret = bdrv_pread(bs-file, l2_offset, table, size);
 +        if (ret  0) {
 +            qemu_vfree(table);
 +            return ret;
 +        }
 +        for (j = 0; j  size/sizeof(uint64_t); j++) {
 +            uint64_t *offset = (uint64_t *)(table + j);

No need to cast, table is already uint64_t*.

Since you don't write to the offset, I would just do:

uint64_t offset = table[j];

 +            if (*offset  cluster_size) {
 +                continue;
 +            }

if (!qed_check_cluster_offset(s, offset)) {
continue;
}

We will skip unallocated clusters and zero clusters.

Stefan

Re: [Qemu-devel] [RFC PATCH 06/17] block: use bdrv_{co, aio}_discard for write_zeroes operations

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 13:27, schrieb Paolo Bonzini:
 Il 10/03/2012 19:02, Richard Laager ha scritto:
 I propose adding the following behaviors in any event:
   * If a QEMU block device reports a discard_granularity  0, it
 must be equal to 2^n (n = 0), or QEMU's block core will change
 it to 0. (Non-power-of-two granularities are not likely to exist
 in the real world, and this assumption greatly simplifies
 ensuring correctness.)
 
 Yeah, I was considering this to be simply a bug in the block device.

Block driver you mean? Yes, I think we can just assert() this.

Kevin

Re: [Qemu-devel] [libvirt] Modern CPU models cannot be used with libvirt

2012-03-12 Thread Daniel P. Berrange

On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
 On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
  On 03/11/2012 08:27 AM, Gleb Natapov wrote:
  On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
  Let's step back here.
  
  Why are you writing these patches?  It's probably not because you
  have a desire to say -cpu Westmere when you run QEMU on your laptop.
  I'd wager to say that no human has ever done that or that if they
  had, they did so by accident because they read documentation and
  thought they had to.
 
 No, it's because libvirt doesn't handle all the tiny small details
 involved in specifying a CPU. All libvirty knows about are a set of CPU
 flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
 but we would like to allow it to expose a Westmere-like CPU to the
 guest.

This is easily fixable in libvirt - so for the point of going discussion,
IMHO, we can assume libvirt will support level, family, xlevel, etc.


Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

Re: [Qemu-devel] [PATCH 1/4] block: add image fragmentation statistics to qemu-img

2012-03-12 Thread Stefan Hajnoczi

On Wed, Mar 7, 2012 at 9:22 AM, Dong Xu Wang wdon...@linux.vnet.ibm.com wrote:
 From: Dong Xu Wang wdon...@linux.vnet.ibm.com

 Discussion can be found at:
 http://patchwork.ozlabs.org/patch/128730/

 This patch add image fragmentation statistics while using qemu-img info.

 Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
 ---
  block.c     |   13 +
  block.h     |    7 +++
  block_int.h |    1 +
  qemu-img.c  |    9 +
  4 files changed, 30 insertions(+), 0 deletions(-)

 diff --git a/block.c b/block.c
 index 52ffe14..947607b 100644
 --- a/block.c
 +++ b/block.c
 @@ -2588,6 +2588,19 @@ int bdrv_get_info(BlockDriverState *bs, 
 BlockDriverInfo *bdi)
     return drv-bdrv_get_info(bs, bdi);
  }

 +int bdrv_get_fragment(BlockDriverState *bs, BlockFragInfo *bfi)

bdrv_get_fraginfo() makes it clearer that this function gets a summary
of fragmentation information.  bdrv_get_fragment() makes me think it
gets one specific fragment.

 +{
 +    BlockDriver *drv = bs-drv;
 +    if (!drv) {
 +        return -ENOMEDIUM;
 +    }
 +    if (!drv-bdrv_get_fragment) {
 +        return -ENOTSUP;
 +    }
 +    memset(bfi, 0, sizeof(*bfi));
 +    return drv-bdrv_get_fragment(bs, bfi);

For now this .drv_get_fraginfo() interface makes sense but if we merge
the QCOW2-QED in-place conversion patch series in the future it will
be possible to implement this in a generic way because image formats
will expose their guest - host mapping.

 @@ -1126,6 +1127,14 @@ static int img_info(int argc, char **argv)
             printf(cluster_size: %d\n, bdi.cluster_size);
         }
     }
 +    if (bdrv_get_fragment(bs, bfi) = 0) {

I think we need a separate sub-command for fragmentation info:

qemu-img fraginfo image-file

Utilities that invoke qemu-img info want it to be fast.  Reading all
metadata from a large image can take several seconds.  Since many
qemu-img info users don't need to see the fragmentation information,
it makes sense to put it in a new sub-command.

 +        if (bfi.total_clusters != 0  bfi.allocated_clusters != 0) {
 +            printf(%lld/%lld = %0.2f%% allocated, %0.2f%% fragmented\n,

Please use the PRId64 macro instead of %lld because some compilers
warn when uint64_t is not typedefed to long long int.

Stefan

Re: [Qemu-devel] [libvirt] Modern CPU models cannot be used with libvirt

2012-03-12 Thread Eduardo Habkost

On Sun, Mar 11, 2012 at 10:41:32AM -0500, Anthony Liguori wrote:
 On 03/11/2012 10:12 AM, Gleb Natapov wrote:
 On Sun, Mar 11, 2012 at 09:16:49AM -0500, Anthony Liguori wrote:
 If libvirt assumes anything about what kvm actually supports it is
 working only by sheer luck.
 
 Well the simple answer for libvirt is don't use -nodefconfig and
 then it can reuse the CPU definitions (including any that the user
 adds).
 CPU models should be usable even with -nodefconfig. CPU model is more
 like device. By -cpu Nehalem I am saying I want Nehalem device in my
 machine.
 
 Let's say we moved CPU definitions to /usr/share/qemu/cpu-models.xml.
 
 Obviously, we'd want a command line option to be able to change that
 location so we'd introduce -cpu-models PATH.
 
 But we want all of our command line options to be settable by the
 global configuration file so we would have a cpu-model=PATH to the
 configuration file.
 
 But why hard code a path when we can just set the default path in the
 configuration file so let's avoid hard coding and just put
 cpu-models=/usr/share/qemu/cpu-models.xml in the default
 configuration file.

We wouldn't do the above.

-nodefconfig should disable the loading of files on /etc, but it
shouldn't disable loading internal non-configurable data that we just
happened to choose to store outside the qemu binary because it makes
development easier.

Really, the requirement of a default configuration file is a problem
by itself. Qemu should not require a default configuration file to work,
and it shouldn't require users to copy the default configuration file to
change options from the default.

Doing this would make it impossible to deploy fixes to users if we evern
find out that the default configuration file had a serious bug. What if
a bug in our default configuration file has a serious security
implication?

 
 But now when libvirt uses -nodefconfig, those models go away.
 -nodefconfig means start QEMU in the most minimal state possible.
 You get what you pay for if you use it.
 
 We'll have the same problem with machine configuration files.  At
 some point in time, -nodefconfig will make machine models disappear.

It shouldn't. Machine-types are defaults to be used as base, they are
not user-provided configuration. And the fact that we decided to store
some data outside of the Qemu binary is orthogonal the design decisions
in the Qemu command-line and configuration interface.

As I said previously, requiring generation of opaque config files (and
copy the default config file and change it is included on my
definition of generation of opaque config files) is poor design, IMO.
I bet this even has an entry in some design anti-pattern catalog
somewhere.

-- 
Eduardo

Re: [Qemu-devel] [PATCH 1/4] block: add image fragmentation statistics to qemu-img

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 14:07, schrieb Stefan Hajnoczi:
 On Wed, Mar 7, 2012 at 9:22 AM, Dong Xu Wang wdon...@linux.vnet.ibm.com 
 wrote:
 From: Dong Xu Wang wdon...@linux.vnet.ibm.com

 Discussion can be found at:
 http://patchwork.ozlabs.org/patch/128730/

 This patch add image fragmentation statistics while using qemu-img info.

 Signed-off-by: Dong Xu Wang wdon...@linux.vnet.ibm.com
 ---
  block.c |   13 +
  block.h |7 +++
  block_int.h |1 +
  qemu-img.c  |9 +
  4 files changed, 30 insertions(+), 0 deletions(-)

 diff --git a/block.c b/block.c
 index 52ffe14..947607b 100644
 --- a/block.c
 +++ b/block.c
 @@ -2588,6 +2588,19 @@ int bdrv_get_info(BlockDriverState *bs, 
 BlockDriverInfo *bdi)
 return drv-bdrv_get_info(bs, bdi);
  }

 +int bdrv_get_fragment(BlockDriverState *bs, BlockFragInfo *bfi)
 
 bdrv_get_fraginfo() makes it clearer that this function gets a summary
 of fragmentation information.  bdrv_get_fragment() makes me think it
 gets one specific fragment.
 
 +{
 +BlockDriver *drv = bs-drv;
 +if (!drv) {
 +return -ENOMEDIUM;
 +}
 +if (!drv-bdrv_get_fragment) {
 +return -ENOTSUP;
 +}
 +memset(bfi, 0, sizeof(*bfi));
 +return drv-bdrv_get_fragment(bs, bfi);
 
 For now this .drv_get_fraginfo() interface makes sense but if we merge
 the QCOW2-QED in-place conversion patch series in the future it will
 be possible to implement this in a generic way because image formats
 will expose their guest - host mapping.

We can probably resurrect Devin's patches for this part even now instead
of introducing an interface that we don't really want.

 
 @@ -1126,6 +1127,14 @@ static int img_info(int argc, char **argv)
 printf(cluster_size: %d\n, bdi.cluster_size);
 }
 }
 +if (bdrv_get_fragment(bs, bfi) = 0) {
 
 I think we need a separate sub-command for fragmentation info:
 
 qemu-img fraginfo image-file
 
 Utilities that invoke qemu-img info want it to be fast.  Reading all
 metadata from a large image can take several seconds.  Since many
 qemu-img info users don't need to see the fragmentation information,
 it makes sense to put it in a new sub-command.

Yes. If we wanted to merge it into an existing qemu-img subcommand, I
think check would be the one, as it scans the whole image already today
and fragmentation is something that could be added fairly easily.

Kevin

[Qemu-devel] [PATCH/RFC 2/7] Allow a qemu_fopen_socket() to be opened for writing

2012-03-12 Thread Marc-André Lureau

---
 migration-tcp.c  |2 +-
 migration-unix.c |2 +-
 qemu-file.h  |2 +-
 savevm.c |   36 
 4 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/migration-tcp.c b/migration-tcp.c
index 35a5781..f567898 100644
--- a/migration-tcp.c
+++ b/migration-tcp.c
@@ -140,7 +140,7 @@ static void tcp_accept_incoming_migration(void *opaque)
 goto out2;
 }
 
-f = qemu_fopen_socket(c);
+f = qemu_fopen_socket(c, r);
 if (f == NULL) {
 fprintf(stderr, could not qemu_fopen socket\n);
 goto out;
diff --git a/migration-unix.c b/migration-unix.c
index 169de88..3928f4c 100644
--- a/migration-unix.c
+++ b/migration-unix.c
@@ -137,7 +137,7 @@ static void unix_accept_incoming_migration(void *opaque)
 goto out2;
 }
 
-f = qemu_fopen_socket(c);
+f = qemu_fopen_socket(c, r);
 if (f == NULL) {
 fprintf(stderr, could not qemu_fopen socket\n);
 goto out;
diff --git a/qemu-file.h b/qemu-file.h
index 31b83f6..efd6b1c 100644
--- a/qemu-file.h
+++ b/qemu-file.h
@@ -67,7 +67,7 @@ QEMUFile *qemu_fopen_ops(void *opaque, QEMUFilePutBufferFunc 
*put_buffer,
  QEMUFileGetRateLimit *get_rate_limit);
 QEMUFile *qemu_fopen(const char *filename, const char *mode);
 QEMUFile *qemu_fdopen(int fd, const char *mode);
-QEMUFile *qemu_fopen_socket(int fd);
+QEMUFile *qemu_fopen_socket(int fd, const char *mode);
 QEMUFile *qemu_popen(FILE *popen_file, const char *mode);
 QEMUFile *qemu_popen_cmd(const char *command, const char *mode);
 int qemu_stdio_fd(QEMUFile *f);
diff --git a/savevm.c b/savevm.c
index 80be1ff..4171ebf 100644
--- a/savevm.c
+++ b/savevm.c
@@ -205,6 +205,21 @@ static int socket_get_buffer(void *opaque, uint8_t *buf, 
int64_t pos, int size)
 return len;
 }
 
+static int socket_put_buffer(void *opaque, const uint8_t *buf, int64_t pos, 
int size)
+{
+QEMUFileSocket *s = opaque;
+ssize_t len;
+
+do {
+len = write(s-fd, buf, size);
+} while (len == -1  socket_error() == EINTR);
+
+if (len == -1)
+len = -socket_error();
+
+return len;
+}
+
 static int socket_close(void *opaque)
 {
 QEMUFileSocket *s = opaque;
@@ -316,7 +331,7 @@ QEMUFile *qemu_fdopen(int fd, const char *mode)
 if (!s-stdio_file)
 goto fail;
 
-if(mode[0] == 'r') {
+if (mode[0] == 'r') {
 s-file = qemu_fopen_ops(s, NULL, stdio_get_buffer, stdio_fclose, 
 NULL, NULL, NULL);
 } else {
@@ -330,13 +345,26 @@ fail:
 return NULL;
 }
 
-QEMUFile *qemu_fopen_socket(int fd)
+QEMUFile *qemu_fopen_socket(int fd, const char *mode)
 {
 QEMUFileSocket *s = g_malloc0(sizeof(QEMUFileSocket));
 
+if (mode == NULL ||
+   (mode[0] != 'r'  mode[0] != 'w') ||
+   mode[1] != 'b' || mode[2] != 0) {
+fprintf(stderr, qemu_fopen_socket: Argument validity check failed\n);
+return NULL;
+}
+
 s-fd = fd;
-s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close, 
-NULL, NULL, NULL);
+if (mode[0] == 'r') {
+s-file = qemu_fopen_ops(s, NULL, socket_get_buffer, socket_close,
+ NULL, NULL, NULL);
+} else {
+s-file = qemu_fopen_ops(s, socket_put_buffer, NULL, socket_close,
+ NULL, NULL, NULL);
+}
+
 return s-file;
 }
 
-- 
1.7.7.6

Re: [Qemu-devel] [libvirt] Modern CPU models cannot be used with libvirt

2012-03-12 Thread Gleb Natapov

On Mon, Mar 12, 2012 at 01:04:19PM +, Daniel P. Berrange wrote:
 On Mon, Mar 12, 2012 at 09:52:27AM -0300, Eduardo Habkost wrote:
  On Sun, Mar 11, 2012 at 09:12:58AM -0500, Anthony Liguori wrote:
   On 03/11/2012 08:27 AM, Gleb Natapov wrote:
   On Sat, Mar 10, 2012 at 12:24:47PM -0600, Anthony Liguori wrote:
   Let's step back here.
   
   Why are you writing these patches?  It's probably not because you
   have a desire to say -cpu Westmere when you run QEMU on your laptop.
   I'd wager to say that no human has ever done that or that if they
   had, they did so by accident because they read documentation and
   thought they had to.
  
  No, it's because libvirt doesn't handle all the tiny small details
  involved in specifying a CPU. All libvirty knows about are a set of CPU
  flag bits, but it knows nothing about 'level', 'family', and 'xlevel',
  but we would like to allow it to expose a Westmere-like CPU to the
  guest.
 
 This is easily fixable in libvirt - so for the point of going discussion,
 IMHO, we can assume libvirt will support level, family, xlevel, etc.
 
And fill in all cpuid leafs by querying /dev/kvm when needed or, if TCG
is used, replicating QEMU logic? And since QEMU should be usable without
libvirt the same logic should be implemented in QEMU anyway.

--
Gleb.

Re: [Qemu-devel] [PATCH] ide: Adds model=s option, allowing the user to override the default disk model name QEMU HARDDISK

2012-03-12 Thread Kevin Wolf

Am 12.03.2012 13:09, schrieb Floris Bos / Maxnet:
 On 03/12/2012 12:57 PM, Kevin Wolf wrote:
 Am 10.03.2012 20:56, schrieb Floris Bos:
 Some Linux distributions use the 
 /dev/disk/by-id/scsi-SATA_name-of-disk-model_serial addressing scheme
 when refering to partitions in /etc/fstab and elsewhere.
 This causes problems when starting a disk image taken from an existing 
 physical server under qemu,
 because when running under qemu name-of-disk-model is always QEMU HARDDISK
 This patch introduces a model=s option which in combination with the 
 existing serial=s option can be used to
 fake the disk the operating system was previously on, allowing the OS to 
 boot properly.

 Cc: kw...@redhat.com
 Signed-off-by: Floris Bosd...@noc-ps.com
 Oh, and now that I look at the actual patch, do we really want to add
 the model=... option to -drive? I think just the qdev property may be
 enough. When you need to set the option, you would need to use -device,
 but it's not that hard.
 
 Well, to me it makes sense to put it at the same places serial is, as 
 both options have quite similar functions: faking drive attributes.

-drive serial=... is older than -device, this is why it was okay back
then and why we still have to maintain it for compatibility. If
serial=... was a new patch today, I would probably be asking the same
question there.

Kevin

1 2 3 4 >

1 - 100 of 394 matches

Mail list logo